We noticed a major performance problem when parsing a long list of references similar to this benchmark: https://github.com/markdown-it/markdown-it/blob/master/benchmark/samples/block-ref-list.md
In our case we have a list of 1000+ references.
The root cause seems to be this termination logic:
|
const terminatorRules = state.md.block.ruler.getRules('reference') |
|
|
|
const oldParentType = state.parentType |
|
state.parentType = 'reference' |
|
|
|
for (; nextLine < endLine && !state.isEmpty(nextLine); nextLine++) { |
|
// this would be a code block normally, but after paragraph |
|
// it's considered a lazy continuation regardless of what's there |
|
if (state.sCount[nextLine] - state.blkIndent > 3) { continue } |
|
|
|
// quirk for blockquotes, this line should already be checked by that rule |
|
if (state.sCount[nextLine] < 0) { continue } |
|
|
|
// Some tags can terminate paragraph without empty line. |
|
let terminate = false |
|
for (let i = 0, l = terminatorRules.length; i < l; i++) { |
|
if (terminatorRules[i](state, nextLine, endLine, true)) { |
|
terminate = true |
|
break |
|
} |
|
} |
|
if (terminate) { break } |
|
} |
Removing this logic doesn't break any tests and improves speed of parsing our long list 30x 🙀
I tried to find some similar problems and found this thread: #54
I believe this table is incorrect but I'm not sure:
|
// First 2 params - rule name & source. Secondary array - list of rules, |
|
// which can be terminated by this one. |
|
[ 'table', require('./rules_block/table'), [ 'paragraph', 'reference' ] ], |
|
[ 'code', require('./rules_block/code') ], |
|
[ 'fence', require('./rules_block/fence'), [ 'paragraph', 'reference', 'blockquote', 'list' ] ], |
|
[ 'blockquote', require('./rules_block/blockquote'), [ 'paragraph', 'reference', 'blockquote', 'list' ] ], |
|
[ 'hr', require('./rules_block/hr'), [ 'paragraph', 'reference', 'blockquote', 'list' ] ], |
|
[ 'list', require('./rules_block/list'), [ 'paragraph', 'reference', 'blockquote' ] ], |
|
[ 'reference', require('./rules_block/reference') ], |
|
[ 'html_block', require('./rules_block/html_block'), [ 'paragraph', 'reference', 'blockquote' ] ], |
|
[ 'heading', require('./rules_block/heading'), [ 'paragraph', 'reference', 'blockquote' ] ], |
|
[ 'lheading', require('./rules_block/lheading') ], |
|
[ 'paragraph', require('./rules_block/paragraph') ] |
From the CommonMark spec I can't see that reference can be terminated by other rules and it's the other way around actually - the reference can terminate some of the rules. Am I correct?
I tried modifying the code above to the variant below and all the tests are passing performance is still fast:
const _rules = [
// First 2 params - rule name & source. Secondary array - list of rules,
// which can be terminated by this one.
['table', r_table, ['paragraph']],
['code', r_code],
['fence', r_fence, ['paragraph', 'blockquote', 'list']],
['blockquote', r_blockquote, ['paragraph', 'blockquote', 'list']],
['hr', r_hr, ['paragraph', 'blockquote', 'list']],
['list', r_list, ['paragraph', 'blockquote']],
['reference', r_reference, ['table', 'fence', 'blockquote', 'hr', 'list', 'html_block', 'heading']],
['html_block', r_html_block, ['paragraph', 'blockquote']],
['heading', r_heading, ['paragraph', 'blockquote']],
['lheading', r_lheading],
['paragraph', r_paragraph]
]
Could someone check if my understanding is correct? I would be happy to open a PR.
We noticed a major performance problem when parsing a long list of references similar to this benchmark: https://github.com/markdown-it/markdown-it/blob/master/benchmark/samples/block-ref-list.md
In our case we have a list of 1000+ references.
The root cause seems to be this termination logic:
markdown-it/lib/rules_block/reference.mjs
Lines 29 to 51 in d07d585
Removing this logic doesn't break any tests and improves speed of parsing our long list 30x 🙀
I tried to find some similar problems and found this thread: #54
I believe this table is incorrect but I'm not sure:
markdown-it/lib/parser_block.js
Lines 13 to 25 in d72c68b
From the CommonMark spec I can't see that reference can be terminated by other rules and it's the other way around actually - the reference can terminate some of the rules. Am I correct?
I tried modifying the code above to the variant below and all the tests are passing performance is still fast:
Could someone check if my understanding is correct? I would be happy to open a PR.