Skip to content

Major performance issue when parsing a long list of reference links #996

@RomanHotsiy

Description

@RomanHotsiy

We noticed a major performance problem when parsing a long list of references similar to this benchmark: https://github.com/markdown-it/markdown-it/blob/master/benchmark/samples/block-ref-list.md

In our case we have a list of 1000+ references.

The root cause seems to be this termination logic:

const terminatorRules = state.md.block.ruler.getRules('reference')
const oldParentType = state.parentType
state.parentType = 'reference'
for (; nextLine < endLine && !state.isEmpty(nextLine); nextLine++) {
// this would be a code block normally, but after paragraph
// it's considered a lazy continuation regardless of what's there
if (state.sCount[nextLine] - state.blkIndent > 3) { continue }
// quirk for blockquotes, this line should already be checked by that rule
if (state.sCount[nextLine] < 0) { continue }
// Some tags can terminate paragraph without empty line.
let terminate = false
for (let i = 0, l = terminatorRules.length; i < l; i++) {
if (terminatorRules[i](state, nextLine, endLine, true)) {
terminate = true
break
}
}
if (terminate) { break }
}

Removing this logic doesn't break any tests and improves speed of parsing our long list 30x 🙀

I tried to find some similar problems and found this thread: #54

I believe this table is incorrect but I'm not sure:

// First 2 params - rule name & source. Secondary array - list of rules,
// which can be terminated by this one.
[ 'table', require('./rules_block/table'), [ 'paragraph', 'reference' ] ],
[ 'code', require('./rules_block/code') ],
[ 'fence', require('./rules_block/fence'), [ 'paragraph', 'reference', 'blockquote', 'list' ] ],
[ 'blockquote', require('./rules_block/blockquote'), [ 'paragraph', 'reference', 'blockquote', 'list' ] ],
[ 'hr', require('./rules_block/hr'), [ 'paragraph', 'reference', 'blockquote', 'list' ] ],
[ 'list', require('./rules_block/list'), [ 'paragraph', 'reference', 'blockquote' ] ],
[ 'reference', require('./rules_block/reference') ],
[ 'html_block', require('./rules_block/html_block'), [ 'paragraph', 'reference', 'blockquote' ] ],
[ 'heading', require('./rules_block/heading'), [ 'paragraph', 'reference', 'blockquote' ] ],
[ 'lheading', require('./rules_block/lheading') ],
[ 'paragraph', require('./rules_block/paragraph') ]

From the CommonMark spec I can't see that reference can be terminated by other rules and it's the other way around actually - the reference can terminate some of the rules. Am I correct?

I tried modifying the code above to the variant below and all the tests are passing performance is still fast:

const _rules = [
  // First 2 params - rule name & source. Secondary array - list of rules,
  // which can be terminated by this one.
  ['table',      r_table,      ['paragraph']],
  ['code',       r_code],
  ['fence',      r_fence,      ['paragraph', 'blockquote', 'list']],
  ['blockquote', r_blockquote, ['paragraph', 'blockquote', 'list']],
  ['hr',         r_hr,         ['paragraph', 'blockquote', 'list']],
  ['list',       r_list,       ['paragraph', 'blockquote']],
  ['reference',  r_reference, ['table', 'fence', 'blockquote', 'hr', 'list', 'html_block', 'heading']],
  ['html_block', r_html_block, ['paragraph', 'blockquote']],
  ['heading',    r_heading,    ['paragraph', 'blockquote']],
  ['lheading',   r_lheading],
  ['paragraph',  r_paragraph]
]

Could someone check if my understanding is correct? I would be happy to open a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions