Major performance issue when parsing a long list of reference links

We noticed a major performance problem when parsing a long list of references similar to this benchmark: https://github.com/markdown-it/markdown-it/blob/master/benchmark/samples/block-ref-list.md

In our case we have a list of 1000+ references.


The root cause seems to be this termination logic:
https://github.com/markdown-it/markdown-it/blob/d07d585b6b15aaee2bc8f7a54b994526dad4dbc5/lib/rules_block/reference.mjs#L29-L51

Removing this logic doesn't break any tests and improves speed of parsing our long list 30x 🙀

I tried to find some similar problems and found this thread: https://github.com/markdown-it/markdown-it/issues/54

I believe this table is incorrect but I'm not sure:

https://github.com/markdown-it/markdown-it/blob/d72c68b520cedacae7878caa92bf7fe32e3e0e6f/lib/parser_block.js#L13-L25

From the CommonMark spec I can't see that reference can be terminated by other rules and it's the other way around actually - the reference can terminate some of the rules. Am I correct?

I tried modifying the code above to the variant below and all the tests are passing performance is still fast:

```js
const _rules = [
  // First 2 params - rule name & source. Secondary array - list of rules,
  // which can be terminated by this one.
  ['table',      r_table,      ['paragraph']],
  ['code',       r_code],
  ['fence',      r_fence,      ['paragraph', 'blockquote', 'list']],
  ['blockquote', r_blockquote, ['paragraph', 'blockquote', 'list']],
  ['hr',         r_hr,         ['paragraph', 'blockquote', 'list']],
  ['list',       r_list,       ['paragraph', 'blockquote']],
  ['reference',  r_reference, ['table', 'fence', 'blockquote', 'hr', 'list', 'html_block', 'heading']],
  ['html_block', r_html_block, ['paragraph', 'blockquote']],
  ['heading',    r_heading,    ['paragraph', 'blockquote']],
  ['lheading',   r_lheading],
  ['paragraph',  r_paragraph]
]
```

Could someone check if my understanding is correct? I would be happy to open a PR.

	const terminatorRules = state.md.block.ruler.getRules('reference')

	const oldParentType = state.parentType
	state.parentType = 'reference'

	for (; nextLine < endLine && !state.isEmpty(nextLine); nextLine++) {
	// this would be a code block normally, but after paragraph
	// it's considered a lazy continuation regardless of what's there
	if (state.sCount[nextLine] - state.blkIndent > 3) { continue }

	// quirk for blockquotes, this line should already be checked by that rule
	if (state.sCount[nextLine] < 0) { continue }

	// Some tags can terminate paragraph without empty line.
	let terminate = false
	for (let i = 0, l = terminatorRules.length; i < l; i++) {
	if (terminatorRules[i](state, nextLine, endLine, true)) {
	terminate = true
	break
	}
	}
	if (terminate) { break }
	}

	// First 2 params - rule name & source. Secondary array - list of rules,
	// which can be terminated by this one.
	[ 'table', require('./rules_block/table'), [ 'paragraph', 'reference' ] ],
	[ 'code', require('./rules_block/code') ],
	[ 'fence', require('./rules_block/fence'), [ 'paragraph', 'reference', 'blockquote', 'list' ] ],
	[ 'blockquote', require('./rules_block/blockquote'), [ 'paragraph', 'reference', 'blockquote', 'list' ] ],
	[ 'hr', require('./rules_block/hr'), [ 'paragraph', 'reference', 'blockquote', 'list' ] ],
	[ 'list', require('./rules_block/list'), [ 'paragraph', 'reference', 'blockquote' ] ],
	[ 'reference', require('./rules_block/reference') ],
	[ 'html_block', require('./rules_block/html_block'), [ 'paragraph', 'reference', 'blockquote' ] ],
	[ 'heading', require('./rules_block/heading'), [ 'paragraph', 'reference', 'blockquote' ] ],
	[ 'lheading', require('./rules_block/lheading') ],
	[ 'paragraph', require('./rules_block/paragraph') ]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Major performance issue when parsing a long list of reference links #996

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Major performance issue when parsing a long list of reference links #996

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions