-
Notifications
You must be signed in to change notification settings - Fork 7
Description
There appears to be a bug with the lexer that is reproduced by the following JavaScript:
import { lexer } from 'dt-python-parser';
// This Python code is processed with no problem:
const python = `"""it is for test"""\nvar1 = "Hello World!"\n# comment here\nfor i in range(5):\n print(i)`;
const commentTokens = lexer(python);
console.log(commentTokens);
/*
[
{
type: 'Comment',
value: '"""it is for test"""',
start: 0,
lineNumber: 1,
end: 20
}
]
*/
////////////////////////////////// HERE is where the bug is reproduced:
const commentTokens2 = lexer('# hi');
console.log(commentTokens2); // never reaches this point.
Here is the stack trace I get:
RangeError: Invalid string length
at lexer (C:\Users\josh.greig\Desktop\turtle\python-parser\node_modules\dt-python-parser\dist\utils\index.js:76:26)
at file:///C:/Users/josh.greig/Desktop/turtle/python-parser/comments.mjs:20:24
at ModuleJob.run (internal/modules/esm/module_job.js:152:23)
at async Loader.import (internal/modules/esm/loader.js:166:24)
at async Object.loadESM (internal/process/esm_loader.js:68:5)
Oddly enough, I can parse the same code without a problem. The resulting tree doesn't contain the single-line comments but that is as you intended parse to work.
I'm working around this by adding a '\n' to the end of the Python code before passing it to the lexer. This bug is reproduced by Python code with a '#' comment and no newline character at the very end.
Is there a problem with this in the grammar not matching EOF and instead strictly looking for a line break?:
fragment COMMENT
: '#' ~[\r\n]*
;