Skip to content

Lexer throws RangeError: Invalid string length #21

@joshi1983

Description

@joshi1983

There appears to be a bug with the lexer that is reproduced by the following JavaScript:

import { lexer } from 'dt-python-parser';

// This Python code is processed with no problem:
const python = `"""it is for test"""\nvar1 = "Hello World!"\n# comment here\nfor i in range(5):\n    print(i)`;
const commentTokens = lexer(python);
console.log(commentTokens);
/*
    [
      {
        type: 'Comment',
        value: '"""it is for test"""',
        start: 0,
        lineNumber: 1,
        end: 20
      }
    ]
*/

////////////////////////////////// HERE is where the bug is reproduced:
const commentTokens2 = lexer('# hi');
console.log(commentTokens2); // never reaches this point.

Here is the stack trace I get:
RangeError: Invalid string length
at lexer (C:\Users\josh.greig\Desktop\turtle\python-parser\node_modules\dt-python-parser\dist\utils\index.js:76:26)
at file:///C:/Users/josh.greig/Desktop/turtle/python-parser/comments.mjs:20:24
at ModuleJob.run (internal/modules/esm/module_job.js:152:23)
at async Loader.import (internal/modules/esm/loader.js:166:24)
at async Object.loadESM (internal/process/esm_loader.js:68:5)

Oddly enough, I can parse the same code without a problem. The resulting tree doesn't contain the single-line comments but that is as you intended parse to work.

I'm working around this by adding a '\n' to the end of the Python code before passing it to the lexer. This bug is reproduced by Python code with a '#' comment and no newline character at the very end.

Is there a problem with this in the grammar not matching EOF and instead strictly looking for a line break?:
fragment COMMENT
: '#' ~[\r\n]*
;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions