PostgreSQL Tokenization: Fix unexpected characters after question mark being silently ignored#2129
Merged
iffyio merged 3 commits intoapache:mainfrom Dec 18, 2025
Merged
Conversation
de8db69 to
571009c
Compare
Contributor
Author
I have added the test case. I also stand corrected about one of my previous comments, |
iffyio
reviewed
Dec 18, 2025
…tion mark if it is not one of the expected characters
70f0da3 to
f23087e
Compare
iffyio
approved these changes
Dec 18, 2025
ayman-sigma
pushed a commit
to sigmacomputing/sqlparser-rs
that referenced
this pull request
Feb 3, 2026
…k being silently ignored (apache#2129)
fmguerreiro
pushed a commit
to fmguerreiro/datafusion-sqlparser-rs
that referenced
this pull request
Feb 20, 2026
…k being silently ignored (apache#2129)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of Issue
Upon encountering '?', the tokenizer first consumes the token and peeks to match any the following: '|' , '&', '-', '#'. If none of the symbols are present it will call
consume_and_return(chars, Token::Question)which consumes an additional character but only returns aToken::Question. This is also reflected intokenize_with_locationwhere the relevantToken::Questionwill have a span of 2 characters.Reproducing the Issue
Both tests will fail on the current main branch.
The Proposed Fix
The PR replaces the call to
self.consume_and_return(chars, Token::Question)withOk(Some(Token::Question))no longer consuming the additional token.Additional considerations
As far as I am aware,
Token::Questionis not a valid PostgreSQL token and the best course of action might be to explicitly not support it.