Permit dollar signs ($) after the first char in unquoted IDs#2748
Permit dollar signs ($) after the first char in unquoted IDs#2748MKleusberg merged 1 commit intosqlitebrowser:masterfrom
Conversation
Follow the canonical SQLite lexer in recognizing $ as a valid character in unquoted identifiers, for compatibility with certain other databases. See: https://www.sqlite.org/cgi/src/file?name=src/tokenize.c&ci=c5e2de1d24d34589&ln=162-165 https://www.sqlite.org/cgi/src/file?name=src/global.c&ci=fd8b68a474226e3b&ln=113-116
|
Awesome, thanks for this PR @oldlaptop! The best person to review this is @MKleusberg, whom Manuel has already requested review from. Hopefully he'll get to it in the next few days. 😄 |
|
Thank you very much, @oldlaptop! This is a very good catch 👍 I'll add a test case too in a second.
In theory that would be very desirable. I've actually tried this before developing the current parser but it turned out that we would have to make so many adjustments to the parser code that the amount of work would be similar to developing a parser from scratch. The reason is that the parser actions need to be completely different and at the same time we cannot just replace their function bodies because sometimes we're interested in different information than SQLite is when it parses a statement and calls its functions. Also we're only a small developer team and keeping up with SQLite's changes might be tricky and is potentially more time consuming than a little fix to the lexer or parser from time to time. But that's just my thinking from a couple of years ago. Maybe one day we'll actually switch to using the original parser code, for example when we also want to parse SELECT statements and similar in the future. |
DB4S uses its own lexer and parser to process the database's schema for display in the Database Structure tab. This lexer does not permit dollar signs (
$) in unquoted identifiers. SQLite's tokenizer, however, permits (and therefore real SQLite databases may have) dollar signs in unquoted identifiers, for compatibility with certain other database systems that also permit these non-standard identifiers. (I tripped over this issue while playing with databases exported from Oracle, which often uses dollar signs in identifiers for special "system" tables and columns.) See:https://www.sqlite.org/cgi/src/file?name=src/tokenize.c&ci=c5e2de1d24d34589&ln=162-165
https://www.sqlite.org/cgi/src/file?name=src/global.c&ci=fd8b68a474226e3b&ln=113-116
My opinion from the peanut gallery is that it might be better to use SQLite's tokenizer and parser (which could be included inline or something) so as to absolutely assure precise, bug-for-bug compatibility with at least some version of SQLite, but that's not a five-minute drive-by patch, and adding one character to one regex is. This follows what appears to be the practice in this codebase of committing flex's output after making any change to the source; if there's something undesirable about my copy of flex's output, I can leave that out.
The behavior without this patch is that tables with unquoted names containing dollar signs, or which contain unquoted column names containing dollar signs, appear in the Database Structure tab to have no columns, but otherwise seem to behave normally. Test case (as an sqlite3(1) .dump of a database with offending names):