Permit dollar signs ($) after the first char in unquoted IDs by oldlaptop · Pull Request #2748 · sqlitebrowser/sqlitebrowser

oldlaptop · 2021-06-19T22:01:47Z

DB4S uses its own lexer and parser to process the database's schema for display in the Database Structure tab. This lexer does not permit dollar signs ($) in unquoted identifiers. SQLite's tokenizer, however, permits (and therefore real SQLite databases may have) dollar signs in unquoted identifiers, for compatibility with certain other database systems that also permit these non-standard identifiers. (I tripped over this issue while playing with databases exported from Oracle, which often uses dollar signs in identifiers for special "system" tables and columns.) See:

https://www.sqlite.org/cgi/src/file?name=src/tokenize.c&ci=c5e2de1d24d34589&ln=162-165
https://www.sqlite.org/cgi/src/file?name=src/global.c&ci=fd8b68a474226e3b&ln=113-116

My opinion from the peanut gallery is that it might be better to use SQLite's tokenizer and parser (which could be included inline or something) so as to absolutely assure precise, bug-for-bug compatibility with at least some version of SQLite, but that's not a five-minute drive-by patch, and adding one character to one regex is. This follows what appears to be the practice in this codebase of committing flex's output after making any change to the source; if there's something undesirable about my copy of flex's output, I can leave that out.

The behavior without this patch is that tables with unquoted names containing dollar signs, or which contain unquoted column names containing dollar signs, appear in the Database Structure tab to have no columns, but otherwise seem to behave normally. Test case (as an sqlite3(1) .dump of a database with offending names):

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE foo(bar$);
INSERT INTO foo VALUES(1);
INSERT INTO foo VALUES(2);
INSERT INTO foo VALUES(3);
INSERT INTO foo VALUES(4);
CREATE TABLE baz$(a);
INSERT INTO "baz$" VALUES('a');
INSERT INTO "baz$" VALUES('b');
INSERT INTO "baz$" VALUES('c');
INSERT INTO "baz$" VALUES('d');
COMMIT;

Follow the canonical SQLite lexer in recognizing $ as a valid character in unquoted identifiers, for compatibility with certain other databases. See: https://www.sqlite.org/cgi/src/file?name=src/tokenize.c&ci=c5e2de1d24d34589&ln=162-165 https://www.sqlite.org/cgi/src/file?name=src/global.c&ci=fd8b68a474226e3b&ln=113-116

justinclift · 2021-06-20T02:19:33Z

Awesome, thanks for this PR @oldlaptop!

The best person to review this is @MKleusberg, whom Manuel has already requested review from. Hopefully he'll get to it in the next few days. 😄

MKleusberg · 2021-07-10T12:28:31Z

Thank you very much, @oldlaptop! This is a very good catch 👍 I'll add a test case too in a second.

My opinion from the peanut gallery is that it might be better to use SQLite's tokenizer and parser (which could be included inline or something) so as to absolutely assure precise, bug-for-bug compatibility with at least some version of SQLite, but that's not a five-minute drive-by patch, and adding one character to one regex is

In theory that would be very desirable. I've actually tried this before developing the current parser but it turned out that we would have to make so many adjustments to the parser code that the amount of work would be similar to developing a parser from scratch. The reason is that the parser actions need to be completely different and at the same time we cannot just replace their function bodies because sometimes we're interested in different information than SQLite is when it parses a statement and calls its functions. Also we're only a small developer team and keeping up with SQLite's changes might be tricky and is potentially more time consuming than a little fix to the lexer or parser from time to time. But that's just my thinking from a couple of years ago. Maybe one day we'll actually switch to using the original parser code, for example when we also want to parse SELECT statements and similar in the future.

mgrojo requested a review from MKleusberg June 19, 2021 22:32

MKleusberg merged commit c585372 into sqlitebrowser:master Jul 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Permit dollar signs ($) after the first char in unquoted IDs#2748

Permit dollar signs ($) after the first char in unquoted IDs#2748
MKleusberg merged 1 commit intosqlitebrowser:masterfrom
oldlaptop:permit_dollar_in_ids

oldlaptop commented Jun 19, 2021 •

edited

Loading

Uh oh!

justinclift commented Jun 20, 2021 •

edited

Loading

Uh oh!

MKleusberg commented Jul 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

oldlaptop commented Jun 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justinclift commented Jun 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MKleusberg commented Jul 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oldlaptop commented Jun 19, 2021 •

edited

Loading

justinclift commented Jun 20, 2021 •

edited

Loading