Skip to content

Implement raw SQL mode for prepared, non-batched commands#3834

Closed
roji wants to merge 6 commits intomainfrom
RawMode
Closed

Implement raw SQL mode for prepared, non-batched commands#3834
roji wants to merge 6 commits intomainfrom
RawMode

Conversation

@roji
Copy link
Member

@roji roji commented Jun 13, 2021

Npgsql currently does SQL parsing/rewriting of user-provided SQL, for two reasons:

  1. For legacy reasons, we support the named@p instead of the PostgreSQL positional native $1.
  2. We support semicolon-batching, where CommandText contains semicolon. Since PG requires protocol messages to be per-statement, we need to split the SQL.

This is problematic for two reasons:

  1. Performance: we have parsing/rewriting in the hot path. Even for prepared statements, we must perform the parsing before we can look up the statements in the internal SQL cache (since individual statements are cached, not the entire SQL with semicolons).
  2. Correctness: a DB driver should not touch SQL, and writing a correct parser for PostgreSQL isn't trivial (e.g. detect semicolons, but not inside strings). In practice, we've had very little bug reports over the years.

This PR removes SQL parsing/rewriting for non-batched, prepared commands (explicit or auto): PostgreSQL's native parameter placeholders can be used (so $1 instead of @p). The same will be done for batched commands in a separate PR, as part of implementing the new .NET batching API.

Some notes:

  • I used to think about raw mode as an opt-in flag. Here are some issues with that approach:

    • This isn't just a connection string parameter you turn on, since code creating commands has to be changed as well (different placeholders in SQL, etc), so layers like EF/Dapper would need to make changes to use this feature. At that point the connection string parameter becomes required - since EF wouldn't work with named parameters any more - so raw mode would somehow need to be turned on regardless of the connection string provided by the user (EF/Dapper don't typically touch/rewrite connection strings).
    • We could have an additional programmatic way to activate raw mode (e.g. some new boolean property on NpgsqlCommand), but that would be an Npgsql-specific API, so impossible to use in provider-agnostic code.
    • I generally hate adding behavior flags/parameters.
  • After thinking about it, I ended up with a version that doesn't require an opt-in.

    • We automatically detect if we're in positional parameter mode by checking the first parameter's name. If it's empty, we're in positional mode.
    • When preparing a command (explicit or auto), we go through the existing SqlQueryParser. This tells us if we're in legacy batching mode. For legacy batching, only the individual statements get stored in PreparedStatementManager - never the full CommandText with semicolons.
    • The next time we execute the same command, if we found it in PreparedStatementManager we know there's no legacy batching. And if we're in positional mode (and have no output parameters), we can simply send the command's CommandText and Parameters as-is - no need to parse/rewrite any more.
  • This means that we parse - but not for prepared commands which use positional parameters; this is new the hot path.

  • This addresses the 1st point above (perf), but not the 2nd (completely remove the parsing/rewriting logic when using positional parameters). Since we haven't had any actual trouble with the parser, it seems reasonable.

  • Breaking change: having an NpgsqlParameter which isn't referenced in the SQL no longer works (silly)

Part of #1042
Leads to #2317

@roji roji requested review from Brar, NinoFloris and vonzshik June 13, 2021 13:34
@roji
Copy link
Member Author

roji commented Jun 13, 2021

Here's some perf data for this PR.

tl;dr I'm seeing 2.5%-3% RPS improvement in the Fortunes Platform benchmark. Memory allocations are down 33% in number of allocations, 17% in bytes.

RPS runs

commit Before/after RPS
98e7af8 Before change 479,858
04eb3b1 After change 492,215 (+2.5%)
Detailed run results

Fortunes platform before (98e7af8)

dotnet run -- --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/platform.benchmarks.yml --scenario fortunes --profile aspnet-citrine-lin --application.framework net6.0 --application.options.outputFiles Z:\projects\npgsql\src\Npgsql\bin\Release\net6.0\Npgsql.dll --variable warmup=45 --variable duration=120

| db                  |         |
| ------------------- | ------- |
| CPU Usage (%)       | 39      |
| Cores usage (%)     | 1,093   |
| Working Set (MB)    | 45      |
| Build Time (ms)     | 1,825   |
| Start Time (ms)     | 358     |
| Published Size (KB) | 914,277 |


| application           |                                 |
| --------------------- | ------------------------------- |
| CPU Usage (%)         | 96                              |
| Cores usage (%)       | 2,688                           |
| Working Set (MB)      | 463                             |
| Private Memory (MB)   | 1,582                           |
| Build Time (ms)       | 2,590                           |
| Start Time (ms)       | 1,585                           |
| Published Size (KB)   | 90,962                          |
| .NET Core SDK Version | 6.0.100-preview.6.21276.12      |
| ASP.NET Core Version  | 6.0.0-preview.6.21313.1+56ddfd2 |
| .NET Runtime Version  | 6.0.0-preview.6.21313.1+4bb1318 |


| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 31         |
| Cores usage (%)        | 872        |
| Working Set (MB)       | 37         |
| Private Memory (MB)    | 363        |
| Start Time (ms)        | 0          |
| First Request (ms)     | 73         |
| Requests/sec           | 479,858    |
| Requests               | 57,630,609 |
| Mean latency (ms)      | 1.12       |
| Max latency (ms)       | 21.80      |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 622.83     |
| Latency 50th (ms)      | 1.01       |
| Latency 75th (ms)      | 1.21       |
| Latency 90th (ms)      | 1.50       |
| Latency 99th (ms)      | 3.80       |

Fortunes platform after (04eb3b1)

dotnet run -- --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/platform.benchmarks.yml --scenario fortunes --profile aspnet-citrine-lin --application.framework net6.0 --application.options.outputFiles Z:\projects\npgsql\src\Npgsql\bin\Release\net6.0\Npgsql.dll --variable warmup=45 --variable duration=120

| db                  |         |
| ------------------- | ------- |
| CPU Usage (%)       | 39      |
| Cores usage (%)     | 1,100   |
| Working Set (MB)    | 45      |
| Build Time (ms)     | 1,702   |
| Start Time (ms)     | 358     |
| Published Size (KB) | 914,277 |


| application           |                                 |
| --------------------- | ------------------------------- |
| CPU Usage (%)         | 96                              |
| Cores usage (%)       | 2,699                           |
| Working Set (MB)      | 453                             |
| Private Memory (MB)   | 1,581                           |
| Build Time (ms)       | 2,629                           |
| Start Time (ms)       | 1,602                           |
| Published Size (KB)   | 90,962                          |
| .NET Core SDK Version | 6.0.100-preview.6.21276.12      |
| ASP.NET Core Version  | 6.0.0-preview.6.21313.1+56ddfd2 |
| .NET Runtime Version  | 6.0.0-preview.6.21313.1+4bb1318 |


| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 32         |
| Cores usage (%)        | 897        |
| Working Set (MB)       | 38         |
| Private Memory (MB)    | 363        |
| Start Time (ms)        | 0          |
| First Request (ms)     | 74         |
| Requests/sec           | 492,215    |
| Requests               | 59,114,682 |
| Mean latency (ms)      | 1.09       |
| Max latency (ms)       | 34.83      |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 638.87     |
| Latency 50th (ms)      | 0.99       |
| Latency 75th (ms)      | 1.18       |
| Latency 90th (ms)      | 1.46       |
| Latency 99th (ms)      | 3.91       |

Memory profiling results (for 50k iterations):

Scenario Number of allocations Bytes allocated
Before 1400K 130.05MB
After 934.3K 108.17MB

/cc @sebastienros @benaadams @ajcvickers

@benaadams
Copy link
Contributor

tl;dr I'm seeing 2.5%-3% RPS improvement in the Fortunes Platform benchmark. Memory allocations are down 33% in number of allocations, 17% in bytes.

What about the Multiple queries test as that runs 20 queries sequentially (also 21% of composite score)

Copy link
Member

@NinoFloris NinoFloris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a fairly thorough review of the changes, not sure yet why we can't skip parsing entirely when positional mode is triggered.

Somewhat related to this all I hope we get to close some of the preparation gaps and splitting things out. Explicit preparation for multiplexing would really make my day for instance but in general I think we could really improve named param/legacy batching perf when auto prepare is enabled.

Breaking change: having an NpgsqlParameter which isn't referenced in the SQL no longer works (silly)

Just for positional or named as well?


NpgsqlStatement TruncateStatementsToOne()
{
switch (_statements.Count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this hit anything but 0 or 1 (given the check for postitional + > 0 statement index up in the parser?) Also, as I have never given this any thought, how exactly do you construct a 0 statement command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this hit anything but 0 or 1 (given the check for postitional + > 0 statement index up in the parser?)

This happens when you execute an NpgsqlCommand with legacy batching (so now _statements.Length > 1). If you then reuse that same NpgsqlCommand instance to execute a single statement, we truncate the statement list.

how exactly do you construct a 0 statement command?

If you mean when _statements.Count is 0 - that's just when a command is initially created (so on first execution). Otherwise I don't think it's possible to actually execute a 0-statement command - CommandText has to be non-empty...

@roji
Copy link
Member Author

roji commented Jun 15, 2021

Did a fairly thorough review of the changes, not sure yet why we can't skip parsing entirely when positional mode is triggered.

One important issue is whether there's a semicolon in there ("legacy batching") - we don't know that until we parse. We could say that in positional mode we don't care: legacy batching simply isn't supported (use the upcoming batching API instead), and if you do it, PG will error. But that leaves the case of no parameters at all... We do need to parse there since legacy batching is valid.

So we could indeed skip parsing only when there are positional parameters - but we'd still need to do it for no parameters, which seems inconsistent/weird. My logic here was that if we get parsing/rewriting out of the hot path (the prepared case) - which this PR does - then it's better to just always parse for the non-prepared case for consistency etc. Had there been a way to totally get rid of parsing (e.g. the opt-in path), I'd have like that, but given that we keep it in...

Does that make sense?

Explicit preparation for multiplexing would really make my day for instance

I'm assuming you want this in order to "pin" certain SQLs and make sure they stay prepared forever (and not subject to the LRU logic of auto-preparation), right? Otherwise explicit preparation simply isn't very useful given that Dapper/EF don't support it etc. In any case, it's probably possible to make preparation and multiplexing work, by deferring the preparation to execution time.

I think we could really improve named param/legacy batching perf when auto prepare is enabled.

Is that important? My mindset is that starting 6.0, the recommended pattern would be positional parameters with the new batching API, and anything else would be sub-optimal. In other words, I'd rather not spend time on optimizing SqlQueryParser (which is already not bad) when there's a far better API that avoids it entirely.

@roji
Copy link
Member Author

roji commented Jun 15, 2021

Breaking change: having an NpgsqlParameter which isn't referenced in the SQL no longer works (silly)

Just for positional or named as well?

Both - though the note is a bit too concise/imprecise :)

In SqlQueryParser, if nothing needed to be rewritten (no parameter placeholders, no semicolon), we just take the user SQL and param list as-is. This is what happens for unprepared positional parameters (no legacy batching supported) - we only parse, no rewriting. This means that if you have unreferenced positional parameters, we just send them as-is, and PG complains because the Parse and Bind messages mismatch.

The same optimization also kicks in if you have a SQL without placeholders, but some named parameters in the list - we just send everything as-is. So in named parameter mode, if there's at least one placeholder (or legacy batching), we will rewrite and everything will be OK (we pick out only the referenced parameters). But if it's a single statement with no placeholders, but parameters are present (named or not), it will fail.

I could add a check to not rewrite only when doing positional, though the whole thing is a bit ridiculous (why are you putting that parameter in there if it isn't referenced...).

@roji
Copy link
Member Author

roji commented Jun 15, 2021

Thanks for the thorough reviewing @NinoFloris!

@roji
Copy link
Member Author

roji commented Jun 15, 2021

Breaking change: having an NpgsqlParameter which isn't referenced in the SQL no longer works (silly)

Just for positional or named as well?

What I wrote above wasn't accurate - PG actually doesn't have a problem with parameters in Bind which aren't referenced in Parse (the problem was specifically because the parameter in the test had a null value without NpgsqlDbType, so PG couldn't infer the type).

Regardless, I changed things so that SqlQueryParser only avoids rewriting for positional parameters (in the named or zero parameters cases it always rewrites). But once the command is prepared all that goes away anyway.

@NinoFloris
Copy link
Member

One important issue is whether there's a semicolon in there ("legacy batching") - we don't know that until we parse. We could say that in positional mode we don't care: legacy batching simply isn't supported (use the upcoming batching API instead), and if you do it, PG will error. But that leaves the case of no parameters at all... We do need to parse there since legacy batching is valid.

So we could indeed skip parsing only when there are positional parameters - but we'd still need to do it for no parameters, which seems inconsistent/weird. My logic here was that if we get parsing/rewriting out of the hot path (the prepared case) - which this PR does - then it's better to just always parse for the non-prepared case for consistency etc. Had there been a way to totally get rid of parsing (e.g. the opt-in path), I'd have like that, but given that we keep it in...

Ok I get where you're coming from, but I still think we should go down the fast path (no parsing) when we notice we're in positional mode. IMO there should be some 'zero overhead' way to execute a query without preparation. Speaking for myself there are queries today that I'd rather not prepare (due to dynamic compilation) but would still like to execute with the lowest latency, and as they're not tiny queries parsing does add measurable overhead, sorry...

Is that important? My mindset is that starting 6.0, the recommended pattern would be positional parameters with the new batching API, and anything else would be sub-optimal. In other words, I'd rather not spend time on optimizing SqlQueryParser (which is already not bad) when there's a far better API that avoids it entirely.

I'd say this is hopelessly optimistic ;) I can see legacy batching being entirely replaced (though dapper will take its time) but named parameters are so useful that I don't see people (or me) dropping them any time soon * for handwritten sql * (orms is another matter). Making people choose between them and perf feels like we're going for a false choice. Once batching is supported it would be entirely reasonable to add a flag (I know) to disable legacy batching entirely. And at that point we can just directly lookup statements in the preparation cache without parsing, for unprepared cases it would still not be free but it could become more straight forward than what's there today.

Thoughts?

@roji
Copy link
Member Author

roji commented Jun 15, 2021

[...] I still think we should go down the fast path (no parsing) when we notice we're in positional mode. IMO there should be some 'zero overhead' way to execute a query without preparation.

I'm not totally against doing that (in fact it was written that way at some point). But at the end of the day, there's already so much more overhead with non-prepared execution, that the parsing/rewriting is unlikely to have a noticeable effect... I'd like to see some numbers showing it's worth it first...

I'd say this is hopelessly optimistic ;) I can see legacy batching being entirely replaced (though dapper will take its time) but named parameters are so useful that I don't see people (or me) dropping them any time soon * for handwritten sql * (orms is another matter).

I definitely don't expect everyone to switch away from legacy batching and named parameters - I'm just saying they'll need to do it if they want the absolute best possible perf, that's all. I mean, putting this in proportion - the parsing/rewriting overhead is quite small at the end of the day, and unlikely to matter to all but the most perf-hungry users. And those guys really should just use the optimized API, no?

And at that point we can just directly lookup statements in the preparation cache without parsing, for unprepared cases it would still not be free but it could become more straight forward than what's there today.

For named parameters, it's a bit more complicated than that - even for already-prepared commands. I could sent SELECT @p1, @p2, and execute it once with parameters in the order [@p1, @p2], and another time in the order [@p2, @p1]. Today this is taken care of by always doing the parsing/rewriting (which also orders the parameters correctly); of course we could optimize that further, saving some sort of mapping structure which says "the parameter with name @p2 needs to go in the second position in Bind". But at that point we're introducing a lot of complexity (and potential bugs) for optimizing a slow path, when you have the option of just using positional parameters for ultimate perf...

To summarize, I'm definitely not against optimizing the slower paths - but I'd rather avoid lots of complexity (the driver is already pretty complex). I think that we can also treat some of the above outside of this PR, as later optimizations, no?

@NinoFloris
Copy link
Member

NinoFloris commented Jun 16, 2021

so much more overhead with non-prepared execution

From the pg side you mean? Well the thing is, dynamic sql execution puts you between a rock and a hard place, you can (auto) prepare everything and fill your pg memory and npgsql caches with nonsense or you choose to do it unprepared. Isn't it worth it to try and get that perf overhead as low as possible for a general purpose driver?

I definitely don't expect everyone to switch away from legacy batching and named parameters - I'm just saying they'll need to do it if they want the absolute best possible perf, that's all. I mean, putting this in proportion - the parsing/rewriting overhead is quite small at the end of the day, and unlikely to matter to all but the most perf-hungry users. And those guys really should just use the optimized API, no?

Sure I just wanted make sure that was what was behind your initial thoughts on it.

of course we could optimize that further, saving some sort of mapping structure which says "the parameter with name @p2 needs to go in the second position in Bind".

That would have been my initial guess yeah, good to know this isn't how it works today ;)

@roji
Copy link
Member Author

roji commented Jun 16, 2021

so much more overhead with non-prepared execution

From the pg side you mean?

Yeah - planning, 2 additional wire protocol messages, all that stuff.

Well the thing is, dynamic sql execution puts you between a rock and a hard place, you can (auto) prepare everything and fill your pg memory and npgsql caches with nonsense or you choose to do it unprepared. Isn't it worth it to try and get that perf as low as possible for a general purpose driver?

Well, if your dynamic SQLs tend to repeat themselves, then auto-preparation should work for that scenario (just like non-dynamic). If you're really producing a lot of one-off SQLs which will never repeat, then it definitely doesn't make sense to prepare. Whether that scenario is worth optimizing depends on how much overhead the SQL parsing/rewriting actually adds for those cases... In other words, you can see the impact of removing this for prepared statements precisely because it's already quite optimized - I'm just not sure you'll see that for unprepared.

Can always benchmark and see though.

@NinoFloris NinoFloris self-requested a review June 17, 2021 13:38
Copy link
Member

@NinoFloris NinoFloris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @roji looks good to me now!

@roji
Copy link
Member Author

roji commented Jun 17, 2021

@benaadams

What about the Multiple queries test as that runs 20 queries sequentially (also 21% of composite score)

First, thanks for suggesting - that uncovered a regression I accidentally introduced... should be fixed (waiting to confirm on the perf lab soon).

I'm seeing very little effect on multiple queries, and that make sense, since that scenario already benefits from an optimization today: if you re-execute the same NpgsqlCommand instance multiple times and it's prepared (either explicitly or auto), then subsequent executions take a very fast path (no need to parse/rewrite again).

So basically this PR only improves things for the first time you execute on a given NpgsqlCommand instance, and multiple queries is all about reusing the instance many times.

I'll run single query though.

@roji roji closed this Jun 17, 2021
@roji roji deleted the RawMode branch June 17, 2021 22:03
@roji
Copy link
Member Author

roji commented Jun 17, 2021

The approach in this PR has the problem of tying together the detection of raw mode (absence of legacy semicolon batching) with the preparation SQL cache. The SQL cache is connection-specific, and in multiplexing there is no connection until we're in the write loop; adding SQL parsing/rewriting there (e.g. for the named parameter case) caused significant perf regressions (25k->18k RPS in multiple query).

#3852 is an alternative approach that does not suffer from this problem, but does parse/rewrite for the 0-param case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants