-
Notifications
You must be signed in to change notification settings - Fork 874
Description
I've written a blog post detailing the context for this, go read it.
Summary of benefits:
- Parsing the PostgreSQL SQL dialect isn’t trivial. For example, we must avoid manipulating string literals, which may contain semicolons or text that looks like placeholders. Of course, Npgsql doesn’t include a full SQL parser - that would be very hard to do - but rather a very small parser that knows the absolute minimum in order to perform its job. Now, we haven’t had any bugs recently, but I’m sure that if I really dove in there, I could produce cases where the parser mistakenly identifies a placeholder or semicolon where it shouldn’t, or vice versa. It’s an inherently unsafe situation.
- Beyond correctness, both parsing and producing the rewritten SQL is work, which can hurt performance. The longer the SQL query and the more parameters it has, the more overhead this process adds to query execution. Nobody wants that.
- When managing a parameter collection (e.g. NpgsqlParameterCollection), we have to maintain an internal dictionary that indexes parameters by their name. If we didn’t have to handle names, the collection would become a simple list - this is more efficient.
Subtasks:
- Implement support for positional parameters #3852
- Implementation of the ADO.NET batching API #3860
- Implement AppContext-switch mode to disable SQL parsing/rewriting #3920
- Report positional parameters in DataSourceInformation #4028
https://github.com/dotnet/corefx/issues/3688 discussed the same issue on a general ADO.NET level, if that issue receives attention this one is no longer necessary.
ADO.NET currently "supports" batching multiple SQL statements into a single DbCommand by packing them into the command's CommandText, delimited by semicolons (INSERT ... ; INSERT ...). In PostgreSQL's case, to support this Npgsql must parse the CommandText to find the semicolons, and break up the text into several PostgreSQL protocol messages.
Having Npgsql parse SQL is clearly problematic. SQL (and it's PostgreSQL variant) is a complex language, with non-trivial quoting and escaping, and we've had issues in the past where the SQL parser had issues. An ADO.NET provider should be as minimal and "pass-through" as possible, and not perform such a complex task.
Npgsql already maintains (and even exposes) a non-standard NpgsqlStatement class, which represents a single SQL statement within a multistatement command. In 3.0 NpgsqlStatement instances can be accessed after executing the command to access rows affected, for example. We could allow users to populate NpgsqlCommand's with NpgsqlStatements, providing a structured API for creating multistatement commands rather than the current semicolon-delimited method.
A second issue is parameters. In ADO.NET the parameter list is set on the command, which means that parameters are global to all statements. However, in the PostgreSQL protocol things don't work that way - each statement is completely separate and has its own parameter set (in the Bind message). Npgsql 3.0 works around this problem by again, parsing the SQL, figuring out which statement references which parameter, and sending each statement with its own parameters.
The solution here would be to have a parameter list on each NpgsqlStatement, allowing the user to manually specify parameters on a per-statement basis.
An example of what this API would look like:
using (var cmd = new NpgsqlCommand
{
Connection = conn,
Statements =
{
new NpgsqlStatement("INSERT INTO data (first_name, last_name) VALUES ($1, $2)") {
Parameters =
{
new NpgsqlParameter { Value = "John" },
new NpgsqlParameter { Value = "Doe" }
}
},
new NpgsqlStatement("INSERT INTO data (first_name, last_name) VALUES ($1, $2)") {
Parameters =
{
new NpgsqlParameter { Value = "Alice" },
new NpgsqlParameter { Value = "Doe" }
}
}
}
})
{
cmd.ExecuteNonQuery();
}Some notes:
- In order to truly stop parsing the SQL, users would also have to switch to PostgreSQL's parameter placeholders, which are positional ($1, $2) rather than names (@myParam).
- For single-statement commands, the command-level CommandText and parameter list can still be used.
- It should be impossible to mix both command-level parameters and statement-level parameters.
- This should obviously be an opt-in option
- Port the EF6 and EFCore providers to use this API