SQL Performance Archives - SQLPerformance.com

The Lock Escalation Threshold – Part 3

Paul White — Thu, 20 Oct 2022 09:00:21 +0000

Lock Lifetimes

The examples so far in this series (part 1 | part 2) have all used repeatable read isolation to acquire shared locks on qualifying rows and hold them to the end of the transaction. This was convenient for the demos but repeatable read isolation isn’t common in everyday use.

Locking optimizations

When SQL Server reads data under locking read committed isolation (the default for non-cloud offerings) and chooses row-level locking granularity, it normally only locks one row at a time. The engine processes the row through parent operators before releasing the shared lock prior to locking and reading the next row.

When the engine chooses page-level locking granularity, it takes and holds a shared page lock while processing rows on the current page and then releases the lock right before acquiring a lock on the next page.

In general, SQL Server is careful to take and hold locks only when needed and only long enough to guarantee correct results, taking account of the current isolation level. When locking at row granularity, the engine can skip row locks entirely when it’s safe to do.

Changing data

From a lock escalation perspective, we’re more concerned with the times locks need to be held longer than usual for locking read committed.

This is common when identifying rows to update (or delete) when a blocking operator appears in the execution plan. Rows qualifying for the update must continue to qualify until the change is performed and committed.

In this context, a blocking operator is any operator potentially consuming more than one row before producing an output row. It need not be fully blocking (reading its entire input before producing any output).

There’s no problem with only locking one row at a time if any changes needed are always applied before processing the next row. This is the case for a pipelined (or streaming) execution plan (one without blocking operators).

When a blocking operator is present (for Halloween protection or any other reason), this scheme breaks down because more than one row can be read before any changes are made.

Releasing locks after each row would allow another transaction to change a row we’ve decided to update, allowing lost updates and incorrect results. For an example, see Craig Freedman’s article Read Committed and Updates on the Microsoft documentation site.

Lock classes

SQL Server solves this problem using lock classes.

A lock class is a container for locks needing a lifetime extension within the current statement. Only locks needing to be held longer than the default are added to the lock class. Locks associated with a lock class are released in bulk at the end of the statement but may be released earlier in some circumstances, as I’ll describe later on. The maximum life extension granted by a lock class is the duration of the current statement.

This is different from using an isolation level like repeatable read, which holds shared locks for the duration of the current transaction. Using lock classes is better for resource usage, concurrency, and performance than internally escalating to repeatable read would be.

Lock classes are an internal implementation detail and not visible in regular execution plans.

Reading data

Lock classes may be needed when reading data. This can happen for several reasons when the server needs to ensure more than one row at a time remains unchanged for longer than normal during statement execution. This typically requires a blocking operator (as previously defined) in the plan.

Microsoft’s Craig Freedman gave some examples in blog posts now hosted on the Microsoft documentation site:

Lock classes are needed for key lookups when a blocking operator is present before the lookup. This includes the hidden operators for nested loops prefetching and batch sorting, as well as visible blocking operators like Sort.

The second of Craig’s posts explains locks are held longer than usual when a blocking operator is used and large objects (LOBs) are accessed by reference as an internal optimization. SQL Server needs to hold locks in this case to ensure the by-reference LOB pointers remain valid.

Let’s look at this in a bit of detail.

Test 5—Lock escalation reading data

Replace the test statement in the procedure from part one with the following:

-- Test statement(s)
SELECT
    LT.lob
FROM dbo.LockTest AS LT
    WITH (ROWLOCK)
ORDER BY
    LT.i
    OFFSET 0 ROWS
    FETCH FIRST 1 ROW ONLY
OPTION (MAXDOP 1);
-- End test statement(s)

Note the absence of a REPEATABLEREAD isolation level hint. All the tests in this part are run under locking read committed isolation.

Run the test with 6214 rows in a clustered table:

EXECUTE dbo.RunTest 
    @Rows = 6214,
    @Clustered = 'true';

The execution plan reads the LOB column before a blocking sort:

The test results confirm lock escalation occurred:

No locks are shown in the output because lock classes only hold locks for the duration of the current statement at most. The procedure displays locks still held after the test statement completes.

Nevertheless, we can see locks were held longer than usual because four lock promotion attempts were recorded, and lock escalation was successful. The escalated table lock was associated with the lock class and released at the end of the test statement.

Refer back to part one of the series if you need a reminder of how lock ‘promotion attempts’ and escalation work in detail.

Test 6—No escalation on a heap

Let’s run test 5 again, but this time on a heap table:

EXECUTE dbo.RunTest 
    @Rows = 6214,
    @Clustered = 'false';

The execution plan is:

The results show no lock ‘promotion attempts’, and no escalation:

The by-reference internal LOB optimization is only applied to indexes, so locks don’t need to be held longer than usual in this heap table test.

Lock classes are available in this execution plan, but heap tables only use them to hold locks longer (for stability reasons) when the LOB data is off-row. All of the LOB data in this test is held in-row because it’s small enough and the data type is nvarchar(max).

I’m not going to demonstrate it for space reasons, but if you change the column data type to the deprecated ntext type (which defaults to off-row LOB storage) or use sp_tableoption to store the nvarchar(max) data off-row, locks will be held for longer.

You would need to increase the number of rows in the test to 6418 to see lock escalation due to the lock class usage (at 6250 held locks). The increase to 6418 rows compensates for the smaller number of in-row pages when LOB data is off-row. Fewer intent-share page locks mean we need a few more row locks to reach the target.

I mention this as a side note to avoid giving the misleading impression lock escalation can’t occur on heap tables because of LOB data. It can, but not in this test.

Let’s now look at another example based on one of Craig’s LOB tests where lock escalation occurs without a blocking operator. I’ll give some additional explanation and show how lock classes are involved.

Test 7—Lock escalation without a blocking operator

Replace the test statement with:

-- Test statement(s)
SELECT
    LT.i
FROM dbo.LockTest AS LT
    WITH (ROWLOCK)
WHERE
    LT.lob = N'banana'
OPTION (MAXDOP 1);
-- End test statement(s)

None of the rows in the table will match the test predicate, but this isn’t important for the test—it merely makes the output smaller and faster.

Run the test with 6214 rows in a clustered table:

EXECUTE dbo.RunTest 
    @Rows = 6214,
    @Clustered = 'true';

The post-execution plan contains no blocking operators:

The results are:

This test shows lock escalation where none is expected. Craig explains this as SQL Server being “overly conservative” in deciding when to hold locks for longer. This is true, of course, but there’s a bit more to say.

Test 8—No lock escalation with LOB projection

Let’s run test 7 again with a small modification:

-- Test statement(s)
SELECT
    LT.lob
FROM dbo.LockTest AS LT
    WITH (ROWLOCK)
WHERE
    LT.lob = N'banana'
OPTION (MAXDOP 1);
-- End test statement(s)

The only change is to return the lob column instead of column i.

Run the test again with the same number of rows on a clustered table:

EXECUTE dbo.RunTest 
    @Rows = 6214,
    @Clustered = 'true';

The execution plan is the same as for test 7, but the results are different:

This time we see no lock ‘promotion attempts’ and no lock escalation. As is usual under locking read committed isolation, locks were acquired and released one at a time, and lock classes weren’t needed.

SQL Server can avoid using a lock class (overly conservatively) when we project the LOB column unmodified. Other columns and expressions may be output as well, but it’s essential the LOB column appears undisturbed.

Test 9—Lock escalation with variable assignment

When I say the LOB column must appear in the output unmodified, I mean no changes at all. Even assigning it to a variable is enough to cause lock escalation again:

-- Test statement(s)
DECLARE @lob nvarchar(max);

SELECT
    @lob = LT.lob
FROM dbo.LockTest AS LT
    WITH (ROWLOCK)
WHERE
    LT.lob = N'banana'
OPTION (MAXDOP 1);
-- End test statement(s)

The execution plan is the same once again, but the results show lock escalation has returned:

This is an interesting example of variable assignment changing test results.

To be clear, holding locks for longer than usual is unnecessary in this test, as it was in test 7. The logic SQL Server uses to determine when lock classes and longer locks are needed is imperfect.

Update lock lifetimes

The documentation states that update locks (U) requested via the UPDLOCK table hint are taken and held to the end of the transaction.

This is generally true, but there’s an exception. Locks are taken before the row is tested to see if it qualifies for the change. When the same query plan operator taking the U lock determines the row doesn’t qualify, the U lock is released immediately.

Let’s look at a couple of examples to see when this does and doesn’t apply.

Test 10—Update row locks released early

Change the test statement to one which locates 10 rows, with five qualifying for the update:

---- Test statement(s)
UPDATE dbo.LockTest
    WITH (ROWLOCK, UPDLOCK)
SET i += 1
WHERE
    pk BETWEEN 1 AND 10
    AND i <= 5;
-- End test statement(s)

Run the test on a clustered table:

EXECUTE dbo.RunTest 
    @Rows = 6214,
    @Clustered = 'true';

The execution plan is:

The Clustered Index Seek locates the 10 rows matching the pk predicate and applies the unindexed test on column i as a residual predicate.

The results of the test are:

The Clustered Index Seek took 10 row-level U locks but released five of them early (before the end of the transaction) because the residual predicate wasn’t satisfied. This was possible because the same operator was responsible for both acquiring and releasing the locks.

The five remaining U locks were converted to exclusive X locks at the Clustered Index Update operator right before making the required data changes.

Notice the final result set shows 15 row-level locks were taken, 10 U locks at the seek, and five X locks at the update. This is a consequence of the way the sys.dm_db_index_operational_stats DMV counts lock requests.

Test 11—Intent-update locks not released early

Expand the pk range from test 10 to cover the whole table:

---- Test statement(s)
UPDATE dbo.LockTest
    WITH (ROWLOCK, UPDLOCK)
SET i += 1
WHERE
    pk BETWEEN 1 AND 6214
    AND i <= 0;
-- End test statement(s)

Run the test on the same table:

EXECUTE dbo.RunTest 
    @Rows = 6214,
    @Clustered = 'true';

The execution plan is the same, but the results are:

The row-level U locks were released, but the intent-update IU locks remain. On large tables, these unnecessary page-level locks could lead to lock escalation. A similar thing happens with released row-level S locks under repeatable read isolation—the IS page locks remain.

Test 12—Update row locks not released early

Let’s now return to the 10-row update statement from test 10, modifying the residual predicate slightly:

---- Test statement(s)
UPDATE dbo.LockTest
    WITH (ROWLOCK, UPDLOCK)
SET i += 1
WHERE
    pk BETWEEN 1 AND 10
    AND i <= CONVERT(sql_variant, 5);
-- End test statement(s)

Run the test again:

EXECUTE dbo.RunTest 
    @Rows = 6214,
    @Clustered = 'true';

The execution plan has now changed to feature a separate Filter operator because sql_variant tests can’t be pushed down into a seek or scan as a residual predicate:

The residual predicate on column i is no longer tested by the same operator acquiring the U locks. This is reflected in the test results:

The five X locks converted from U locks seen in test 10 are still present, but they’ve been joined by five U locks from rows locked at the Clustered Index Seek that didn’t qualify at the Filter operator. Because the rows were found not to qualify by a different plan operator, the unnecessary U locks weren’t released early.

The test is a little contrived for compactness, but it’s extremely common for plans to apply predicate tests outside the original locking operator, for example after a join, aggregation, or row numbering exercise.

This is something to be aware of if you use UPDLOCK hints.

Test 13—Lock escalation from update locks

It’s easy to see how this could lead to lock escalation, but for completeness, I’ll show a quick example where no rows qualify for the update:

---- Test statement(s)
UPDATE dbo.LockTest
    WITH (ROWLOCK, UPDLOCK)
SET i += 1
WHERE
    i <= CONVERT(sql_variant, 0);
-- End test statement(s)

Run the test as before (switch to a heap table if you like):

EXECUTE dbo.RunTest 
    @Rows = 6214,
    @Clustered = 'true';

The execution plan again features a separate Filter:

The results are:

The unnecessary U locks couldn’t be released early, resulting in lock escalation to an exclusive table lock (table-level update locks don’t exist). This lock will be held to the end of the transaction, despite our update affecting no rows.

If you run the test again with only 6213 rows in the table, lock escalation will not occur, and you’ll see 6213 row-level U locks, 35 IU page locks, and an IX table lock. All of these locks will be held to the end of the transaction.

Internal update locks

Even when a data-changing statement (not just an update) doesn’t include the UPDLOCK table hint, the engine still acquires some update locks. This is an optimization intended to avoid a common cause of deadlocks when multiple processes read from and write to the same table.

SQL Server takes internal update locks when reading rows needed for a data-changing operation, but only at the plan operator responsible for initially retrieving the row locator (clustering key or RID for heaps).

Internal update locks differ from update locks acquired from the UPDLOCK in three important ways:

The maximum lifetime of an internal update lock is the current statement. They’re not held to the end of the transaction though they may be converted to an exclusive lock, which will be held until the transaction ends.
Internal update locks can usually be released by any operator in the plan. The outcomes seen in tests 12 and 13 don’t generally apply to internal update locks.
When a row-level internal update lock is released, the associated IU page lock is also released.

I’m not going to demonstrate all these differences for space reasons and because my test procedure isn’t set up to show the types of locks released before the end of the transaction. You can validate for yourself by running the UPDLOCK tests without the hint while monitoring the locks taken and released.

I qualified point two above because there’s an important exception. Internal update locks can be released by any operator unless the lock is associated with a lock class. In this case, the lock can only be released by the acquiring operator.

Test 14—Internal update locks released with a lock class

The following update requires a lock class because it retrieves a LOB column accessed by reference:

---- Test statement(s)
UPDATE dbo.LockTest
    WITH (ROWLOCK)
SET lob = LEFT(lob, 1)
WHERE
    i <= 0;
-- End test statement(s)

A clustered table is necessary for the by-reference LOB access:

EXECUTE dbo.RunTest 
    @Rows = 6214,
    @Clustered = 'true';

The execution plan shows the residual predicate on column i pushed down into the scan. No rows qualify for the update:

The results show all 6214 internal update row locks were released:

A lock class was needed, but the internal update row locks were released by the same operator acquiring them.

Test 15—Internal update locks held in a lock class

Modifying the residual predicate in test 14 introduces the familiar Filter operator:

---- Test statement(s)
UPDATE dbo.LockTest
    WITH (ROWLOCK)
SET lob = LEFT(lob, 1)
WHERE
    i <= CONVERT(sql_variant, 0);
-- End test statement(s)

Run the test again:

EXECUTE dbo.RunTest 
    @Rows = 6214,
    @Clustered = 'true';

The execution plan is:

The results are:

The internal update row locks associated with a lock class couldn’t be released this time because they were acquired at the scan and only found not to qualify at the filter. This lead to a lock escalation, although no rows were updated.

The escalated table X lock couldn’t be released at the end of the statement because exclusive locks are always held to the end of the transaction.

Without lock escalation, the internal update locks are held to the end of the statement, then released in bulk by the lock class. You can see this by running the test again with 6213 rows in the clustered table:

The results in this case are:

Notice there were three lock ‘promotion attempts’ confirming the update locks were held for the duration of the statement. As usual, no escalation occurred because the HoBt only had 4998 locks when the check at 5000 held locks was made.

Monitoring

You can monitor lock escalation with Profiler, an extended event, or undocumented trace flag 611 (with 3604 for output to the SSMS messages pane).

All methods provide the reason for escalation (lock threshold or memory usage), the number of locks escalated (including the current lock), and the number of locks held by the HoBt at the time (not including the current lock).

Unfortunately, there’s no way to observe the lock manager’s held lock count without a debugger.

Use of the by-reference LOB access optimization can be monitored with the Access Methods performance counter object.

End notes

There’s no way to observe lock classes in regular execution plans. You can see they were used at the root node of a verbose showplan (enabled with undocumented trace flag 8666). The verbose plan only shows lock class availability, not where and when they were used (or not) by particular plan operators. I didn’t include this in the tests because it makes them less repeatable.

An example screenshot from test 15 is shown below:

You may see extra lock escalations for some of the tests when verbose showplan is enabled because the lock manager counts all locks, including metadata and statistics locks acquired during showplan production. An extra lock or two can be enough to trigger an escalation check when trying to demo edge cases. Many of these locks are taken before the tests start, but they don’t contribute to the held lock count since they’re released.

The test procedure includes extra statements to prevent plan caching in another attempt to make the tests more repeatable. With caching, an extra intent-share object lock can be acquired on the dropped test table during the schema-changed recompilation process.

With a partitioned test table and lock escalation set to AUTO, an additional IS lock is acquired on the partition. This is counted by the lock manager as a held lock but doesn’t count for the per-HoBt counters. The net effect: escalation occurs with one less row in the test table. At 5000 held locks, the HoBt counter is 4997 (omitting the table, HoBt, and current locks). The next test at 6250 held locks is reached one row earlier due to the extra HoBt lock. Escalation occurs with 6247 locks on the HoBt instead of 6248.

I hope this series has given you more insight into the complexities of locking in the SQL Server engine.

The post The Lock Escalation Threshold – Part 3 appeared first on SQLPerformance.com.

The Lock Escalation Threshold – Part 2

Paul White — Tue, 04 Oct 2022 09:00:51 +0000

Parallelism

In part one of this series, I explained how the lock manager maintains a count of held locks per transaction, and individual access methods keep a separate lock count per HoBt for each plan operator performing data access.

Parallelism adds an extra layer of complexity because both lock counts are maintained per thread.

There are two underlying reasons for this:

Each additional parallel worker thread operates inside a nested sub-transaction with the original transaction as its parent. A count of held locks is maintained per sub-transaction and so per-thread.
Each worker in a parallel data-access operator has its own access method, meaning the per-HoBt counts are also separate.

Let’s look at a couple of examples to see what this means in practice.

Test 3—20,000 rows at DOP 4

This test reuses the procedure from part one with a modified test statement designed to produce a parallel plan:

-- Test statement(s)
DECLARE @i bigint;
 
SELECT 
    @i = LT.i
FROM dbo.LockTest AS LT 
    WITH (REPEATABLEREAD)
WHERE 
    LT.i > (SELECT 0)
OPTION
(
    USE HINT ('ENABLE_PARALLEL_PLAN_PREFERENCE'),
    MAXDOP 4
); 
-- End test statement(s)

Now run the test with 20,000 rows:

EXECUTE dbo.RunTest @Rows = 20000;

The post-execution (actual) plan is:

Example results:

No lock escalation occurs despite a total of 20,113 locks being held. This is because the total held locks and HoBt checks are per-thread. None of the four threads has individually acquired 5000 or more HoBt locks when checks take place at 2500, 3750, and 5000 held locks per sub-transaction.

Remember, the HoBt lock count doesn’t include the table-level lock or the current lock being acquired. In this test run, three of the threads read more than 5000 rows, but each had an HoBt lock count of 4998 when the check at 5000 total locks occurred.

A total of 11 lock ‘promotion attempts’ are recorded in this run. Three threads passed checkpoints at 2500, 3750, and 5000 total held locks, accounting for nine checks between them. The remaining thread checked at 2500 and 3750 total locks, accounting for the two other checks.

None of these checks resulted in an escalation attempt because no thread had acquired 5000 HoBt locks at the time.

Test 4—24,000 rows at DOP 4

We’ll now increase the number of rows to 24,000:

EXECUTE dbo.RunTest @Rows = 24000;

A sample post-execution plan is:

Example results:

This time we do see a lock escalation, but there are still 17,836 locks held at the end of the test.

Thread two reached 6250 locks held (with 6248 on its access method), so its locks were escalated. None of the other threads reached the threshold, so their locks were not escalated.

The single escalation reduced the total number of locks held at the end of test four (17,836) compared with test three (20,113) despite the extra 4000 rows in the test table.

If you have a machine with more than four cores, you might like to try the tests with more rows and a higher degree of parallelism. You’ll find each thread considers lock escalation separately.

I’ll show one last example of the plan and test output running at DOP 12 on 64,000 rows, where three threads escalated:

The results confirm three per-thread escalations, with 45,459 locks held at the end:

Interestingly, these tests show a single-table query holding a table-level shared lock while also holding intent-share locks on pages and shared locks on rows in the same table.

Promotion attempts

Let’s replace the test statement with:

-- Test statement(s)
DECLARE @i bigint;

SELECT
    @i = LT.i 
FROM dbo.LockTest AS LT
    WITH (REPEATABLEREAD)
WHERE
    LT.i <= 2483;
-- End test statement(s)

Run the test procedure for 2685 rows in a heap table:

EXECUTE dbo.RunTest @Rows = 2685;

The execution plan couldn’t be more straightforward:

The results show 179 lock promotion attempts on a single serial scan of this 15-page table with no actual escalation:

Of course, the test is engineered to produce this remarkable result, but why does it happen?

Explanation

The key observation is lock promotion tests happen whenever the total held lock count increases to become exactly divisible by 1250 (except for 1250 itself, as explained in part one).

The table scan happens to encounter rows in ascending order with respect to the column i. This is not guaranteed, but it’s likely in this case.

Each row is read and tested to see if it qualifies for the predicate LT.i <= 2483. The first 2483 rows encountered all qualify and are locked with a shared row lock. These rows appear on 14 different pages, each of which acquires an intent-share lock. Adding the intent-share lock on the table gives a total of 2483 + 14 + 1 = 2498 held locks at this point in the full table scan.

Not all rows on page 14 qualify, but they are still locked before being tested. This shared row lock is released when SQL Server determines the row doesn’t qualify. If this surprises you, remember repeatable read isolation only holds locks to the end of the transaction for qualifying rows.

Having scanned 14 pages so far, the table scan now moves to the last page in the table. Preparing to read the first row on that page, it acquires an intent-share lock, bringing the total held locks to 2499.

The scan next takes a shared lock on the first row on page 15 bringing the total held locks to 2500. This is divisible by 1250, so it triggers an escalation check and increments the poorly-named ‘promotion attempts’ counter. No escalation is attempted because the HoBt doesn’t hold at least 5000 locks.

The scan finds the current row doesn’t meet the query predicate, so it releases the shared row lock, bringing the total lock count back to 2499. Note the intent-share page lock is not released, despite no rows on page 15 qualifying.

The table’s row width means 179 rows fit on each 8K page. Each of the non-qualifying rows on page 15 is tested in the same way. This causes 179 escalation checks as the total held lock count briefly hits 2500 and then retreats to 2499 for each non-qualifying row.

Promotion attempts on a clustered table

If you’re (rightly) bothered by the idea of scanning a heap table in a predictable order, try the following test statement:

-- Test statement(s)
DECLARE @i bigint;

SELECT
    @i = LT.i 
FROM dbo.LockTest AS LT
    WITH (REPEATABLEREAD)
WHERE
    LT.i <= 2483
ORDER BY
    LT.pk ASC;
-- End test statement(s)

Run the test procedure specifying a clustered table:

EXECUTE dbo.RunTest 
    @Rows = 2685,
    @Clustered = 'true';

This produces a plan with the Ordered:True and Scan Direction: FORWARD attributes on the Clustered Index Scan. We know the columns i and pk have the same value in every row, so this is an acceptable guarantee of ordered processing.

The results are:

This shows one more than the 179 lock promotion attempts seen for the heap table example. The mechanism is exactly the same, aside from one minor locking implementation detail:

The heap scan is driven from IAM page(s) for the table, so each table page is fully processed in isolation. In particular, the shared lock on the last row of a page is released before moving to the next page.

This clustered index scan follows leaf-level forward and backward pointers since it doesn’t qualify for an allocation-order scan. The shared lock on the last row of a page is released after acquiring an intent-share lock on the next page but before acquiring the first shared row lock on the new page.

This slight difference means the held lock count reaches 2500 when the intent share lock is taken for page 15. The count returns to 2499 when the lock on the last row of page 14 is released, then hits 2500 again when the first row on page 15 is locked. This accounts for the one extra lock ‘promotion attempt.’

Avoiding lock promotions on the clustered table

The clustered example allows us to avoid lock ‘promotion attempts’ by scanning the index backward:

-- Test statement(s)
DECLARE @i bigint;

SELECT
    @i = LT.i 
FROM dbo.LockTest AS LT
    WITH (REPEATABLEREAD)
WHERE
    LT.i <= 2483
ORDER BY
    LT.pk DESC; -- Now descending
-- End test statement(s)

Run the test as before:

EXECUTE dbo.RunTest 
    @Rows = 2685,
    @Clustered = 'true';

The execution plan shows a BACKWARD scan of the index:

The results show no lock promotion attempts:

The backward scan ensures all non-qualifying rows are encountered before the qualifying ones. Locks on non-qualifying rows are released, so the 2500 first-check threshold is never hit. The highest held lock count hit during the scan is 2499, as shown in the third result set.

The final result set shows 2685 row locks because sys.dm_db_index_operational_stats counts all locks taken—it doesn’t decrement when a lock is released. All 2685 rows in the table were locked at some point, but only 2483 row-level locks were held on the qualifying rows.

End of part two

The final part of this series will cover lock lifetimes and further internal details about lock escalation.

The post The Lock Escalation Threshold – Part 2 appeared first on SQLPerformance.com.

The Lock Escalation Threshold – Part 1

Paul White — Tue, 06 Sep 2022 09:00:32 +0000

This article isn’t about the act of lock escalation itself, which is already well documented and generally well understood. Some myths (like row locks escalating to page locks) persist, but challenging those yet again probably wouldn’t change much.

Instead, the question I’ll address here is exactly how and when lock escalation is triggered. Much of the documentation is incorrect or at least imprecise about this and I’ve been unable to find a correct description in other writings.

There are good reasons you haven’t seen a simple demo of lock escalation taking place at 5000 locks. I’ve seen suggestions such as lock escalation isn’t deterministic, or some types of locks don’t count toward the 5000 lock threshold. Neither of those assertions is true, but the details are interesting, as I’ll explain.

Testing environment

Let’s look at some examples using a fresh database. I happen to be using SQL Server 2019 CU16 but the details I’ll describe haven’t materially changed since partition level lock escalation was added to SQL Server 2008.

USE master;

-- Create a new test database
-- COLLATE clause only to remind people it exists
CREATE DATABASE Escalation
    COLLATE Latin1_General_100_CI_AS;

-- Disable row-versioning isolation levels
-- to avoid the 14-byte row overhead
-- Not required, but it makes the later tests more precise
ALTER DATABASE Escalation SET 
    ALLOW_SNAPSHOT_ISOLATION OFF;
    
ALTER DATABASE Escalation SET 
    READ_COMMITTED_SNAPSHOT OFF;

I’m creating a stored procedure to make it easier to run tests with different row counts while showing consistent diagnostic output. The procedure creates a test table, populates it with a configurable number of rows, runs a test statement inside a transaction, and displays information about locks and escalations.

USE Escalation;
GO
CREATE OR ALTER PROCEDURE dbo.RunTest
    @Rows bigint,
    @Clustered bit = 'false',
    @HidePhysicalStats bit = 'false',
    @HideLockDetail bit = 'false',
    @HideLockSummary bit = 'false'
AS
SET NOCOUNT, XACT_ABORT ON;

-- Prevent plan caching for this procedure
-- See https://sqlperformance.com/2022/03/sql-performance/simple-param-trivial-plans-1
GOTO Start
    OPEN SYMMETRIC KEY Banana 
        DECRYPTION BY CERTIFICATE Banana;
Start:

-- Recreate the test table 
DROP TABLE IF EXISTS dbo.LockTest;

CREATE TABLE dbo.LockTest 
(
    pk integer IDENTITY NOT NULL, 
    i bigint NOT NULL, 
    lob nvarchar(max) NOT NULL
);

-- Add data
INSERT dbo.LockTest 
    WITH (TABLOCKX)
    (i, lob)
SELECT
    i = ROW_NUMBER() OVER (ORDER BY @@SPID),
    lob = CONVERT(nvarchar(max), N'big string')
FROM sys.all_columns AS AC1
CROSS JOIN sys.all_columns AS AC2
ORDER BY i
    OFFSET 0 ROWS
    FETCH NEXT @Rows ROWS ONLY;

-- Convert from heap to clustered if required
IF @Clustered = 'true'
BEGIN
    ALTER TABLE dbo.LockTest
        ADD CONSTRAINT [PK dbo.LockTest pk]
        PRIMARY KEY CLUSTERED (pk)
        WITH (ONLINE = OFF, MAXDOP = 1, FILLFACTOR = 100);
END;

-- Show table physical statistics
IF @HidePhysicalStats = 'false'
BEGIN
    SELECT
        IPS.index_type_desc,
        IPS.alloc_unit_type_desc,
        IPS.index_depth,
        IPS.index_level,
        IPS.page_count
    FROM sys.dm_db_index_physical_stats
    (
        DB_ID(), OBJECT_ID(N'dbo.LockTest', 'U'), NULL, NULL, NULL
    ) AS IPS;
END;

-- Start a transaction
BEGIN TRANSACTION;

    -- Test statement(s)
    DECLARE @i bigint;

    SELECT @i = LT.i
    FROM dbo.LockTest AS LT 
        WITH (REPEATABLEREAD);

    -- End test statement(s)

    -- Show locks held by the current transaction
    IF @HideLockDetail = 'false'
    BEGIN
        SELECT
            DTL.resource_type,
            DTL.resource_description,
            DTL.resource_associated_entity_id,
            DTL.request_mode,
            DTL.request_status
        FROM sys.dm_tran_locks AS DTL 
        WHERE 
            DTL.request_owner_type = N'TRANSACTION'
            AND DTL.request_session_id = @@SPID
            AND DTL.request_request_id = CURRENT_REQUEST_ID()
            AND DTL.request_owner_id = CURRENT_TRANSACTION_ID()
        ORDER BY
            CASE DTL.resource_type
                WHEN N'OBJECT' THEN 1
                WHEN N'PAGE' THEN 2
                WHEN N'KEY' THEN 3
                WHEN N'RID' THEN 3
                WHEN N'APPLICATION' THEN 999
            END,
            DTL.resource_description;
    END;

    -- Transaction lock summary
    IF @HideLockSummary = 'false'
    BEGIN
        SELECT 
            TotalLockCount = COUNT_BIG(*),
            HoBtLockCount = 
                SUM
                (
                    IIF
                    (
                        DTL.resource_associated_entity_id = P.hobt_id, 
                        1, 
                        0
                    )
                ),
            PageLocks =
                SUM
                (
                    IIF
                    (
                        DTL.resource_type = N'PAGE',
                        1,
                        0
                    )
                ),
            RowLocks =
                SUM
                (
                    IIF
                    (
                        DTL.resource_type IN (N'RID', N'KEY'),
                        1,
                        0
                    )
                ),
            TableLockType = 
                MAX
                (
                    IIF
                    (
                        DTL.resource_type = N'OBJECT', 
                        DTL.request_mode, 
                        N''
                    )
                )
        FROM sys.partitions AS P
        JOIN sys.dm_tran_locks AS DTL 
            ON DTL.resource_associated_entity_id IN 
                (P.[hobt_id], P.[object_id])
        WHERE 
            P.[object_id] = OBJECT_ID(N'dbo.LockTest', 'U')
            AND DTL.request_owner_type = N'TRANSACTION'
            AND DTL.request_owner_id = CURRENT_TRANSACTION_ID()
            AND DTL.request_session_id = @@SPID
            AND DTL.request_request_id = CURRENT_REQUEST_ID();
    END;

    -- Index operational statistics
    SELECT 
        IOS.row_lock_count, 
        IOS.page_lock_count, 
        IOS.index_lock_promotion_attempt_count, 
        IOS.index_lock_promotion_count
    FROM sys.dm_db_index_operational_stats
    (
        DB_ID(), OBJECT_ID(N'dbo.LockTest', 'U'), NULL, NULL
    ) AS IOS;

ROLLBACK TRANSACTION; 

DROP TABLE dbo.LockTest;
GO

Test 1 with 6213 rows

My first test adds 6213 rows to a heap table and selects all rows using REPEATABLEREAD isolation to ensure shared locks are held to the end of the transaction.

EXECUTE dbo.RunTest @Rows = 6213;

The outcome is essentially the same when using a clustered table. If you prefer that configuration, run the procedure with the optional @Clustered parameter set to true.

For a heap, example results are:

The first result set shows our test table is a heap with 35 allocated pages. The second set shows the locks held in detail. There are a large number of shared RID locks further down the list.

The third table shows a summary of the locks. There are a total of 6249 locks held: 6213 row locks, 35 page locks, and one table (OBJECT) lock.

The displayed HoBt (heap or b-tree) lock count (6248) includes all locks except the table lock. A HoBt (pronounced “hobbit”) is a partition. A table may have many partitions so a table-level lock is not associated with any particular partition.

The data comes from sys.dm_tran_locks, where the resource associated with a lock is a HoBt for KEY, RID, and PAGE locks. The associated resource for an OBJECT lock is an object id.

Lock escalation can be configured to escalate to the partition level instead of to the table. My test table isn’t partitioned, except in the technical sense, as all tables are partitioned with at least one partition. I won’t talk specifically about tables with multiple partitions for space reasons, but the details aren’t materially different.

The final result in the screenshot shows output from sys.dm_db_index_operational_stats. As expected, for a full scan of the 6213-row test table under repeatable read isolation, 6213 row locks were taken. Each shared (S) RID lock is covered by an intent-share (IS) lock on its immediate parent page (only) and an IS lock at the OBJECT level.

The last result set confirms no lock escalation occurred, but does indicate three lock promotion attempts were made.

This appears to contradict the documentation in several respects. First, no lock escalation occurred at 5000 held locks. Second, additional attempts are supposed to happen every subsequent 1250 locks. With 6249 locks held, we didn’t quite reach 5000 + 1250 = 6250 for the first ‘escalation retry’. I’ll explain these observations shortly.

Test 2 with 6214 rows

Let’s run a second test with one more row in the test table:

EXECUTE dbo.RunTest @Rows = 6214;

The output is:

The output shows lock escalation turning the IS lock at the OBJECT level into an S lock and releasing all the RID and PAGE locks below it.

There were four lock escalation attempts, and one successful lock promotion according to the dynamic management views (DMVs).

There’s no difference between a ‘lock escalation’ and a ‘lock promotion’. The documentation and DMVs use different terminology to mean the same thing in different places.

Explanation

Lock escalation is described in several places in the documentation. The one coming closest to being completely accurate is in the (very long) Transaction locking and row versioning guide under the heading Escalation threshold for a Transact-SQL statement, where it says:

When the Database Engine checks for possible escalations at every 1,250 newly acquired locks, a lock escalation will occur if and only if a Transact-SQL statement has acquired at least 5,000 locks on a single reference of a table. Lock escalation is triggered when a Transact-SQL statement acquires at least 5,000 locks on a single reference of a table.

Careful reading is necessary to draw the correct inferences from those statements. It also helps if you already know what’s happening and why before reading. I’ll unpack the necessary details next.

Initial check for lock escalation

The engine does indeed check for possible lock escalation at (almost) every 1250 locks acquired by the transaction. Checking after every lock would create unacceptably high overhead since locks can be acquired and released with great frequency.

The SQL Server lock manager keeps track of the number of locks held by the current transaction. When a lock is acquired, the count is incremented. When a lock is released, the counter is decremented. Converting an existing lock, for example from U to X has no net effect on the counter. It’s the cumulative number of locks held by the transaction that’s important.

This counter isn’t limited to table/page/row locks on user objects: it includes all locks acquired via the lock manager, including intent and application locks. System locks also count, but these are rarely held for long and aren’t associated with the user transaction.

When the number of locks held divides evenly by 1250, a check for lock escalation is started. Well, almost. A quirk of the implementation means no check is performed when the count of held locks is exactly 1250.

The sqlmin!lck_lockInternal locking code below checks if the number of locks is greater than 1250 (0x4E2 hexadecimal):

mov r8d,dword ptr [rcx+64h]         -- load lock count
cmp r8d,4E2h                        -- compare with 1,250
ja  sqlmin!lck_lockInternal+0x1070  -- jump if above

The branch is only taken if the lock count is greater than 1250, not if it’s equal. The first number of held locks that will take the branch is therefore 1251, not 1250. This seems like an oversight.

Integer division

When the branch is taken, SQL Server next divides the held lock count by 1250 using integer division, then multiplies the result by 1250. This implements the idea of checking once every 1250 newly-held locks.

For example, 1251 divided by 1250 gives 1 (using integer division), then 1250 after the multiplication. This result doesn’t match the original value of 1251, so we know the original didn’t evenly divide by 1250. When 2500 locks are held, the integer division results in 2, giving the original 2500 when multiplied by 1250.

When the number of held locks is evenly divisible by 1250, the engine continues to check for lock escalation by calling into sqlmin!XactLockInfo::EscalateLocks. Otherwise, no escalation occurs.

This means escalation checks occur in practice when the number of held locks is 2500, 3750, 5000, 6250, and so on. Notice the lack of a check at 1250 due to the code quirk I mentioned. Also, the checks don’t start at 5000 held locks.

As a side note, the compiled code doesn’t actually divide by 1250 since integer division is typically a much slower CPU operation than integer multiplication. Instead, it multiplies by 1/1250 using a technique known as Division by Invariant Multiplication:

mov  eax,0D1B71759h  -- 0.8192 in hexadecimal
mul  eax,r8d         -- multiply #locks
shr  edx,0Ah         -- shift right 10 (divide by 1024)
                     -- so we have multipled by 0.0008 = 1/1,250
imul eax,edx,4E2h    -- multiply by 1,250
cmp  r8d,eax         -- did we get the original number back?

Access methods

To understand the next part of the process accurately, I’ll first need to give an introduction to access methods.

When the SQL Server query processor needs to read or change data, it does so via one or more storage engine access methods. Whenever you see a data-accessing operator in an execution plan—like an update, insert, delete, seek, or scan—there’s at least one connection to the storage engine via access methods.

Each data-accessing plan operator has its own access methods. A plan accessing the same underlying object (heap or index) multiple times has different access methods for each operator. A common example of this would be a self-join using the same index both times.

Some individual plan operators can have multiple access methods. For example, an Update operator may have one for inserts and one for deletes if it is preceded by a Split operator. A Merge operator may have up to three access methods for inserts, deletes, and updates.

I like to think of access methods as an API used by the query processor to read or change data managed by the storage engine. These connections aren’t visible in execution plans.

The main access method elements relevant to this article are the heap and index dataset sessions sqlmin!HeapDataSetSession and sqlmin!IndexDataSetSession respectively. These are created when a data-accessing operator opens.

Each access method dataset session is associated with a particular HoBt and maintains a count of locks held. This is separate from the total lock count maintained by the lock manager per transaction.

Access method checks

In sqlmin!XactLockInfo::EscalateLocks SQL Server sequentially checks all active access methods associated with the current statement to see if any qualify for lock escalation.

As I mentioned, an access method typically becomes active when its parent query processor operator is opened during statement execution. If you need a refresher on how execution plans operate in detail see my article, Iterators, Query Plans, and Why They Run Backwards.

My tests have only a single HeapDataSetSession controlling the heap table scan so only one access method is checked in the present case.

Each access method check begins with a call to its ‘escalate locks’ method. In my heap example, this is a call to HeapDataSetSession::EscalateLocks. For an index, the call would be to IndexDataSetSession::EscalateLocks.

An important lock count disparity

The first escalation check in my example occurs when the lock manager counts 2500 locks associated with the current transaction. However, the call to HeapDataSetSession::EscalateLocks finds only 2498 locks associated with its HoBt at that time.

There are two reasons for this difference:

First, the HeapDataSetSession doesn’t count the intent-shared table lock because it isn’t a HoBt-level lock. Remember, a table can have several partitions. This accounts for one ‘missing’ lock.

Second, the HeapDataSetSession updates its count after the lock is acquired, while the lock escalation check happens in the lock manager during lock acquisition. The current lock is therefore included in the lock manager’s count but hasn’t yet been recorded by the access method. This accounts for the second lock ‘missing’ from the HoBt count.

These differences again feel unintentional to me.

Lock escalation attempts

The code in HeapDataSetSession::EscalateLocks next calls sqlmin!IsEscalationPossible to check if the current access method qualifies for lock escalation.

If documented trace flag 1211 is enabled, this method returns immediately, and no escalation takes place.

Otherwise, the current access method increments its lock escalation attempts counter. This happens regardless of the number of locks it currently holds. This means a ‘lock escalation attempt’ is really only a check, not an attempt per se. We’re still checking to see if we should try to escalate locks; we haven’t decided to do it yet.

Bear this in mind when interpretingindex_lock_promotion_attempt_count in sys.dm_db_index_operational_stats. The documentation describes this as the “Cumulative number of times the Database Engine tried to escalate locks”, but this really isn’t what it measures. Tools and scripts relying on this DMV may also report misleading information for this reason.

Whenever the lock manager’s total transaction locks counter divides evenly by 1250 (except at exactly 1250 locks as previously noted), each active access method associated with the current statement will have its ‘lock promotion attempt’ counter incremented. There’s no guarantee any access method will qualify for a lock promotion attempt at the time this check is made.

In my example, the lock manager’s count is currently 2500, so there’s no prospect of any lock escalation due to the lock threshold of 5000 being met. Nevertheless, the ‘lock promotion attempt’ counter for the heap table is incremented.

This explains why test 1 saw three ‘lock promotion attempts’ and test 2 saw four ‘attempts’. Checks were triggered at 2500, 3750, and 5000 held locks in both cases. Test 2 triggered an extra check when it reached 6250 held locks (6214 row locks, 35 page locks, and a table lock).

Lock escalation

After incrementing the ‘promotion attempts’ counter, the code in sqlmin!IsEscalationPossible next checks the number of locks held by the access method against the threshold for escalation. This threshold is exactly 5000 locks. The precise test is:

cmp esi,1388h -- compare locks held with 5000 (hex 1388)
jb  sqlmin!IsEscalationPossible+0x222 -- jump if below

Lock escalation is attempted if the number of HoBt locks held by the current access method is greater than or equal to 5000.

In test 1, this check failed at 2500, 3750, and 5000 held locks (according to the lock manager) because the HoBt only had 2498, 3748, and 4998 locks at those times. Remember, the HoBt-level counter doesn’t include the table lock or the current lock being acquired.

In test 2, the first three checks failed in exactly the same way. The last check at 6250 locks succeeded because the HoBt lock count was 6248, meeting the 5000-lock threshold.

In case you’re wondering why test 1 didn’t trigger lock escalation at 5002 held locks (with 5000 recorded against the HoBt at that time), it’s because 5002 doesn’t divide evenly by 1250. The next check after 5000 held locks doesn’t occur until 6250 total locks, and test 1 never quite hit that mark.

Final check

When the access method has 5000 or more locks, SQL Server checks trace flag 1224 to see if lock escalation due to the lock threshold has been disabled.

If not, the physical process of lock escalation begins in earnest with a call to sqlmin!HeapDataSetSession::ForceLockEscalation. This converts the IS table lock to S and releases the lower-level HoBt locks in a bulk operation.

End of part one

Summarizing the main points from this article:

The total number of locks held is incremented when a lock of any type is acquired by the current transaction. The count is decremented when a lock is released.
Lock escalation checking is triggered at 2500 held locks per transaction and every 1250 additional. No checks happen between these times.
A separate count per HoBt is maintained for each active access method associated with each data-accessing plan operator.
These HoBt counts don’t include the table-level intent lock or the current lock being acquired.
A lock escalation ‘attempt’ is counted for all active access methods in the current statement whenever the total held lock count divides exactly by 1250, starting at 2500.
The ‘attempt’ counter is misleadingly named. Escalation is not guaranteed to be attempted at the time it’s incremented. It indicates only that the HoBt access method counter is being checked.
Escalation is attempted for any active HoBt access method in the current statement with at least 5000 held locks. Remember, this count doesn’t include the table lock or current lock.
Simple tests will typically escalate at 6250 total held locks (where the single HoBt lock counter is 6248) despite the threshold being 5000. This is because the prior check at 5000 locks finds only 4998 locks on the HoBt.

The second and third parts of this series will continue my coverage of lock escalation thresholds with the impact of parallelism, a discussion of lock lifetimes, and some interesting edge case examples.

The post The Lock Escalation Threshold – Part 1 appeared first on SQLPerformance.com.

Simple Parameterization and Trivial Plans — Part 6

Paul White — Thu, 04 Aug 2022 09:00:03 +0000

In part 5, I described how the early phases of query compilation affect simple parameterization. To recap:

Normalization and decoding promote cached plan reuse
The decoding step has fixed and limited capabilities
The Failed Auto-Params/sec counter is incremented when decoding fails and the statement is not parameterized
Shell plans optimize for very frequent execution of simple statements, bypassing the parsing, parameter replacement, normalization, and decoding stages
The first application of constant folding occurs after decoding, resulting in a separate parameter for each value in a foldable expression

Let’s now continue following the compilation process to see how SQL Server decides if simple parameterization is safe or unsafe.

Code examples use the Stack Overflow 2010 database on SQL Server 2019 CU 16 with the following additional nonclustered index:

CREATE INDEX [IX dbo.Users Reputation (DisplayName)] 
ON dbo.Users (Reputation) 
INCLUDE (DisplayName);

Simplification

The simplification compilation stage aims to remove logical redundancy and reduce expression complexity.

Some opportunities for logical tree simplification arise from prior compilation activity, others when the original statement contains unnecessary elements. The latter is common for SQL generated by applications and frameworks but also often occurs in dynamic queries. Opportunities may also arise as database CHECK constraints are incorporated in the logical tree during simplification.

Timing is a key point here. Since decoding succeeded earlier, the query processor is now dealing with a parameterized (prepared) statement. Many simplifications can't be applied to parameters, only constant values.

Let’s look at an example:

SELECT 
    U.DisplayName
FROM dbo.Users AS U 
WHERE 
    U.Reputation BETWEEN 1 AND 999
    AND U.Reputation BETWEEN 999 AND 1234;

This statement qualifies for simple parameterization. Four constants are identified, with inferred integer data types shrunk to smallint or tinyint due to the comparison operator parsing context (described in part 4):

This plan will continue to return correct results when reused with different parameter values. Let’s run the query again with an extra element to prevent simple parameterization:

SELECT 
    U.DisplayName
FROM dbo.Users AS U 
WHERE 
    U.Reputation BETWEEN 1 AND 999
    AND U.Reputation BETWEEN 999 AND 1234
OPTION (KEEP PLAN); -- New

The query hint has no effect on the statement besides disabling simple parameterization at the parsing stage. Without parameters, simplification removes logical redundancy from the predicates to leave a single equality test:

None of this is particularly surprising. The outcome is the same as when we choose to parameterize client-side. Reusable plans often permit fewer optimizations than ones built for specific values.

Nevertheless, there’s an important observation to make here. The query processor is not yet fully committed to producing a prepared plan. The simplification stage will not be repeated using constant values if the parameterization attempt fails later.

There are several compilation stages after simplification but they don’t interact much with simple parameterization. I’m going to skip those and move on to where the final decision to apply simple parameterization is made.

Trivial Plans

Like simple parameterization, the trivial plan compilation stage aims to reduce the cost of creating execution plans for simple statements. It’s the final step before possibly invoking the full cost-based optimizer (CBO) where alternatives are subject to detailed costing analysis.

The trivial plan stage provides a fast path through compilation for simple statements, whether parameterized or not. The CBO has significant start-up and runtime costs and may consume significant server resources. These costs are unlikely to be recovered through finding an incrementally better plan in simple cases.

Avoiding CBO can confer significant performance and resource usage advantages for workloads frequently executing simple queries.

Generating a Trivial Plan

Trivial plans are often referred to as having no cost-based choices. This is perhaps a useful shorthand, but it isn’t the whole story.

SQL Server generates trivial plans for ‘common’ statement types having an ‘obvious’ implementation and a ‘short’ expected run time. Whether a statement is ‘common’ and ‘obvious’ is largely determined by heuristics and implementation complexity. The ‘expected run time’ is approximated by estimated plan cost.

The trivial plan stage has entry conditions like CBO stages do. If the logical tree has features that can't be implemented in a trivial plan, the stage is skipped. Otherwise, a limited set of substitution and implementation rules are evaluated where they apply to the current tree.

Entering the trivial plan stage is not a guarantee of success. The limited rules available still might fail to generate a complete execution plan. In this case, the trivial plan stage fails and compilation moves on to the CBO.

Qualification

Microsoft deliberately don’t document the precise criteria used to decide if a statement qualifies for a trivial plan. The rules can and do change from time to time.

Even so, it’s useful to say a simple statement with an obvious ‘best’ index is most likely to get a trivial plan. The index need not be covering if it's unique and an equality index seek is possible. Trivial plans are possible for index scans as well as seeks. A table scan (heap or clustered) can also feature in a trivial plan.

In principle, pretty much any relatively simple statement pattern could be implemented in SQL Server at the trivial plan stage, assuming a known good plan shape exists. In practice, SQL Server is quite conservative.

This doesn’t mean SQL Server only generates a trivial plan when no cost-based choices exist. One could argue every decision is cost-based to some extent, including which index to use at the trivial plan stage.

Index analysis is a big part of the trivial plan decision. SQL Server uses heuristics to decide if a particular index is good enough to make CBO unlikely worth pursuing.

For example, a covering nonclustered index will usually be selected as trivial if it involves reading fewer pages than any other index (including the heap or clustered index). This determination takes into account the selectivity of any predicates and index fill factors.

The trivial plan selection heuristics also don’t account for facilities only available in CBO, like index intersection plans. SQL Server might therefore select a single-index trivial plan when full CBO analysis would’ve found a lower-cost index intersection plan.

Parameterization Safety

When a statement passes the earlier parser and decoder checks, it arrives at the trivial plan stage as a prepared (parameterized) statement. The query processor now needs to decide if the parameterization attempt is safe.

Parameterization is considered safe if the query processor would generate the same plan for all possible future parameter values. This might seem like a complex determination to make, but SQL Server takes a practical approach.

Safe Parameterization

Simply stated, a simple parameterization attempt is considered safe if the trivial plan stage produces a plan for the prepared statement.

When this occurs, the Safe Auto-Params/sec counter of the SQL Statistics object is incremented. Refer back to part 3 for a script to reliably detect simple parameterization and trivial plan outcomes.

If the trivial plan stage is disabled with trace flag 8757, simple parameterization is never applied because a trivial plan can’t be generated.

Unsafe Parameterization

If the trivial plan stage doesn’t produce a plan for the current prepared statement, the simple parameterization attempt is considered unsafe.

When this occurs, the Unsafe Auto-Params/sec counter is incremented.

After an unsafe parameterization attempt, SQL Server replaces parameters in the prepared plan with the original constant values before invoking the CBO. The statement remains technically prepared but contains no parameters.

Parameters are only replaced by constants inside query plan operators. Other elements of the prepared statement aren't cleaned up. This explains why an actual execution plan produced after an unsafe attempt at simple parameterization retains the parameterized text and a parameter list (see part 3).

Parallelism

While a simple parameterization attempt is considered safe if a trivial plan is found, it doesn’t mean the final execution plan will be TRIVIAL.

If the estimated cost of the trivial plan found exceeds the cost threshold for parallelism, SQL Server will invoke the CBO to consider more alternatives, possibly including parallel execution plans. The plan will be marked as having been through FULL optimization.

The outcome of CBO need not be a parallel plan, but it will remain parameterized in any case. I showed an example of simple parameterization producing a parallel query plan in part 3.

Feature Interaction

Simple parameterization and the trivial plan stage are closely related but remain independent activities.

A statement can qualify for simple parameterization without producing a TRIVIAL execution plan.

This happens when a trivial plan is found with a cost exceeding the parallelism threshold. The final plan will remain parameterized after CBO but might be serial or parallel.

A statement containing constants can produce a TRIVIAL plan without also qualifying for simple parameterization.

This happens when the statement contains syntax elements considered unsuitable for simple parameterization by parsing or decoding (covered in parts four and five).

Process Summary

I’ve covered a lot of ground in this series, so I thought it would be useful to finish up with a flowchart showing the main decision points during consideration of simple parameterization and trivial plans:

The post Simple Parameterization and Trivial Plans — Part 6 appeared first on SQLPerformance.com.

10 For 10 : My 10 Favorite SQLPerformance Posts Over 10 Years

Aaron Bertrand — Sun, 24 Jul 2022 09:00:52 +0000

This is a month of milestones. I was awarded my 25^th MVP award on July 5^th, which I believe ties me with Tibor for the longest-tenured MVP. SQL Server 2012 went out of all forms of official support about two weeks ago, on July 12^th. And this site turns 10 years old today – the first post was on July 24^th, 2012. I thought it would be fun to look back and comment on some of my contributions over that decade.

2012

Best approaches for running totals – updated for SQL Server 2012

In the very first post on this site, I compared the performance of long-standing running totals methods with the new windowing functions added in SQL Server 2012. In almost all cases, the new approach worked better – but there’s always a catch, right?

2013

Break large delete operations into chunks

I wrote this 2013 post as a response to seeing many people try to delete millions of rows from very large tables, and face blocking that crippled their applications. Nearly 10 years later I still reference it frequently in comments or answers on Stack Overflow.

2014

Readable Secondaries on a Budget

When Availability Groups were first introduced, they were only available in Enterprise Edition. Customers who didn’t have unlimited cap-ex funds were clamoring for a way to scale out reads. In this 2014 post, I detailed a way to do this with good old log shipping.

2015

Partitioning on a Budget

In a similar vein to my favorite post from 2014, I described a method to simulate partitioning – another feature that was Enterprise-only at the time – using filtered indexes.

2016

A Big Deal : SQL Server 2016 Service Pack 1

Toward the end of the year, I was able to spread the news that a SQL Server 2016 service pack would make most Enterprise-only features available in all editions. I detailed all the changes and explained why this was such a huge win for non-Enterprise customers.

2017

One way to get an index seek for a leading %wildcard

Leading wildcard searches are notoriously inefficient because they require inspecting every single value. One way to get a seek is to have an indexed computed column using REVERSE. In this post from 2017, I explored the use of a trigram as a similar strategy.

2018

What to do (or not do) about top wait stats

Usually, wait stats associated with an individual query don’t help you solve the problem, because the wait is actually caused by some other process. This post from 2018 took a look at the top 10 wait types observed by the sum of SQL Sentry customers at the time and, for each one, asked the question, “Can you solve this problem with just the wait type and the query that experienced it?”

2019

Filtered Indexes and Forced Parameterization (redux)

In this 2019 post, I explained a trivial workaround to a frustrating limitation with filtered indexes. This was not my first post complaining about filtered indexes (other posts are linked in the opening paragraph).

2020

Please stop using this UPSERT anti-pattern

In 2020, I talked about an all-too-common pattern: checking if a row exists, if so, update, otherwise, insert. This pattern is a shortcut to deadlocks and, in this post, I explain a slightly better approach (using graphics I’m quite proud of).

2021

Refreshing Tables With Less Interruption Using Partition Switching

Last year, I wrote about a way to use partition switching to refresh lookup data with minimal or zero user disruption. I explained the new ABORT_AFTER_WAIT option and demonstrated the tradeoffs involved.

2022 (BONUS!)

Serializing Deletes From Clustered Columnstore Indexes

Well, 2022 isn’t over yet but, so far, my favorite post has been this one about reducing the impact of deleting data from a table with a clustered columnstore index.

The post 10 For 10 : My 10 Favorite SQLPerformance Posts Over 10 Years appeared first on SQLPerformance.com.

Simple Parameterization and Trivial Plans — Part 5

Paul White — Wed, 08 Jun 2022 09:00:58 +0000

In part 4, I described the role the parser plays turning a statement with constant values into a parameterized prepared statement. To recap briefly, the parser:

Identifies potential parameters
Assigns an initial data type
Accounts for both simple and forced parameterization rules
Sets a flag if it determines simple parameterization is impossible

When the parser allows simple parameterization to continue, SQL Server® increments the Auto-Param Attempts/sec counter of the SQL Statistics object.

Let’s now continue following the compilation process noting the effects on simple parameterization as we go.

As in previous parts, code examples use the Stack Overflow 2010 database on SQL Server 2019 CU 16 with the following additional nonclustered index:

CREATE INDEX [IX dbo.Users Reputation (DisplayName)] 
ON dbo.Users (Reputation) 
INCLUDE (DisplayName);

Normalization

The output of parsing is a logical representation of the statement called a parse tree. This tree does not contain SQL language elements. It's an abstraction of the logical elements of the original query specification.

It's frequently possible to write the same logical requirement in different ways. Using a SQL analogy, x IN (4, 8) is logically the same as writing x = 4 OR x = 8. This flexibility can be useful when writing queries, but it makes implementing an optimizer more difficult.

In general terms, normalization is an attempt to standardize. It recognises common variations expressing the same logic, and rewrites them in a standard way. For example, normalization is responsible for turning x BETWEEN y AND z into x >= y AND x <= z. The normalized, or standardized, form chosen is one the query processor finds convenient to work with internally.

I’m simplifying a bit here. Like parameterization, normalization happens in stages at different points during statement compilation and optimization. Logical operator trees and expressions are normalized at different times and in different ways. It's only important to understand the broad concept for what follows.

Maximizing Parameterized Plan Reuse

SQL Server could use the parameter locations determined during parsing to directly replace constants with parameter markers. This would function to allow plan reuse, but also require future statements to be written in exactly the same way (constant values aside) to benefit from plan reuse.

This might be useful enough in itself, but we can do better by normalizing. Expressing the parameterized statement in a standardized way allows future statements to match if they normalize to the same form, even where the original text is quite different. I’ll show you some examples shortly.

Decoding

SQL Server implements this important normalization aspect by decoding the logical operator tree back into a SQL representation. Decoding uses slightly different rules for simple and forced parameterization but in either case the normalized SQL will usually be quite different from the original.

For simple parameterization, decoding emits keywords in upper case, data types and intrinsic functions in lower case, delimits object names with square brackets, and removes unnecessary spaces and comments.

Forced parameterization emits keywords in lower case, does not use bracket delimiters, and places a space either side of the ‘dot’ between schema and object names.

Decoding also standardizes elements like comparison operators, the AS keyword for aliases, and the presence of an ASC or DESC specification in an ORDER BY clause. For example, decoding always expresses ‘not equal’ using <> even if != was used in the original statement.

Neither normalization scheme changes the casing of non-keyword elements like object names. This could be unsafe in case-sensitive databases.

Whichever normalization scheme is used, the key point is AdHoc statements can be written using different keyword casing, spacing, comments, and delimiters but still benefit from plan reuse through server-side parameterization. The text will normalize to the same parameterized form, allowing a cached prepared plan to be matched and reused.

Let’s look at an example.

Example 1

The following statements generate the same normalized text under simple parameterization:

ALTER DATABASE SCOPED CONFIGURATION 
    CLEAR PROCEDURE_CACHE;
GO
SELECT U.DisplayName FROM dbo.Users AS U 
WHERE U.Reputation IN (2) ORDER BY U.Id;
GO
SELECT U.DisplayName 
FROM dbo.Users AS U 
WHERE U.Reputation =3
ORDER BY U.Id;
GO
select 
    U.DisplayName 
FROM [dbo].[Users] as U 
WhErE 
    U.Reputation = (4)
Order By
    U.[Id];
GO
SELECT
    U.DisplayName 
    -- Note: No AS used for the alias
from [dbo] . Users U 
where
    U.Reputation = 5
order by 
    U.Id asc;
GO

Let’s look at the plan cache:

SELECT
    CP.usecounts,
    CP.objtype,
    ST.[text]
FROM sys.dm_exec_cached_plans AS CP
CROSS APPLY sys.dm_exec_sql_text (CP.plan_handle) AS ST
WHERE 
    ST.[text] NOT LIKE '%dm_exec_cached_plans%'
    AND ST.[text] LIKE '%DisplayName%Users%'
ORDER BY 
    CP.objtype ASC;

The output shows four AdHoc shell plans pointing to a single Prepared plan:

Each AdHoc statement is different, but the logical trees all decode to the same form (line breaks added for clarity):

(@1 tinyint)
SELECT [U].[DisplayName]
FROM [dbo].[Users] [U]
WHERE [U].[Reputation]=@1
ORDER BY [U].[Id] ASC

The normalized and parameterized form is used in cache lookups so the prepared plan is successfully found and reused.

This is a neat feature of simple parameterization but it's still best to use a consistent SQL style. Admittedly, this can be difficult to enforce in practical work environments over time.

Normalization isn’t perfect. You can probably find any number of ways to subtly change the statements above to produce different ‘normalized’ parameterized forms. Normalization promotes reuse of parameterized plans—it doesn’t guarantee it.

Failed Parameterization

The decoding step for simple parameterization has deliberately limited capabilities. It's only able to decode certain logical tree elements into SQL. These limitations determine which clauses and intrinsic functions (among other things) are compatible with simple parameterization.

If the decoding step successfully produces a complete SQL representation, the compilation process proceeds from this point as a prepared (parameterized) statement.

When decoding fails, the Failed Auto-Params/sec counter of the SQL Statistics object is incremented and compilation continues as an unparameterized (not prepared) statement.

Statements that fail at the parsing or decoding steps don't show any parameter details in estimated or actual plans because a prepared version of the statement is never created. Refer back to part three for execution plan details.

Decoder Limitations

For a parameterization attempt to fail here, it must have passed the generic tests performed by the parser, but fail at the decoding stage. The syntax elements, global variables, and intrinsic functions supported by the simple parameterization decoder aren't documented.

The decoder limitations explain why (and when) examples shown earlier in this series failed with LOWER and CEILING built-in functions, but succeeded with FLOOR and ABS.

Constant to constant comparison is also unsupported by the simple parameterization decoder, which explains why the well-known WHERE 1 = 1 trick prevents simple parameterization. As further examples, the decoder supports the @@SPID and @@TRANCOUNT global variables, but not @@ROWCOUNT or @@IDENTITY.

The performance counter test rig in part three can be used to explore which statement features fail at the decoding stage. Such statements will increment both Auto-Param Attempts/sec and Failed Auto-Params/sec.

Remember a statement disqualified from simple parameterization by the parser doesn’t increment Auto-Param Attempts/sec.

Example 2 – Plan Reuse

Try to predict how many plans of which type (ad-hoc or prepared) will be cached if we remove the GO batch separators from example 1:

-- Single batch
SELECT U.DisplayName FROM dbo.Users AS U 
WHERE U.Reputation IN (2) ORDER BY U.Id;

SELECT U.DisplayName 
FROM dbo.Users AS U 
WHERE U.Reputation =3
ORDER BY U.Id;

select 
    U.DisplayName 
FROM [dbo].[Users] as U 
WhErE 
    U.Reputation = (4)
Order By
    U.[Id];

SELECT
    U.DisplayName 
    -- Note: No AS used for the alias
from [dbo] . Users U 
where
    U.Reputation = 5
order by 
    U.Id asc;

You may be surprised to find only two plans are cached, one Adhoc and one Prepared.

The prepared plan is used four times:

Explanation

Back in part one, I said it can be useful to think of a prepared statement as an (unnamed) stored procedure.

Imagine example 2 had contained four calls to a single stored procedure with different parameter values each time. That would cache the ad-hoc batch once and the stored procedure plan once. The single stored procedure plan would be executed four times with the different parameter values.

Something conceptually similar happened with simple parameterization as I’ll now explain.

SQL Server compiles a single plan for all statements in a batch, but it compiles each statement in the batch sequentially. In example 2, the first statement encountered qualifies for simple parameterization so a separate prepared statement is built and cached.

The second, third, and fourth statements in the batch also qualify for simple parameterization. After normalization and decoding they match the prepared plan cached by the first statement. The single prepared plan is therefore executed a total of four times.

Shell Plans Revisited

The Adhoc plan cached in example 2 contains four shell plans (covered in part 1). Each shell plan points to the same Prepared plan.

Caching the Adhoc shell ensures that future execution of exactly the same statement text (including constant values) will quickly find the parameterized plan.

Without the shells, the statement would need to be parsed, parameterized, normalized, and decoded back to SQL representation before it could be matched to the prepared plan.

This is less work than generating even a trivial plan, but it is still less efficient than using the shell to locate the prepared statement directly from the source text.

The goal of simple parameterization is to optimize performance for simple and frequently-executed ad-hoc SQL statements that parameterize to the same form. For an OLTP workload with a very rapid submission of such statements, every millisecond counts. Besides, all compilation activity consumes some server resources.

Algebrization and Constant Folding

The next step of the compilation process is algebrization. This complex stage performs a number of tasks, including binding object names found in the parse tree to physical database entities, constant folding, matching aggregates to grouping elements, and deriving final data types from metadata using type conversion rules as necessary. The output from this activity is a bound tree of logical operators.

The most relevant part of algebrization to simple parameterization is the initial round of constant folding it performs.

As the linked documentation states, constant folding is the early evaluation of expressions to improve runtime performance. For example, the expression DATEFROMPARTS(2022, 07, 11) can be evaluated early to the date value 11 July 2022.

The algebrization stage is the first time constant folding is applied during compilation, so earlier stages will always see unfolded expressions.

Let’s look at this with an example.

Example 3

SELECT 
    U.DisplayName
FROM dbo.Users AS U 
WHERE 
    U.Reputation = 900 + 90 + 9;

This statement qualifies for simple parameterization. Each of the constants is identified as a separate parameter by the parser, and initially typed as integer. No integer type shrinking to smallint or tinyint is performed because the immediate parsing context is an arithmetic operator as described in part four.

The post-execution (actual) plan confirms parameterization occurred before constant folding had a chance to evaluate 900 + 90 + 9:

Notice the three integer parameters and the normalized representation of the parameterized statement text in the top bar.

This example is specific to simple parameterization. Forced parameterization doesn’t parameterize constant-foldable expressions that are arguments of the +, -, *, /, and % operators.

Constant folding may be repeated several times later in the compilation and optimization process as new folding opportunities arise. These later constant folding runs may apply to the whole tree, newly generated alternative subtrees, or an individual expression.

End of Part 5

In the final part of this series, I’ll continue analysis of the compilation process with the simplification and trivial plan stages, including how and when SQL Server decides if simple parameterization is safe or unsafe.

The post Simple Parameterization and Trivial Plans — Part 5 appeared first on SQLPerformance.com.

T-SQL Can Be Expressive Without Sacrificing Performance

Aaron Bertrand — Thu, 02 Jun 2022 09:00:38 +0000

Even with its warts and limitations, Transact-SQL is a beautiful language, allowing for flexible declarative expression about what you’re asking the database engine to do for you.

Itzik Ben-Gan has shown time after time that you can get the same results in many different ways—look what he demonstrated recently with window ordering. I have also discussed this when dealing with anti-semi-joins, which can be solved using APPLY or EXCEPT or NOT EXISTS or LEFT JOIN. In some of those cases, different queries led to different performance because query semantics changed (either obviously or, at least, according to the engine). But writing a query differently—even one more verbose—can have any of the following impacts, some subjective, some not:

More or less intuitive code
Zero or negligible performance differences
Meaningful performance differences
Trade one resource for another (e.g., use more memory but less CPU, or more CPU but less I/O)

And in each of those cases, you can decide which version of the query is most beneficial for you based on your priorities. Maybe you have a memory- or CPU-bound system, or maybe you prefer one syntax over another for subjective reasons, or maybe you think one form will be easier for future maintainers and newcomers.

As an analogy, there are many routes you can take from New York City to Dallas. Some may be faster than others, some may be fewer miles but take longer, some are more fuel-efficient due to average speed limits, some more scenic, and some more toll-friendly. The beauty is that if you and I are independently planning the same trip, we can choose our routes based on our individual priorities. I may not like interstates, or I may prefer to drive more westerly until the sun starts setting, and you may want to see a particular tourist attraction, visit an uncle, or stop in a certain city.

A query is similar. Usually, performance is of utmost importance, but even that isn’t always true. When two or more queries give the same answer and have identical (or “close enough”) performance, the choice can come down to other factors, as mentioned above. I recently answered a question on Stack Overflow where the user was asking how to filter a grouping where an aggregate condition was true.

For some context, Stack Overflow is a place where I tend to cater to people with a wide variety of experience with T-SQL or queries in general. Sometimes solutions are necessarily complex, or they need less-used syntax that is not universally understood, so it can take more explanation and a better breakdown of the code for the user to benefit. One of the ways I like to help with this breakdown is to isolate different aspects of the query in a derived table or, more commonly, a common table expression (CTE) because it can be a lightbulb moment to think about that part independently. With that in mind, let’s look at a boiled-down version of the question:

Given this table dbo.tablename, I want to return a single row for each name and division combination, but only where there is both a row with source = 'comp' and a row where source = 'manual':

name	division	source
host1	abc	comp
host2	xy	manual
host3	zyx	comp
host3	zyx	manual
host2	xy	manual

I’ve highlighted the only rows they want to consider for aggregation, with the desired output being a single row:

name	division
host3	zyx

In T-SQL, setting up this sample would look like this:

  CREATE TABLE dbo.tablename
  (
    name     varchar(128),
    division varchar(128),
    source   varchar(128)
  );

  INSERT dbo.tablename(name, division, source) VALUES
  ('host1',  'abc',  'comp'),
  ('host2',  'xy',   'manual'),
  ('host3',  'zyx',  'comp'),
  ('host3',  'zyx',  'manual'),
  ('host2',  'xy',   'manual');

To get the desired result, the first (and later accepted) answer used this syntax, which is perfectly adequate:

  SELECT name, division  
  FROM dbo.tablename
  WHERE source in ('comp', 'manual')
  GROUP BY name, division
  HAVING COUNT(DISTINCT source) > 1; -- or = 2

Since I know users commonly have difficulty with the HAVING clause, I offered a different approach, one that breaks the logic down, as I mentioned earlier.

Another way to think about it is to calculate the counts inside a CTE and then filter:

  ;WITH cte AS
  (
    SELECT name, division, SourceCount = COUNT(DISTINCT source)
    FROM dbo.tablename
    WHERE source IN ('comp', 'manual')
    GROUP BY name, division
  )
  SELECT name, division FROM cte 
    WHERE SourceCount = 2;

And yes, my CTEs always start with ;WITH – see why

Or, if you don’t like CTEs:

  SELECT name, division FROM
  (
    SELECT name, division, SourceCount = COUNT(DISTINCT source)
    FROM dbo.tablename
    WHERE source IN ('comp', 'manual')
    GROUP BY name, division
  ) AS q WHERE SourceCount = 2;

Yes, it’s more typing, but the intention is to think about the counting and grouping separate from the filtering, like how in an INNER JOIN you can logically think about the joining conditions (in the ON clause) separate from the filter conditions (in the WHERE clause).

As for performance, they all perform the same because SQL Server is smart and can generate the same plan. I inserted 50,000 rows into the table and ran all three queries; each had a duration of 30 – 33ms, a memory grant of 1,584 KB, and an estimated subtree cost of 0.5972860. Here is the plan shape in all three cases:

The plan would look different if the table had a clustered index; or let’s try an index designed to support this query specifically:

  CREATE INDEX testing ON dbo.tablename (source) INCLUDE (name, division);

Now the time is down to 26 – 28ms, the memory grant is still 1.5MB, and the estimated subtree cost has dropped by a whopping amount, to 0.5769890. Here is the new plan (again, identical for all three queries):

This is not a complex example but illustrates that we can often find various ways to get to a final destination using the most expressive format we like. Variations in syntax that are identical in results and underlying meaning can help give someone that “lightbulb” moment and provide a more natural tendency to test “identical” variations for cases where the performance might be different.

The post T-SQL Can Be Expressive Without Sacrificing Performance appeared first on SQLPerformance.com.

T-SQL Windowing Improvements in SQL Server 2022

Itzik Ben-Gan — Wed, 25 May 2022 04:01:06 +0000

Microsoft recently released the first public preview of SQL Server 2022. This release has a number of T-SQL improvements. In this article I focus on windowing and NULL-related improvements. These include the new WINDOW clause and the windowing NULL treatment clause.

I’ll be using the sample database TSQLV6 in the examples in this article. You can download this sample database here.

The WINDOW Clause

The WINDOW clause is part of the ISO/IEC SQL standard. It allows you to name parts of a window specification—or an entire one—and then use the window name in the OVER clause of your query’s window functions. This clause allows you to shorten your code by avoiding the repetition of identical parts of your window specifications. This clause is now available in Azure SQL Database and SQL Server 2022, provided you use database compatibility level 160 or higher.

The WINDOW clause is located between the query’s HAVING and ORDER BY clauses:

SELECT
FROM
WHERE
GROUP BY
HAVING
WINDOW
ORDER BY

The WINDOW clause has the following syntax:

WINDOW window_name AS ( [ reference_window_name ]   
                        [  ]  
                        [  ]   
                        [  ] )

As an example where the WINDOW clause can be handy in shortening your code, consider the following query:

USE TSQLV6;

SELECT orderid, custid, orderdate, qty, val,
  SUM(qty) OVER( PARTITION BY custid 
                 ORDER BY orderdate, orderid
                 ROWS UNBOUNDED PRECEDING ) AS runsumqty,
  SUM(val) OVER( PARTITION BY custid 
                 ORDER BY orderdate, orderid
                 ROWS UNBOUNDED PRECEDING ) AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
ORDER BY custid, orderdate, orderid;

This query generates the following output:

orderid     custid      orderdate  qty         val     runsumqty   runsumval
----------- ----------- ---------- ----------- ------- ----------- ----------
10643       1           2021-08-25 38          814.50  38          814.50
10692       1           2021-10-03 20          878.00  58          1692.50
10702       1           2021-10-13 21          330.00  79          2022.50
10835       1           2022-01-15 17          845.80  96          2868.30
10952       1           2022-03-16 18          471.20  114         3339.50
11011       1           2022-04-09 60          933.50  174         4273.00
10308       2           2020-09-18 6           88.80   6           88.80
10625       2           2021-08-08 18          479.75  24          568.55
10759       2           2021-11-28 10          320.00  34          888.55
10926       2           2022-03-04 29          514.40  63          1402.95

In this query you can see two window functions using identical window specifications, including window partitioning, ordering and framing clauses. To shorten the query, you can use the WINDOW clause to name a window specification with all three elements, say as W, and then specify OVER W in both window functions, like so:

SELECT orderid, custid, orderdate, qty, val,
  SUM(qty) OVER W AS runsumqty,
  SUM(val) OVER W AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW W AS ( PARTITION BY custid 
              ORDER BY orderdate, orderid
              ROWS UNBOUNDED PRECEDING )
ORDER BY custid, orderdate, orderid;

As you can see, when the window name represents the whole window specification that you need and not just part of it, you specify the window name right after the OVER clause without parentheses.

You may have noticed in the WINDOW clause’s syntax one window name specification can have a reference to another window name. This is especially useful when your query has different window functions with different window specifications and one window specification is the same as part of another. Consider the following query as an example:

SELECT orderid, custid, orderdate, qty, val,
  ROW_NUMBER() OVER( PARTITION BY custid
                     ORDER BY orderdate, orderid ) AS ordernum,
  MAX(orderdate) OVER( PARTITION BY custid ) AS maxorderdate,
  SUM(qty) OVER( PARTITION BY custid 
                 ORDER BY orderdate, orderid
                 ROWS UNBOUNDED PRECEDING ) AS runsumqty,
  SUM(val) OVER( PARTITION BY custid           
                 ORDER BY orderdate, orderid   
                 ROWS UNBOUNDED PRECEDING ) AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
ORDER BY custid, orderdate, orderid;

This query generates the following output:

orderid  custid  orderdate  qty  val     ordernum  maxorderdate runsumqty   runsumval
-------- ------- ---------- ---- ------- --------- ------------ ----------- -----------
10643    1       2021-08-25 38   814.50  1         2022-04-09   38          814.50
10692    1       2021-10-03 20   878.00  2         2022-04-09   58          1692.50
10702    1       2021-10-13 21   330.00  3         2022-04-09   79          2022.50
10835    1       2022-01-15 17   845.80  4         2022-04-09   96          2868.30
10952    1       2022-03-16 18   471.20  5         2022-04-09   114         3339.50
11011    1       2022-04-09 60   933.50  6         2022-04-09   174         4273.00
10308    2       2020-09-18 6    88.80   1         2022-03-04   6           88.80
10625    2       2021-08-08 18   479.75  2         2022-03-04   24          568.55
10759    2       2021-11-28 10   320.00  3         2022-03-04   34          888.55
10926    2       2022-03-04 29   514.40  4         2022-03-04   63          1402.95

The MAX function’s window specification has only a window partition clause. The ROW_NUMBER function’s window specification has a window partition clause that is the same as the MAX function’s, plus a window order clause. Both SUM functions have the same window partition and order clauses as the ROW_NUMBER function’s, plus a window frame clause.

The recursive capability of the WINDOW clause’s syntax allows you to shorten the query’s code, like so:

SELECT orderid, custid, orderdate, qty, val,
  ROW_NUMBER() OVER PO AS ordernum,
  MAX(orderdate) OVER P AS maxorderdate,
  SUM(qty) OVER POF AS runsumqty,
  SUM(val) OVER POF AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW P AS ( PARTITION BY custid ),
       PO AS ( P ORDER BY orderdate, orderid ),
       POF AS ( PO ROWS UNBOUNDED PRECEDING )
ORDER BY custid, orderdate, orderid;

The order of the window name definitions in the WINDOW clause is insignificant. For example, the following code is valid and has the same meaning as the above query:

SELECT orderid, custid, orderdate, qty, val,
  ROW_NUMBER() OVER PO AS ordernum,
  MAX(orderdate) OVER P AS maxorderdate,
  SUM(qty) OVER POF AS runsumqty,
  SUM(val) OVER POF AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW POF AS ( PO ROWS UNBOUNDED PRECEDING ),
       PO AS ( P ORDER BY orderdate, orderid ),
       P AS ( PARTITION BY custid )
ORDER BY custid, orderdate, orderid;

Note, though, you can't use multiple window name references in one window name specification. You're limited to only one window name reference, plus any relevant additional window specification elements. For example, the following code isn’t valid for this reason:

SELECT orderid, custid, orderdate, qty, val,
  SUM(qty) OVER ( P O F ) AS runsumqty,
  SUM(val) OVER ( P O F ) AS runsumval
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW P AS ( PARTITION BY custid ),
       O AS ( ORDER BY orderdate, orderid ),
       F AS ( ROWS UNBOUNDED PRECEDING )
ORDER BY custid, orderdate, orderid;

This code generates the following error:

Msg 102, Level 15, State 1, Line 106
Incorrect syntax near 'O'.

You're allowed to mix one window name and additional windowing elements in a window specification, like so:

SELECT orderid, custid, orderdate, qty, val,
  ROW_NUMBER() OVER ( P ORDER BY orderdate, orderid ) AS ordernum,
  MAX(orderdate) OVER P AS maxorderdate
FROM Sales.OrderValues
WHERE custid IN (1, 2)
WINDOW P AS ( PARTITION BY custid )
ORDER BY custid, orderdate, orderid;

This query generates the following output:

orderid     custid      orderdate  qty         val     ordernum             maxorderdate
----------- ----------- ---------- ----------- ------- -------------------- ------------
10643       1           2021-08-25 38          814.50  1                    2022-04-09
10692       1           2021-10-03 20          878.00  2                    2022-04-09
10702       1           2021-10-13 21          330.00  3                    2022-04-09
10835       1           2022-01-15 17          845.80  4                    2022-04-09
10952       1           2022-03-16 18          471.20  5                    2022-04-09
11011       1           2022-04-09 60          933.50  6                    2022-04-09
10308       2           2020-09-18 6           88.80   1                    2022-03-04
10625       2           2021-08-08 18          479.75  2                    2022-03-04
10759       2           2021-11-28 10          320.00  3                    2022-03-04
10926       2           2022-03-04 29          514.40  4                    2022-03-04

As I mentioned before, when a window name represents the whole window specification, like with the MAX function in this query, you specify the window name right after the OVER clause without parentheses. When the window name is only part of the window specification, like with the ROW_NUMBER function in this query, you specify the window name followed by the rest of the windowing elements within parentheses.

By now, you know you're allowed to recursively define one window name based on another. However, in case it wasn’t obvious, cyclic references aren’t allowed. For example, the following query is valid since the window name definitions aren’t cyclic:

SELECT 'This is valid'
WINDOW W1 AS (), W2 AS (W1), W3 AS (W2);

This query generates the following output:

-------------
This is valid

However, the following query is invalid since the window name definitions are cyclic:

SELECT 'This is invalid'
WINDOW W1 AS (W2), W2 AS (W3), W3 AS (W1);

This code generates the following error:

Msg 5365, Level 15, State 1, Line 108
Cyclic window references are not permitted.

Lastly, the scope of the defined window names is the immediate query/table expression, and can't cross table expression boundaries. For instance, if you define a window name in the inner query of a CTE, derived table, view or inline table valued function, the outer query won’t recognize the inner window name. As an example, the following query is invalid for this reason:

WITH C AS
(
  SELECT orderid, custid, orderdate, qty, val,
    SUM(qty) OVER W AS runsumqtyall
  FROM Sales.OrderValues
  WHERE custid IN (1, 2)
  WINDOW W AS ( PARTITION BY custid 
                ORDER BY orderdate, orderid
                ROWS UNBOUNDED PRECEDING )
)
SELECT *,
  SUM(qty) OVER W AS runsumqty22
FROM C
WHERE orderdate >= '20220101';

This code generates the following error:

Msg 5362, Level 15, State 3, Line 172
Window 'W' is undefined.

You have to define a window name you want to use in each of the scopes where you want to use it, like so:

WITH C AS
(
  SELECT orderid, custid, orderdate, qty, val,
    SUM(qty) OVER W AS runsumqtyall
  FROM Sales.OrderValues
  WHERE custid IN (1, 2)
  WINDOW W AS ( PARTITION BY custid 
                ORDER BY orderdate, orderid
                ROWS UNBOUNDED PRECEDING )
)
SELECT *,
  SUM(qty) OVER W AS runsumqty22
FROM C
WHERE orderdate >= '20220101'
WINDOW W AS ( PARTITION BY custid 
              ORDER BY orderdate, orderid
              ROWS UNBOUNDED PRECEDING );

This query generates the following output:

orderid     custid      orderdate  qty         val     runsumqtyall runsumqty22
----------- ----------- ---------- ----------- ------- ------------ -----------
10835       1           2022-01-15 17          845.80  96           17
10952       1           2022-03-16 18          471.20  114          35
11011       1           2022-04-09 60          933.50  174          95
10926       2           2022-03-04 29          514.40  63           29

Each of the scopes defines its own window name W, and they don’t have to be based on the same specification (though they are in this example).

The Windowing NULL Treatment Clause

The NULL treatment clause is part of the ISO/IEC SQL standard and is available to the offset window functions FIRST_VALUE, LAST_VALUE, LAG and LEAD. This clause has the following syntax:

([, ]) [IGNORE NULLS | RESPECT NULLS] OVER(  )

The RESPECT NULLS option is the default, in case you don’t indicate this clause. It means you want the function to return the value of in the requested position (first, last, previous, next), whether it's NULL or non-NULL. The IGNORE NULLS option introduces a new capability that people have been eagerly waiting to have in T-SQL for a long time. It means you want the function to return the value of in the requested position if it's non-NULL. However, if it is NULL, you want the function to keep going in the relevant direction (backward for LAST_VALUE and LAG, forward for FIRST_VALUE and LEAD) until a non-NULL value is found. If no non-NULL value is found, then it will return a NULL.

To illustrate the utility of this clause, I’ll use a table called T1 in my examples. Use the following code to create and populate T1:

DROP TABLE IF EXISTS dbo.T1;

CREATE TABLE dbo.T1
(
  id INT NOT NULL CONSTRAINT PK_T1 PRIMARY KEY,
  col1 INT NULL,
  col2 INT NULL
);
GO

INSERT INTO dbo.T1(id, col1, col2) VALUES
  ( 2, NULL,  200),
  ( 3,   10, NULL),
  ( 5,   -1, NULL),
  ( 7, NULL,  202),
  (11, NULL,  150),
  (13,  -12,   50),
  (17, NULL,  180),
  (19, NULL,  170),
  (23, 1759, NULL);

Suppose the column id represents the chronological order of the events recorded in T1. Each row represents an event where one or more attribute values have changed. A NULL means the attribute retains whatever last non-NULL value it had up to that point.

Suppose you need to return the last known (non-NULL) col1 value per event. Without access to the NULL treatment clause, you'd need to use a fairly complex technique such as the following:

WITH C AS
(
  SELECT id, col1,
    MAX(CASE WHEN col1 IS NOT NULL THEN id END)
      OVER(ORDER BY id
           ROWS UNBOUNDED PRECEDING) AS grp
  FROM dbo.T1
)
SELECT id, col1,
  MAX(col1) OVER(PARTITION BY grp
                 ORDER BY id
                 ROWS UNBOUNDED PRECEDING) AS lastknowncol1
FROM C;

If you aren't already familiar with this technique, it can take a bit to figure out the logic here.

This code generates the following output:

id          col1        lastknowncol1
----------- ----------- -------------
2           NULL        NULL
3           10          10
5           -1          -1
7           NULL        -1
11          NULL        -1
13          -12         -12
17          NULL        -12
19          NULL        -12
23          1759        1759

Having access to the NULL treatment clause, you can easily achieve the same using the LAST_VALUE function with the IGNORE NULLS option, like so:

SELECT id, col1,
  LAST_VALUE(col1) IGNORE NULLS OVER( ORDER BY id ROWS UNBOUNDED PRECEDING ) AS lastknowncol
FROM dbo.T1;

The difference is of course more dramatic if you need to apply this logic to multiple attributes.

Without access to the NULL treatment clause, you'd use the following code to return the last known col1 and col2 values:

WITH C AS
(
  SELECT id, col1, col2,
    MAX(CASE WHEN col1 IS NOT NULL THEN id END)
      OVER(ORDER BY id
           ROWS UNBOUNDED PRECEDING) AS grp1,
    MAX(CASE WHEN col2 IS NOT NULL THEN id END)
      OVER(ORDER BY id
           ROWS UNBOUNDED PRECEDING) AS grp2
  FROM dbo.T1
)
SELECT id,
  col1,
  MAX(col1) OVER(PARTITION BY grp1
                 ORDER BY id
                 ROWS UNBOUNDED PRECEDING) AS lastknowncol1,
  col2,
  MAX(col2) OVER(PARTITION BY grp2
                 ORDER BY id
                 ROWS UNBOUNDED PRECEDING) AS lastknowncol2
FROM C;

This code generates the following output:

id          col1        lastknowncol1 col2        lastknowncol2
----------- ----------- ------------- ----------- -------------
2           NULL        NULL          200         200
3           10          10            NULL        200
5           -1          -1            NULL        200
7           NULL        -1            202         202
11          NULL        -1            150         150
13          -12         -12           50          50
17          NULL        -12           180         180
19          NULL        -12           170         170
23          1759        1759          NULL        170

I should also note even though the table T1 has a supporting covering index with id as the key, each of the last known attribute calculations in the query above results in an explicit sort operator in the plan, as shown in Figure 1.

Figure 1: Plan for query without the NULL treatment clause

This fact makes this solution quite expensive.

Here’s the alternative using the NULL treatment clause:

SELECT id, 
  col1, LAST_VALUE(col1) IGNORE NULLS OVER W AS lastknowncol1,
  col2, LAST_VALUE(col2) IGNORE NULLS OVER W AS lastknowncol2
FROM dbo.T1
WINDOW W AS ( ORDER BY id ROWS UNBOUNDED PRECEDING );

This solution is so much shorter and more elegant, and the optimization of the functions with this option can rely on an ordered scan of a supporting index, and thus avoid explicit sorting, as shown in the plan for this query in Figure 2.

Figure 2: Plan for query with the NULL treatment clause

As mentioned, the NULL treatment clause is available to all offset window functions (FIRST_VALUE, LAST_VALUE, LAG, and LEAD). Here’s an example using LAG to return the previous known value:

SELECT id, col1, 
  LAG(col1) IGNORE NULLS OVER ( ORDER BY id ) AS prevknowncol1
FROM dbo.T1;

This code generates the following output:

id          col1        prevknowncol1
----------- ----------- -------------
2           NULL        NULL
3           10          NULL
5           -1          10
7           NULL        -1
11          NULL        -1
13          -12         -1
17          NULL        -12
19          NULL        -12
23          1759        -12

Want to try to achieve the same without the NULL treatment clause? I bet you don’t!

Conclusion and Other T-SQL Improvements in SQL Server 2022

In this article I covered T-SQL improvements in SQL Server 2022 concerning window functions and NULL handling. I showed how to:

Reuse parts of—or entire—window definitions with the WINDOW clause
Control NULL treatment in offset window functions with the NULL treatment clause

SQL Server 2022 has additional T-SQL improvements, covered by Aaron Bertrand in this article:

GREATEST / LEAST
STRING_SPLIT
DATE_BUCKET
GENERATE_SERIES

The post T-SQL Windowing Improvements in SQL Server 2022 appeared first on SQLPerformance.com.

Simple Parameterization and Trivial Plans — Part 4

Paul White — Fri, 20 May 2022 13:35:37 +0000

The Compilation Process

The most important things to understand about server-side parameterization are it doesn’t happen all at once and a final decision to parameterize isn’t made until the end of the process.

Multiple compilation components are involved in successfully turning a SQL statement containing constants into a parameterized prepared statement with typed parameters. Each component is active at a different time during compilation and performs its own parameterization-related tasks using the information available at that time.

The scope and timing of each component’s activity explain the curious things seen in the previous parts of this series. Even when the parameterization attempt isn’t successful, there can still be effects from preparations for parameterization visible in the final execution plan.

There are a number of details to explore to gain a complete understanding. Let’s follow the main compilation stages noting the effects on parameterization as we go.

As in previous parts, most code examples use the Stack Overflow 2010 database on SQL Server 2019 CU 16 with the following additional nonclustered index:

CREATE INDEX [IX dbo.Users Reputation (DisplayName)] 
ON dbo.Users (Reputation) 
INCLUDE (DisplayName);

Database compatibility is set to 150, and the cost threshold for parallelism is set to 50 to avoid parallelism for the time being:

ALTER DATABASE StackOverflow2010
SET COMPATIBILITY_LEVEL = 150;
GO
EXECUTE sys.sp_configure
    @configname = 'show advanced options',
    @configvalue = 1;
RECONFIGURE;
GO
EXECUTE sys.sp_configure
    @configname = 'cost threshold for parallelism',
    @configvalue = 50;
RECONFIGURE;

Statement Parsing

The parser identifies constants in the statement and marks them as potential parameters if they are in an allowable context. For example, the constants in TOP (50) and CONVERT(varchar(11, x) are not parameterizable, but the literal values in WHERE x = 123 and WHERE y IN (3, 8) are.

Decisions about parameterization at this early stage are generic and conservative because the only information available is the statement itself. Names of tables and columns haven’t been resolved to database objects yet, and no type information is available.

Imagine someone handing you a query written for an unknown SQL Server database and asking you to identify parameterizable constants. That’s roughly the task facing the parser.

Parsing marks constants as potential parameters unless the current clause context forbids it, accommodating both simple and forced parameterization rules.

For example, AND x = 100 + 50 is not acceptable to forced parameterization, but simple parameterization allows it, so the parser marks both constants as potential parameters.

In contrast, the earlier example WHERE y IN (3, 8) is acceptable to forced parameterization, but not simple. Again, the parser marks both constants as potential parameters just in case.

Initial Data Types

Under simple parameterization, constants are assigned an initial data type based on the textual representation (see part two for details). The data type may then be refined depending on the context.

For example, in WHERE x = 5 the constant is initially parsed as an integer because the textual form is not surrounded by quotation marks and has no decimal point. The context is a comparison operator (equals), so the data type is shrunk to a tinyint. This is the smallest integer type able to contain the value 5.

As a second example, in the expression 123 + 456 both constants are initially typed as integer based on the textual representation. Neither is shrunk to a smaller integer subtype because the context is an arithmetical operation, not a comparison. This explains why the constant 7 was typed as an integer rather than tinyint in the arithmetic operators section of part two.

These rules might seem odd or arbitrary but they were created for SQL Server 7.0 where simple parameterization was a new feature, then called “automatic parameters” or “auto-parameterization”. The engine’s ability to match indexes and reason through implicit conversions has improved markedly since then but the parsing rules remain the same for compatibility.

CAST and CONVERT

In part two I described how these very specific inferred parameter data types could prevent plan reuse—a particular problem with numeric and decimal data types:

An obvious solution would be to provide an explicit type for each constant, but T-SQL doesn’t provide a way to do this for all constant values. The best we can do sometimes is add CAST or CONVERT, but this does not work well with simple parameterization for reasons I will now set out.

In principle, the parser could incorporate the CAST or CONVERT in its parameter data typing decision, but this would require either a call into the expression services component or an early round of constant folding.

Neither of these facilities are available to the parser. It’s simply too early in the process. For example, constant folding expects an operator tree that doesn’t exist yet. All things are possible with enough engineering effort of course, but as things stand the parser cannot use a wrapping CAST or CONVERT to determine the data type of a constant literal.

The end result today is a parameter with the original parser-derived type surrounded by an explicit CAST or CONVERT. This doesn’t solve the problem of plan reuse at all.

Example 1

This is a slightly simplified version of the decimal example from part two. An explicit CONVERT matching the column data type has been added around each constant in an attempt to promote plan reuse:

ALTER DATABASE SCOPED CONFIGURATION 
    CLEAR PROCEDURE_CACHE;
GO
DROP TABLE IF EXISTS dbo.Test;
GO
CREATE TABLE dbo.Test
(
    SomeValue decimal(19,8) NOT NULL
);
GO
SELECT T.SomeValue 
FROM dbo.Test AS T 
WHERE T.SomeValue = CONVERT(decimal(19,8), 1.23);
GO
SELECT T.SomeValue 
FROM dbo.Test AS T 
WHERE T.SomeValue = CONVERT(decimal(19,8), 12.345);
GO
SELECT T.SomeValue 
FROM dbo.Test AS T 
WHERE T.SomeValue = CONVERT(decimal(19,8), 123.4567);
GO

Let’s look at the plan cache:

SELECT
    CP.usecounts,
    CP.objtype,
    ST.[text]
FROM sys.dm_exec_cached_plans AS CP
CROSS APPLY sys.dm_exec_sql_text (CP.plan_handle) AS ST
WHERE 
    ST.[text] NOT LIKE '%dm_exec_cached_plans%'
    AND ST.[text] LIKE '%SomeValue%Test%'
ORDER BY 
    CP.objtype ASC;

It shows a prepared statement for each query:

This is the same outcome as before we added the CONVERT. The parameter data types are still different, so separate plans are cached and no plan reuse occurs.

Example 2

This is the other example from part two with a CONVERT added to match the integer type of the Reputation column:

ALTER DATABASE SCOPED CONFIGURATION 
    CLEAR PROCEDURE_CACHE;
GO
SELECT U.DisplayName
FROM dbo.Users AS U 
WHERE U.Reputation = CONVERT(integer, 252);
GO
SELECT U.DisplayName
FROM dbo.Users AS U 
WHERE U.Reputation = CONVERT(integer, 25221);
GO
SELECT U.DisplayName
FROM dbo.Users AS U 
WHERE U.Reputation = CONVERT(integer, 252552);
GO

As a reminder, without the CONVERT these statements resulted in three separate prepared cached plans due to the parser’s typing rules:

Let’s look at the plan cache after running the statements with the CONVERT:

SELECT
    CP.usecounts,
    CP.objtype,
    ST.[text]
FROM sys.dm_exec_cached_plans AS CP
CROSS APPLY sys.dm_exec_sql_text (CP.plan_handle) AS ST
WHERE 
    ST.[text] NOT LIKE '%dm_exec_cached_plans%'
    AND ST.[text] LIKE '%DisplayName%Users%'
ORDER BY 
    CP.objtype ASC;

We see one prepared statement used three times:

This result should surprise you. Didn’t we just establish adding a CAST or CONVERT doesn’t help plan reuse?

This case is slightly different though. The constants in these statements were initially typed by the parser as integer then shrunk to the smallest possible integer subtype (smallint or tinyint) capable of holding the specific value. The shrinking caused different prepared statements without the CONVERT.

Remember from earlier this shrinking happens under simple parameterization only when the parse context is a comparison operator. Without the CONVERT the immediate context for the constant is the equality comparison operator, so shrinking is applied.

With the CONVERT the immediate context is the conversion, which is not a comparison operator so no shrinking occurs. All three constants remain typed as integer resulting in a single prepared statement used three times.

As an aside, notice the explicit CONVERT to integer remains in the prepared statement text even though the parameter is typed as integer.

A CAST or CONVERT isn’t the only operator capable of preventing integer type shrinking by the parser. Anything that gets between the constant and a comparison operator will do the job, as long as the extra item is acceptable for simple parameterization.

For example, we could use FLOOR or ABS around the constant value—but not CEILING. The list of intrinsic functions compatible with simple parameterization is quite limited and undocumented.

Parameterization Attempts

If the parser encounters a syntax element that always prevents simple parameterization it sets a flag so components involved in later stages can avoid wasted effort.

These syntax checks are not exhaustive. For example, the presence of a subquery, TOP clause, or query hint is sufficient to set the flag, but an IN clause, constant-to-constant comparison, or disallowed intrinsic function like LOWER or CEILING is not.

There is a partial list of the syntax elements that prevent simple parameterization in Appendix A of the Microsoft Technical Paper Plan Caching and Recompilation in SQL Server 2012. The list is not complete or maintained, and doesn’t detail why each item excludes parameterization, or at what stage of the compilation process the test is applied.

When the parser decides simple parameterization is impossible, none of the auto-parameterization performance counters mentioned in part 3 are incremented.

In particular, the Auto-Param Attempts/sec counter of the SQL Statistics object is not incremented. This is the primary way to detect a statement with constants was determined unsuitable for simple parameterization by the parser.

If Auto-Param Attempts/sec is incremented, it means the parser was satisfied simple parameterization might succeed. Later components will determine the eventual outcome of the parameterization attempt, either Failed, Safe, or Unsafe. I will cover these details later in this series.

In either case, the parser performs the lightweight work to identify potential parameters and assign an initial data type. Partly this is due to the streaming nature of the parser—it might encounter constants in the token stream before anything that disallows simple parameterization. The work might still prove useful if forced parameterization is active, either at the database level, via a plan guide, or due to undocumented trace flag 144.

End of Part 4

In the next part of this series, I’ll continue the compilation process at the algebrization and normalization stages, showing how these components explain some of the curious things we’ve seen with simple parameterization and trivial plans.

The post Simple Parameterization and Trivial Plans — Part 4 appeared first on SQLPerformance.com.

Simple Parameterization and Trivial Plans — Part 3

Paul White — Mon, 18 Apr 2022 09:00:28 +0000

Execution Plans

It’s more complicated than you might expect to tell from the information provided in execution plans if a SQL statement uses simple parameterization. It’s no surprise even highly experienced SQL Server users tend to get this wrong, given the contradictory information often supplied to us.

Let’s look at some examples using the Stack Overflow 2010 database on SQL Server 2019 CU 14, with database compatibility set to 150.

To begin, we’ll need a new nonclustered index:

CREATE INDEX [IX dbo.Users Reputation (DisplayName)] 
ON dbo.Users (Reputation) 
INCLUDE (DisplayName);

1. Simple Parameterization Applied

This first example query uses simple parameterization:

SELECT U.DisplayName 
FROM dbo.Users AS U 
WHERE U.Reputation = 999;

The estimated (pre-execution) plan has the following parameterization-related elements:

Estimated plan parameterization properties

Notice the @1 parameter is introduced everywhere except the query text shown across the top.

The actual (post-execution) plan has:

Actual plan parameterization properties

Notice the properties window has now lost the ParameterizedText element, while gaining information about the parameter runtime value. The parameterized query text is now shown across the top of the window with ‘@1’ instead of ‘999’.

2. Simple Parameterization Not Applied

This second example does not use simple parameterization:

-- Projecting an extra column
SELECT 
    U.DisplayName, 
    U.CreationDate -- NEW
FROM dbo.Users AS U 
WHERE 
    U.Reputation = 999;

The estimated plan shows:

Estimated non-parameterized plan

This time, the parameter @1 is missing from the Index Seek tooltip, but the parameterized text and other parameter list elements are the same as before.

Let’s look at the actual execution plan:

Actual non-parameterized plan

The results are the same as the previous parameterized actual plan, except now the Index Seek tooltip displays the non-parameterized value ‘999’. The query text shown across the top uses the @1 parameter marker. The properties window also uses @1 and displays the runtime value of the parameter.

The query is not a parameterized statement despite all the evidence to the contrary.

3. Parameterization Failed

My third example is also not parameterized by the server:

-- LOWER function used
SELECT 
    U.DisplayName, 
    LOWER(U.DisplayName)
FROM dbo.Users AS U 
WHERE 
    U.Reputation = 999;

The estimated plan is:

Estimated plan parameterization failed

There’s no mention of a @1 parameter anywhere now, and the Parameter List section of the properties window is missing.

The actual execution plan is the same, so I won’t bother showing it.

4. Parallel Parameterized Plan

I want to show you one more example using parallelism in the execution plan. The low estimated cost of my test queries means we need to lower the cost threshold for parallelism to 1:

EXECUTE sys.sp_configure
    @configname = 'cost threshold for parallelism',
    @configvalue = 1;
RECONFIGURE;

The example is a bit more complex this time:

SELECT 
    U.DisplayName 
FROM dbo.Users AS U 
WHERE 
    U.Reputation >= 5 
    AND U.DisplayName > N'ZZZ' 
ORDER BY 
    U.Reputation DESC;

The estimated execution plan is:

Estimated parallel parameterized plan

The query text across the top remains unparameterized while everything else is. There are two parameter markers now, @1 and @2, because simple parameterization found two suitable literal values.

The actual execution plan follows the pattern of example 1:

Actual parallel parameterized plan

The query text across the top is now parameterized and the properties window contains runtime parameter values. This parallel plan (with a Sort operator) is definitely parameterized by the server using simple parameterization.

Reliable Methods

There are reasons for all the behaviours shown so far, and a few more besides. I’ll attempt to explain many of these in the next part of this series when I cover plan compilation.

In the meantime, the situation with showplan in general, and SSMS in particular, is less than ideal. It’s confusing for people who’ve been working with SQL Server their entire careers. Which parameter markers do you trust, and which ones do you ignore?

There are several reliable methods for determining if a particular statement had simple parameterization successfully applied to it or not.

Query Store

I’ll start with one of the most convenient, the query store. Unfortunately, it’s not always as straightforward as you might imagine.

You must enable the query store feature for the database context where the statement is executed and the OPERATION_MODE must be set to READ_WRITE, allowing the query store to actively collect data.

After meeting these conditions, post-execution showplan output contains extra attributes, including the StatementParameterizationType. As the name suggests, this contains a code describing the type of parameterization used for the statement.

It’s visible in the SSMS properties window when the root node of a plan is selected:

StatementParameterizationType

The values are documented in sys.query_store_query:

0 – None
1 – User (explicit parameterization)
2 – Simple parameterization
3 – Forced parameterization

This beneficial attribute only appears in SSMS when an actual plan is requested and missing when an estimated plan is selected. It’s important to remember the plan must be cached. Requesting an estimated plan from SSMS does not cache the plan produced (since SQL Server 2012).

Once the plan is cached, the StatementParameterizationType appears in the usual places, including via sys.dm_exec_query_plan.

You can also trust the other places parameterization type is recorded in the query store, such as the query_parameterization_type_desc column in sys.query_store_query.

One important caveat. When the query store OPERATION_MODE is set to READ_ONLY, the StatementParameterizationType attribute is still populated in SSMS actual plans—but it’s always zero—giving a false impression the statement was not parameterized when it might well have been.

If you’re happy enabling query store, are sure it’s read-write, and only look at post-execution plans in SSMS, this will work for you.

Standard Plan Predicates

The query text shown across the top of the graphical showplan window in SSMS isn’t reliable, as the examples have shown. Neither can you rely on the ParameterList displayed in the Properties window when the root node of the plan is selected. The ParameterizedText attribute shown for estimated plans only is also not conclusive.

You can, however, rely on the properties associated with individual plan operators. The given examples show these are present in the tooltips when hovering over an operator.

A predicate containing a parameter marker like @1 or @2 indicates a parameterized plan. The operators most likely to contain a parameter are Index Scan, Index Seek, and Filter.

Predicates with parameter markers

If the numbering starts with @1, it uses simple parameterization. Forced parameterization begins with @0. I should mention the numbering scheme documented here is subject to change at any time:

Change warning

Nevertheless, this is the method I use most often to determine if a plan was subject to server-side parameterization. It’s generally quick and easy to check a plan visually for predicates containing parameter markers. This method also works for both types of plans, estimated and actual.

Dynamic Management Objects

There are several ways to query the plan cache and related DMOs to determine if a statement was parameterized. Naturally, these queries only work on plans in cache, so the statement must have been executed to completion, cached, and not subsequently evicted for any reason.

The most direct approach is to look for an Adhoc plan using an exact SQL textual match to the statement of interest. The Adhoc plan will be a shell containing a ParameterizedPlanHandle if the statement is parameterized by the server. The plan handle is then used to locate the Prepared plan. An Adhoc plan will not exist if the optimize for ad hoc workloads is enabled, and the statement in question has only executed once.

This type of enquiry often ends up shredding a significant amount of XML and scanning the entire plan cache at least once. It’s also easy getting the code wrong, not least because plans in cache cover an entire batch. A batch may contain multiple statements, each of which may or may not be parameterized. Not all the DMOs work at the same granularity (batch or statement) making it quite easy to come unstuck.

An efficient way to list statements of interest, together with plan fragments for just those individual statements, is shown below:

SELECT
    StatementText =
        SUBSTRING(T.[text], 
            1 + (QS.statement_start_offset / 2), 
            1 + ((QS.statement_end_offset - 
                QS.statement_start_offset) / 2)),
    IsParameterized = 
        IIF(T.[text] LIKE N'(%',
            'Yes',
            'No'),
    query_plan = 
        TRY_CONVERT(xml, P.query_plan)
FROM sys.dm_exec_query_stats AS QS
CROSS APPLY sys.dm_exec_sql_text (QS.[sql_handle]) AS T
CROSS APPLY sys.dm_exec_text_query_plan (
    QS.plan_handle, 
    QS.statement_start_offset, 
    QS.statement_end_offset) AS P
WHERE 
    -- Statements of interest
    T.[text] LIKE N'%DisplayName%Users%'
    -- Exclude queries like this one
    AND T.[text] NOT LIKE N'%sys.dm%'
ORDER BY
    QS.last_execution_time ASC,
    QS.statement_start_offset ASC;

To illustrate, let’s run a single batch containing the four examples from earlier:

ALTER DATABASE SCOPED CONFIGURATION 
    CLEAR PROCEDURE_CACHE;
GO
-- Example 1
SELECT U.DisplayName 
FROM dbo.Users AS U 
WHERE U.Reputation = 999;

-- Example 2
SELECT 
    U.DisplayName, 
    U.CreationDate 
FROM dbo.Users AS U 
WHERE 
    U.Reputation = 999;

-- Example 3
SELECT 
    U.DisplayName, 
    LOWER(U.DisplayName)
FROM dbo.Users AS U 
WHERE 
    U.Reputation = 999;

-- Example 4
SELECT 
    U.DisplayName 
FROM dbo.Users AS U 
WHERE 
    U.Reputation >= 5 
    AND U.DisplayName > N'ZZZ' 
ORDER BY 
    U.Reputation DESC;
GO

The output of the DMO query is:

DMO query output

This confirms only examples 1 and 4 were successfully parameterized.

Performance Counters

It’s possible to use the SQL Statistics performance counters to get a detailed insight into parameterization activity for both estimated and actual plans. The counters used aren’t scoped per-session, so you’ll need to use a test instance with no other concurrent activity to get accurate results.

I’m going to supplement the parameterization counter information with data from the sys.dm_exec_query_optimizer_info DMO to provide statistics on trivial plans as well.

Some care is needed to prevent statements reading the counter information from modifying those counters themselves. I’m going to address this by creating a couple of temporary stored procedures:

CREATE PROCEDURE #TrivialPlans
AS
SET NOCOUNT ON;

SELECT
    OI.[counter],
    OI.occurrence
FROM sys.dm_exec_query_optimizer_info AS OI
WHERE
    OI.[counter] = N'trivial plan';
GO
CREATE PROCEDURE #PerfCounters
AS
SET NOCOUNT ON;

SELECT
    PC.[object_name],
    PC.counter_name,
    PC.cntr_value
FROM 
    sys.dm_os_performance_counters AS PC
WHERE 
    PC.counter_name LIKE N'%Param%';

The script to test a particular statement then looks like this:

ALTER DATABASE SCOPED CONFIGURATION 
    CLEAR PROCEDURE_CACHE;
GO
EXECUTE #PerfCounters;
EXECUTE #TrivialPlans;
GO
SET SHOWPLAN_XML ON;
GO
-- The statement(s) under test:
-- Example 3
SELECT 
    U.DisplayName, 
    LOWER(U.DisplayName)
FROM dbo.Users AS U 
WHERE 
    U.Reputation = 999;
GO
SET SHOWPLAN_XML OFF;
GO
EXECUTE #TrivialPlans;
EXECUTE #PerfCounters;

Comment the SHOWPLAN_XML batches out to run the target statement(s) and get actual plans. Leave them in place for estimated execution plans.

Running the whole thing as written gives the following results:

Performance counter test results

I’ve highlighted above where values changed when testing example 3.

The increase in the “trivial plan” counter from 1050 to 1051 shows a trivial plan was found for the test statement.

The simple parameterization counters increased by 1 for both attempts and failures, showing SQL Server tried to parameterize the statement, but failed.

End of Part 3

In the next part of this series, I’ll explain the curious things we’ve seen by describing how simple parameterization and trivial plans interact with the compilation process.

If you changed your cost threshold for parallelism to run the examples, remember to reset it (mine was set to 50):

EXECUTE sys.sp_configure
    @configname = 'cost threshold for parallelism',
    @configvalue = 50;
RECONFIGURE;

The post Simple Parameterization and Trivial Plans — Part 3 appeared first on SQLPerformance.com.