SQL Server 2016 – SQL TECH BLOG

Query tuning 101: What’s a probe residual?

Daniel Janik — Mon, 17 Apr 2017 17:08:16 +0000

Query tuning is an important process that will probably never go away and sharpening your tuning skills is always a good idea.

I’ve spoken on this topic many times and talked about probe residuals before. Mostly at SQL Saturday events and for some online webinars.

A probe residual is important because they can indicate key performance problems that might not otherwise be brought to your attention.

What is a probe residual?

Simply put, a probe residual is an extra operation that must be performed to compete the matching process. Extra being left over things to do.

Let’s look at the following example:

SELECT   P.Name,
D.OrderQty,
D.UnitPrice
FROM     Production.ProductArchive2013 AS P
INNER JOIN Sales.SalesOrderDetail AS D ON (p.ProductID = D.ProductID)

SELECT   P.Name,
D.OrderQty,
D.UnitPrice
FROM     Production.ProductArchive2014 AS P
INNER JOIN Sales.SalesOrderDetail AS D ON (p.ProductID = D.ProductID)
GO

These two queries are the same minus selecting from two different productarchive tables. Note that the top query has a slightly lower cost.

What’s different?

Without knowing about Probe Residual you’d think these plans were the same when in fact they are not.

On further investigation, the Hash Match operator reveals a residual operation:

Why is that and what is that?

In this particular case, the residual operation is due to an implicit conversion; though, this isn’t indicated anywhere in the query plan whatsoever. The implicit isn’t even found searching through the plan XML.

Reviewing the table DDL shows that the ProductID column of the ProductArchive2014 was created with BIGINT and it’s INT everywhere else.

CREATE TABLE [Production].[ProductArchive2014](
[ProductID] [bigint] NOT NULL, …

CREATE TABLE [Production].[ProductArchive2013](
[ProductID] [int] NOT NULL, …

CREATE TABLE [Sales].[SalesOrderDetail](
…
[ProductID] [int] NOT NULL, …

Simply put, this query could be faster by fixing the data types and the Probe Residual helps reveal this.

Note that not every Probe Residual will be caused by an implicit conversion; however, they are a noteworthy tuning item to review.

For more about query plans be sure to check out my other site: How’s My Plan (howsmyplan.com)

DBA 101: Comparing two SQL plans

Daniel Janik — Fri, 14 Apr 2017 20:24:26 +0000

Simple post today: How to compare two plans in SSMS.

This task is a task that is often done by a DBA or developer who is wanting to compare changes of an index or perhaps even comparing queries between DEV and PROD.

As an example, we take the following AdventureWorks2012 query:

SELECT DISTINCT City
FROM Person.Address
WHERE ModifiedDate = ‘2007-05-01’

And save the query plan as a .SQLPLAN file.

Having noted the KEY LOOKUP operator, I know this query can be improved with by adding City to the ncl_demo index as an included column.

For fun, I also ran the plan through my query plan website: HowsMyPlan.com and it noted the same concern.

After adding the index, here’s the new result:

Let’s say you want to compare these two plans and send the results to your team for training or whatever other purpose.

SSMS (SQL Server Management Studio) allows you to compare query plans with a simple right click context menu.

Clicking on “Compare Showplan” opens a file dialog which will allow you to choose a previously saved .SQLPLAN file.

Once selected, the “Showplan Comparison” tab will open with a top and bottom plan and details listed side by side. Clicking the operators in the graphical plan will change the details list and allow you to easily compare the data between each plan.

I hope you found this simple post helpful. Have a great weekend and be sure to check out HowsMyPlan.com

Also feel free to add me on LinkedIn and Twitter.

Using Memory Optimized tables for reporting

Daniel Janik — Tue, 27 Dec 2016 20:03:40 +0000

I recently worked on a project for fraud. This project needed to relate 67 million accounts to one another “Kevin Bacon” style. This means that each unique account may tie to another account on ssn, email, home phone, business phone, or any other PII type metric.

The goal is to catch accounts that may be fraud by relating the data together in levels of separation from a point of entry such as a single email address.

This process took more than 3 days to execute, relating each account to one another but didn’t didn’t provide levels and was somewhat inaccurate. To top this off it also only included a small subset of accounts (4.7 mil) from a single region.

This process needed to relate all 67 million accounts, be dynamic, and include the levels of separation.

I was able to get the initial matching process down to 5 hours for all regions. This was down from unfinished after 3 days for one region. After quite a bit of tuning the dynamic matching was taking 15 seconds which I didn’t feel good enough about. It needed to be faster.

How we go from days to seconds

I thought I’d do an experiment with Hekaton (Memory Optimized Tables) to see if I could speed up this process. Large groups with many levels of separation were taking minutes to match with the largest taking more than 30 minutes. Could Hekaton make this process faster?

It’s just for OLTP right? It’s not useful for reporting’ or, is it?

Experimenting paid off in a big way! The memory optimized table was able to gobble up the code and spit out a result in sub-second times!! My report query that took 30 minutes now only took 3 seconds.

The Demo setup

I’ve recreated my own demo scenario so the client’s code is not revealed. In this demo I created two tables: Customer & CustomerMO.

CREATE TABLE Customer
( accountnum varchar(17) PRIMARY KEY CLUSTERED,
companyname varchar(50),
Fname varchar(20),
lname varchar(20),
SSN char(9),
Homephone varchar(15),
Businessphone varchar(15),
Mobilephone varchar(15),
EmailAddress varchar(100)
)

Creating a memory optimized table is a lot different and there are quite a few limitations.

First you need a filegroup that supports memory optimized data

–Add MEMORY_OPTIMIZED_DATA filegroup to the database.
ALTER DATABASE EpicFail
ADD FILEGROUP fgMO CONTAINS MEMORY_OPTIMIZED_DATA

–Add file to the MEMORY_OPTIMIZED_DATA filegroup.
ALTER DATABASE EpicFail
ADD FILE
( NAME = moFile,
FILENAME = N’C:\Data\moFile.mdf’)
TO FILEGROUP fgMO

Next we create the table

–Create memory optimized table and indexes on the memory optimized table.
CREATE TABLE CustomerMO
( accountnum varchar(17) PRIMARY KEY NONCLUSTERED,
companyname varchar(50),
Fname varchar(20),
lname varchar(20),
SSN char(9),
Homephone varchar(15),
Businessphone varchar(15),
Mobilephone varchar(15),
EmailAddress varchar(100)
INDEX ix_moh_CustomerMo HASH (accountnum) WITH (BUCKET_COUNT = 5000000)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)

Next I used dbForge Data Generator to load 1,000,000 realistic rows of data. I attempted to do 5 million at first but my Surface with only 8 GB of RAM couldn’t handle the memory optimized table that large. Not enough RAM.

The data was loaded into Customer and then directly copied into CustomerMO to ensure the data was the same for fairness.

The comparison

A quick comparison shows some interesting results.

The emailaddress non-clustered index was created with only “emailaddress”. Note that “select *” results in a key lookup for the traditional table but does not for the memory optimized.
The memory optimized table is blazingly fast comparatively.

Next we add accountnum to the query and the performance gap is even wider.

For the matching process itself, it’s required that each entry point be matched with rows and those rows be matched with rows until no more are matched. Again, matching is based off PII data; so, for this example We start with loading all records with an email address of ‘[email protected]’ into #accts.

Next we loop over queries that join the accounts based on homephone = homephone, homephone = businessphone, businessphone = homephone, email address = email address, and SSN = SSN. Looping is an expensive process but for this it is a must. Windowing functions are too limited to show what was matched on what at which level, remember that we’re matching on many things for each level.

As you can see by the example image the cost for the Memory Optimized query is much less and as data is added to #accts for each new level of separation discovered the top query becomes slower and slower.

I wouldn’t have thought that Hekaton could take my report query down from 30+ min to 3 seconds but in the end it did. *Note that the source data is static and repopulated just twice a week. With that said I didn’t bother looking into any limitations that “report style” queries may cause OLTP operations. I’ll leave that to you.

What are some key concerns/limitations of memory optimized tables?

Cross database transactions are not allowed.
Some complex queries seem to confuse Memory Optimized tables. Consider simplifying your queries for the best performance.
Cannot truncate Memory Optimized tables.
Identity columns are limited.
FK’s are limited.

For a full list of limitations check out the following MSDN resource: https://msdn.microsoft.com/en-us/library/dn246937.aspx

I am an independent consultant. If you’d like to work with me or need help tuning your environment, please reach out on LinkedIn, Twitter, or [email protected].

What keeps clearing my query stats?

Daniel Janik — Thu, 08 Dec 2016 21:00:37 +0000

I was building out a blog post about cardinality today and noticed that sys.dm_exec_query_stats was getting cleared my my Surface Pro 3 dev instance about every 30 to 60 seconds.

I opened object explorer to check if any jobs were running in SQL Agent that may be running DBCC FREEPROCCACHE

I know that sounds odd but I often demo random things in the name of performance and this sounds like a “hands on troubleshooting” demo I may have setup.

SQL Agent was actually disabled so this was not the issue.

*Note: Don’t run DBCC FREEPROCCACHE in production it does exactly what it sounds like it will do.

Next stop: SQL Server Logs. There were no messages about the cache being cleared. If you’re not sure what to look for in the logs here’s an example:

Again, this was not found in today’s log. What I did notice is plenty of messages reading “A significant part of sql server process memory has been paged out. This may result in a performance degradation.”

Could that cause the proc cache to clear?

Since the cache was getting cleared so quickly it was easiest to repeat the following query over and over until the cache cleared, which I’d know happened because the query would return 0 rows.

select plan_handle from sys.dm_exec_query_stats

I wouldn’t query sys.dm_exec_query_stats without a where clause on a production server since the results would probably be too large. This method worked out for this case since this is my dev box and I’m the only one using it.

As soon as the query stats dmv was cleared, I checked the log and sure enough; the message was there at the exact time the dmv cleared.

How do I resolve those messages?

Lock Pages in Memory will resolve the “A significant part of sql server process memory has been paged out. This may result in a performance degradation.” messages.

If you’re using SQL 2016 you can use the following query to determine if Lock Pages is enabled:

SELECT sql_memory_model_desc
FROM sys.dm_os_sys_info

Enable Lock Pages in Memory by doing the following:

On the Start menu, click Run. In the Open box, type gpedit.msc.
On the Local Group Policy Editor console, expand Computer Configuration, and then expand Windows Settings.
Expand Security Settings, and then expand Local Policies.
Select the User Rights Assignment folder.The policies will be displayed in the details pane.
In the pane, double-click Lock pages in memory.
In the Local Security Setting – Lock pages in memory dialog box, click Add User or Group.
In the Select Users, Service Accounts, or Groups dialog box, add an account with privileges to run sqlservr.exe.

Why is paging out SQL Server’s process memory a bad thing?

Well, to start with, you can see that the proc cache got flushed and that’s probably not good for performance. To dig any deeper we need to understand how memory works in Windows.

Let’s say you have Notepad.EXE open and Windows orders a trim of Notepad’s working set (RAM).

Which of the following can be paged out?

The associated DLL files
The EXE file
The unsaved text you’ve written
All text in the open document
All of the above

Take a moment and answer.

Did you say 5. All of the above?

If you did, I’m sorry that is not correct.

The answer is 3. The unsaved text. This is because the Page File only holds data that has been modified. If you’re going to free up RAM by swapping out used RAM pages for pages in the Page File on disk, why would you move the EXE or DLLs into the Page File? They’re already unmodified and on disk.

When we see the “A significant part of sql server process memory has been paged out. This may result in a performance degradation.” message, this indicates that memory has been swapped out to disk and cannot be used again until it is swapped back into RAM. This process can be time consuming and expensive.

My Surface was reporting 58% memory consumed. With so little RAM used you wouldn’t think that Windows would trigger a trim. After all SQL Server was only set to a maximum memory of 1024 MB. Also, increasing the maximum memory to 4096 MB did not help this problem. This is because we’re not talking about memory pressure in SQL, we’re talking about memory pressure in Windows.

Yet; after configuring Locked Pages In Memory for the SQL Server service account, the warning messages stopped appearing and my machine is still performing the same as before.

It’s usually a best practice to enable this policy but I never thought I’d need it on my dev/demo instance.

If you liked this post be sure to follow me on Twitter and LinkedIn. I always welcome endorsements and new followers.

Query tuning 101: How to measure Query Plan I/O cost

Daniel Janik — Fri, 18 Nov 2016 03:47:02 +0000

If you’re a DBA or a developer chances are you’ve looked at a query plan or two. While looking at your plan you may have noticed that each operator has a cost. Did you know that the cost is measurable?

When you’re looking at your plan where do you start? Do you look at the highest cost operator? Do you look for key or RID lookups? Do you look for scans? There’s a lot of places to start and each of these are valid. I generally look for Spools, Filters, Probe Residuals, Warnings, and more. Look for posts on each of these in the future. You can also read my other posts on Query tuning here: SQL Server Central and SQL Tech Blog.com.

When looking at an operator such as an index seek or index scan you’ll notice that there are a few metrics for cost. One of those is the cost for IO. What would you say if I told you I could tell you what the cost of a scan was before you ever ran the query? This is another little gem that I can credit to my friend Joe Sack.

Let’s take a look at the following AdventureWorks2012 query:

SELECT ModifiedDate, rowguid
FROM Person.Address

When we analyze the Clustered Index Scan we find that the Estimated I/O Cost is 0.257199.

How much does an I/O cost?

I/O cost can be measured with two different measurements (Random and Sequential). A single random I/O costs 0.003125, while a single sequential I/O costs 0.000740741.

Keep in mind the plan does not know if the page(s) are in cache or somewhere out on storage. These numbers are what the plan thinks it will need. Also note that the I/O costs do not take your storage hardware into account. It assumes one for all so SSD and Flash get the same score as spinning disks.

Let’s put this to a test with the following query:

SELECT City
FROM Person.Address
WHERE AddressID = 207

Below you’ll see that the cost is 0.003125 for reading the single data page.

Now let’s examine the math for a scan. You should have a single random I/O at 0.003125 followed by sequential I/O’s at 0.000740741 for the rest of the index. Since we know the cost of each I/O type, we can determine the cost of the scan ahead of time using the following query:

SELECT index_id,
in_row_data_page_count,
.003125 + (.000740741 * (in_row_data_page_count-1)) as [Cost for Scan]
FROM sys.dm_db_partition_stats
WHERE object_id = OBJECT_ID(‘Person.Address’)

We can see that the cost of the Clustered Index Scan we did earlier would be 0.257199.

That’s great but now what?

Don’t get me wrong, this is not the first place I’d start with tuning; however, this knowledge can be helpful for a whole host of things. Knowing this, I’d ask “What if the cost of the scan was 5.0 or greater? What affect may that have on the optimization process and parallelism?”

I’d like to hear more from you. Please feel free to comment and let me know your thoughts.

Also, be sure to follow me at SQLTechBlog.com, Twitter, and LinkedIn.

Query tuning 101: Problems with IN ()

Daniel Janik — Thu, 10 Nov 2016 12:00:01 +0000

If you’re a DBA or database developer, chances are you will have to write or rewrite queries from time to time. Rewriting may be for new features or simply to improve the performance of existing code.

There are tens of thousands of tips and tricks for tuning. Just about every one of them you can find online. Though I still see many developers writing queries in less than efficient ways. I’ve even been told that tuning doesn’t matter much anymore since the advent of SSD and FLASH storage. If only this were true.

Today I’d like to discuss IN (Transact-SQL) and a neat tip that I learned from Joe Sack. Using IN with a list of values is the same as a list of OR’s.

Let’s take a look at the following queries:

IN with 15 vs 16 values

SELECT SalesOrderID, ProductID, UnitPrice
FROM Sales.SalesOrderDetail
WHERE ModifiedDate in (‘2007-01-05’, ‘2007-01-17’, ‘2007-01-26’, ‘2007-02-01’, ‘2007-02-04’, ‘2007-02-14’, ‘2007-02-28’, ‘2007-03-06’, ‘2007-03-07’, ‘2007-03-10’, ‘2007-04-15’, ‘2007-04-17’, ‘2007-08-08’, ‘2007-11-11’, ‘2007-12-25’)

SELECT SalesOrderID, ProductID, UnitPrice
FROM Sales.SalesOrderDetail
WHERE ModifiedDate in (‘2007-01-05′, ‘2007-01-17’, ‘2007-01-26’, ‘2007-02-01’, ‘2007-02-04’, ‘2007-02-14’, ‘2007-02-28’, ‘2007-03-06’, ‘2007-03-07’, ‘2007-03-10’, ‘2007-04-15’, ‘2007-04-17’, ‘2007-08-08’, ‘2007-11-11’, ‘2007-12-25’, ‘2007-12-26’)

Normally, you’d think that these two queries would run with the exact same plan. After all, they are the same query but one has 16 values in the IN while the other only has 15. Let’s examine the two plans.

As you can see the second query is much slower and the extra value in the IN caused late filtering. This is a limitation on some types of operators such as this clustered index scan.

There isn’t just a limitation of 15 input values. There’s also one at 64. On the 65th input value the list will be converted to a constant scan which is then sorted and joined. Most interestingly enough is that the list in my demo query is already sorted ascending.

IN with 64 vs 65 values

SELECT DISTINCT city
FROM Person.Address
WHERE city in (
‘Boulogne-Billancourt’,’Boulogne-sur-Mer’,’Bountiful’,
‘Bracknell’,’Bradenton’,’Braintree’,’Brampton’,
‘Branch’,’Branson’,’Braunschweig’,’Bremerton’,
‘Brisbane’,’Brossard’,’Burbank’,’Burien’,
‘Burlingame’,’Burnaby’,’Bury’,’Byron’,’Calgary’,
‘Caloundra’,’Camarillo’,’Cambridge’,’Campbellsville’,
‘Canoga Park’,’Carnation’,’Carol Stream’,’Carrollton’,
‘Carson’,’Casper’,’Cedar City’,’Cedar Park’,
‘Central Valley’,’Cergy’,’Cerritos’,’Chalk Riber’,
‘Chandler’,’Chantilly’,’Charlotte’,’Chatou’,
‘Cheektowaga’,’Chehalis’,’Cheltenham’,’Cheyenne’,
‘Chicago’,’Chula Vista’,’Cincinnati’,’Citrus Heights’,
‘City Of Commerce’,’Clackamas’,’Clarkston’,’Clay’,
‘Clearwater’,’Cliffside’,’Cloverdale’,’Coffs Harbour’,
‘College Station’,’Colma’,’Colombes’,’Colomiers’,
‘Columbus’,’Concord’,’Coronado’,’Corpus Christi’
)

SELECT DISTINCT city
FROM Person.Address
WHERE city in (
‘Boulogne-Billancourt’,’Boulogne-sur-Mer’,’Bountiful’,
‘Bracknell’,’Bradenton’,’Braintree’,’Brampton’,
‘Branch’,’Branson’,’Braunschweig’,’Bremerton’,
‘Brisbane’,’Brossard’,’Burbank’,’Burien’,
‘Burlingame’,’Burnaby’,’Bury’,’Byron’,’Calgary’,
‘Caloundra’,’Camarillo’,’Cambridge’,’Campbellsville’,
‘Canoga Park’,’Carnation’,’Carol Stream’,’Carrollton’,
‘Carson’,’Casper’,’Cedar City’,’Cedar Park’,
‘Central Valley’,’Cergy’,’Cerritos’,’Chalk Riber’,
‘Chandler’,’Chantilly’,’Charlotte’,’Chatou’,
‘Cheektowaga’,’Chehalis’,’Cheltenham’,’Cheyenne’,
‘Chicago’,’Chula Vista’,’Cincinnati’,’Citrus Heights’,
‘City Of Commerce’,’Clackamas’,’Clarkston’,’Clay’,
‘Clearwater’,’Cliffside’,’Cloverdale’,’Coffs Harbour’,
‘College Station’,’Colma’,’Colombes’,’Colomiers’,
‘Columbus’,’Concord’,’Coronado’,’Corpus Christi’,’Seattle’
)

Here we see a significant difference between the two query plans by simply adding ‘Seattle’ to the list.

How do we work around these limitations?

I’ve always liked joining lists instead of using IN and NOT IN. For IN a simple INNER JOIN works great. For NOT IN a LEFT JOIN WHERE VALUE IS NULL works great as well. Let’s look at the IN

DECLARE @in TABLE (city nvarchar(60))

INSERT INTO @in
VALUES (‘Boulogne-Billancourt’),(‘Boulogne-sur-Mer’),(‘Bountiful’),
(‘Bracknell’),(‘Bradenton’),(‘Braintree’),(‘Brampton’),
(‘Branch’),(‘Branson’),(‘Braunschweig’),(‘Bremerton’),
(‘Brisbane’),(‘Brossard’),(‘Burbank’),(‘Burien’),
(‘Burlingame’),(‘Burnaby’),(‘Bury’),(‘Byron’),(‘Calgary’),
(‘Caloundra’),(‘Camarillo’),(‘Cambridge’),(‘Campbellsville’),
(‘Canoga Park’),(‘Carnation’),(‘Carol Stream’),(‘Carrollton’),
(‘Carson’),(‘Casper’),(‘Cedar City’),(‘Cedar Park’),
(‘Central Valley’),(‘Cergy’),(‘Cerritos’),(‘Chalk Riber’),
(‘Chandler’),(‘Chantilly’),(‘Charlotte’),(‘Chatou’),
(‘Cheektowaga’),(‘Chehalis’),(‘Cheltenham’),(‘Cheyenne’),
(‘Chicago’),(‘Chula Vista’),(‘Cincinnati’),(‘Citrus Heights’),
(‘City Of Commerce’),(‘Clackamas’),(‘Clarkston’),(‘Clay’),
(‘Clearwater’),(‘Cliffside’),(‘Cloverdale’),(‘Coffs Harbour’),
(‘College Station’),(‘Colma’),(‘Colombes’),(‘Colomiers’),
(‘Columbus’),(‘Concord’),(‘Coronado’),(‘Corpus Christi’),(‘Seattle’)

SELECT DISTINCT a.city
FROM Person.Address a
INNER JOIN @in i on (i.city = a.City)

Since we don’t have a table to join with already we create a table variable and insert the 65 values. Then we join the table variable to our query and you can see that this approach appears to cost much less. Notice that the IN forces an index scan while the join uses an index seek.

Summary

I hope you’ve found this post useful. We all know the saying “It depends”. It’s a phrase to live by in SQL Server and one you should always keep in mind. Remember that adequate testing is always needed for any proper tuning job.

When tuning or writing new queries take this tip into account. IN and OR can be limited and these limitations can slow your query. Test with an IN and with a JOIN to see which is better for your environment.

If you liked this post you may want to read some more from me at: http://www.sqlservercentral.com/blogs/confessions-of-a-microsoft-addict/ and https://sqltechblog.com/

I also invite you to follow me on Twitter and LinkedIn.

Why I don’t love SELECT INTO

Daniel Janik — Thu, 20 Oct 2016 11:30:00 +0000

Long ago there were locking / blocking problems with the SELECT INTO statement. That’s not the case anymore and for AdHoc operations and investigation of data SELECT INTO is very helpful.

If you’re not aware of what SELECT INTO is or its benefits here’s an example:

SELECT City, COUNT(1) as CNT
INTO #MyDuplicateCities
FROM Person.Address
GROUP BY City
HAVING COUNT(1) > 1

This query for AdventureWorks will dump all of its results into a table named #MyDuplicateCities. Note that there is no CREATE TABLE statement. The INTO [tablename] will create the table for you.

Running this query a second time will result in failure if you haven’t dropped the #MyDuplicateCities table.

Using this syntax can be really helpful if you just need to do some quick and dirty cleanup; however, it should be avoided for stored procedures. Here’s why…

Have you ever tried to get an Estimated plan from a stored procedure and it results in an ERROR 208 Invalid object name?

When you’re attempting to get an estimated plan the query / stored procedure isn’t actually running. Since the DDL doesn’t exist SQL Server errors with a 208 message and cannot continue to analyze the statement.

Creating the DDL at the beginning of your procedure will always ensure that the estimated plan can be retrieved and as an added bonus the code will be easier to debug for someone that isn’t familiar with the data.

Summing up

Avoid using SELECT INTO for code you’re going to push to production for the following reasons:

It can make estimated plans fail.
It’s slightly harder to debug or modify since the schema isn’t easily shown.

Be lazy and efficient when investigating data. For code that you’re not going to reuse SELECT INTO is a champ. It’s faster than building the typing the CREATE TABLE statement and works just as well for AdHoc scenarios.

*Update for Mr Ozar*

When I originally wrote this post I thought I did include a counter point or at least some mention of parallelism for SELECT INTO as a benefit. I think it’s easy to say that in the world of SQL Server “It depends.” and this is always why a talking point like this has two sides and both are valid. I typically always select the highest performing path. I think it’s up to you, the reader, to decide which is best for your environment. Given the new Parallel Insert operation in SQL Server 2016, I’d say why not create your temp table first and have the best of both worlds?

Read this before using SQL 2016 Temporal Tables

Daniel Janik — Mon, 17 Oct 2016 11:30:00 +0000

I’ve been testing the new Temporal Tables feature over the past day to see about using it in one of our production databases. It’s a neat feature that honestly adds a boat load of possibility around logging.

In my testing I noticed that user created tables seem to store the rows over quite a bit more pages. User created history tables were nearly double the size of an auto generated one. If you’re currently using the feature or plan to use it in the near future, you’ll want to think about this storage issue before you implement.

If you’re not aware of what Temporal tables are, they are like a running history of your table. You can read more about them here:

https://msdn.microsoft.com/en-us/library/dn935015.aspx

User created vs System generated history tables?

With Temporal Tables, you can create you history table in one of three ways:

Create the table yourself ahead of time and then reference it.
Don’t specify a history table at all and SQL 2016 will create a new table with a cryptic system generated name. – ick
Specify a name and SQL 2016 will create the table for you.

Each of these are done with the ALTER TABLE statement.

ALTER TABLE dbo.Address
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = History.AddressHistory))

ALTER TABLE dbo.Address
SET (SYSTEM_VERSIONING = ON)

The second example creates a table with a system generated name.

There’s something else you should know

At first I was a bit puzzled. I noticed that the system generated table was consistently smaller than my user created table. It was not only smaller it was twice as small!

I did some further testing on my Surface this weekend and here’s what I found:

— Side note: I use Person.Address a lot in demos, so I decided to create a new table to test with in hopes of not breaking any other demos I do regularly.

The script

use AdventureWorks2012
go

–CREATE THE TABLE
SELECT * INTO dbo.Address FROM Person.Address

–ADD REQUIRED COLUMNS
ALTER TABLE dbo.Address
ADD ValidFrom datetime2(0) GENERATED ALWAYS AS ROW START HIDDEN,
ValidTill datetime2(0) GENERATED ALWAYS AS ROW END HIDDEN,
PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTill)
GO

CREATE SCHEMA History
GO

CREATE TABLE History.AddressHistory
( AddressID int NOT NULL,
AddressLine1 nvarchar(60) NOT NULL,
AddressLine2 nvarchar(60) NULL,
City nvarchar(30) NOT NULL,
StateProvinceID int NOT NULL,
PostalCode nvarchar(15) NOT NULL,
SpatialLocation geography NULL,
rowguid uniqueidentifier NOT NULL,
ModifiedDate datetime NOT NULL,
ValidFrom datetime2(0) NOT NULL,
ValidTill datetime2(0) NOT NULL)
GO

ALTER TABLE dbo.Address
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = History.AddressHistory))
GO

UPDATE dbo.Address SET AddressLine2 = ‘ABC’

GO
— RUN TOP TABLES REPORT
— History table has 3 MB of data

UPDATE dbo.Address SET AddressLine2 = ‘ABC’
–Note that no data actually changes but the update is executed
GO

— RUN TOP TABLES REPORT AGAIN
— History table has 6 MB of data

UPDATE dbo.Address SET AddressLine2 = ‘ABC’
GO 3

— RUN TOP TABLES REPORT AGAIN
— History table has 15 MB of data

— NOW LET’S LOOK AT A SYSTEM GENERATED TABLE

ALTER TABLE dbo.address
SET (SYSTEM_VERSIONING = OFF)
go

ALTER TABLE dbo.Address
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = History.AddressHistoryNEW))
GO

UPDATE dbo.Address SET AddressLine2 = ‘ABC’
GO

— RUN TOP TABLES REPORT AGAIN
— History table has 1.8 MB of data?

UPDATE dbo.Address SET AddressLine2 = ‘ABC’
GO

— RUN TOP TABLES REPORT AGAIN
— History table has 3.6 MB of data?

UPDATE dbo.Address SET AddressLine2 = ‘ABC’
GO 3

As you can see the system generated table is nearly half the size of the user created table. I scripted the tables to compare and they were the same! I then looked at Object Explorer and noted that the system generated table actually had a bit more (clustered PK and some defaults).

Why is the table SQL generated so much smaller?

Towards the end of the weekend I decided to pick this up again and I looked at sys.partitions. Here’s what I found:

That’s right! When SQL generates your table for you it adds a clustered index and also applies page compression.

Summing up – the takeaways

To make things very easy and ensure the table is as optimal as it can be, I recommend letting SQL Server 2016 create your history table. Give it a meaningful name; but, let SQL created it for you.

This will ensure that page compression is enabled on the history table and will create an index that may be useful as well.

I hope you’ve found this post helpful! If you did be sure you check out my other posts:

http://www.sqlservercentral.com/blogs/confessions-of-a-microsoft-addict/

http://www.sqltechblog.com

Also find me on LinkedIn and Twitter!

Understanding the new MAXDOP settings in SQL 2016

Daniel Janik — Wed, 05 Oct 2016 01:45:18 +0000

If you have just begun using SQL Server 2016 or you have been using it for a while now you may not have noticed the new MAXDOP settings.

Here’s what you need to know

SQL Server 2016 introduced a really neat feature which allows you to configure MAXDOP and some other settings at the database. Simply open the database properties dialog in the UI from Object Explorer and navigate to the Options tab.

Once on the Options tab you’ll see a new section that holds configuration settings for MAXDOP, Legacy Cardinality Estimation, Parameter Sniffing, and Query Optimizer Fixes.

The default configuration for Max DOP in the Database Scope is 0. Please don’t mistake this to equal all like you see on the instance configuration in SP_CONFIGURE.

Here’s how it works

The Maximum Degree of Parallelism (MAXDOP) can be defined in one of three ways:

Instance Scoped via sp_configure
Database Scoped via database properties
Query Scoped via query option

Which of these trumps the other? Let’s imagine we have an instance with 32 processors:

The chart above attempts to show the following

A query hint always overrides the database and instance configuration
The database scoped configuration will override the instance configuration only when it is not the default value ( 0 ).
The instance scoped configuration limits MAXDOP across all databases and queries when the a query option and database scope have not been defined.

One more thing to consider is that if you use 3 part naming and execute your queries from a database without scope defined while accessing objects in a database that has a scope defined, you will not be limited.

Think of this like the following

USE TEMPDB

GO

SELECT * FROM DEMODB.DBO.Table1

This query would run under the TempDB context but accesses data in a different database. If DEMODB had defined a MAXDOP scope it would be ignored since the scope of your session is in TempDB and not DEMODB.

Here’s what this looks like in action:

Note that Query A was run from Adventureworks2012 which is configured with a database scoped Max DOP of 1. Query B ran the same query pointed at the same objects but from TempDB which has no database scoped configuration for Max DOP. The results speak for themselves. Query A was limited and B was not.

I hope that this helps you understand the new configuration settings and you’ll be able to take advantage of their awesomeness very soon.

Please also be sure to check out my other blog posts at these links:

Follow me on LinkedIn and Twitter!

http://www.sqlservercentral.com/blogs/confessions-of-a-microsoft-addict/

http://www.sqltechblog.com

http://www.sqldatapros.com

SQL 2016 SSMS memory leak?

Daniel Janik — Tue, 04 Oct 2016 06:07:35 +0000

Has SSMS (SQL Server Management Studio) been crashing on you? Have you been getting Out of Memory messages when attempting to run queries?

You may have noticed that this tends to occur after you’ve opened and closed 40 to 50 query windows. I’ve noticed this when I have had as little as 5 query windows open after having already opened and closed 30 or so other query windows.

Here’s what you end up seeing with the “System.OutOfMemoryException”:

After a simple search on Bing I found KB2874903, which states that SSMS is a 32 bit application and large queries can cause this message.

Taking a deeper look

In the world of current Windows operating systems there are 32 bit and 64 bit applications. If you’re a gamer you may recall all the memory problems Skyrim had on PC. They were all related to the executable being released as 32 bit. The 32 bit address space is limited to 2 GB on a 32 bit machine or up to 3 GB if the /3GB switch is used and the process is large address aware (LAA). On a 64 bit machine a 32 bit process is limited to 2 GB or 4 GB if the process is large address aware. A 64 bit process is limited to 8 TB. Quite a difference, right?

So why would a vendor release an application compiled as 32 bit? Simply put there are many more sales opportunities when you add 32 bit in the mix. So why limit your sales to just 64 bit users? The answer is you wouldn’t and they don’t.

What’s SSMS look like?

SSMS is a 32 bit process. This can be seen by attaching the process to VMMAP. Simply add everything up and you’ll see that the limit is 2GB of VAS (Virtual Address Space). Here’s one question though. Why isn’t SSMS compiled as LAA (Large Address Aware)? This would allow the process to consume the entire 4 GB 32bit address space on a 64 bit machine. While this wouldn’t resolve the issue it would give delay it by 100%.

So you’ve got plenty of memory but SSMS still gives OOM errors?

Yes that’s right. RAM doesn’t really have much to do with this message at all. This message is related to your virtual address space for the process. From the image above you can see that there is only 114 MB free of 2 GB. When a process allocates and deallocates bits in the VAS the area can become fragmented.

The little bits of white in this graph represent the free areas where new bits can go. Memory in Windows is allocated contiguously so if there isn’t a space large enough to fit your data it could result in an OOM error.

After opening and closing several query windows the free space declined and didn’t release.

You can see that the free memory has reduced to 67 MB.

Examining the process with Perfmon

When using PERFMON to examine a possible memory leak you want to monitor the following counters in the Process object:

Handle Count – Shows the number of references to objects like files, windows, and registry keys
Thread Count – Shows the number of threads created
Private Bytes – Shows the total committed bytes for the process
Working Set – Shows the total RAM for the process

The PERFMON graph above shows that when a query window is opened the handles, threads, and private bytes all grow.

When the windows are closed, SSMS does not release the associated resources. This certainly looks like a leak or at least has symptoms of one.

After a short while of these OOM messages SSMS decided to crash.

After the crash, SSMS restarted and you can see that all of the memory has been released.

And now the cycle begins again with the clock being restarted. After opening and closing enough query windows I’ll be back in the same Out of Memory boat.

Let’s hope that the team working on SSMS can look into this issue and resolve it quickly. I’m seeing quite a few people starting to complain about the apparent increase of these events since SSMS 2016 was released.

I’d be interested in hearing about any issues you’ve had with SSMS. Let me know.

*UPDATE 2016-10-07* – Microsoft reached out to me on Twitter saying that this has been fixed in the latest update for SSMS. I have tested SSMS release 16.4.1 (build 13.0.15900.1) and can confirm that the Virtual Bytes and Private Bytes appear to be cleaning up appropriately. Be sure to get the latest build from https://msdn.microsoft.com/library/mt238290.aspx to avoid any memory related crashes that may cause you to lose your work.

*UPDATE 2016-10-14* – I was working on a small query this morning and the issue happened. The query only returned a couple rows. I ended up closing a few windows and was able to run the query. I then ran a slightly larger result for the same query and found that after scrolling down I could no longer scroll up. I checked my Virtual Bytes and sure enough I had nearly 1.85 GB used. Following this SSMS crashed and everything was back to normal. @Microsoft, release 16.4.1 did not resolve all memory leaks in SSMS.

*UPDATE 2016-11-22* – I’ve been using build 13.0.15900.1 for a few weeks now and I can say that it is much better. I’ve only had one Out of Memory (OOM) message in the past month. I’ll be upgrading to the latest today and will keep an eye on it. One of my co-workers has been getting a lot of OOM messages from SSMS and he’s on version 13.0.16000.28 (16.5).