System Configuration Archives - SQLPerformance.com

"0 to 60" : Switching to indirect checkpoints

Aaron Bertrand — Fri, 01 May 2020 09:00:43 +0000

In a recent tip, I described a scenario where a SQL Server 2016 instance seemed to be struggling with checkpoint times. The error log was populated with an alarming number of FlushCache entries like this one:

FlushCache: cleaned up 394031 bufs with 282252 writes in 65544 ms (avoided 21 new dirty bufs) for db 19:0
      average writes per second: 4306.30 writes/sec
      average throughput:        46.96 MB/sec, I/O saturation: 53644, context switches 101117
      last target outstanding:   639360, avgWriteLatency 1

I was a bit perplexed by this issue, since the system was certainly no slouch — plenty of cores, 3TB of memory, and XtremIO storage. And none of these FlushCache messages were ever paired with the telltale 15 second I/O warnings in the error log. Still, if you stack a bunch of high-transaction databases on there, checkpoint processing can get pretty sluggish. Not so much because of the direct I/O, but more reconciliation that has to be done with a massive number of dirty pages (not just from committed transactions) scattered across such a large amount of memory, and potentially waiting on the lazywriter (since there is only one for the whole instance).

I did some quick "freshen-up" reading of some very valuable posts:

I quickly decided that I wanted to track checkpoint durations for a few of these more troublesome databases, before and after changing their target recovery interval from 0 (the old way) to 60 seconds (the new way). Back in January, I borrowed an Extended Events session from friend and fellow Canadian Hannah Vernon:

CREATE EVENT SESSION CheckpointTracking ON SERVER 
ADD EVENT sqlserver.checkpoint_begin
(
  WHERE 
  (
       sqlserver.database_id = 19 -- db4
    OR sqlserver.database_id = 78 -- db2
    ...
  )
)
, ADD EVENT sqlserver.checkpoint_end
(
  WHERE 
  (
       sqlserver.database_id = 19 -- db4
    OR sqlserver.database_id = 78 -- db2
    ...
  )
)
ADD TARGET package0.event_file
(
  SET filename = N'L:\SQL\CP\CheckPointTracking.xel',
      max_file_size = 50, -- MB
      max_rollover_files = 50
)
WITH 
(
  MAX_MEMORY = 4096 KB,
  MAX_DISPATCH_LATENCY = 30 SECONDS, 
  TRACK_CAUSALITY = ON,
  STARTUP_STATE = ON
);
GO

ALTER EVENT SESSION CheckpointTracking ON SERVER 
  STATE = START;

I marked the time that I changed each database, and then analyzed the results from the Extended Events data using a query published in the original tip. The results showed that after changing to indirect checkpoints, each database went from checkpoints averaging 30 seconds to checkpoints averaging less than a tenth of a second (and far fewer checkpoints in most cases, too). There's a lot to unpack from this graphic, but this is the raw data I used to present my argument (click to enlarge):

My evidence

Once I proved my case across these problematic databases, I got the green light to implement this across all of our user databases throughout our environment. In dev first, and then in production, I ran the following via a CMS query to get a gauge for how many databases we were talking about:

DECLARE @sql nvarchar(max) = N'';

SELECT @sql += CASE
  WHEN (ag.role = N'PRIMARY' AND ag.ag_status = N'READ_WRITE') OR ag.role IS NULL THEN N'
    ALTER DATABASE ' + QUOTENAME(d.name) + N' SET TARGET_RECOVERY_TIME = 60 SECONDS;' 
  ELSE N'
    PRINT N''-- fix ' + QUOTENAME(d.name) + N' on Primary.'';' 
  END
FROM sys.databases AS d 
OUTER APPLY
(
  SELECT role = s.role_desc, 
    ag_status = DATABASEPROPERTYEX(c.database_name, N'Updateability')
    FROM sys.dm_hadr_availability_replica_states AS s
    INNER JOIN sys.availability_databases_cluster AS c
       ON s.group_id = c.group_id 
       AND d.name = c.database_name
    WHERE s.is_local = 1
) AS ag
WHERE d.target_recovery_time_in_seconds <> 60
  AND d.database_id > 4 
  AND d.[state] = 0 
  AND d.is_in_standby = 0 
  AND d.is_read_only = 0;

SELECT DatabaseCount = @@ROWCOUNT, Version = @@VERSION, cmd = @sql;

--EXEC sys.sp_executesql @sql;

Some notes about the query:

database_id > 4
I didn't want to touch master at all, and I didn't want to change tempdb yet because we're not on the latest SQL Server 2017 CU (see KB #4497928 for one reason that detail is important). The latter rules out model, too, because changing model would affect tempdb on the next failover / restart. I could have changed msdb, and I may go back to do that at some point, but my focus here was on user databases.
[state] / is_read_only / is_in_standby
We need to make sure the databases we're trying to change are online and not read only (I hit one that was currently set to read only, and will have to come back to that one later).
OUTER APPLY (...)
We want to restrict our actions to databases that are either the primary in an AG or not in an AG at all (and also have to account for distributed AGs, where we can be primary and local but still not be writable). If you happen to run the check on a secondary, you can't fix the issue there, but you should still get a warning about it. Thanks to Erik Darling for helping with this logic, and Taylor Martell for motivating improvements.
If you have instances running older versions like SQL Server 2008 R2 (I found one!), you'll have to tweak this a little bit, since the target_recovery_time_in_seconds column doesn't exist there. I had to use dynamic SQL to get around this in one case, but you could also temporarily move or remove where those instances fall in your CMS hierarchy. You could also not be lazy like me, and run the code in Powershell instead of a CMS query window, where you could easily filter out databases given any number of properties before ever hitting compile-time issues.

In production, there were 102 instances (about half) and 1,590 total databases using the old setting. Everything was on SQL Server 2017, so why was this setting so prevalent? Because they were created before indirect checkpoints became the default in SQL Server 2016. Here is a sample of the results:

Partial results from CMS query.

Then I ran the CMS query again, this time with sys.sp_executesql uncommented. It took about 12 minutes to run that across all 1,590 databases. Within the hour, I was already getting reports of people observing a significant drop in CPU on some of the busier instances.

I still have more to do. For example, I need to test out the potential impact on tempdb, and whether there is any weight in our use case to the horror stories I've heard. And we need to make sure that the 60 second setting is part of our automation and all database creation requests, especially those that are scripted or restored from backups.

The post "0 to 60" : Switching to indirect checkpoints appeared first on SQLPerformance.com.

AMD EPYC 7002 Series Processors and SQL Server

Glenn Berry — Tue, 27 Aug 2019 09:00:57 +0000

On August 7, 2019, AMD finally unveiled their new 7nm EPYC 7002 Series of server processors, formerly code-named ”Rome” at the AMD EPYC Horizon Event in San Francisco. This is the second generation EPYC server processor that uses the same Zen 2 architecture as the AMD Ryzen 3000 Series desktop processors. These new processors are socket compatible with the previous generation AMD EPYC 7001 Series processors, so they will work in existing model servers (with a BIOS update). Despite that, you will need a new model server to be able to use PCIe 4.0 support from the newer processors.

The AMD EPYC 7002 series includes 19 public launch SKUs that have anywhere from 8 to 64 physical cores, plus SMT, for twice the number of logical cores per processor. There are fourteen SKUs that will work in both one-socket and two-socket servers. There are also five less expensive processor SKUs (which have a “P” suffix) that only work in one-socket servers. This processor family has enough compute horsepower, memory bandwidth and capacity, and I/O bandwidth to support large server workloads on a single-socket server.

Single-Socket Database Servers

AMD is definitely pushing the idea that a one-socket AMD Ryzen 7002 based server is a superior replacement for nearly any two-socket Intel-based server. A one-socket AMD EPYC 7002 Series server can have up to 64 physical cores, 4TB of DDR4-3200 RAM, and 128 PCIe 4.0 I/O lanes. It will also use less power than a two-socket Intel server, with a lower hardware cost, and potentially lower licensing costs (for things like VMware). Figure 1 shows an example from AMD comparing a two-socket server with two Intel Xeon Gold 6262V processors to a one-socket server with one AMD EPYC 7702P processor.

Figure 1: AMD Example of Two-socket vs. One-socket Server

This idea is not unique to AMD. Dell Vice President/Server CTO Robert W. Hormuth recently argued “Why Single-Socket Servers Could Rule the Future”, which talks about Amdahl’s Law, NUMA overhead, and power usage, among other things. As processor core counts continue to increase, single-socket servers will make more sense for many workloads.

There is already very broad industry support behind these new processors, with Dell EMC, HPE, Gigabyte, Lenovo, SuperMicro, and TYAN announcing new server models for these processors.

The initial reviews and benchmarks for these processors have been very impressive:

TPC-H Benchmark Results with SQL Server 2017

HPE has already submitted a new official TPC-H result on SQL Server 2017 for a single-socket, HPE Proliant DL325 Gen10 1U server that was 743,750 QphH@1000GB. This system has one AMD EPYC 7502P 32-core processor and 512GB of RAM. If we divide 743,750 by 32 cores, we get 23,242 per core. The price per QphH for this system is 0.34 USD.

HPE has a previous official TPC-H result on SQL Server 2017 for a two-socket, HPE Proliant DL380 Gen 10 server that was 1,009,065 QphH@1000GB. That is a higher score, but this system used two Intel Xeon Platinum 8180 28-core processors (that had a total of 56C/112T) and 512GB of RAM. If we divide 1,009,065 by 56 cores, we get 18,019 per core. The price per QphH for this system is 0.47 USD.

TPC-E Benchmark Results with SQL Server 2017

Lenovo has also submitted a new official TPC-E result for a single-socket, Lenovo ThinkSystem SR655 2U server that has a TPC-E Throughput score of 6,716.88. This system has one AMD EPYC 7742 64-core processor and 1TB of RAM. This is the first time any server vendor has even bothered to submit a TPC-E result for an AMD-based system since 2013.

The TPC-E Throughput score is a good measure of the overall CPU capacity of the system for an OLTP workload. If you divide that total score by the number of physical cores in the system, you will get a “score per core” which is a good measure of the single-threaded performance of the processor used in the system. In this case, 6,716.88/64 equals 104.95 per core.

For comparison’s sake, Lenovo has an earlier submission of an official TPC-E result for a two-socket, Lenovo ThinkSystem SR650 2U server that has a TPC-E Throughput score of 7012.53. This system has two, Intel Xeon Platinum 8280 28-core processors and 1.5TB of RAM. If we divide 7012.53 by 56, the score per core ends up being 125.22, which is about 19% higher than the AMD result. Single-threaded CPU performance is the one relative weak point of the AMD EPYC 7742 processor.

I would argue that the advantages of the AMD EPYC 7002 Series include:

A more modern modular architecture
Higher memory density/capacity
Higher memory speed and bandwidth
PCIe 4.0 support
More total PCIe lanes and bandwidth
Lower power usage
Much lower pricing, especially compared to “M” and “L” suffix Intel processors
Better security, with Secure Memory Encryption and Secure Encrypted Virtualization

Preferred AMD EPYC Processors

Among the 19 public processor SKUs, there are preferred choices for SQL Server at each available core count. AMD does a lot less product segmentation than Intel. They don’t artificially limit the number of PCIe 4.0 lanes on the lower cost SKUs, and they don’t charge more for extra memory capacity like Intel does. Intel charges $3K extra for the “M” suffix SKUs, and $7K extra for the “L” suffix SKUs

All of these AMD EPYC processors support 4TB of RAM in a single-socket server. The main specifications for my preferred AMD processors at each core count are shown in Figure 2.

Figure 2: Preferred AMD EPYC 7002 Processor Specifications

The corresponding preferred 2^nd Generation Intel Xeon Scalable processors for SQL Server at each core count are shown in Figure 3.

Figure 3: Preferred 2^nd Generation Intel Xeon Scalable Processor Specifications

Conclusion

AMD has released a game-changing processor family for the server market. AMD EPYC 7002 Series processors have more memory capacity (on standard SKUs), more memory bandwidth, and more general-purpose integer and floating-point performance per socket than current Intel Cascade Lake-SP processors. They also have PCIe 4.0 support and a higher number of PCIe lanes than the latest Intel server processors. AMD also gives you more performance per dollar and per watt with the EPYC 7002 Series.

Because of this, I think they will be a great choice for virtualization hosts and for Storage Spaces Direct nodes. I also think they will be a great choice for SQL Server DW/Reporting workloads because of the memory and I/O capacity/bandwidth advantages compared to Intel.

The single-threaded CPU performance is not quite as good as Intel’s best current offerings, but it is closer than it has ever been before. If you are upgrading from an older Intel processor family (especially if the exact processor was a lower-frequency SKU), you will be able to get better single-threaded CPU performance with a new EPYC 7002 Series processor in most cases.

AMD’s biggest challenge is going to be convincing customers to actually buy AMD-platform servers, since Intel has been so dominant in the server market since the Nehalem architecture era back in 2008. AMD is not going to stand still and wait for Intel to catch up. They are on track to probably release the next Zen 3 architecture in mid-late 2020.

Figure 4: AMD EPYC Roadmap

The post AMD EPYC 7002 Series Processors and SQL Server appeared first on SQLPerformance.com.

tempdb Enhancements in SQL Server 2019

Erin Stellato — Fri, 02 Aug 2019 09:00:50 +0000

I have been making the same recommendations about tempdb since I started working with SQL Server over 15 years ago, when I was working with customers running version 2000. The gist of it: create multiple data files that are sized the same, with the same auto-growth settings, enable trace flag 1118 (and maybe 1117), and reduce your tempdb use. From the customer side, this has been the limit of what can be done*, until SQL Server 2019.

*There are a few additional coding recommendations that Pam Lahoud discusses in her very informative post, TEMPDB – Files and Trace Flags and Updates, Oh My!

What I find interesting is that, after all this time, tempdb is still a problem. The SQL Server team has made many changes over the years to try and mitigate issues, but the abuse continues. The latest adaptation by the SQL Server team is moving the system tables (metadata) for tempdb to In-Memory OLTP (aka memory-optimized). Some information is available in the SQL Server 2019 release notes, and there was a demo from Bob Ward and Conor Cunningham during the first day of the PASS Summit keynote. Pam Lahoud also did a quick demo in her PASS Summit general session. Now that 2019 CTP 3.2 is out, I thought it might be time to do a bit of testing myself.

Setup

I have SQL Server 2019 CTP 3.2 installed on my virtual machine, which has 8GB of memory (max server memory set to 6 GB) and 4 vCPUs. I created four (4) tempdb data files, each sized to 1GB.

I restored a copy of WideWorldImporters and then created three stored procedures (definitions below). Each stored procedure accepts a date input, and pushes all rows from Sales.Order and Sales.OrderLines for that date into the temporary object. In Sales.usp_OrderInfoTV the object is a table variable, in Sales.usp_OrderInfoTT the object is a temporary table defined via SELECT … INTO with a nonclustered added afterwards, and in Sales.usp_OrderInfoTTALT the object is a pre-defined temporary table which is then altered to have an additional column. After the data is added to the temporary object, there is a SELECT statement against the object that joins to the Sales.Customers table.

  /*
  	Create the stored procedures
  */
  USE [WideWorldImporters];
  GO

  DROP PROCEDURE IF EXISTS Sales.usp_OrderInfoTV
  GO

  CREATE PROCEDURE Sales.usp_OrderInfoTV @OrderDate DATE
  AS
  BEGIN
  	DECLARE @OrdersInfo TABLE (
  		OrderID INT,
  		OrderLineID INT,
  		CustomerID INT,
  		StockItemID INT,
  		Quantity INT,
  		UnitPrice DECIMAL(18,2),
  		OrderDate DATE);

  	INSERT INTO @OrdersInfo (
  		OrderID,
  		OrderLineID,
  		CustomerID,
  		StockItemID,
  		Quantity,
  		UnitPrice,
  		OrderDate)
  	SELECT 
  		o.OrderID,
  		ol.OrderLineID,
  		o.CustomerID,
  		ol.StockItemID,
  		ol.Quantity,
  		ol.UnitPrice,
  		OrderDate
  	FROM Sales.Orders o
  	INNER JOIN Sales.OrderLines ol
  		ON o.OrderID = ol.OrderID
  	WHERE o.OrderDate = @OrderDate;

  	SELECT o.OrderID,
  		c.CustomerName,
  		SUM (o.Quantity),
  		SUM (o.UnitPrice)
  	FROM @OrdersInfo o
  	JOIN Sales.Customers c
  		ON o.CustomerID = c.CustomerID
  	GROUP BY o.OrderID, c.CustomerName;
  END
  GO

  DROP PROCEDURE IF EXISTS  Sales.usp_OrderInfoTT
  GO

  CREATE PROCEDURE Sales.usp_OrderInfoTT @OrderDate DATE
  AS
  BEGIN
  	SELECT 
  		o.OrderID,
  		ol.OrderLineID,
  		o.CustomerID,
  		ol.StockItemID,
  		ol.Quantity,
  		ol.UnitPrice,
  		OrderDate
  	INTO #temporderinfo 
  	FROM Sales.Orders o
  	INNER JOIN Sales.OrderLines ol
  		ON o.OrderID = ol.OrderID
  	WHERE o.OrderDate = @OrderDate;
  	
  	SELECT o.OrderID,
  		c.CustomerName,
  		SUM (o.Quantity),
  		SUM (o.UnitPrice)
  	FROM #temporderinfo o
  	JOIN Sales.Customers c
  		ON o.CustomerID = c.CustomerID
  	GROUP BY o.OrderID, c.CustomerName
  END
  GO

  DROP PROCEDURE IF EXISTS  Sales.usp_OrderInfoTTALT
  GO

  CREATE PROCEDURE Sales.usp_OrderInfoTTALT @OrderDate DATE
  AS
  BEGIN
  	CREATE TABLE #temporderinfo (
  		OrderID INT,
  		OrderLineID INT,
  		CustomerID INT,
  		StockItemID INT,
  		Quantity INT,
  		UnitPrice DECIMAL(18,2));
  		
  	INSERT INTO #temporderinfo (
  		OrderID,
  		OrderLineID,
  		CustomerID,
  		StockItemID,
  		Quantity,
  		UnitPrice)
  	SELECT 
  		o.OrderID,
  		ol.OrderLineID,
  		o.CustomerID,
  		ol.StockItemID,
  		ol.Quantity,
  		ol.UnitPrice
  	FROM Sales.Orders o
  	INNER JOIN Sales.OrderLines ol
  		ON o.OrderID = ol.OrderID
  	WHERE o.OrderDate = @OrderDate;
  	
  	SELECT o.OrderID,
  		c.CustomerName,
  		SUM (o.Quantity),
  		SUM (o.UnitPrice)
  	FROM #temporderinfo o
  	JOIN Sales.Customers c
  		ON o.CustomerID  c.CustomerID
  	GROUP BY o.OrderID, c.CustomerName
  END
  GO

  /*
  	Create tables to hold testing data
  */

  USE [WideWorldImporters];
  GO

  CREATE TABLE [dbo].[PerfTesting_Tests] (
  	[TestID] INT IDENTITY(1,1), 
  	[TestName] VARCHAR (200),
  	[TestStartTime] DATETIME2,
  	[TestEndTime] DATETIME2
  ) ON [PRIMARY];
  GO

  CREATE TABLE [dbo].[PerfTesting_WaitStats]   (
    [TestID] [int] NOT NULL,
    [CaptureDate] [datetime] NOT NULL DEFAULT (sysdatetime()),
    [WaitType] [nvarchar](60) NOT NULL,
    [Wait_S] [decimal](16, 2) NULL,
    [Resource_S] [decimal](16, 2) NULL,
    [Signal_S] [decimal](16, 2) NULL,
    [WaitCount] [bigint] NULL,
    [Percentage] [decimal](5, 2) NULL,
    [AvgWait_S] [decimal](16, 4) NULL,
    [AvgRes_S] [decimal](16, 4) NULL,
    [AvgSig_S] [decimal](16, 4) NULL
  ) ON [PRIMARY];
  GO

  /*
  	Enable Query Store
  	(testing settings, not exactly what 
  	I would recommend for production)
  */

  USE [master];
  GO
  
  ALTER DATABASE [WideWorldImporters] SET QUERY_STORE = ON;
  GO

  ALTER DATABASE [WideWorldImporters] SET QUERY_STORE (
  	OPERATION_MODE = READ_WRITE, 
  	CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 30), 
  	DATA_FLUSH_INTERVAL_SECONDS = 600, 
  	INTERVAL_LENGTH_MINUTES = 10, 
  	MAX_STORAGE_SIZE_MB = 1024, 
  	QUERY_CAPTURE_MODE = AUTO, 
  	SIZE_BASED_CLEANUP_MODE = AUTO);
  GO

Testing

The default behavior for SQL Server 2019 is that the tempdb metadata is not memory-optimized, and we can confirm this by checking sys.configurations:

  SELECT *
  FROM sys.configurations
  WHERE configuration_id = 1589;

For all three stored procedures we will use sqlcmd to generate 20 concurrent threads running one of two different .sql files. The first .sql file, which will be used by 19 threads, will execute the procedure in a loop 1000 times. The second .sql file, which will just have one (1) thread, will execute the procedure in a loop 3000 times. The file also include TSQL to capture two metrics of interest: total duration and wait statistics. We will use Query Store to capture average duration for the procedure.

  /*
  	Example of first .sql file
    which calls the SP 1000 times
  */

  SET NOCOUNT ON;
  GO

  USE [WideWorldImporters];
  GO

  DECLARE @StartDate DATE;
  DECLARE @MaxDate DATE;
  DECLARE @Date DATE;
  DECLARE @Counter INT = 1;

  SELECT @StartDATE = MIN(OrderDate) FROM [WideWorldImporters].[Sales].[Orders];
  SELECT @MaxDATE = MAX(OrderDate) FROM [WideWorldImporters].[Sales].[Orders];

  SET @Date = @StartDate;

  WHILE @Counter <= 1000
  BEGIN
  	EXEC [Sales].[usp_OrderInfoTT] @Date;

  	IF @Date <= @MaxDate
  	BEGIN
  		SET @Date = DATEADD(DAY, 1, @Date);
  	END
  	ELSE
  	BEGIN
  		SET @Date = @StartDate;
  	END
  	
  	SET @Counter = @Counter + 1;
  END
  GO

  /*
  	Example of second .sql file
    which calls the SP 3000 times
    and captures total duration and
    wait statisics
  */

  SET NOCOUNT ON;
  GO

  USE [WideWorldImporters];
  GO

  DECLARE @StartDate DATE;
  DECLARE @MaxDate DATE;
  DECLARE @DATE DATE;
  DECLARE @Counter INT = 1;
  DECLARE @TestID INT;
  DECLARE @TestName VARCHAR(200) = 'Execution of usp_OrderInfoTT - Disk Based System Tables';

  INSERT INTO [WideWorldImporters].[dbo].[PerfTesting_Tests] ([TestName]) VALUES (@TestName);

  SELECT @TestID = MAX(TestID) FROM [WideWorldImporters].[dbo].[PerfTesting_Tests];

  SELECT @StartDATE = MIN(OrderDate) FROM [WideWorldImporters].[Sales].[Orders];

  SELECT @MaxDATE = MAX(OrderDate) FROM [WideWorldImporters].[Sales].[Orders];

  SET @Date = @StartDate;

  IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
      WHERE [name] = N'##SQLskillsStats1')
      DROP TABLE [##SQLskillsStats1];

  IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
      WHERE [name] = N'##SQLskillsStats2')
      DROP TABLE [##SQLskillsStats2];

  SELECT [wait_type], [waiting_tasks_count], [wait_time_ms],
         [max_wait_time_ms], [signal_wait_time_ms]
  INTO ##SQLskillsStats1
  FROM sys.dm_os_wait_stats;

  /* 
  	set start time 
  */

  UPDATE [WideWorldImporters].[dbo].[PerfTesting_Tests] 
  SET [TestStartTime] = SYSDATETIME()
  WHERE [TestID] = @TestID;

  WHILE @Counter <= 3000
  BEGIN
  	EXEC [Sales].[usp_OrderInfoTT] @Date;

  	IF @Date <= @MaxDate
  	BEGIN
  		SET @Date = DATEADD(DAY, 1, @Date);
  	END
  	ELSE
  	BEGIN
  		SET @Date = @StartDate;
  	END
  	
  	SET @Counter = @Counter + 1
  END

  /* 
  	set end time 
  */

  UPDATE [WideWorldImporters].[dbo].[PerfTesting_Tests] 
  SET [TestEndTime] = SYSDATETIME() 
  WHERE [TestID] = @TestID;

  SELECT [wait_type], [waiting_tasks_count], [wait_time_ms],
         [max_wait_time_ms], [signal_wait_time_ms]
  INTO ##SQLskillsStats2
  FROM sys.dm_os_wait_stats;

  WITH [DiffWaits] AS
  (SELECT
    -- Waits that weren't in the first snapshot
          [ts2].[wait_type],
          [ts2].[wait_time_ms],
          [ts2].[signal_wait_time_ms],
          [ts2].[waiting_tasks_count]
      FROM [##SQLskillsStats2] AS [ts2]
      LEFT OUTER JOIN [##SQLskillsStats1] AS [ts1]
          ON [ts2].[wait_type] = [ts1].[wait_type]
      WHERE [ts1].[wait_type] IS NULL
      AND [ts2].[wait_time_ms] > 0
  UNION
  SELECT
  -- Diff of waits in both snapshots
          [ts2].[wait_type],
          [ts2].[wait_time_ms] - [ts1].[wait_time_ms] AS [wait_time_ms],
          [ts2].[signal_wait_time_ms] - [ts1].[signal_wait_time_ms] AS [signal_wait_time_ms],
          [ts2].[waiting_tasks_count] - [ts1].[waiting_tasks_count] AS [waiting_tasks_count]
      FROM [##SQLskillsStats2] AS [ts2]
      LEFT OUTER JOIN [##SQLskillsStats1] AS [ts1]
          ON [ts2].[wait_type] = [ts1].[wait_type]
      WHERE [ts1].[wait_type] IS NOT NULL
      AND [ts2].[waiting_tasks_count] - [ts1].[waiting_tasks_count] > 0
      AND [ts2].[wait_time_ms] - [ts1].[wait_time_ms] > 0),
  [Waits] AS
      (SELECT
          [wait_type],
          [wait_time_ms] / 1000.0 AS [WaitS],
          ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
          [signal_wait_time_ms] / 1000.0 AS [SignalS],
          [waiting_tasks_count] AS [WaitCount],
          100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],
          ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum]
      FROM [DiffWaits]
      WHERE [wait_type] NOT IN (
          -- These wait types are almost 100% never a problem and so they are
          -- filtered out to avoid them skewing the results.
          N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR', N'BROKER_TASK_STOP', 
          N'BROKER_TO_FLUSH', N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE', 
          N'CHKPT', N'CLR_AUTO_EVENT', N'CLR_MANUAL_EVENT',
          N'CLR_SEMAPHORE', N'CXCONSUMER', N'DBMIRROR_DBM_EVENT', 
          N'DBMIRROR_EVENTS_QUEUE', N'DBMIRROR_WORKER_QUEUE', N'DBMIRRORING_CMD', 
          N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE', N'EXECSYNC', 
          N'FSAGENT', N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX', 
          N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION', N'HADR_LOGCAPTURE_WAIT',
          N'HADR_NOTIFICATION_DEQUEUE', N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE', 
          N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP', N'LOGMGR_QUEUE', 
          N'MEMORY_ALLOCATION_EXT', N'ONDEMAND_TASK_QUEUE',  N'PARALLEL_REDO_DRAIN_WORKER', 
          N'PARALLEL_REDO_LOG_CACHE', N'PARALLEL_REDO_TRAN_LIST', N'PARALLEL_REDO_WORKER_SYNC', 
          N'PARALLEL_REDO_WORKER_WAIT_WORK', N'PREEMPTIVE_XE_GETTARGETSTATE', 
          N'PWAIT_ALL_COMPONENTS_INITIALIZED', N'PWAIT_DIRECTLOGCONSUMER_GETNEXT', 
          N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP', N'QDS_ASYNC_QUEUE', 
          N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
          N'QDS_SHUTDOWN_QUEUE', N'REDO_THREAD_PENDING_WORK', N'REQUEST_FOR_DEADLOCK_SEARCH', 
          N'RESOURCE_QUEUE', N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH', 
          N'SLEEP_DBSTARTUP', N'SLEEP_DCOMSTARTUP', N'SLEEP_MASTERDBREADY',
          N'SLEEP_MASTERMDREADY', N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP', 
          N'SLEEP_SYSTEMTASK', N'SLEEP_TASK', N'SLEEP_TEMPDBSTARTUP', 
          N'SNI_HTTP_ACCEPT', N'SOS_WORK_DISPATCHER', N'SP_SERVER_DIAGNOSTICS_SLEEP',
          N'SQLTRACE_BUFFER_FLUSH', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', 
          N'SQLTRACE_WAIT_ENTRIES', N'WAIT_FOR_RESULTS', N'WAITFOR', 
          N'WAITFOR_TASKSHUTDOWN', N'WAIT_XTP_RECOVERY', N'WAIT_XTP_HOST_WAIT', 
          N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', N'WAIT_XTP_CKPT_CLOSE',
          N'XE_DISPATCHER_JOIN', N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT' 
      )
    )
  INSERT INTO [WideWorldImporters].[dbo].[PerfTesting_WaitStats] (
  	[TestID],
  	[WaitType] ,
  	[Wait_S] ,
  	[Resource_S] ,
  	[Signal_S] ,
  	[WaitCount] ,
  	[Percentage] ,
  	[AvgWait_S] ,
  	[AvgRes_S] ,
  	[AvgSig_S]
  )
  SELECT
  	@TestID,
      [W1].[wait_type] AS [WaitType],
      CAST ([W1].[WaitS] AS DECIMAL (16, 2)) AS [Wait_S],
      CAST ([W1].[ResourceS] AS DECIMAL (16, 2)) AS [Resource_S],
      CAST ([W1].[SignalS] AS DECIMAL (16, 2)) AS [Signal_S],
      [W1].[WaitCount] AS [WaitCount],
      CAST ([W1].[Percentage] AS DECIMAL (5, 2)) AS [Percentage],
      CAST (([W1].[WaitS] / [W1].[WaitCount]) AS DECIMAL (16, 4)) AS [AvgWait_S],
      CAST (([W1].[ResourceS] / [W1].[WaitCount]) AS DECIMAL (16, 4)) AS [AvgRes_S],
      CAST (([W1].[SignalS] / [W1].[WaitCount]) AS DECIMAL (16, 4)) AS [AvgSig_S]
  FROM [Waits] AS [W1]
  INNER JOIN [Waits] AS [W2]
      ON [W2].[RowNum] <= [W1].[RowNum]
  GROUP BY [W1].[RowNum], [W1].[wait_type], [W1].[WaitS],
      [W1].[ResourceS], [W1].[SignalS], [W1].[WaitCount], [W1].[Percentage]
  HAVING SUM ([W2].[Percentage]) - [W1].[Percentage] < 95; -- percentage threshold
  GO

  -- Cleanup
  IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
      WHERE [name] = N'##SQLskillsStats1')
      DROP TABLE [##SQLskillsStats1];
    
  IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
      WHERE [name] = N'##SQLskillsStats2')
      DROP TABLE [##SQLskillsStats2];
  GO

Example of command line file:

Results

After executing the command line files that generate 20 threads for each stored procedure, checking the total duration for the 12,000 executions of each procedure shows the following:

  SELECT *, DATEDIFF(SECOND, TestStartTime, TestEndTime) AS [TotalDuration]
  FROM [dbo].[PerfTesting_Tests]
  ORDER BY [TestID];

The stored procedures with the temporary tables (usp_OrderInfoTT and usp_OrderInfoTTC) took longer to complete. If we look at individual query performance:

  SELECT
  	[qsq].[query_id], 
  	[qsp].[plan_id],
  	OBJECT_NAME([qsq].[object_id]) AS [ObjectName],
  	[rs].[count_executions],
  	[rs].[last_execution_time],
  	[rs].[avg_duration],
  	[rs].[avg_logical_io_reads],
  	[qst].[query_sql_text]
  FROM [sys].[query_store_query] [qsq] 
  JOIN [sys].[query_store_query_text] [qst]
  	ON [qsq].[query_text_id] = [qst].[query_text_id]
  JOIN [sys].[query_store_plan] [qsp] 
  	ON [qsq].[query_id] = [qsp].[query_id]
  JOIN [sys].[query_store_runtime_stats] [rs] 
  	ON [qsp].[plan_id] = [rs].[plan_id]
  WHERE ([qsq].[object_id] = OBJECT_ID('Sales.usp_OrderInfoTT'))
  OR ([qsq].[object_id] = OBJECT_ID('Sales.usp_OrderInfoTV'))
  OR ([qsq].[object_id] = OBJECT_ID('Sales.usp_OrderInfoTTALT'))
  ORDER BY [qsq].[query_id], [rs].[last_execution_time];

We can see that the SELECT … INTO for usp_OrderInfoTT took about 28ms on average (duration in Query Store is stored in microseconds), and only took 9ms when the temporary table was pre-created. For the table variable, the INSERT took just over 22ms on average. Interestingly, the SELECT query took just over 1ms for the temporary tables, and approximately 2.7ms for the table variable.

A check of wait statistics data finds a familiar wait_type, PAGELATCH*:

  SELECT * 
  FROM [dbo].[PerfTesting_WaitStats]
  ORDER BY [TestID], [Percentage] DESC;

Notice that we only see PAGELATCH* waits for tests 1 and 2, which were the procedures with the temporary tables. For usp_OrderInfoTV, which used a table variable, we only see SOS_SCHEDULER_YIELD waits. Please note: This does not in any way imply that you should use table variables instead of temporary tables, nor does it imply that you will not have PAGELATCH waits with table variables. This is a contrived scenario; I highly recommend you test with YOUR code to see what wait_types appear.

Now we’ll change the instance to use memory-optimized tables for the tempdb metadata. There are two ways this can be done, via the ALTER SERVER CONFIGURATION command, or by using sp_configure. As this setting is an advanced option, if you use sp_configure you’ll need to enable advanced options first.

ALTER SERVER CONFIGURATION SET MEMORY_OPTIMIZED TEMPDB_METADATA = ON;
GO

After this change it’s necessary to restart to the instance. (NOTE: you can change this back to NOT use memory-optimized tables, you just have to restart the instance again.) After the restart, if we check sys.configurations again we can see the metadata tables are memory-optimized:

After executing the command line files again, the total duration for the 21,000 executions of each procedure shows the following (note that results are ordered by stored procedure for easier comparison):

There was definitely an improvement in performance for both usp_OrderInfoTT and usp_OrderInfoTTC , and a slight increase in performance for usp_OrderInfoTV. Let’s check query durations:

For all queries, query duration is nearly the same, except for the increase in INSERT duration when the table is pre-created, which is completely unexpected. We do see an interesting change in wait statistics:

For usp_OrderInfoTT, a SELECT … INTO is executed to create the temporary table. The waits change from being PAGELATCH_EX and PAGELATCH_SH to only PAGELATCH_EX and SOS_SCHEDULER_YIELD. We no longer see the PAGELATCH_SH waits.

For usp_OrderInfoTTC, which creates the temporary table and then inserts, the PAGELATCH_EX and PAGELATCH_SH waits no longer appear, and we only see SOS_SCHEDULER_YIELD waits.

Finally, for OrderInfoTV, the waits are consistent – only SOS_SCHEDULER_YIELD, with almost the same total wait time.

Summary

Based on this testing, we do see an improvement in all cases, significantly for the stored procedures with temporary tables. There is a slight change for the table variable procedure. It’s extremely important to remember that this is one scenario, with a small load test. I was very interested to try these three very simple scenarios, to try and understand what might benefit the most from making the tempdb metadata memory-optimized. This workload was small and ran for a very limited time – in fact I had more varied results with more threads, which is worth exploring in another post. The biggest takeway is that, as with all new features and functionality, testing is important. For this feature, you want to have a baseline of current performance against which to compare metrics such as Batch Requests/Sec and wait statistics after making the metadata memory-optimized.

Additional Considerations

Using In-Memory OLTP requires a filegroup of the MEMORY OPTIMIZED DATA type. However, after enabling MEMORY_OPTIMIZED TEMPDB_METADATA, no additional filegroup is created for tempdb. In addition, it’s not known if the memory-optimized tables are durable (SCHEMA_AND_DATA) or not (SCHEMA_ONLY). Typically this can be determined via sys.tables (durability_desc), but nothing returns for the involved system tables when querying this in tempdb, even when using the Dedicated Administrator Connection. You do have the ability to view nonclustered indexes for the memory-optimized tables. You can use the following query to see which tables are memory-optimized in tempdb:

  SELECT *
  FROM tempdb.sys.dm_db_xtp_object_stats x
  JOIN tempdb.sys.objects o
  	ON x.object_id = o.object_id
  JOIN tempdb.sys.schemas s
  	ON o.schema_id = s.schema_id;

Then, for any of the tables, run sp_helpindex, for example:

EXEC sys.sp_helpindex N'sys.sysobjvalues';

Note that if it’s a hash index (which requires estimating the BUCKET_COUNT as part of creation), the description would include “nonclustered hash.”

The post tempdb Enhancements in SQL Server 2019 appeared first on SQLPerformance.com.

Is Intel Doomed in the Server CPU Space?

Glenn Berry — Tue, 16 Jul 2019 09:00:09 +0000

If you have been paying attention to the technology press over the past 12-18 months, you may have noticed a rather large number of negative stories about Intel’s processor business. A close monitoring of the hardware enthusiast community, including many of the most respected hardware analysts and reviewers paints an even more dire picture about Intel in the server processor space.

Despite all of this, Intel is not going to lose their entire server processor business any time soon. However, I am firmly convinced that Intel will lose significant market share during the next 12-18 months after the release of the upcoming 7nm AMD EPYC “Rome” server processors. By significant market share, I am talking in the 10-15% range during that time period. The previous AMD EPYC “Naples” processors have “primed the pump” in the server space, and the major server vendors are now much more receptive to AMD.

For many years, I explicitly advised people not to run their SQL Server workloads on AMD hardware because of the much lower single-threaded CPU performance and consequently higher SQL Server core license costs. Now, I am advising people to strongly consider AMD for SQL Server workloads as the AMD EPYC “Rome” processors are released in Q3 of 2019. So, what has changed my mind?

The Death of Tick-Tock

From 2007 until 2016, Intel was able to successfully execute their Tick-Tock release strategy, where they would introduce a new processor microarchitecture roughly every two years (a Tock release). One year after a Tock release Intel would take that same microarchitecture (with some minor improvements), and use a shrink of the manufacturing process to create a Tick release.

This created a predictable release cadence, and also delivered significant performance gains and other improvements with each release, especially Tock releases. This made it easier for database professionals to make the case for a hardware upgrade, and made the typical upgrade more worthwhile.

The Tick-Tock release cycle basically fell apart by about 2015, as Intel was unable to move from a 14nm manufacturing process to a 10nm manufacturing process. Intel has been stuck at 14nm in the server space since the Broadwell release in Q4 of 2016. Intel officially moved on to what they call “Process- Architecture-Optimization (PAO)” in early 2016.

This has led to a very noticeable reduction in generational performance increases since Broadwell-EP, as shown in Figure 1. These numbers are estimated TPC-E scores for a two-socket server with two, eight-core processors, using the fastest eight-core processor from each generation.

Figure 1: Generational Intel Xeon Performance Increases

Lack of Competition in the Server Space

Intel server processors have historically delivered significantly better single-threaded CPU performance and lower power consumption than competing AMD processors since the Intel Nehalem microarchitecture in 2008. This situation was so bad that Microsoft offered a 25% discount on the cost of SQL Server processor core licenses for SQL Server 2012 and SQL Server 2014, if you ran on qualifying AMD Opteron processors with six or more cores.

Even with this 25% license discount, it was not really cost-effective to use AMD Opteron processors for SQL Server usage, because of their extremely poor single-threaded performance. You could easily get more total CPU capacity, better single-threaded CPU performance, and much lower SQL Server licensing costs with an appropriate, modern Intel Xeon E5 or E7 processor during that time frame.

Since Intel had no viable competition from a performance perspective, they had little incentive to continue to innovate at the same pace. Intel became complacent over the past ten years, and ended up opening up a large opportunity for AMD. AMD has capitalized on this with their Zen architecture, and new Zen 2 architecture, using a modular, 7nm manufacturing process from Taiwan Semiconductor Manufacturing Company (TSMC).

Intel Processor Security Vulnerabilities

Adding to Intel’s woes are a series of processor vulnerabilities that have been discovered and publicized over the past 18 months. These include Spectre, Meltdown, Foreshadow and their variants, along with newer exploits such as Zombieload. Generally speaking, modern Intel processors are more vulnerable to these types of attacks than modern AMD processors are.

Older Intel processors are more vulnerable to these exploits, and they suffer more of a performance decrease from existing software and firmware-level fixes. The latest Intel Cascade Lake-SP processors do have hardware-level mitigations for some of the Spectre and Meltdown exploits, which reduces the performance impact compared to previous firmware or software-level mitigation measures.

I wrote a number of blog posts about this back in January 2018, including these:

Microsoft’s current SQL Server specific guidance about this subject is here.

AMD EPYC 7002 Series “Rome” Highlights

The 7nm AMD EPYC 7002 “Rome” processors will have between 8 and 64 physical cores, plus Simultaneous Multi-Threading (SMT), which is the AMD equivalent of Intel Hyper-Threading. They will also have up to 256MB of L3 cache per processor.

AMD claims a 15% Instructions Per Clock (IPC) increase between the desktop Zen+ and Zen 2 generations, and we are likely to see a similar increase between the previous AMD EPYC 7001 “Naples” and the AMD EPYC 7002 series processors.

So far, we don’t know the official base and turbo clock speeds, but there was a recent leak of partial specifications and pricing by a European retailer that listed max boost clock speeds of up to 3.4 GHz. We won’t know the actual single-threaded performance of these processors until they have been released and benchmarked by neutral third-party testers. I am optimistic that they will have higher single-threaded CPU performance than Intel Cascade Lake-SP processors.

These Rome processors will have eight memory channels that will support DDR4-3200 memory, with up to 4TB of RAM per socket. The processor will also support 128 PCIe 4.0 lanes (which have double the bandwidth of PCIe 3.0 lanes). This much memory and I/O bandwidth will make this processor an excellent choice for DW type workloads.

Hardware vendors are quite enthusiastic about Rome, with Dell planning on tripling the number of AMD-based server models it offers by the end of 2019. If the leaked pricing information is accurate, AMD Rome processors will be significantly less expensive than Intel Cascade Lake-SP processors.

Figure 2: AMD EPYC Rome Processor

How is This Relevant for SQL Server?

You might be asking why you should care about all of this as a SQL Server Database professional? There are many reasons! These include your overall server CPU capacity, your single-threaded CPU performance, your memory density and capacity, your total I/O capacity, and your SQL Server 2017/2019 license costs.

I think there are a large number of existing SQL Server instances out there running on older versions of SQL Server, on older versions of Windows Server, perhaps on older versions of a hypervisor, running mainly on older generations of Intel Xeon processors. Many organizations have been keeping their legacy environments running for a number of years, waiting for a worthwhile set of reasons to finally do a complete data platform refresh. For many of these organizations, the second half of 2019 into the first half of 2020 will be a window where it will make sense to finally upgrade.

Once you have made the decision to upgrade, you should think about whether you want to run your SQL Server instances on an AMD platform or an Intel platform. Because of the advantages of the AMD Zen 2 architecture, choosing an AMD platform for your new server(s) may be the best choice, from multiple perspectives. These include probable better single-threaded CPU performance, better multi-threaded CPU performance, higher memory density and capacity, higher memory bandwidth, higher I/O bandwidth, better hardware-level security, and lower processor pricing.

The post Is Intel Doomed in the Server CPU Space? appeared first on SQLPerformance.com.

Common SQL Server Mishaps

Tim Radney — Fri, 21 Jun 2019 09:00:12 +0000

I’ve been teaching and writing about common SQL Server mistakes for many years. I wrote a blog about it years ago too, however as time has marched on, guidance has changed a bit. This article will expand on my previous article and point out how these apply to SQL Server, Azure SQL Database, and Azure SQL Managed Instance.

For many years I’ve found users making the same mistakes. I call them mistakes however, in most cases, it is more just things not being done properly because the people managing the environment don’t know any better. Here are some of the more critical items that anyone installing and supporting SQL Server should know about:

Backups
DBCC CHECKDB
Memory settings
Statistics
Index maintenance
MAXDOP and cost threshold for parallelism
SQL Server Agent alerts

Backups

I always check backups first when looking at a new system. Having proper backups to meet recovery objectives is critical. Data loss can be detrimental to an organization. When looking at backups, I check for recovery model and the current history of backups for each database. I usually find a combination of the following:

No backup at all – no record of any backup for the database
Missing backups – no log backups for a database using the full recovery model
No recent backups – last backup was weeks/months/years old

Misconfigured backups are detrimental to an organization when a recovery situation comes up. Working with and having to tell customers that they’ve lost data is never fun or easy. Having proper backups to meet SLAs should be any organizations top priority in addition to making sure there are copies of these backups stored in a secondary location offsite.

This situation applies to on-premises SQL Server and IaaS. Azure SQL Database and Azure Managed Instance have managed backups.

DBCC CHECKDB

Database corruption happens unfortunately. Without regularly checking for corruption, customers can find themselves in a bad place by not having backups in order to recover when that corruption affects the physical data. To check for corruption, DBCC CHECKDB should be run against each database on a regular basis. What I find is very similar to backups:

No DBCC CHECKDBs performed at all
DBCC CHECKDBs being performed only on select databases
DBCC CHECKDBs last performed months or years ago

Worst case is a job scheduled reporting failed DBCC CHECKDBs

It is never pleasant finding corruption or having a customer reach out with a corruption issue when the corruption is a heap or clustered index and there are no backups prior to the corruption occurring. In these cases, the corruption is the actual data and starting the restore from before the corruption is in most cases, the only option. In cases where the corruption is a non-clustered index, rebuilding the index is the fix.

In a few situations, I’ve had to work with customers who have nasty corruption without proper backups where I’ve been able to script out the database and manually copy all the usable data into a newly created database. These costly situations can be easily avoided by running DBCC CHECKDB and having proper backup retention.

I advise customers to run DBCC CHECKDB on-premises, IaaS, Azure SQL Database, and Azure SQL Managed Instance. Azure does a great job checking for physical corruption; however, I feel that consumers need to be checking for logical corruption.

Memory Settings

A default installation of Microsoft SQL Server has minimum memory value set to 0 and maximum server memory value set to 2147483647 MB, which is 2 Petabytes. Prior to SQL Server 2012, the maximum server memory value only applied to the bufferpool, so customers needed to limit the amount of memory the bufferpool could use to save memory for the operating system and other processes. SQL Server 2012 introduced a memory manager rewrite so that the maximum server memory value applies to all SQL Server memory allocations.

It is highly advisable to set a maximum value for your SQL Server instance. Jonathan Kehayias has written an blog post How much memory does my SQL Server actually need, with a formula that helps establish the baseline for the maximum memory value. In cases of a shared SQL Server, I recommend my clients to set the minimum value to 30% of the memory on the server.

In situations with multiple instances or where the server is used for SQL Server, SSIS, SSAS, or SSRS, you need to evaluate how much memory those other systems need and reduce the maximum server memory value to allow adequate memory for the OS and the other services.

This issue is valid for on-premises, IaaS, and partially for Azure SQL Managed Instance. Managed Instance sets a max server memory value based on the deployed tier, however when I tested resizing the environment, the max memory value was not dynamically changed. In that situation, you would need to manually update the value. This issue does not apply to Azure SQL Database.

Statistics

The query optimizer uses statistics to build execution plans. This means SQL Server needs statistics to be up to date so that the query optimizer has a better chance of building a good execution plan. By default, statistics are updated after 20% +500 rows of data have been modified. That can take a long time on larger tables. Beginning with compatibility level 130, the threshold for statistics updates for large tables have been lowered. For SQL Server 2008R – 2014, you could lower this threshold using trace flag 2371.

I regularly find that customers are not manually updating statistics and even with the lower threshold, I’ve found that manually updating makes an environment more stable.

I recommend that customers use a third-party script to update statistics. Ola Hallengren has published a widely used Maintenance Solution for SQL Server. Part of that process is his Index Optimize procedure, which can take additional parameters to update statistics.

@UpdateStatistics 
    ALL     = update index and column statistics
    INDEX   = update index statistics
    COLUMNS = update column statistics
    NULL    = Do not perform statistics maintenance (this is the default)

@OnlyModifiedStatistics
    Y = Update statistics only if rows have been modified since most recent stats update
    N = Update statistics regardless of whether any rows have been modified

I’ve found that customers who are using third party products or scripts to perform index maintenance based upon the fragmentation level of the index are not considering that reorganizations do not update statistics like rebuilds do. Many of these third-party applications have options for updating statistics just like Ola’s Index Optimize procedure, you just need to turn it on.

Updating statistics applies to on-premises, IaaS, Azure SQL Database, and Azure SQL Managed Instance.

Index Maintenance

Performing index maintenance by removing fragmentation from your indexes is still important. Some retired documentation from Microsoft stated that index fragmentation can have a negative impact from 13-460% depending on the size of the environment and the level of fragmentation. While hardware such as intelligent SANs, Solid State Disk, and other advancements have helped speed things up, wasted space in index can translate to wasted space in the buffer pool as well as wasting more I/O.

Fragmentation occurs through regular operations such as inserts, updates and deletes. To remediate this, proper index maintenance of rebuilding or reorganizing your indexes is needed. I again turn to Ola Hallengren, for his Index Optimize script. Ola’s script provides the ability to specify to rebuild or reorganize based on the level of fragmentation and minimum pages. Many third-party tools offer the same logic. SQL Server Database Maintenance plans prior to SQL Server 2016 only allowed to rebuild or reorganize all indexes. Beginning with SQL Server 2016, you can now specify similar logic based on fragmentation levels. Don’t forget those statistics though if you are using smart logic based on fragmentation levels.

I like Ola’s script and third-party tools that log to a table. I can then query the table to see if I have any index hot spots where fragmentation is constantly occurring at high levels and troubleshoot why fragmentation is so prevalent and can anything be done.

There are exceptions to every rule or best practice. Some patterns of data access lead to constant fragmentation. The cost of constantly rebuilding/reorganizing those tables may not be worth it and can be excluded from maintenance. Those situations should be evaluated on a case by case basis.

This applies to on-premises, IaaS, Azure SQL Database, and Azure SQL Managed Instance.

MAXDOP

I find that max degree of parallelism and cost threshold for parallelism are typically left at the default values on the client servers. For MAXDOP the default value is zero which means an ‘unlimited’ number of CPUs could be used to execute a parallel region of a query. Technically up to 64 processors unless you enable a trace flag to use more.

A decade ago, when processors had lower core counts, this value was acceptable. Today, with high core density and multi-socket servers, an unlimited number of CPUs for parallelism isn’t so good. Microsoft has given guidance on what values to use for MAXDOP.

If you are on SQL Server 2008 – SQL Server 2014, for a single NUMA node with less than 8 logical processors, keep MAXDOP at or below the number of logical processors. If you have more than 8 logical processers, keep MAXDOP at 8. If you have multiple NUMA nodes with less than 8 logical processors per NUMA node, keep MAXDOP at or below the number of logical processors per NUMA node. Greater than 8, keep MAXDOP at 8.

SQL Server 2016 introduced soft-NUMA nodes. During service startup, if the Database Engine detects more than 8 physical cores per NUMA node or socket, soft-NUMA nodes are created automatically. The engine takes care of placing logical processors from the same physical core into different soft-NUMA nodes. For that reason, we have slightly different guidance for MAXDOP for SQL Server 2016 onwards.

If you are on SQL Server 2016 and up, for a single NUMA node with less than 16 logical processors, keep MAXDOP at or below the number of logical processors. If you have more than 16 logical processers, keep MAXDOP at 16. If you have multiple NUMA nodes with less than 16 logical processors per NUMA node, keep MAXDOP at or below the number of logical processors per NUMA node. Greater than 16, keep MAXDOP at half the number of logical processors per NUMA node with a MAX value of 16.

If you are mostly virtualized on machines with 8 or fewer logical processors with a default MAXDOP, you’re probably in OK. If you have big physical hardware with defaults, then you should look at optimizing MAXDOP.

All the figures above are guidelines, not hard truths. Your workloads vary and consideration should be taken when you determine what value is most optimal for your workload.

Configuring MAXDOP applies to on-premises, IaaS, and Azure SQL Managed Instance. However, there is a database scoped configuration that can be applied per database starting with SQL Server 2016, and this applies to Azure SQL Database.

Cost Threshold for Parallelism

Cost threshold for parallelism has a default value of 5. The history of this number goes back to the early days of SQL Server and the workstation that workload testing was performed on. With modern hardware, the cost estimation of 5 is outdated. Testing has shown that increasing the number from 5 to a higher value will keep shorter-running queries from having a parallel plan. I tend to recommend increasing this value to a higher number after examining the Plan Cache. In many cases I end up starting with a value of 25 and then monitor further and adjust from there, if needed. For more information about tuning cost threshold for parallelism, Jonathan Kehayias wrote: Tuning ‘cost threshold for parallelism’ from the Plan Cache.

This applies to on-premises, IaaS, and Azure SQL Managed Instance.

SQL Server Agent Alerts

Everyone should be leveraging SQL Agent alerts unless they have a third-party application monitoring for the same error conditions. Configuring alerts is easy and free, and having them configured will give you critical information when your servers are having problems.

I wrote an article titled SQL Server Agent Alerts, providing step-by-step instructions on how to create alerts for severity 19-25 errors and error 825. Enabling these alerts is easy: enable database mail, create a mail operator and then create the alerts. This can be accomplished using the GUI or with T-SQL. I encourage my everyone to script out this process using T-SQL and make it part of your standard server build.

This applies to on-premises, IaaS, and Azure SQL Managed Instance.

Summary

As you can see, there are many settings that should be modified from the defaults after installing SQL Server. This is not a comprehensive list; however, it does cover many of the more critical and performance impacting issues I find, and that I have lumped under my "SQL Server mishaps" category.

The post Common SQL Server Mishaps appeared first on SQLPerformance.com.

Compatibility Levels and Cardinality Estimation Primer

Glenn Berry — Tue, 29 Jan 2019 09:00:37 +0000

Introduction

Between 1998 and early 2014, SQL Server used one cardinality estimator (CE), but would introduce a new database compatibility level with every new major version of SQL Server, (with the exception of SQL Server 2008 R2). The native compatibility levels for SQL Server are shown by major SQL Server version in Table 1:

SQL Server Version	Native Compatibility Level
SQL Server 7.0	70
SQL Server 2000	80
SQL Server 2005	90
SQL Server 2008 SQL Server 2008 R2	100
SQL Server 2012	110
SQL Server 2014	120
SQL Server 2016	130
SQL Server 2017	140
SQL Server 2019	150

Table 1: SQL Server Versions and Native Compatibility Levels

Between SQL Server 7.0 and SQL Server 2012, there was no connection between the compatibility level of a database and the cardinality estimator that queries in that database would use. This is because there was just the one cardinality estimator, which received a major update in 1998. The compatibility level of a database was only used for backwards functional compatibility and to enable/disable some new features in each new version of SQL Server (see this Stack Exchange answer for examples of how behavior changed between 80 and 90, probably the most disruptive change). Unlike the file version of a SQL Server database, you can change the compatibility level of a database at any time, to any supported compatibility level, with a simple ALTER DATABASE command.

By default, if you created a new database in SQL Server 2012, the compatibility level would be set to 110, but you could change it to an earlier level if you wished. If you restored a database backup from a SQL Server 2008 instance onto a SQL Server 2012 instance, it would upgrade the file version of the database, but would leave the compatibility level where it had been on the SQL Server 2008 instance (unless it was 80, which would get upgraded to 90, the minimum version supported by SQL Server 2012). Besides knowing the fundamental difference between the file version of a database and the compatibility level of a database, most DBAs and developers didn’t have to worry very much about database compatibility levels before SQL Server 2014 was released. In many cases, most databases never had their compatibility levels changed after a migration to a new version of SQL Server. This usually didn’t cause any issues unless you actually needed a new feature or behavior that changed in the latest database compatibility level.

SQL Server 2014 Changes

This old state of affairs changed radically with the release of SQL Server 2014. SQL Server 2014 introduced a “new” cardinality estimator that was enabled by default when a database was in 120 compatibility level. In the classic whitepaper, “Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator,” Joe Sack explains the background and behavior of this change back in April of 2014. In many cases, most of your queries ran faster when using the new cardinality estimator, but it was fairly common to run into some queries that had major performance regressions with the new cardinality estimator. If that happened, SQL Server 2014 did not have that many options for alleviating the performance issues caused by the new CE. Joe’s whitepaper covers those options in great detail, but essentially, you were limited to instance-level trace flags or query-level query hints to control which cardinality estimator was used by the query optimizer, unless you wanted to revert to compatibility level 110 or lower.

SQL Server 2016 Changes

SQL Server 2016 introduced database scoped configuration options, which give you the ability to control some behaviors that were formerly configured at the instance level, using an ALTER DATABASE SCOPED CONFIGURATION command. In SQL Server 2016, these options included MAXDOP, LEGACY_CARDINALITY ESTIMATION, PARAMETER_SNIFFING, and QUERY_OPTIMIZER_HOTFIXES. There was also a CLEAR PROCEDURE_CACHE option that let you clear the entire plan cache for a single database.

Most relevant in this context are the LEGACY_CARDINALITY ESTIMATION and QUERY_OPTIMIZER_HOTFIXES database scoped configuration options. LEGACY_CARDINALITY ESTIMATION enables the legacy CE regardless of the database compatibility level setting. It is equivalent to trace flag 9481, but it only affects the database in question, not the entire instance. It allows you to set the database compatibility level to 130 in order to get a number of functional and performance benefits, but still use the legacy CE database-wide (unless overridden by a query-level query hint).

The QUERY_OPTIMIZER_HOTFIXES option is equivalent to trace flag 4199 at the database level. SQL Server 2016 will enable all query optimizer hotfixes before SQL Server 2016 RTM when you use the 130 database compatibility level (without enabling trace flag 4199). If you do enable TF 4199 or enable QUERY_OPTIMIZER_HOTFIXES, you will also get all of the query optimizer hotfixes that were released after SQL Server 2016 RTM.

SQL Server 2016 SP1 also introduced the USE HINT query hints that are easier to use, understand, and remember than the older QUERYTRACEON query hints. This gives you even more fine-grained control over optimizer behavior that is related to database compatibility level and the version of the cardinality estimator that is being used. You can query sys.dm_exec_valid_use_hints to get a list of valid USE HINT names for the exact build of SQL Server that you are running.

SQL Server 2017 Changes

The new adaptive query processing feature was added in SQL Server 2017, and is enabled by default when you are using database compatibility level 140.

Microsoft is trying to move away from the old terminology of “New CE” and “Old CE”, since there are actually changes and fixes to query optimization in each new major version of SQL Server. Because of this, there is no single “New CE” anymore. Instead, Microsoft wants to refer to CE70 (default CE for SQL Server 7.0 through SQL Server 2012), CE120 for SQL Server 2014, CE130 for SQL Server 2016, CE140 for SQL Server 2017, and CE150 for SQL Server 2019. Starting with SQL Server 2017 CU10, you can use the USE HINT functionality to control this with query hints. For example:

/*...query...*/ OPTION (USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_130'));

… would be a valid query hint to force the CE130 cardinality estimator for a particular query.

SQL Server 2019 Changes

SQL Server 2019 is adding even more performance improvements and behavior changes that are enabled by default when a database is using compatibility mode 150. A prime example is scalar UDF inlining. Another example is the intelligent query processing feature, which is a superset of the adaptive query processing in SQL Server 2017.

There are five new USE HINT options, including ways to disable batch mode or disable adaptive memory grant feedback, as shown in Table 2:

DISABLE_BATCH_MODE_ADAPTIVE_JOINS

DISABLE_BATCH_MODE_MEMORY_GRANT_FEEDBACK

DISABLE_INTERLEAVED_EXECUTION_TVF

DISALLOW_BATCH_MODE

QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_150

Table 2 : New USE HINT options

And there are also sixteen new database scoped configuration options (as of CTP 2.2) that give you database-level control of more items that are also affected by trace flags or database compatibility level. It gives you more fine-grained control of higher level changes that are enabled by default with database compatibility level 150. These are listed in Table 3:

ACCELERATED_PLAN_FORCING	ELEVATE_RESUMABLE	ROW_MODE_MEMORY_GRANT_FEEDBACK
BATCH_MODE_ADAPTIVE_JOINS	GLOBAL_TEMPORARY_TABLE_AUTO_DROP	TSQL_SCALAR_UDF_INLINING
BATCH_MODE_MEMORY_GRANT_FEEDBACK	INTERLEAVED_EXECUTION_TVF	XTP_PROCEDURE_EXECUTION_STATISTICS
BATCH_MODE_ON_ROWSTORE	ISOLATE_SECURITY_POLICY_CARDINALITY	XTP_QUERY_EXECUTION_STATISTICS
DEFERRED_COMPILATION_TV	LIGHTWEIGHT_QUERY_PROFILING
ELEVATE_ONLINE	OPTIMIZE_FOR_AD_HOC_WORKLOADS

Table 3 : New database scoped configuration options

Conclusion

Migrating to a modern version of SQL Server (meaning SQL Server 2016 or newer) is significantly more complicated than it was with legacy versions of SQL Server. Because of the changes associated with the various database compatibility levels and various cardinality estimator versions, it is actually very important to put some thought, planning, and actual testing into what database compatibility level you want to use on the new version of SQL Server that you are migrating your existing databases to.

Microsoft’s recommended upgrade process is to upgrade to the latest SQL Server version, but keep the source database compatibility level. Then, enable Query Store on each database and collect baseline data on the workload. Next, you set the database compatibility level to the latest version, and then use Query Store to fix performance regressions by forcing the last known good plan.

You really want to avoid a haphazard “blind” migration where you are blissfully unaware of how this works and how your workload will react to these changes. Changing the database compatibility level to an appropriate version and using the appropriate database scoped configuration options, along with appropriate query hints where absolutely necessary, is extremely important with modern versions of SQL Server.

The post Compatibility Levels and Cardinality Estimation Primer appeared first on SQLPerformance.com.

End of support for SQL Server 2008 & 2008 R2

Tim Radney — Fri, 12 Oct 2018 10:00:29 +0000

If you are running SQL Server 2008 or SQL Server 2008 R2, what does July 9th, 2019 mean for you? With both of these versions of SQL Server reaching the end of their support lifecycle together, you will no longer be able to get critical security updates. This can cause serious security and compliance issues for your organization.

When these versions of SQL Server were released, they came with 10 years of support; 5 years of Mainstream Support and 5 years of Extended Support. If your organization still has SQL Server 2008/2008 R2 in production, how is your organization planning to address the risk? For organizations that are heavily regulated, this is a big concern.

You need to choose how you’re going to migrate and where you’re going to migrate to, and then make sure you’re not going to hit any roadblocks along the way.

Migration Assessment Tools

If you are planning an upgrade from SQL Server 2008/2008 R2, Microsoft has made things much easier to test and validate your environment. Numerous tools exist that can assist with migration assessments and even handle migration tasks, and they’re all slightly different. These tools include:

Data Migration Assistant
Microsoft Assessment and Planning Toolkit
Azure Database Migration Service
Database Experimentation Assistant

The Data Migration Assistant helps you to upgrade to a modern data platform. It does this by detecting compatibility issues that can impact functionality on the newer version of SQL Server and makes recommendations for performance and reliability improvements for the new environment. Your source can be SQL Server 2005+ with a target of SQL 2012+ and Azure SQL Database.

The Microsoft Assessment and Planning Toolkit has been around for many years and is often referred to as the MAP Tool. It’s great for doing an inventory of your current environment to find where SQL Server (and other applications) exist.

The Azure Database Migration Service integrates some of the functionality of existing tools and services to provide customers with a comprehensive solution for migrating to Azure. The tool generates assessment reports that provide recommendations to guide you through any changes required prior to performing a migration. This service currently requires a VPN or Express Route.

Finally, the Database Experimentation Assistant is a new A/B testing solution for SQL Server Upgrades and it’s a tool you should become familiar with. It leverages Distributed Replay to capture a workload and replay it against a target SQL Server. This can be used to test hardware changes or version differences of SQL Server. You can capture workloads from SQL Server 2005 and up.

Migration Options

On-premises upgrade: One of the easiest migration methods is to upgrade to a newer version of SQL Server. In this case, you have SQL Server 2012, 2014, 2016, or 2017 to pick from. I encourage clients to upgrade to the latest version that they can. SQL Server 2012 is already out of Mainstream Support and SQL Server 2014 goes out of Mainstream Support on July 9^th, 2019. Upgrading can be very time consuming and costly to organizations due to all the planning and testing involved, so moving to the latest version can increase the time before the next upgrade. There are also numerous performance and functionality improvements in SQL Server 2016 and 2017 that make migrating to SQL Server 2012 or 2014 a very poor choice at this time.

A common approach for on-premises upgrades is to build new and migrate, regardless of a physical or virtual environment. By building new, you can restore your databases and conduct numerous rounds of testing and validation to make sure everything works as expected before moving production.

Upgrade and migrate to an Azure VM: For organizations that are looking to migrate to the cloud, Azure Infrastructure as a Service (IaaS) is a great option. Running SQL Server on an Azure VM is much like on-premises. You specify the size of the VM (number of vCPUs and memory) and configure your storage for your I/O and size requirements. You are still responsible for supporting the OS and SQL Server for configuration and patching. Azure IaaS gives you the ability to easily scale your workloads by scaling the size of your virtual machine up or down as your workload needs change, as well as take advantage of Azure Active Directory integration, threat detection, and many other Azure benefits.

Migrate to Azure SQL Database: Another option you have is to migrate to Azure SQL Database. Azure SQL Database can be thought of as a Database as a Service and is part of Microsoft’s Platform as a Service (PaaS). Azure SQL Database functionality is database scoped, which means certain things such as cross database queries, SQL Server Agent, Database Mail, and more are not available. However, many customers that have applications that utilize a single database have been able to migrate to Azure SQL Database with minimal effort. You can quickly test for compatibility with Azure SQL Database by using the Data Migration Assistant. With Azure SQL Database, you can size your databases by DTU (Database Transaction Units) or vCores individually, or group databases into an Elastic Pool. Azure SQL Database allows you to scale your resources up and down with minimal effort and downtime.

Migrate to Azure SQL Managed Instance: A new option (as of 2018) is to migrate to Azure SQL Managed Instance. This is a new product that is currently generally available as of October 1^st for the General-Purpose tier. Managed Instance was built using the instance-level programming model. This means that functionality we are used to with the full version of SQL Server is supported. The goal of Managed Instance is to have 100% surface area compatibility with on-premises. All databases in the instance are on the same server, so cross-database queries are supported, as are Database Mail, SQL Server Agent, Service Broker, and much more. There are two pricing tiers; General Purpose, that includes a non-readable secondary for HA, and Business Critical, that has two non-readable secondaries and a readable secondary. Managed Instance is part of Microsoft’s PaaS offering, so you get all the built-in features and functionality of PaaS.

Move as-is to Azure Virtual Machines: Microsoft is offering three years of Extended Security Updates at no additional charge if you move your SQL 2008/SQL 2008 R2 instances to an Azure VM. The goal is to give you a bit more time to upgrade to a newer version of SQL Server when you are ready.

Pay to Stay: This isn’t a migration option, but you do have an option to purchase up to three years of Extended Security Updates. There are restrictions around this option. You must have active Software Assurance for those instances or Subscription licenses under an Enterprise Agreement. If this applies to you, then this option can buy you more time to plan and migrate off of SQL Server 2008/2008 R2.

Migration Best Practices

When performing any migration or upgrade, there are certain things you need to be aware of. First, you need baselines and I can’t stress this enough. Anytime you make a change to an environment, you need to be able to measure how that change impacts the environment. Knowing key performance metrics for your environment can help you when troubleshooting any perceived impact. You can manually collect these metrics using perfmon and DMVs or invest in a performance monitoring platform. I wrote about both techniques in more detail in a previous post, and right now you can get an extended, 45-day evaluation of SentryOne. Having baseline metrics for things like CPU utilization, memory consumption, disk metrics, and more can quickly let you know if things look better or worse after an upgrade or migration.

You should also note your configuration options within your instance. Many times, I’ve been asked to look at a SQL Server instance after an upgrade or migration and found that most of the default settings are in use. If the old system is still available, I’m able to query it and get the previous non-default values that were in place and apply those to the new environment to get them back to a known configuration. It is always good to review sys.configurations on your production server to consider making similar changes on your new environment (cost threshold for parallelism, max degree of parallelism, optimize for ad hoc workloads, and more.) Notice I wrote ‘consider’. If your core count or memory is different on the new server, you need to configure the settings taking the new server’s size into account.

What is your backout plan if things go wrong? Do you have proper backups you can go back to? In most cases with an upgrade or migration, you are moving to a new VM or physical server. Your failback may be to move back to the old server. If you have had data change in the new version of SQL Server, your failback is much more complicated. You cannot restore a SQL Server database backup from a newer version of SQL Server to an older version.

Conclusion

If you are still using SQL Server 2008 or SQL Server 2008 R2, you have a few options available to you to stay in compliance after July 9th, 2019. To stay on SQL Server 2008 or SQL Server 2008 R2, you can purchase Extended Security Updates or move to an Azure virtual machine if you qualify. If you can upgrade, you can migrate to a supported version of SQL Server on-premises or on an Azure VM or consider migrating to a managed solution such as Azure SQL Database or Azure SQL Managed Instance.

The post End of support for SQL Server 2008 & 2008 R2 appeared first on SQLPerformance.com.

Perfect Storm for Upgrading to a Modern Version of SQL Server

Glenn Berry — Tue, 04 Sep 2018 09:00:24 +0000

A number of ongoing and upcoming developments in software, hardware, and storage technology are shaping up to make the next twelve to eighteen months an ideal time to migrate from a legacy version of SQL Server to a modern version of SQL Server.

Personally, I consider any version of SQL Server prior to SQL Server 2016 to be a legacy version of SQL Server. SQL Server 2014 will fall out of mainstream support on July 9, 2019 (the same date that SQL Server 2008 and SQL Server 2008 R2 will fall out of extended support). SQL Server 2012 fell out of mainstream support on July 11, 2017.

SQL Server Developments

The modern versions of SQL Server are SQL Server 2016, SQL Server 2017, and the upcoming SQL Server vNext. Previously, I wrote about why SQL Server 2017 is a tangibly better upgrade choice than SQL Server 2016 right now. When SQL Server vNext is released (perhaps in Q4 2018) it is likely to have a number of useful new features and enhancements that will make it a superior upgrade choice to SQL Server 2017. Regardless of new features, the next version of SQL Server will be in mainstream support for a longer period than SQL Server 2016 or SQL Server 2017.

Operating System Developments

Microsoft is set to release Windows Server 2019 sometime later in 2018. Microsoft recently had an online Windows Server Summit on June 26, 2018 that had a lot of useful information about this release. There are a number of improvements in Storage Spaces Direct (S2D), including deduplication and compression in ReFS. Another improvement is True Two-Node quorum for two-node S2D clusters using a USB thumb drive as a file share in a router.

Windows Server 2019 S2D will let you have up to 4PB of raw storage capacity per S2D cluster, which is a 4X improvement over Windows Server 2016. There is a new PoSH cmdlet called Get-PhysicalDiskIoReport that lets you view much more granular performance information for individual physical disks, that allows you to manually monitor drive latency, and can be used to automatically detect drive latency outliers.

Windows Server 2019 fully supports existing NV-DIMM persistent memory, along with Intel Optane 3D XPoint memory and storage. There are also improvements in the free Windows Admin Center management utility that is a great dashboard for hyper-converged infrastructure (HCI) deployments.

VMware vSphere 6.7 supports vSphere Persistent Memory, which will let you take advantage of persistent memory in a VMware virtualized environment.

Hardware Developments

Both Intel and AMD are scheduled to release new generations of server processors during the next three to six months, both of which will work in existing model servers. This will make them actually available more quickly than if they required new server models.

For Intel, it will be the 14nm Intel Xeon Scalable Processor “Cascade Lake-SP” which is rumored to be due in Q4 of 2018. These processors will support “Apache Pass” DIMMs (meaning Intel Optane 3D XPoint persistent memory) and they are socket compatible with current Intel Xeon Scalable Processor “Skylake-SP” processors, which means that they will work in existing server models. More important for many customers is the fact that Cascade Lake-SP will have hardware-level protection for most side-channel attacks (such as Spectre/Meltdown), which will have less performance impact than software-based mitigation techniques.

The Cascade Lake-SP processors will be followed by 14nm “Cooper Lake-SP” in 2019, and then 10nm “Ice Lake-SP” server processors in 2020. Intel’s continued struggles with 10nm manufacturing are definitely going to hurt their competitive position compared to AMD in 2019/2020. I expect AMD will gain significant market share from Intel in the server space during this period.

AMD will have the 2^nd generation, 7nm “Zen2” EPYC “Rome” family processors in early-mid 2019. These processors are supposed to be socket compatible with existing server models and have up to 48 cores/ 96 threads per processor. These processors are also rumored to have PCIe 4.0 support, which will give them double the bandwidth per lane compared to PCIe 3.0.

Looking further out, AMD is planning on releasing the 3^rd generation 7nm+ “Zen3” EPYC “Milan” family of processors sometime in 2020.

Storage Developments

Intel released its first 3D XPoint storage product, the 375GB Intel Optane DC P4800X SSD in Q1 2017. These use a PCIe 3.0 x4 link along with the NVMe protocol, and they have roughly 10X lower latency and 5-8X better throughput at low queue depths compared to the fastest PCIe NVMe NAND-based SSDs. They also have 2-4X better write endurance than enterprise NAND-based SSDs.

I have had great success using these DC P4800X drives for SQL Server instances with extremely heavy tempdb workloads. They are relatively inexpensive and offer the fastest currently available type of traditional block mode storage, bar none. They are transparent to SQL Server and will work in any system that supports PCIe 3.0 x4 slots as HHHL add-in cards or U.2 connected drives.

What is coming in the very near future is Intel 3D XPoint-based DIMMs (“Apache Pass”) that use a traditional low-latency DDR4 memory interface and form factor. These DIMMs will be available in 128GB, 256GB, and 512GB capacities, and they will fit in DDR4 memory slots. They will be addressable in a lower performance block mode that uses the entire storage stack, or a much higher performance direct access (DAX) mode that is byte addressable and bypasses the storage stack.

Both Windows 10 and Windows Server 2016 already have DAX support, and SQL Server 2016 SP1 has the persistent log buffer feature that lets you leverage a DAX storage volume that is built on persistent memory to create a small additional 20MB transaction log file that is used to greatly reduce latency writing to the transaction log. My guess that that SQL Server vNext will improve this particular feature.

Windows Server 2019 will have even better support for persistent memory. New two-socket servers with Intel Xeon “Cascade Lake-SP” processors will support up to 6TB of 3D XPoint DIMMs, which can be combined with traditional DDR4 memory in other memory slots.

Conclusion

Over the next few months, we are probably going to see a new release of SQL Server, along with the release of Windows Server 2019, new generations of server processors from both Intel (and a newly competitive AMD), and new high-performance layers in the memory and storage hierarchy.

This convergence of events makes it much easier to build a business and technical case for moving from a legacy version of SQL Server running on a legacy operating system on old, out of warranty hardware and slow storage, to a much better and more flexible data platform that is able to take advantage of all of these new developments to get better performance and scalability.

The post Perfect Storm for Upgrading to a Modern Version of SQL Server appeared first on SQLPerformance.com.

New Azure SQL Database Standard Tier Sizes

Tim Radney — Mon, 04 Dec 2017 14:08:56 +0000

Azure SQL Database currently has three service tiers to choose from for your workload. These tiers consist of Basic, Standard, and Premium. Basic supports only one size of 5 DTUs. Premium starts at 125 DTUs and goes up to 4,000 DTUs. The Premium tier is the top tier which is built for higher I/O workloads and provides lower latency per I/O, and an order of magnitude more IOPS per DTU, than in the Standard tier.

Prior to August 2017, the Standard tier only supported DTU sizes between 15 and 100 DTUs. Currently available in preview are new performance levels and storage add-ons that offer price optimization benefits for CPU-intensive workloads. With those, the Standard tier now supports up to 3,000 DTUs.

At this point, you may be asking yourself what, exactly, is a DTU? A DTU is a Database Transaction Unit and is a mixture of CPU, memory, and data and transaction log I/O. (Andy Mallon, @AMtwo, recently addressed this in "What the heck is a DTU?") You can hit your DTU limit by maxing out CPU, memory, or I/O.

Previously, the Standard tier only offered 4 levels: 15, 30, 50, and 100 DTUs, with a database size limit of 250GB, with standard disk. If you had a database that was larger than 250GB, however did not need more than 100 DTUs for CPU, memory, or I/O, you were stuck paying a Premium price just for database size. With the new changes, you can now have up to a 1TB database in the Standard tier; you just have to pay the extra storage. Currently storage is being billed at $0.085/GB during the preview. Increasing from the included size of 250GB to 1TB increases by 774GB at a cost of $65.79 per month.

The new Standard preview DTU sizes support 200, 400, 800, 1,600, and 3,000 DTU options. If you have a SQL Server database workload that is more CPU-bound than I/O, these Standard tier options have the potential to save you a lot of money; however, if your workload is I/O bound, the Premium tier is going to outperform the Standard tier.

I decided to try two different workloads to see how different the Standard and Premium tiers compared to each other. I wanted to create simple and reproducible test so that others can try to validate for themselves. For my first test, I wanted to generate a healthy mix of CPU and I/O. I was hoping that I would be pushing more CPU than I/O, and be able to show that the expanded Standard tier would outperform a Premium tier with the same DTU size. I didn’t exactly get the results I was hoping for.

To setup this demo, I created a table with three GUID columns, inserted 1 million rows, and then updated two of the three columns with new IDs. The sample code is below:

CREATE TABLE dbo.TestTable
(
  Table_id UNIQUEIDENTIFIER DEFAULT NEWID(),
  Customer_id UNIQUEIDENTIFIER DEFAULT NEWID(),
  Cust_Name VARCHAR(40) DEFAULT CAST(NEWID() AS VARCHAR(40))
);

SET NOCOUNT ON;
GO

INSERT INTO dbo.TestTable DEFAULT VALUES;
GO 1000000

CREATE CLUSTERED INDEX [ClustTestTable] ON [dbo].[TestTable]
(
  [Table_id] ASC,
  [Customer_id] ASC
);

SET STATISTICS TIME ON;

UPDATE TestTable
  SET Table_id = NEWID(), Customer_id = NEWID();

As I ran through the series of tests, performance steadily improved in the Standard tier until I got to the S12 option where, oddly, CPU and elapsed time increased. I ran the test multiple times and S12 was consistently 54 seconds. It is pretty clear with my first test, that the Premium tier outperformed the Standard tier. For example, the S9 and P2 are closest in time, however the DTU size for Standard is 1,600 compared to 250 for the P2. This test is more about the I/O capabilities. The chart below shows the size, DTU level, cost, CPU time, elapsed time, and time in seconds for each test:

As the tests were being executed, I observed in the monitor dashboard that data I/O and log I/O percentage were the driving force behind the DTU percentage. The following chart was from a run against an S4 database:

I then decided to try another series of tests that would be more CPU-heavy. For that test I used the following script:

SET STATISTICS TIME ON;

SELECT SUM(CONVERT(BIGINT, t1.object_id) 
         + CONVERT(BIGINT, t2.object_id) 
         + CONVERT(BIGINT, t3.object_id) 
         + CONVERT(BIGINT, t4.object_id))
  FROM sys.objects t1
  CROSS JOIN sys.objects t2
  CROSS JOIN sys.objects t3
  CROSS JOIN sys.objects t4;

What I observed in the monitor dashboard on this series of tests is that CPU percentage was the sole driver of DTU percentage. As I went through the series of tests in the Standard tier, the test seemed to plateau at roughly 27 seconds, and started at the S4 size. What struck me as odd is that an S4 at 200 DTU took 27 seconds to complete and the corresponding P2 at 250 DTU took 38 seconds; a P4 at 500 DTU was more comparable. If we look at the cost differential for this demo, an S4 during preview only cost $150.01, while a P4 cost $1,860; the S4 provides a cost savings of just over $1,700. Let’s imagine that a P2 at 250 DTUs performed like we had expected; a P2 costs $930 and would still cost $780 more than an S4.

The full results of all tests in the second demo are included in the following chart:

Unlike the first demo, this was 100% CPU-driven. I had tried to include one additional cross join, but the demo then took hours per session instead of minutes. For a future test I’ll try to come up with a few additional scenarios that push a more realistic OLTP workload; one that is higher CPU, and one that is more I/O bound, and then a decent blend of the two.

You can see from the graph below that, on this run against an S4 database, CPU spiked at nearly 50%, while DTU percentage matched exactly:

From the two different workloads that I tested, it is very evident that if you have any significant I/O workload, you will need the Premium tier, but if your workload is mostly CPU-bound without any significant I/O needs, the higher Standard tiers can provide you substantial savings over the Premium tier.

If you are considering a migration to an Azure SQL Database, the DTU calculator is a great place to start to get an idea of a starting point for sizing; however, at the time of writing, the DTU calculator doesn’t take into consideration the expanded Standard tier. What is great about the DTU calculator is that it will break out CPU, IOPs, and log utilization to let you know what the driving factor for the DTU level recommendation is. For example, I ran the last demo on a 4 vCPU, 4GB virtual machine, and the DTU calculator recommended a P2. When I chose to ‘view more details,’ I got the following messages.

Service Tier/Performance Level for CPU – Based solely on CPU utilization, we recommend you migrate your SQL Server workload to Premium – P2. This Service Tier/Performance Level should cover approximately 100.00 % of your CPU utilization.

Service Tier/Performance Level for Iops – Based solely on Iops utilization, we recommend you migrate your SQL Server workload to Basic. This Service Tier/Performance Level should cover approximately 89.92 % of your Iops utilization.

NOTE: There is approximately 10.08 % of your workload that falls into a higher Service Tier/Performance Level. After migrating your database to Azure, you should evaluate your database's performance using the guidance mentioned in the information section above.

Service Tier/Performance Level for Log – Based solely on Log utilization, we recommend you migrate your SQL Server workload to Basic. This Service Tier/Performance Level should cover approximately 100.00 % of your Log utilization.

Since I know this workload is heavily CPU-bound, if I can’t tune the workload to decrease the CPU requirement, I have up to 3,000 DTUs available in Standard tier. Rather than spend $930 per month for a P2 with 250 DTUs, an S4 with 200 DTUs at $150 per month (or an S6 with 400 DTUs at $300.02 per month) would be a much more economical option.

In conclusion, there are tools available to help you determine a good starting point for the size of your Azure SQL Database migrations, however the absolute best method is to test your workload. Migrating a copy of your production database, capturing a production workload, and replaying that workload against the Azure SQL Database will give you a much better understanding of what DTU size you truly need.

The post New Azure SQL Database Standard Tier Sizes appeared first on SQLPerformance.com.

Intel Xeon Scalable Processors and SQL Server 2017

Glenn Berry — Mon, 17 Jul 2017 09:00:38 +0000

On July 11, 2017, Intel officially launched their new 14nm Intel Xeon Scalable Processor family (Skylake-SP) of server processors. This family replaces the previous generation of 14nm Intel Xeon E7 v4 and 14nm Intel Xeon E5 v4 (Broadwell-EX and Broadwell-EP) processors.

Branding and Pricing

Intel calls this overall family a new “converged platform” that is segmented into four distinct product lines called Intel Xeon Platinum, Intel Xeon Gold, Intel Xeon Silver, and Intel Xeon Bronze. For SQL Server usage, only the Platinum and Gold lines make sense if you are concerned about getting the best performance for each one of your SQL Server processor core licenses.

Unlike the previous generation Xeon processors, the new Xeon Platinum processors have up to 28 physical cores and can all be used in any socket count server, whether it is a two-socket, four-socket, or eight-socket machine. The Xeon Gold processors have up to 22 physical cores, and can be used in two-socket or four-socket machines.

Intel is using a new product branding and numbering convention that is detailed in Figure 1. Unlike the new AMD EPYC processors, Intel is doing a lot of marketing and profitability-driven product segmentation in this lineup that makes it more difficult to understand, and makes it more difficult to pick the right processor for your workload.

Figure 1: Intel Xeon Scalable Processor Product Numbering Convention

One prime example of this harmful product segmentation are the “M” SKU processors (processor models with an M suffix) that support 1.5TB of memory per socket as opposed to 768GB of memory per socket for the non-“M” SKU models. Intel charges about a $3000.00 premium (per processor) for that extra memory support, which is a pretty large increase, especially for the lower cost processors. The eight-core Intel Xeon Gold 6134 processor is $2,214.00 each, while the eight-core Intel Xeon Gold 6134M processor is $5,217.00 each. All of the other specifications (and performance) are identical for those two processors. For SQL Server usage, it may make perfect sense to pay that premium to get twice the memory capacity per socket (especially given your SQL Server license costs and overall memory costs), but I don’t like the fact that Intel is doing some price gouging here.

Architecture Changes

The Skylake-SP has a different cache architecture that changes from a shared-distributed model used in Broadwell-EP/EX to a private-local model used in Skylake-SP. How this change will affect SQL Server workloads remains to be seen.

In Broadwell-EP/EX, each physical core had a 256KB private L2 cache, while all of the cores shared a larger L3 cache that could be as large as 60MB (typically 2.5MB/core). All of the lines in the L2 cache for each core were also present in the inclusive, shared L3 cache.

In Skylake-SP, each physical core has a 1MB private L2 cache, while all of the cores share a larger L3 cache that can be as large as 38.5MB (typically 1.375MB/core). All of the lines in the L2 cache for each core may not be present in the non-inclusive, shared L3 cache.

A larger L2 cache increases the hit ratio from the L2 cache, resulting in lower effective memory latency and lowered demand on the L3 cache and the mesh interconnect. L2 cache is typically about 4X faster than L3 cache in Skylake-SP. Figure 2 details the new cache architecture changes in Skylake-SP.

Figure 2: Comparing Cache Architectures

Another new architectural improvement is Intel Ultra Path Interconnect (UPI), which replaces the previous generation Intel Quick Path Interconnect (QPI). Intel UPI is a coherent interconnect for systems containing multiple processors in a single shared address space. Intel Xeon processors that support Intel UPI, provide either two or three Intel UPI links for connecting to other Intel Xeon processors using a high-speed, low-latency path to the other CPU sockets. Intel UPI has a maximum data rate of 10.4 GT/s (giga-transfers/second) compared to a maximum data rate of 9.6 GT/s for the Intel QPI used in the fastest Broadwell-EP/EX processors.

Another important new feature in Skylake-SP is the inclusion of AVX-512 support, which allows 512-bit wide vectors to be computed, which greatly improves speed compared to older vector instruction sets like AVX2, AVX, and SSE. These instruction sets are typically used for things like compression and encryption.

AVX-512 also has much better power efficiency in terms of GFLOPS/Watt and GFLOPS/GHz compared to the older instruction sets, so that Intel does not have to reduce the clock speed of all of the cores if AVX code is running on any of the cores and they also can have each core run at a different speed depending on what type of AVX code is running on that core.

SQL Server Hardware Support

In SQL Server 2016, Microsoft introduced support for SSE and AVX instructions supported by the CPU to improve the performance for row bucketing in Columnstore indexes and bulk inserts. They also added hardware support for AES-NI encryption. I wrote about how this new software support for specific hardware mapped to different processor generations here. Hopefully, Microsoft will extend this type of code to cover AVX-512 support in SQL Server 2017.

Another new feature in Skylake-SP is Intel Speed Shift support, which allows the processor cores to change their p-states and c-states much more effectively (which lets the processor cores “throttle up” much more quickly). This feature builds on the Hardware Power Management (HWPM) introduced in Broadwell with a new mode that allows HWPM and the operating system to work together, called native mode. Native mode is supported on Linux kernel 4.10 and in Windows Server 2016.

According to some of the early benchmarks I have seen, these Skylake-SP processors have about a 10% IPC improvement over Broadwell-EP cores running at the same clock speed. Software that takes advantage of specific new features (such as AVX-512 and FMA) could see much higher performance increases.

Regarding SQL Server 2017 OLTP workloads, on June 27, 2017, Lenovo submitted a TPC-E benchmark result for a Lenovo ThinkSystem SR650 two-socket server, with two 28-core Intel Xeon Platinum 8180 processors. The raw score for this system was 6,598.36. Dividing that score by 56 physical cores, we get a score/core of 117.83 (which is a measure of single-threaded CPU performance).

For comparison’s sake Lenovo submitted a TPC-E benchmark result for a Lenovo System x3650 M5 two-socket server with two 22-core Intel Xeon E5-2699 v4 processors. The raw score for this system was 4938.14. Dividing that score by 44 physical cores, we get a score/core of 112.23. The Skylake-SP system is about 5% faster for single-threaded performance here, but keep in mind that this is for a pre-release version of SQL Server 2017.

If you want an even more detailed view of the specific changes and improvements in the Intel Xeon Scalable Processor family compared to the previous generation Xeon processors, you can read about it here.

The post Intel Xeon Scalable Processors and SQL Server 2017 appeared first on SQLPerformance.com.