performance | Under the kover of business intelligence

SSRS and MDX: Think about Cell Properties

Koen Verbeeck — Wed, 04 May 2016 12:00:59 +0000

I stumbled across a small MDX performance tuning trick when using SSRS reports (or any other client tool where you can edit the MDX). When you create the MDX query using the designer, it will append some cell properties at the end of the query. Basically it’s just metadata about the cells of the result set.

An example of what those values can contain:

Value: 6360
Formatted value: € 6.360,00
Format string: \€ #,##0.00;\€ -#,##0.00

Some of them are quite useful, especially the FORMAT_STRING property as you can use it to force SSRS to display the formatting you configured in the cube. Be sure to check Jason Thomas’ excellent blog post on this: Using SSAS formatting in SSRS. FORMATTED_VALUE seems very useful, but since it’s a string value SSRS doesn’t know how to aggregate it, so using it in a tablix might result in an error.

However, most likely you won’t need most of those cell properties. So you can leave out the cell properties that you don’t need and it saves you some data that has to be retrieved from the cube and sent to the SSRS report. Don’t expect a 10-minute query to suddenly run under 5 seconds, but every little bit counts to make your reports as fast as possible.

I tested a simple query using MDX Studio, where you can easily wipe the cache of the cube.

It also has a nice Perfmon pane showing you some performance metrics about the executed query.

Here’s the query on a cold cache, with all cell properties:

On a cold cache with only two cell properties:

Warm cache with all properties:

Warm cache with two properties:

On a cold cache we got about 14.9% performance increase. On a warm cache about 16.5% increase. Quite a nice result for just deleting a couple of lines at the end of the MDX query. The more measures you include in your query, the bigger the impact that this trick has. If you only have one single measure, you probably won’t notice much difference.

The post SSRS and MDX: Think about Cell Properties first appeared on Under the kover of business intelligence.

Loading Data into Clustered Columnstore Index with SSIS

Koen Verbeeck — Thu, 21 Apr 2016 12:00:49 +0000

I’m using a clustered columnstore index (CCI) on one of my fact tables at a client. The row size is between 8 and 9 million rows and I was wondering how I could load data efficiently in this CCI using SSIS.

I stumbled upon this MSDN blog post: SQL Server 2016 SSIS Data Flow Buffer Auto Sizing capability benefits data loading on Clustered Columnstore tables (catchy title). It explains how you can set the buffer properties of the data flow to try to insert data directly into compressed row groups instead of in the delta store. They fail to achieve this using SSIS 2014 and then they explain how using the new AutoAdjustBufferSize property of SSIS 2016 works miracles and everything is loaded directly into compressed row groups. Hint: you want to avoid loading data into the delta store, as it is row storage and you need to wait for the tuple mover to load the data to the CCI in the background.

However, it’s still possible to achieve the same using SSIS 2014 (or earlier). Niko Neugebauer (blog | twitter) shows this in his post Clustered Columnstore Indexes – part 51 (“SSIS, DataFlow & Max Buffer Memory”). It still depends on the estimated row size, but using these settings you should get better results:

Set the DefaultBufferMaxRows property of the data flow to 1,048,576, which is the maximum number of rows for a columnstore row group.
Set the DefaultBufferSize to 100MB (104857600), which is the maximum. If (estimated row size * max number of rows) is bigger than 100MB, you’ll end up with less rows in the rows group.

Set these settings in the OLE DB Destination:

Fast Load, obviously
Rows Per Batch: 1,048,576. This will set the batch size property of the INSERT BULK command.
Maximum Insert Commit Size: 0. Although the default is around 2 billion, it can still force intermediate commits. In my case, I had a commit after 150,000 rows, so I ended up with row group sizes of 150,000 rows. They were directly compressed, but a tad too small. Setting this property to 0 will force a single commit at the end, so you finally get your data loaded directly into the CCI with a row group size of 1,048,576.

While the SSIS package was running, you could see the different row groups who were not yet committed:

For some reason one row group doesn’t want to play ball like the others. When the SSIS package was finished, all of the row groups were set to the Compressed state:

The package was still quite slow, but this was caused by the fact I enforced a unique constraint on my CCI by using a materialized view (you have to wait till SQL Server 2016 to be able to put a rowstore index on top of a table with a CCI).

Fortunately I have put an incremental load mechanism into place, so I don’t need to load 8 million rows every time.

The post Loading Data into Clustered Columnstore Index with SSIS first appeared on Under the kover of business intelligence.

Reblog: SSIS Performance Testing

Koen Verbeeck — Fri, 12 Feb 2016 13:00:46 +0000

I had to do some performance testing for an MSSQLTips article and I thought I’d share the framework I used in a blog post.

First of all we have to log start and end dates of the package to a table so we can easily calculate the duration a package took to finish. This can probably be calculated from the SSIS catalog as well, but I was a bit too ~~busy~~ lazy to find out how to do this. Anyway, the logging table is created using the following statement:

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[PackageLogging]') AND type in (N'U'))
BEGIN
    CREATE TABLE [dbo].[PackageLogging](
        [ID] [int] IDENTITY(1,1) NOT NULL,
        [RunID] [int] NOT NULL,
        [PackageName] [varchar](50) NOT NULL,
        [StartDate] [datetime2](7) NOT NULL,
        [EndDate] [datetime2](7) NULL
    );
END
GO

The RunID column is populated by a package parameter; I will come back to this later on.

The package starts with an Execute SQL Task to log the start. I use the following SSIS expression to construct the SQL statement on the fly, allowing you to easy copy-paste the logging task between packages.

“INSERT INTO dbo.PackageLogging(RunID, PackageName,StartDate) VALUES (” +  (DT_STR,10,1252)@[$Package::RunID] + “,’” +  @[System::PackageName] + “‘,SYSDATETIME());”

At the end of the control flow, there is an Execute SQL Task that updates the EndDate of the previously inserted row. The expression looks like this:

“UPDATE dbo.PackageLogging SET [EndDate] = SYSDATETIME() WHERE RunID = ” + (DT_STR,10,1252)  @[$Package::RunID] + ” AND PackageName = ‘” +  @[System::PackageName] + “‘;”

The RunID parameter is important to link those two Execute SQL Tasks together. A typical control flow looks like this:

Logging is the first step, now we have to run the package of course. I created a stored procedure that allows me to easily start a package in the SSIS catalog.

CREATE PROC [dbo].[RunPackage]
    (@RunID         INT
    ,@PackageName   VARCHAR(50)
    ,@FolderName    VARCHAR(50)
    ,@ProjectName   VARCHAR(50)
    ,@Synchronized  BIT = 1 -- run synchronously by default
    )
AS
DECLARE @execution_id BIGINT;
 
EXEC [SSISDB].[catalog].[create_execution]
     @package_name      = @PackageName
    ,@execution_id      = @execution_id OUTPUT
    ,@folder_name       = @FolderName
    ,@project_name      = @ProjectName
    ,@use32bitruntime   = False
    ,@reference_id      = NULL;
 
--SELECT @execution_id;
 
EXEC [SSISDB].[catalog].[set_execution_parameter_value]
     @execution_id
    ,@object_type       = 30
    ,@parameter_name    = N'RunID'
    ,@parameter_value   = @RunID;
 
EXEC [SSISDB].[catalog].[set_execution_parameter_value]
     @execution_id
    ,@object_type       = 50
    ,@parameter_name    = N'SYNCHRONIZED'
    ,@parameter_value   = @Synchronized;
 
EXEC [SSISDB].[catalog].[start_execution] @execution_id;
 
GO

The proc passes the @RunID parameter to the package, as well as other usual suspects, such as the package name, folder name and project name. You can also choose if a package is run synchronously or asynchronously. When run synchronously, the stored procedure doesn’t finish until the package is finished as well.

Using this stored procedure, it is easy to run a package multiple times in a row using a WHILE loop.

DECLARE @RunID INT = 1;
 
WHILE (@RunID <= 10)
BEGIN
    EXEC dbo.RunPackage @RunID, 'myPackage.dtsx', 'myFolder', 'myProject', 1;
    SET @RunID += 1;
END

The package is run synchronously, so that multiple instances do not run at the same time. This eliminates resource contention and gives us a clearer result of the performance of the individual package.

Using the following query, it is easy to build a nice chart in SSRS:

SELECT
     [RunID]
    ,[PackageName]
    ,[Duration] = DATEDIFF(MILLISECOND,StartDate,EndDate) / 1000.0
    ,[Mean] = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY DATEDIFF(MILLISECOND,StartDate,EndDate) / 1000.0)
                    OVER (PARTITION BY PackageName)
FROM [dbo].[PackageLogging]
ORDER BY ID;

The result:

I used RunID as category, PackageName as the series and the Duration/Mean as the values. I created a custom palette where I forced the Duration measure and the Mean to have the same color.

The post Reblog: SSIS Performance Testing first appeared on Under the kover of business intelligence.

Book Review – Expert Performance Indexing

Koen Verbeeck — Tue, 02 Feb 2016 13:00:51 +0000

A few weeks back, the folks at Apress Publishing sent me a copy of the book Expert Performance Indexing in SQL Server for me to review. Since I know both the authors and since I’m familiar with the spectacular work they do, I was happy to do so. Doesn’t mean I won’t do an honest review though. I can’t be bought! (Well I can, but considerably more money should be involved ) Apparently the book is already in second edition. I haven’t read the first edition, so I can’t compare those.

About the authors

Jason Strate (blog | @StrateSQL) is a SQL Server MVP with tons of experience with databases and data warehouses (he’s did a legendary one-month run on SSIS a few years back). He’s a well known speaker and an active member of the SQL Server community. Grant Fritchey (blog | @GFritchey) aka the Scary DBA. A longtime SQL Server MVP and a vocal member of the SQL Server community. He’s an active participant at the forums of SQLServerCentral.com and speaks at various events (where our paths have crossed a few times).

If anybody could write a book about SQL Server indexing, it’s those two. You can tell from the book they have years and years of technical experience, and they try to pass it on to the reader. Very successfully, I must add. Grant has also a book on SQL Server Query Performance Tuning, so this book is a logical step. And Jason wrote the first edition of course

General

I’m pleased to say this is an excellent book. I definitely learned a lot from it. The book goes into real technical detail – which is sometimes hard to follow if you’re not a hardcore DBA but a BI developer instead, like me – and it comes packed with a lot of scripts that you can immediately start using in your environment. The title starts with “Expert” and there’s a good reason for that. The introduction states this book is also for “novices in indexing”, but if this is really your very first book on indexing, you might feel a wee bit overwhelmed after a few chapters. If you do keep up with the pace, you’ll find that this book is an excellent resource for everything there’s to know about indexing in SQL Server. Although a bit more info on columnstore indexes would have been nice.

The book is very well written. There are almost no typos or mistakes in the book (kudos for the reviewers and the editors). There were a few issues with hyperlinks to chapters in the PDF ebook, but nothing too serious. When needed, there are plenty of screenshots to back-up the technical explanations. The T-SQL scripts use the Microsoft sample databases (AdventureWorks and Contoso), so it’s easy to try them out on your machine as well. You can read the book front-to-back like I did, or you can pick any chapter of your interest and just start reading. Only at the end there’s one chapter – Index Analysis – that is dependent on the previous chapter, Monitoring Indexes.

Chapter by Chapter

Let’s go over the various chapters:

Chapter 1 is a general introduction to all the various kinds of indexes in SQL Server. It uses a great analogy with books inside a library. This chapter should be mandatory reading for everyone working with SQL Server.
Chapter 2 talks about index storage fundamentals. The kid gloves are off, this chapter dives into the technical detail. It explains how data is stored inside pages and it demonstrates how you can use DBCC commands or dynamic managements views/functions to take a peek into those pages.
Chapter 3 explains how you can query the metadata of indexes (for example physical and operational statistics) and it shows how SQL Server uses statistics for the indexes and how you can examine them. This chapter is important, as it lays the foundation for scripts used in later chapters.
Chapters 4, 5 and 6 are about XML, spatial and full-text indexes respectively. I skipped those chapters. The reason for this is because I’m pretty sure I’m never going to use them. If someday I would, I know I can read up on it using those chapters. But I couldn’t be bothered reading them right now. The same goes for chapter 7, which is about indexing in-memory OLTP tables. Not exactly a feature I’m going to use soon in my data warehouses, so I decided to skip that chapter as well.
Chapter 8 is a great chapter. It takes a few common myths about indexes and debunks them. An example is “Clustered indexes store records in physical order”. If you thought they were, it’s time you read this chapter I guess this chapter is the result of a few great slidedecks and presentations the authors did over the years. This chapter should also be mandatory reading. Give it to everyone starting with SQL Server. The authors also present a few indexing best practices. They are careful enough to mention that best practices are not carved in stone and that they should be taken with a grain of salt.
Index maintenance is the topic of chapter 9, which is mainly about index fragmentation and statistics maintenance. You’ll learn how to find issues and how to resolve them, either with maintenance plans or with custom T-SQL scripts. A great resource for DBAs.
Chapter 10 talks about indexing tools, such as the missing index DMOs. The bigger part of the chapter however is used in an attempt to salvage some of the reputation of the Database Tuning Advisor. Lots of screenshots here and even an explanation on how to use the command line interface.
Chapter 11 is a very interesting one. It goes over the four big types of indexes (heaps, non-clustered, clustered and columnstore) and presents different scenarios and use cases where you like to implement them. Row and page compression are discussed as well. A very good chapter and one I will reread soon because there’s a lot of useful information there. There’s only one issue in this chapter: one clustered index pattern talks about “surrogate keys”. The definition is given as “A surrogate key is a column in the data that can uniquely identify one row from all the other.”, which is incorrect in my opinion. A surrogate key is a meaningless key, separate from the source system and typically used in data warehouses. A better term would have been natural key,candidate key or business key. Now it can lead to a bit of confusion with readers familiar with the data warehouse surrogate key definition. I’ve reached out to the authors and they acknowledge the use of an incorrect definition, so this will probably get fixed if there’s a third edition
Chapter 12 is another great one. It explains that it still matters how you write your queries. Because even with world-class indexes in place, nothing can shield SQL Server from stupidly written queries. This chapters teaches you how to make sure your queries still use the correct indexes.
Chapter 13 is probably the most useful for DBAs. It provides you with a full framework for monitoring indexes. It comes with scripts that read performance counters, dynamic management objects and event traces and store all of that information in a monitoring database for further analysis. Lots of useful scripts here, if you want to know what is going on with your indexes.
When you have all the monitoring information on your indexes, chapter 14 tells you how you can analyze that information to take action. This chapter reminded me a bit of sp_blitzindex.
The last chapter is a very short one and the only one that is not technical. It talks about indexing methodology. I know it’s important, but the book wouldn’t be worse if it was left out. For the interested readers.

Conclusion

This book is an excellent book about indexing in SQL Server. It covers about everything there is to know in great technical detail. Which is one of the “downsides” of the book: it’s not exactly for beginners. But hey, the title says “Expert Performance Indexing”, not “Indexing for dummies”, so you could think the technical detail is one of the books biggest strengths. The book is a bit more geared towards DBAs, but I guess that’s fair since they deal with them the most in their daily activities. There are a lot of scripts in the book which you can immediately use in your environment to monitor, analyze or learn more about the indexes in your databases. My favorite chapters are chapter 8 (myths and best practices), chapter 11 (indexing scenarios) and chapter 12 (how to not mess up your queries).

My only criticism: there isn’t a chapter about columnstore indexes. I feel there can be a lot more said about those indexes. And if XML indexes get a separate chapter, so should they. Maybe in a third edition?

I recommend this book for everyone who wants to get a better understanding in SQL Server and who wants to make take their expertise in indexing to the next level.

The post Book Review – Expert Performance Indexing first appeared on Under the kover of business intelligence.

Performance Tuning with sp_BlitzIndex

Koen Verbeeck — Mon, 07 Dec 2015 13:49:32 +0000

Just a small blog post about how I used sp_BlitzIndex for the first and how awesome that was. sp_BlitzIndex is one of the many free scripts you can get from the great team at Brent Ozar Unlimited.

Recently I went to a client for a one day data warehouse performance tuning exercise. Because you only have one day, it’s important to quickly find the pain-points of the system. I remembered seeing a webinar somewhere of Brent explaining the sp_Blitz script, so I decided to bring those scripts with me. I couldn’t have made a better choice.

There was a serious indexing problem at the client. They had heard “indexes make reads go faster”, so they slapped a lot of indexes on most of the tables. None of them clustered. I ran the script with its default settings and I quickly got a list of all the problems it could find with the indexes on the data warehouse.

It gave me an overview the following items, all of them were immediately actionable:

duplicate indexes. Remove the offenders immediately.
near-duplicate indexes. Check if for example an index has columns (A,B,C) and another index (A,B). Delete the last one.
Heaps. Quite a long list, but the script also has a section on which tables are accessed the most. This allowed us to focus on the more important heaps in the data warehouse.
the so-called work-a-holics: indexes which were used a lot. I focused on making these indexes more efficient: could I make a filtered index out of it? Or maybe add some included columns?

Other topics were listed as well, but these were the main ones I focused on.

What’s great is that this script also provides you with the URLs to knowledge articles on the Brent Ozar website. If you don’t understand one of the results, you can immediately look it up and read about it.

By focusing on the results of sp_BlitzIndex script, I could boost performance in just a few hours of work. This near real-time data warehouse is the source for a reporting application used by dozens of people in the field, and you could immediately tell it worked a lot faster. Awesomesauce.

Disclaimer: I was honestly really impressed with the results. I did not get paid by Brent for this blog post.

The post Performance Tuning with sp_BlitzIndex first appeared on Under the kover of business intelligence.