Stefán J. Sigurðarson

Autoscaling Azure SQL HyperScale for better cost management

Stefán Jökull Sigurðarson — Wed, 19 Mar 2025 23:44:40 GMT

Troy Hunt has written before about how Have I Been Pwned runs in the cloud, but I wanted to do a little more of a deep dive on how we try to manage our Azure SQL HyperScale cloud costs. We use Azure SQL HyperScale to store all the breach data for Have I Been Pwned and it currently sits at just over 1 Terabyte of data which is constantly being queried by users through the website or the API.

One of the things we saw early on with Azure SQL HyperScale is that while we are loading data breaches we often need to scale the DB up and then scale it back down once we have loaded the data. We scale down because the Have I Been Pwned data is very static and only really changes when we actually load new data breaches, so we extensively cache things on the Cloudflare Edge using their workers.

Now, you might be wondering why we don't just leave the core count for Azure SQL up there, since it's just a max setting and it should be cheaper if we aren't using all the cores anyway. That's what the docs say right?

Screenshot from the Azure SQL pricing Overview page

We figured we could just as well keep the core count up and since we weren't using all of them we'd only be billed for the CPU cores used.

Apparently we were wrong! But we couldn't figure out why the bills weren't going down so we started digging deeper.

We were seeing utilization for a 40 vCore instance in the low 10-20% ranges so we expected to be billed for some 4 - 8 cores used but we were being billed constantly for all 40 cores. How could that be? That's when I started really looking into the billing for HyperScale and my eyes stopped at this line:

Amount billed: vCore unit price * maximum (minimum vCores, vCores used, minimum memory GB * 1/3, memory GB used * 1/3)

After my brain processed that single line a bit i dawned on us...

Slack messages between me and Troy

Well ain't that sneaky! For a service like ours where data is always beeing queried and since SQL Server by default uses all the memory it has to cache data, the maximum of that is ALWAYS going to be "(memory GB used * 1/3)". This was confirmed when we looked at the Memory % Used graphs which pretty much constantly look like this:

Screenshot from the Azure Portal for the Have I Been Pwned Azure SQL DB Instance

Since our memory utilization is pretty much always maxed, we are always billed for the max vCores regardless of our actual vCore usage.

Again...

If your HyperScale server is querying a lot of data from your DB, you will be billed for the max vCore it is configured for regardless of your actual vCore usage, because you will be using all the available memory in SQL Server due to how the SQL Server cache works.

Ok, now that we made that discovery, how can we manage the max vCore configuration? Azure SQL does not have anything like Azure App Service autoscaling to scale core counts up and down based on load. That's when I found out that you can scale Azure SQL HyperScale with a SQL command!

-- Set the HyperScale DB to 8 vCores max
ALTER DATABASE  MODIFY (SERVICE_OBJECTIVE = 'HS_S_Gen5_8');

Ok, but how do we know when it's safe for us to scale things up and down then? Easy! You can also query performance counters for the SQL instance! So, we decided that we'd write a few procs to query this data and depending on the state of the DB we could determine if we should scale the database up/down or not do anything. So, we came up with this script!

CREATE PROCEDURE [dbo].[ScaleDatabase]
  @CoreCount              INT   = NULL, @CpuScaleUpThreshold FLOAT = 60, @CpuScaleDownThreshold FLOAT = 20,
  @WorkerScaleUpThreshold FLOAT = 10, @WorkerScaleDownThreshold FLOAT = 5, @MinutesToEvaluate INT = 5
AS
BEGIN
  -- Let's first check if we are allowed to scale
  DECLARE @IsSet BIT
  EXEC dbo.GetHibpFlag 'ScaleLocked', @IsSet OUTPUT
  IF @IsSet = 1
  BEGIN
    EXEC dbo.LogScalingEvent @Message = 'Scaling is locked. Not doing anything this time.'
    RETURN
  END

  -- Get the current workload
  DECLARE @currentWorkload NVARCHAR(20) = CAST(DATABASEPROPERTYEX(DB_NAME(), 'ServiceObjective') AS NVARCHAR),
    @nextWorkload          NVARCHAR(20) = NULL, @logMessage NVARCHAR(1000) = NULL

  PRINT 'Current workload: ' + @currentWorkload
  DECLARE @ScaleCommand NVARCHAR(1000)

  -- If the CoreCount is not provided, we will use automatic scaling
  IF @CoreCount IS NULL
  BEGIN
    -- Automatic scaling
    DECLARE @StartDate DATETIME = DATEADD(MINUTE, -@MinutesToEvaluate, GETUTCDATE())
    DECLARE @StartTime DATETIME, @EndTime DATETIME, @AvgCpuPercent FLOAT, @MaxWorkerPercent FLOAT, @ScaleUp BIT,
      @ScaleDown       BIT, @DataPoints INT;

    WITH DbStats (TimeStamp, AvgCpuPercent, MaxWorkerPercent, ScaleUp, ScaleDown)
    AS
    (
      SELECT [TimeStamp] = [end_time], AvgCpuPercent = [avg_cpu_percent], MaxWorkerPercent = [max_worker_percent],
        -- If the CPU OR Worker percentage exceeds the scale up threshold, increment the ScaleUp counter
        ScaleUp = IIF([avg_cpu_percent] > @CpuScaleUpThreshold OR [max_worker_percent] > @WorkerScaleUpThreshold,
                  1.0,
                  0.0),
        -- If the CPU AND Worker percentage are below the scale down threshold, increment the ScaleDown counter
        ScaleDown = IIF([avg_cpu_percent] < @CpuScaleDownThreshold AND [max_worker_percent] < @WorkerScaleDownThreshold,
                    1.0,
                    0.0)
        FROM sys.dm_db_resource_stats s
    )
    SELECT @StartTime = MIN(TimeStamp), @EndTime = MAX(TimeStamp), @AvgCpuPercent = AVG(AvgCpuPercent),
      @MaxWorkerPercent = MAX(MaxWorkerPercent), @ScaleUp = CONVERT(BIT, ROUND(MAX(ScaleUp), 0)),
      @ScaleDown = CONVERT(BIT, ROUND(MIN(ScaleDown), 0)), @DataPoints = COUNT(*)
      FROM DbStats
     WHERE TimeStamp BETWEEN @StartDate AND GETUTCDATE()

    DECLARE @EndDate DATETIME = DATEADD(MINUTE, 10, GETUTCDATE())
    IF @ScaleUp = 1
    BEGIN
      EXEC dbo.GetNextScaleWorkload 1, @nextWorkload OUTPUT
      IF @nextWorkload IS NULL
      BEGIN
        SELECT @logMessage = N'Already at highest scale. Not doing anything this time.'
        EXEC dbo.LogScalingEvent @Message = @logMessage
      END
      ELSE
      BEGIN
        SELECT @logMessage = CONCAT(
                               'Scaling up from ',
                               @currentWorkload,
                               ' to ',
                               @nextWorkload,
                               ' due to high CPU or worker utilization (CPU: ',
                               FORMAT(@AvgCpuPercent, 'N2'),
                               '%, Workers: ',
                               FORMAT(@MaxWorkerPercent, 'N2'),
                               '%)')
        SELECT @ScaleCommand = CONCAT(
                                 'ALTER DATABASE ', DB_NAME(), ' MODIFY (SERVICE_OBJECTIVE = ''', @nextWorkload, ''')')
        EXEC dbo.SetHibpFlag 'ScaleLocked', 1, @EndDate
        EXEC sp_executesql @ScaleCommand
        EXEC dbo.LogScalingEvent @Message = @logMessage
      END
    END
    ELSE IF @ScaleDown = 1
    BEGIN
      EXEC dbo.GetNextScaleWorkload 0, @nextWorkload OUTPUT
      IF @nextWorkload IS NULL
      BEGIN
        SELECT @logMessage = N'Already at lowest scale. Not doing anything this time.'
        EXEC dbo.LogScalingEvent @Message = @logMessage
      END
      ELSE
      BEGIN
        SELECT @logMessage = CONCAT(
                               'Scaling down from ',
                               @currentWorkload,
                               ' to ',
                               @nextWorkload,
                               ' due to low utilization (CPU: ',
                               FORMAT(@AvgCpuPercent, 'N2'),
                               '%, Workers: ',
                               FORMAT(@MaxWorkerPercent, 'N2'),
                               '%)')
        SELECT @ScaleCommand = CONCAT(
                                 'ALTER DATABASE ', DB_NAME(), ' MODIFY (SERVICE_OBJECTIVE = ''', @nextWorkload, ''')')
        EXEC dbo.SetHibpFlag 'ScaleLocked', 1, @EndDate
        EXEC sp_executesql @ScaleCommand
        EXEC dbo.LogScalingEvent @Message = @logMessage
      END
    END
    ELSE
    BEGIN
      SELECT @logMessage = CONCAT(
                             'No scaling action taken. CPU: ',
                             FORMAT(@AvgCpuPercent, 'N2'),
                             '%, Worker: ',
                             FORMAT(@MaxWorkerPercent, 'N2'),
                             '%. Current workload: ',
                             @currentWorkload)
      EXEC dbo.LogScalingEvent @Message = @logMessage
    END
  END
  ELSE
  BEGIN
    -- Manual scaling
    SELECT @nextWorkload = Workload
      FROM dbo.ScaleWorkloads
     WHERE Cores = @CoreCount

    IF @nextWorkload IS NULL
    BEGIN
      EXEC dbo.LogScalingEvent @Message = 'Invalid core count provided. No action taken.'
      RETURN
    END
    ELSE
    BEGIN
      SELECT @logMessage = CONCAT('Manually scaling from ', @currentWorkload, ' to ', @nextWorkload)
      SELECT @ScaleCommand = CONCAT(
                               'ALTER DATABASE ', DB_NAME(), ' MODIFY (SERVICE_OBJECTIVE = ''', @nextWorkload, ''')')
      EXEC sp_executesql @ScaleCommand
      EXEC dbo.LogScalingEvent @Message = @logMessage
    END
  END
END
GO

What this script basically does, is look at the performance data we have for a time period, and depending on the provided thresholds, will scale the HyperScale database up or down. It'll also give the DB time to stabilize after scaling operations by locking the scale for about 10 minutes after scaling events. We can configure all of this by sending different parameters as we iterate on this to find our optimal configuration.

However, there was one more problem. Azure SQL HyperScale does not support SQL Agent jobs so there is no built-in way to run this automatically or on a schedule. However, Azure does provide a service called Elastic Jobs which can run SQL statements against SQL databases on a schedule. Perfect!

So we created an elastic job to run the stored procedure every minute. Here is an example of what the logging output looks like:

Timestamp	Message
2025-03-19 07:13:07.413	No scaling action taken. CPU: 24.17%, Worker: 1.55%. Current workload: HS_S_Gen5_18
2025-03-19 07:12:07.640	No scaling action taken. CPU: 24.32%, Worker: 3.11%. Current workload: HS_S_Gen5_18
2025-03-19 07:11:07.290	Scaling is locked. Not doing anything this time.
2025-03-19 07:10:07.270	Scaling is locked. Not doing anything this time.
2025-03-19 07:09:07.350	Scaling is locked. Not doing anything this time.
2025-03-19 07:08:07.377	Scaling is locked. Not doing anything this time.
2025-03-19 07:07:07.427	Scaling is locked. Not doing anything this time.
2025-03-19 07:06:07.500	Scaling is locked. Not doing anything this time.
2025-03-19 07:05:07.090	Scaling is locked. Not doing anything this time.
2025-03-19 07:04:07.150	Scaling is locked. Not doing anything this time.
2025-03-19 07:03:17.707	Scaling is locked. Not doing anything this time.
2025-03-19 07:02:07.127	Scaling down from HS_S_Gen5_20 to HS_S_Gen5_18 due to low utilization (CPU: 17.89%, Workers: 1.26%)
2025-03-19 07:01:06.770	No scaling action taken. CPU: 17.95%, Worker: 1.26%. Current workload: HS_S_Gen5_20
2025-03-19 07:00:06.717	No scaling action taken. CPU: 17.92%, Worker: 1.26%. Current workload: HS_S_Gen5_20

Here we see the procedure decide to automatically scale the max vCores down from 20 to 18 due to low CPU and worker utilization. Success!

This has made our lives a lot easier, although I do wish that Azure SQL HyperScale had a way to do this automatically.

Hope this helps!

Fast ManualResetEventSlim checks when waiting is rarely needed

Stefán Jökull Sigurðarson — Mon, 22 Feb 2021 09:19:19 GMT

I was looking at some code optimizations for the RabbitMQ .NET client and noticed that in one place on the hot path the Wait() method was being called on a ManualResetEventSlim which was already set most of the time, and thought that might be overkill.

I figured that calling Wait() when it should just complete most of the time was most likely overkill, so I decided to check if things could be made faster. I thought of two options, first calling Wait(0) which tries to wait with a timeout of zero and then also to simply check the IsSet property first.

I wrote the following code using BenchmarkDotNet:

public class ManualResetEventSlimChecks
{
    private ManualResetEventSlim _set = new ManualResetEventSlim(true);

    [Benchmark]
    public void CheckSetBit()
    {
        if (!_set.IsSet)
        {
            _set.Wait();
        }
    }

    [Benchmark]
    public void WaitZero()
    {
        if (!_set.Wait(0))
        {
            _set.Wait();
        }
    }

    [Benchmark(Baseline = true)]
    public void Wait()
    {
        _set.Wait();
    }
}

This yielded the following results:


BenchmarkDotNet=v0.12.1, OS=Windows 10.0.21313
Intel Core i7-10700 CPU 2.90GHz, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.103
  [Host]     : .NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT
  Job-HJNSVS : .NET Framework 4.8 (4.8.4311.0), X64 RyuJIT
  Job-FBCCVP : .NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT

Method	Runtime	Mean	Error	StdDev	Ratio	Code Size	Gen 0	Gen 1	Gen 2	Allocated
CheckSetBit	.NET 4.8	0.7993 ns	0.0060 ns	0.0050 ns	0.09	1378 B	-	-	-	-
WaitZero	.NET 4.8	9.0637 ns	0.0317 ns	0.0296 ns	0.97	1364 B	-	-	-	-
Wait	.NET 4.8	9.3205 ns	0.0463 ns	0.0434 ns	1.00	1338 B	-	-	-	-

CheckSetBit	.NET Core 3.1	1.2812 ns	0.0023 ns	0.0019 ns	0.29	65 B	-	-	-	-
WaitZero	.NET Core 3.1	4.3391 ns	0.0108 ns	0.0096 ns	0.98	1300 B	-	-	-	-
Wait	.NET Core 3.1	4.4167 ns	0.0163 ns	0.0153 ns	1.00	1274 B	-	-	-	-

The lesson here was, that if you have a ManualResetEventSlim that will most of the time be set and it is being checked in a high-throughput scenario you should first check the IsSet property before deciding to call Wait, because even though Wait does do the IsSet check itself, it still requires a method call to Wait which is more expensive than simply checking the IsSet bit first.

Hope this helps!

Using HaveIBeenPwned, Application Insights and Grafana to detect credential stuffing attacks.

Stefán Jökull Sigurðarson — Fri, 04 Sep 2020 09:00:00 GMT

Hi friends. Today I want to share with you a method we use to detect and react to credential stuffing attacks in real-time using the PwnedPasswords API, Application Insights and Grafana.

This all started a few days ago when I noticed, not for the first time, an unusual amount of failed login attempts as well as an unusual number of successful logins using pwned passwords. We monitor users logging in with pwned passwords so we can properly notify them to change to a more secure password, but we also monitor a few other metrics and we post all of them to Application Insights where we use the Grafana Azure Monitor data source to create dashboards out of these metrics so we can follow and see anomalies as they occur. Looking at our Grafana dashboards, the credential stuffing attack looked something like this:

Failed login attempts, categorized by type, for example 'username unknown', 'incorrect password' etc.

2FA challenges issued

Obviously, something was off, if you look at the sides you can see our "normal" levels of failed login attempts and here we saw a huge spike over a period. We also saw an increase in issued 2FA challenges, which stems from an unusual amount of successful logins due to accounts using a pwned password, but us detecting an unknown machine, we issue a verification code challenge via. email, like Valve's Steam Guard.

I was looking at ways to detect this reliably without being affected by normal EVE server restarts or downtimes polluting the data too much. That where seasonality kicks in, since our downtimes are always at the same time, 11:00 UTC. After looking around a bit I found an awesome Kusto (the query language used for Application Insights) query called series_decompose_anomalies. You can read up on what it does by clicking the link, but basically it looks at historical data, maps out trends and forecasts and then scores the actual values against the forecast, giving you an indicator of when you might be looking at anomalous values.

I needed to combine these metrics into one metric, which I called 'high-risk' activity. It consists of for example ratio of pwned passwords for successful logins, the number of of failed login attempts, 2FA challenges issued and more. Using the Application Insights anomaly detection query over this metric, and doing a little fine tuning gave me chart looking like this for the same period:

Our high-risk activity 'index'.

customMetrics
| where (name == "Password.Breached" or (name == "User.Login" and customDimensions["Result"] != "Success") or name == "User.2FAChallenge"
) and $__timeFilter()
| summarize ['Count'] = sum(valueCount) by bin(timestamp, $__interval)
| order by timestamp asc

Simplified version of the Application Insights query for our high-risk index.

Then I added a second query to the Grafana chart, using the anomaly detection, which gives us a score of either 0 (everything is fine) and 1 (anomaly detected). Adding that to the chart gave me this:

Our high-risk activity 'index' with the added 'anomaly' detection line

let highRiskMetrics = materialize(
customMetrics
| where name == "Password.Breached" or (name == "User.Login" and customDimensions["Result"] != "Success") or name == "User.2FAChallenge"
// summarize our high-risk metric on a 1 minute interval
| make-series valueSum = sum(valueCount) on timestamp step 1m
// detect anomalies over our sum value, using a k-value of 4, over buckets amounting to one day, skipping trend analysis
| extend (anomalies, score, baseline) = series_decompose_anomalies(valueSum, 4, toint(1d/1m), 'none')
// expand our new anomalies series
| mv-expand timestamp, anomalies
// project the data for filtering in Grafana
| project timestamp = todatetime(timestamp), anomalies = toint(anomalies));

highRiskMetrics
| where $__timeFilter()
| summarize Anomaly = max(anomalies) by bin(timestamp, $__interval)
| order by timestamp asc

The Application Insights query for the anomaly detection

What this query does is take our high-risk metrics, detect trends and baselines on a 1 minute basis, with a seasonality of 24 hours.

Putting this all together we could then create a Grafana alert, which triggers if we see the anomaly being triggered for a given amount of time, giving is near-real-time notifications and react once we see credential stuffing attacks.

This is still a work in progress, but initial tests are extremely hopeful, with few false positives and looking at historical data we have even been able to spot smaller credential stuffing attacks that we missed, so this is most certainly something we'll keep around and iterate on.

Real world example of reducing allocations using Span and Memory

Stefán Jökull Sigurðarson — Fri, 01 May 2020 22:57:26 GMT

I have been planning to write this post for a few months now but always seem to have found something new to add. This all started when I decided to look at the RabbitMQ .NET Client source code to see if I could make improvements to the asynchronous logic and add support for System.IO.Pipelines. During that exploration I saw some opportunities for improvements that could be made to memory allocations. This was quite a learning project for myself as I both had to relearn a lot of things I had already learned and a lot of new things as well.

Brief introduction to the AMQP 0.9.1 Protocol

RabbitMQ uses the AMQP 0.9.1 protocol by default. In overly broad terms, every AMQP operation is defined as a command. Before transmission, commands are split into frames of a specified maximum size (often dictated by the server), and these frames are then serialized to bytes for transmission on the wire when the client and server communicate. AMQP uses network byte order representation/big-endian for its data types so you have to be careful when serializing them on little-endian systems like Windows, so you don't corrupt data and the client can be interoperable with other operating systems. Now there are certain nuances to the protocol, and I am oversimplifying things here, but this is the gist of how the AMQP protocol works.

Binary Serialization

To serialize data to/from RabbitMQ we need to be able to translate data into a binary format. This is called serialization and as I mentioned above, RabbitMQ uses network-byte-order.

Previously the library used a class called NetworkBinaryReader which inherited from BinaryReader and overrode certain methods. BinaryReader is a very old class in the .NET BCL and does not take endianness into account, and the library also assumed that it was running on little-endian systems so quite a lot of overhead went into serializing doubles and floats.

Here is an example of how NetworkBinaryReader previously implemented the ReadDouble method:

public override double ReadDouble()
{
    byte[] bytes = ReadBytes(8);
    byte temp = bytes[0];
    bytes[0] = bytes[7];
    bytes[7] = temp;
    temp = bytes[1];
    bytes[1] = bytes[6];
    bytes[6] = temp;
    temp = bytes[2];
    bytes[2] = bytes[5];
    bytes[5] = temp;
    temp = bytes[3];
    bytes[3] = bytes[4];
    bytes[4] = temp;
    return TemporaryBinaryReader(bytes).ReadDouble();
}

This was changed to get rid of the temporary MemoryStreams (used in the TemporaryBinaryReader method above) and byte arrays to reduce allocations.

Here is an example of how we are now reading Double from a ReadOnlySpan:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal static double ReadDouble(ReadOnlySpan span)
{
    if (span.Length < 8)
    {
        throw new ArgumentOutOfRangeException(nameof(span), "Insufficient length to decode Double from memory.");
    }

    uint num1 = (uint)((span[0] << 24) | (span[1] << 16) | (span[2] << 8) | span[3]);
    uint num2 = (uint)((span[4] << 24) | (span[5] << 16) | (span[6] << 8) | span[7]);
    ulong val = ((ulong)num1 << 32) | num2;
    return Unsafe.As(ref val);
}

This resulted in a dramatic decrease in allocations as well as CPU savings. Here is an example of BenchmarkDotNet benchmarks I created for verification.

Double

Method	Runtime	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ReadBefore	.NET 4.8	150.351 ns	1.2760 ns	0.9962 ns	0.1147	-	-	361 B
ReadAfter	.NET 4.8	8.116 ns	0.1166 ns	0.1090 ns	-	-	-	-
ReadBefore	Core 3.1	67.239 ns	0.9648 ns	0.8553 ns	0.0739	-	-	232 B
ReadAfter	Core 3.1	4.080 ns	0.1625 ns	0.1441 ns	-	-	-	-
WriteBefore	.NET 4.8	160.946 ns	4.3261 ns	4.0467 ns	0.1249	-	-	393 B
WriteAfter	.NET 4.8	7.788 ns	0.1187 ns	0.1110 ns	-	-	-	-
WriteBefore	Core 3.1	84.995 ns	2.0234 ns	2.6311 ns	0.0815	-	-	256 B
WriteAfter	Core 3.1	3.576 ns	0.0498 ns	0.0465 ns	-	-	-	-

Single

Method	Runtime	Mean	Error	StdDev	Median	Gen 0	Gen 1	Gen 2	Allocated
ReadBefore	.NET 4.8	142.165 ns	2.0497 ns	1.9172 ns	141.090 ns	0.1147	-	-	361 B
ReadAfter	.NET 4.8	2.322 ns	0.0520 ns	0.0406 ns	2.328 ns	-	-	-	-
ReadBefore	Core 3.1	65.098 ns	1.3230 ns	1.4705 ns	64.897 ns	0.0739	-	-	232 B
ReadAfter	Core 3.1	1.772 ns	0.0738 ns	0.0789 ns	1.771 ns	-	-	-	-
WriteBefore	.NET 4.8	151.188 ns	1.4721 ns	1.2293 ns	150.984 ns	0.1249	-	-	393 B
WriteAfter	.NET 4.8	3.101 ns	0.0603 ns	0.0564 ns	3.101 ns	-	-	-	-
WriteBefore	Core 3.1	75.557 ns	0.9762 ns	0.9131 ns	75.431 ns	0.0815	-	-	256 B
WriteAfter	Core 3.1	1.188 ns	0.0991 ns	0.2697 ns	1.093 ns	-	-	-	-

Serializing data and reusing buffers

Serializing data types is one thing. Serializing the AMQP data structures composed of that data is another thing. Messages are serialized into what's called Frames. Frames consist of a frame header, channel number, frame size and then the eventual frame payload (our data) and ends with a frame-end marker byte.

This brings up an interesting problem, which stems from the frame size. The frame size appears in the structure before the actual payload does, which makes sense because we need the consumers of the protocol to know how many bytes are in the payload they are about to read. But, since our payload can contain all sorts of data as well as header properties, and data types are serialized differently, it's hard for us to know the size of the serialized payload before we have actually serialized all the data in the message.

The obvious solution: MemoryStream

Previously, this was solved in a way many of us would think of. The client library created a MemoryStream, which was as the output stream for the NetworkBinaryWriter class used to serialize the data. After all the data had been written, it was easy to keep track of how many bytes had been written to the MemoryStream, seek back to the correct header position and write the payload size. Job done!

This method did its job, and most importantly did it correctly, but it had the drawback that the entire frame had to be buffered as it was being serialized, and as the MemoryStream grew bigger, new backing arrays needed to be allocated, memory copied etc. until the entire frame was correctly serialized and fit into the buffer, and eventually the resulting array was output to the network.

A better solution: MemoryPool

This was a noticeably big part of where allocations were being made so it became one of the focus points. When I looked at the code I figured, since I already had all the data, that I didn't need to serialize all of it to know how big it would be. I could simply walk through the data structure and calculate the required buffer size. We know a byte is one byte, a float is four bytes, int is four bytes etc. The biggest problem was strings. Given an arbitrary string, how do I know how many bytes it'll occupy when it's encoded using UTF8? Luckily for us, such a method already exists (Encoding.GetMaxByteCount), so that problem was solved as well.

So, what I decided to do, was to simply calculate how big each frame would need be at minimum when serialized, utilize System.Buffers.MemoryPool from the System.Memory NuGet packages to fetch a pooled byte array of the minimum size, and then serialize the data directly to that pooled array, write it to the network socket and eventually return the array to the pool to be reused later. This would allow the client to eliminate any memory churn when serializing the data since data would be serialized to reusable byte arrays.

Now, you may wonder why I didn't use one of the many libraries that provide implementations of MemoryStream that reuse pooled arrays, like Microsoft.IO.RecyclableMemoryStream? The answer is future proofing! Moving to System.Memory gives the library access to Memory and Span, which are a mainstay in .NET Core, and will make the migration to System.IO.Pipelines a lot easier. So, stay tuned because there is even more good stuff coming in later versions :)

Armed with this knowledge and idea, let's take a closer look at some of the improvements made and the impact it had!

Converting strings to UTF8 bytes

One of the first things I noticed was that when strings were being serialized, they were converted directly to a UTF8 byte array. Obviously, this will create a new array to contain the UTF8 bytes. However, strings can also be serialized to an existing array and given an offset to where to start to write the bytes and we'll also know how many bytes are eventually written. This of course fit in extremely well with the pooled array idea.

This is how strings were serialized to a stream before:

public static void WriteShortstr(NetworkBinaryWriter writer, string value)
{
    byte[] bytes = Encoding.UTF8.GetBytes(value);
    writer.Write((ushort) bytes.Length);
    writer.Write(bytes);            
}

And this is how it's now serialized to a Memory

public static int WriteShortstr(Memory memory, string val)
{
    int stringBytesNeeded = Encoding.UTF8.GetByteCount(val);
    if (MemoryMarshal.TryGetArray(memory.Slice(1, stringBytesNeeded), out ArraySegment segment))
    {
        memory.Span[0] = (byte)stringBytesNeeded;
        Encoding.UTF8.GetBytes(val, 0, val.Length, segment.Array, segment.Offset);
        return stringBytesNeeded + 1;
    }

    throw new WireFormattingException("Unable to get array segment from memory.");
}

Now, granted, the new code is a little more verbose, but it is MUCH more efficient. Here is an example benchmark converting the first "Lorem Ipsum" paragraph to UTF8 bytes.

Method	Runtime	Mean	Error	StdDev	Ratio	Gen 0	Gen 1	Gen 2	Allocated
GetByteArray	.NET 4.8	1,115.2 ns	13.68 ns	12.79 ns	1.00	0.2823	-	-	891 B
WriteToMemory	.NET 4.8	625.6 ns	5.23 ns	4.89 ns	0.56	0.0076	-	-	24 B
GetByteArray	.NET Core 3.1	234.0 ns	4.11 ns	3.64 ns	1.00	0.2828	-	-	888 B
WriteToMemory	.NET Core 3.1	157.0 ns	2.36 ns	2.09 ns	0.67	0.0076	-	-	24 B

Arrays

As we have seen, the benefits of pooling and reusing arrays on code hot paths can be quite dramatic. This is mostly the result of freeing up our precious CPU cycles from being used by the Garbage Collector, cleaning up all our temporary objects. Having a garbage collector is one of the benefits of a managed language such as C#, but not having to use the Garbage Collector is even better.

I was able to take advantage of pooled arrays in the RabbitMQ client in several places. One of them was in the serialization as I mentioned above, where we could estimate the minimum amount of memory we'd need to serialize a frame, "rent" that amount of memory from our array pool, serialize the data directly into the array instead of using a MemoryStream + BinaryWriter object, and then send that array directly to the network Socket where it was sent to the server. Then we could return the array to the pool, rinse, and repeat. ‌

But what about the messages received from the server? Could I take advantage of them there.

YES!

After thinking about it for a while, I realized the the message payloads were being sent and received as byte arrays (byte[]). After talking it over the the RabbitMQ client maintainers, we decided that we'd make the change to have the message payloads represented as Memory instead. That allows us to use pooled arrays for messages received, copy the payload into another pooled array before being sent to the message event handlers, and then we could return them to the pool once the consumers had processed the messages!

Thankfully this was easy to solve using the MemoryPool type. It allows us to easily rent and return pooled arrays using the IDisposable pattern. To give you an example of how to use it, let's imagine that we need calculate the SHA1 hash for a UTF8 string. As an example we'll use the same first "Lorem Ipsum" paragraph as an example.

Using the easy and convenient way:

private SHA1 hasher = SHA1.Create();

public byte[] GetByteArray()
{
    return hasher.ComputeHash(Encoding.UTF8.GetBytes(LoremIpsum));
}

Using MemoryPool:

private SHA1 hasher = SHA1.Create();
// Emulating a reusable pooled array to put the calculated hashes into
private Memory hashedBytes = new Memory(new byte[20]);

public Memory WriteToMemoryWithGetByteCount()
{
    int numCharactersNeeded = Encoding.UTF8.GetByteCount(LoremIpsum);
    // Let's use pooled memory to put the converted UTF8 bytes into. This is
    // wrapped in a using statement so once we have finished calculating the hash,
    // we will return the array to the pool, since it implements IDisposable.
    using (IMemoryOwner memory = MemoryPool.Shared.Rent(numCharactersNeeded))
    {
        if (MemoryMarshal.TryGetArray(memory.Memory.Slice(0, numCharactersNeeded), out ArraySegment segment))
        {
            Encoding.UTF8.GetBytes(LoremIpsum, 0, LoremIpsum.Length, segment.Array, segment.Offset);
            hasher.TryComputeHash(segment, hashedBytes.Span, out int _);
            return hashedBytes;
        }

        throw new InvalidOperationException("Failed to get memory");
    }
}

Now again, the code is more verbose, but it is much more memory friendly.

Method	Mean	Error	StdDev	Ratio	Gen 0	Gen 1	Gen 2	Allocated
ByteArray	2.378 us	0.0241 us	0.0226 us	1.00	0.3090	-	-	984 B
MemoryPool	2.244 us	0.0219 us	0.0205 us	0.94	0.0076	-	-	24 B

Results!

After going a few rounds on this, finding more places where I could put this to practice, and submitting several PRs, this work was finally released in the 6.0.0 release of the RabbitMQ .NET Client NuGet package. I decided to write a small program to see the result and run it through the JetBrains dotMemory profiler to see the result. ‌

The program did the following:

Set up two connections to RabbitMQ, one to publish messages and another to receive them.
Used publisher confirms
Sent 50.000 messages, in bathes of five hundred messages at a time.
Used 512 byte and 16kb message payloads to test memory usage.
Ran until it had received those 50.000 messages back.

Here are the results:

Payload Size (KB)	Memory Allocated Before (MB)	Memory Allocated After (MB)	Savings
512	470.03	99.43	78.85%
16,384	7,311.36	99.41	98.64%

Before: 50,000 messages sent/received with a 512-byte payload

Before: 50,000 messages sent/received with a 16,384-byte payload

After: 50,000 messages sent/received with a 512-byte payload

After: 50.000 messages sent/received with a 16,384-byte payload

As you can see, the memory usage when sending and receiving the events is now pretty much constant, regardless of the payload size. This is a massive improvement and frees up a lot of valuable CPU cycles and memory churn for other tasks.

What really made me happy with this release, was the fact that the API change only very slightly, and in most cases required little or no changes at all for the consumers of the package.

This is however not the end of this performance work by any means. There are more improvements in the pipeline (hint hint) that are sure to bring even more gains, for the benefit of everyone, so stay tuned :)

.NET/Visual Studio Productivity Tips - Part 1

Stefán Jökull Sigurðarson — Sun, 09 Feb 2020 00:14:28 GMT

There are many useful shortcuts and methods in .NET and Visual Studio to make coding and debugging a lot easier. I'm going to share some of my favorites.

CTRL+. (Period) or ALT+Enter

This is without a doubt my favorite by far! This handy shortcut triggers the "Quick Actions" menu in the text editor, and it was a HUGE productivity booster for me when I discovered and started to use it. If you don't know what I mean by "Quick Actions", it's this awesome menu

The "Quick Actions" menu in Visual Studio

Being able to trigger it quickly while writing code and refactoring, without using the mouse, sped up my development a lot.

The [DebuggerDisplay] attribute

Did you know that you can write your own display mechanism for the debugger without overriding ToString()? Yep, it's called the DebuggerDisplay attribute and it works like this. Let's say you have your standard Points struct.

public struct Point
{
    public int X { get; }
    public int Y { get; }

    public Point(int x, int y)
    {
        X = x;
        Y = y;
    }
}

By default it will display in the debugger like this:

Not very useful debugging information...

Not terribly useful, right? Let's make it easier to debug! If we add the System.Diagnostics.DebuggerDisplay attribute we can define how we want the debugger to display our Point.

[System.Diagnostics.DebuggerDisplay("X = {X} Y = {Y}")]
public struct Point
{
    public int X { get; }
    public int Y { get; }

    public Point(int x, int y)
    {
        X = x;
        Y = y;
    }
}

This will make the debugger render our Point struct like this:

Useful debugging information!

Much better! We can even customize it even further by telling the attribute to get a certain property to render the display:

[System.Diagnostics.DebuggerDisplay("{RenderDebugger,nq}")]
public struct Point
{
    public int X { get; }
    public int Y { get; }

    public Point(int x, int y)
    {
        X = x;
        Y = y;
    }

    private string RenderDebugger => $"{X},{Y}";
}

Which makes it render like this:

You might be wondering what the ,nq part means. That simply means that the debugger should strip out the quotation marks (nq = no quotes) since our RenderDebugger property returns a string.

What's also awesome, is using this on custom collections or lists to render counts, sums or whatever is applicable for those objects for a much better experience when debugging our own objects.

Happy debugging!

C# 8 - Null coalescing/compound assignment

Stefán Jökull Sigurðarson — Mon, 25 Nov 2019 09:30:29 GMT

Hi friends.

There are a lot of cool new features in C# 8 and one of my favorites is the new Null coalescing assignment (or compound assignment, whichever you prefer) operator.

If you have ever written code like this:

private string _someValue;

public string SomeMethod()
{
    // Let's do an old-school null check and initialize if needed.
    if(_someValue == null)
    {
        _someValue = InitializeMyValue();
    }
    
    return _someValue;
}

private string InitializeMyValue()
{
    // Do some expensive initialization that we only want to do once...
}

You can simplify this code A LOT!

private string _someValue;

public string SomeMethod()
{
    // Using the awesome new ??= operator to do all in one go!
    _someValue ??= InitializeMyValue();
    return _someValue;
}

private string InitializeMyValue()
{
    // Do some expensive initialization that we only want to do once...
}

That is the new ??= operator in action, which takes care of doing the null check and assignment for you in one sweet syntactic sugar-rush! I also find it more readable, but some people might disagree, and that fine. Just pick whatever you prefer :)

And the best part is that it actually compiles down to the same efficient code! Just take a look at this sharplab.io sample to see what I mean.

You might have solved this with a Lazy or by using something like _someValue = _someValue ?? InitializeMyValue(); as well but that's still more code to write than the new operator and less efficient as well. The Lazy approach has some overhead and the null-coalescing operator + assignment has the added inefficiency of always making the variable assignment even if there is no need to (you can take a closer look at the ASM part of the SharpLab example above) but there it is for reference:

NullVsCoalescingVsCompound.NullCheckAndAssign()
    L0000: cmp dword [ecx+0x4], 0x0
    L0004: jnz L0014
    L0006: mov eax, [0xeb52038]
    L000c: lea edx, [ecx+0x4]
    L000f: call 0x6fe391d0
    L0014: mov eax, [ecx+0x4]
    L0017: ret

NullVsCoalescingVsCompound.CoalescingOperatorAndAssign()
    L0000: mov eax, [ecx+0x4]
    L0003: test eax, eax
    L0005: jnz L000d
    L0007: mov eax, [0xeb52038]
    L000d: lea edx, [ecx+0x4]
    L0010: call 0x6fe391d0
    L0015: mov eax, [ecx+0x4]
    L0018: ret

NullVsCoalescingVsCompound.CompoundAssignment()
    L0000: cmp dword [ecx+0x4], 0x0
    L0004: jnz L0014
    L0006: mov eax, [0xeb52038]
    L000c: lea edx, [ecx+0x4]
    L000f: call 0x6fe391d0
    L0014: mov eax, [ecx+0x4]
    L0017: ret

NullVsCoalescingVsCompound.LazyInitializer()
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: mov ecx, [ecx+0x8]
    L0006: cmp dword [ecx+0x4], 0x0
    L000a: jz L0013
    L000c: call System.Lazy`1[[System.__Canon, System.Private.CoreLib]].CreateValue()
    L0011: jmp L0016
    L0013: mov eax, [ecx+0xc]
    L0016: pop ebp
    L0017: ret

NullVsCoalescingVsCompound.GetValue()
    L0000: mov eax, [0xeb52038]
    L0006: ret

Hope this helps! :)

You don't need to be a rocket-scientist to contribute to .NET Core!

Stefán Jökull Sigurðarson — Mon, 04 Nov 2019 22:33:13 GMT

Sometimes when I'm talking with other .NET developers, the "I don't understand why Microsoft doesn't provide X functionality" or "Why hasn't Microsoft fixed Y yet?" topic comes up. Also "The documentation for Z really sucks" comes often up as well.

Guess what? You can fix all this yourself! Because .NET Core is open source. You can even contribute documentation since that is open source as well. And in many cases it's not even hard at all.

But why should you? Doesn't Microsoft have people that should be doing this? They do, but those people are just like you and me. They work normal hours, they have prioritized tasks and things simply might not be on their radar. That's why open source is so awesome, because it allows us to take part and fix stuff that we really want fixed if it isn't high on the priority list.

I'm going to demonstrate this with an example of a contribution I did myself a few weeks ago.

I decided to try out the new Microsoft.Data.SqlClient library, whose purpose is to replace the SQL Server client library which is built into .NET Framework and .NET Core to be shipped out-of-band, bringing new features earlier. Adding it to our product was easy, however, when I was trying to gather SQL dependency data for Application Insights, I noticed that I wasn't receiving any. That's when I found this error message in our logs: ERROR: Exception in Command Processing for EventSource Microsoft-AdoNet-SystemData: Event BeginExecute is givien event ID 2 but 1 was passed to WriteEvent.

Ok, obviously something wasn't working and after googling the error message I found that it was coming from Application Insights, and after looking at the code it was immediately clear what had happened. It was a simple typo, a typical copy-paste error that we have all done many many times in our coding careers.

This was the offending code:

[Event(SqlEventSource.EndExecuteEventId, Keywords = Keywords.SqlClient, Task = Tasks.ExecuteCommand, Opcode = EventOpcode.Stop)]
public void BeginExecute(int objectId, string dataSource, string database, string commandText)
{
    // we do not use unsafe code for better performance optization here because optimized helpers make the code unsafe where that would not be the case otherwise. 
    // This introduces the question of partial trust, which is complex in the SQL case (there are a lot of scenarios and SQL has special security support).   
    WriteEvent(SqlEventSource.BeginExecuteEventId, objectId, dataSource, database, commandText);
}

// unfortunately these are not marked as Start/Stop opcodes.  The reason is that we dont want them to participate in 
// the EventSource activity IDs (because they currently don't use tasks and this simply confuses the logic) and 
// because of versioning requirements we don't have ActivityOptions capability (because mscorlib and System.Data version 
// at different rates)  Sigh...
[Event(SqlEventSource.EndExecuteEventId, Keywords = Keywords.SqlClient, Task = Tasks.ExecuteCommand, Opcode = EventOpcode.Stop)]
public void EndExecute(int objectId, int compositeState, int sqlExceptionNumber)
{
    WriteEvent(SqlEventSource.EndExecuteEventId, objectId, compositeState, sqlExceptionNumber);
}

Spotted that pretty quickly right? It doesn't seem right that BeginExecute would be using the EndExecuteEventId, and sure enough, if you look at the diff from my changes, it seems to have been a copy paste left-over from the EndExecute event right below. I simply changed that to the correct event id, verified by compiling the library and running it locally, created a pull request, and within days, it was accepted and merged!

And since I'd found this process to be so wonderfully simple I continued. I decided to fix a long standing pet-peeve of mine where the old System.Data.SqlClient only sent the SQL Command Text for stored procedure but omitted it for standard queries when running on the old .NET Framework but it worked fine in .NET Core. So I found that code and fixed that as well. That pull request was also accepted and merged.

// I simply removed this logic!
string commandText = CommandType == CommandType.StoredProcedure ? CommandText : string.Empty;
SqlEventSource.Log.BeginExecute(GetHashCode(), Connection.DataSource, Connection.Database, commandText);

You see?

None of these changes were big, they weren't overly complex, and I was able to fix them myself and submit the changes upstream for others to benefit.

You can even make changes to docs.microsoft.com if you want to add information or code samples, and it's as simple as clicking an "Edit" button to get started.

Is there something you have been dying to fix or wanted to change in .NET? What are you waiting for?! Fork the code and get started. It's easier than you think!

Mocking JWT tokens in ASP.NET Core integration tests

Stefán Jökull Sigurðarson — Wed, 28 Aug 2019 10:41:08 GMT

As we've been migrating services over to .NET Core we needed to mock JWT tokens in ASP.NET Core integration tests. I finally found a way that worked.

The problem is, by default, the JWT authentication handler in ASP.NET Core tries to communicate with the issuer defined in the JWT token to download the appropriate metadata needed to validate the tokens, but in our case we didn't want to be dependent on that when running through the tests, but we still wanted to make sure our authentication policies worked as intended.

I found out that by setting the Configuration property in the JwtBearerOptions we were able to short-circuit this behavior and make the JWT authentication handler skip the metadata download step.

Here's how!

First of all, we created a static helper class in our integration test project that initializes the required security keys and algorithms to create and sign our mocked JWT tokens. We then, through the standard service configuration, initialize the JWT authentication handler with that same information, making it possible for it to validate our generated JWT tokens.

It's quite simple really!

First, let's create our static helper class:

public static class MockJwtTokens
{
    public static string Issuer { get; } = Guid.NewGuid().ToString();
    public static SecurityKey SecurityKey { get; }
    public static SigningCredentials SigningCredentials { get; }

    private static readonly JwtSecurityTokenHandler s_tokenHandler = new JwtSecurityTokenHandler();
    private static readonly RandomNumberGenerator s_rng = RandomNumberGenerator.Create();
    private static readonly byte[] s_key = new byte[32];

    static MockJwtTokens()
    {
        s_rng.GetBytes(s_key);
        SecurityKey = new SymmetricSecurityKey(s_key) { KeyId = Guid.NewGuid().ToString() };
        SigningCredentials = new SigningCredentials(SecurityKey, SecurityAlgorithms.HmacSha256);
    }

    public static string GenerateJwtToken(IEnumerable claims)
    {
        return s_tokenHandler.WriteToken(new JwtSecurityToken(Issuer, null, claims, null, DateTime.UtcNow.AddMinutes(20), SigningCredentials));
    }
}

Here we define our mock JWT issuer, it's security key, and create a simple helper method that accept claims and generates a JWT token with those claims. We just set our JWT tokens to be valid for 20 minutes.

Now all we need to do is hook this up to the JWT authentication handler and that was as simple as overriding a few settings when the tests are being initialized.

public class BaseIntegrationTest : WebApplicationFactory
{
    protected override void ConfigureWebHost(IWebHostBuilder builder)
    {
        builder.ConfigureTestServices(ConfigureServices);
        builder.ConfigureLogging((WebHostBuilderContext context, ILoggingBuilder loggingBuilder) =>
        {
            loggingBuilder.ClearProviders();
            loggingBuilder.AddConsole(options => options.IncludeScopes = true);
        });
    }

    protected virtual void ConfigureServices(IServiceCollection services)
    {
        services.Configure(JwtBearerDefaults.AuthenticationScheme, options =>
        {
            var config = new OpenIdConnectConfiguration()
            {
                Issuer = MockJwtTokens.Issuer
            };

            config.SigningKeys.Add(MockJwtTokens.SecurityKey);
            options.Configuration = config;
        });
     }
}

We now have a base class that we can have our integration tests inherit from to customize the configuration further. The trick here is to set the Configuration property on the JwtBearerOptions to the values defined in our mock. When that value has been set, the JWT authentication handler will see that it already has the information it needs to validate the JWT tokens we generate, and will not try to download the required metadata. This is documented behavior:

Configuration provided directly by the developer. If provided, then MetadataAddress and the Backchannel properties will not be used. This information should not be updated during request processing.

Hope this helps :)

EVE Online account security - Part 2 - Pwned Passwords details, CSP and HSTS

Stefán Jökull Sigurðarson — Mon, 06 May 2019 13:13:25 GMT

In my previous post I showed you how we integrated the Pwned Passwords check from Troy Hunt's https://haveibeenpwned.com into our login pages on the EVE Online SSO, which is used by our game launcher, when logging into our websites and when logging into 3rd party integrations.

Our feedback after the launch was very encouraging, as our users were overall very appreciative of us taking steps towards better security, and this has been noticed by other companies and organizations as well. As an example, Epic Games also started checking new passwords using the same way as of April 16th 2019, which is awesome, and I'd like to think our implementation at least served as an encouragement towards that :) Well done Epic Games!

One question I got a lot was how the password checks did from a performance perspective. Troy has gone over this on several occasions on his blog (links at the end of the article) so I'll leave the details of the way he does caching to him. What I can say is the service is absolutely rock-solid and performs great. We do some caching on our end as well, but we hardly see any failed requests and the response times are usually very good due to Cloudflare caching magic and their global CDN. Here you can see the response times for the last three months

Pwned Passwords response times over the last three months

Not only is the response time great, but just look at that histogram!

Pwned Passwords response time histogram

That's a median response time of 25 milliseconds (we are hitting a local Cloudflare node in London) and a 99th percentile of 710ms, which considering everything as absolutely fantastic.

As for failures....

Pwned Passwords failed requests

That's a total of 2.000 failures for over 2.02 million requests over the last three months, almost all of them timeouts (we have an aggressive 5 second timeout on our end), and I know for a fact that the big spike in the middle was a proxy misconfiguration on our end, but even so, that's still less than a 0.1% request failure! So the reliability and consistency of the service is fantastic.

Overall, our Pwned Passwords integration has been quite the success.

But, we decided to do more!

Shortly after the launch me and a coworker went to NDC Oslo where I had the pleasure of attending Troy's Hack Yourself First workshop. That workshop, which I highly recommend for every developer, was great. I already knew about most of the topics in the workshop, but hadn't really spent any time into digging into them in detail. Troy gave an excellent "hand's on" approach, with engaging conversations and good examples on the topics involved and he has got the presentation of it absolutely nailed down.

We also had a few chats (with beers), I got introduced to Scott Helme (who's also an EVE player!) and after a few discussions (and perhaps a few more beers) I knew of two rather easy wins to work on when I got back.

First up...

CSP

CSP stands for Content Security Policy and it's a way of telling browsers how to treat your site with regards to security. Examples include what scripts sources are allowed, are sites allowed to host your site in an IFrame, what hosts is your site allowed to communicate with over JavaScript and plenty more security related bits.

I won't go into exact details on exactly how we did the CSP implementation but I will give a shout out to Report-URI which is an excellent service to help you set up, gather, aggregate and act on these CSP security policies (and more). That service is also run by Scott Helme.

We ran the CSP policies in Report-Only mode at first, while we tweaked the policies so they wouldn't break anything, and once we were certain we had everything working, we took them out of Report-Only mode and started enforcing them. This did not take very long. We went from the CSP being published in Report-Only mode on June 21st to being enforced on July 3rd. That's less than two weeks!

CSP Report-Only policies are now live on our @EveOnline production SSO and feeding reports into @reporturi. Yay! #tweetfleet #security
— Stefán Jökull Sigurðarson - CCP Ghostrider (@stebets) June 21, 2018

June 21st, CSP policies go live in Report-Only mode.

As of now, the @EveOnline SSO is enforcing a CSP (Content Security Policy)! This has taken a few weeks to prepare and get right but it's finally live! Big shoutout to @Scott_Helme, @troyhunt and @reporturi for their inspiration and assistance. #tweetfleet pic.twitter.com/3fRSLITFld
— Stefán Jökull Sigurðarson - CCP Ghostrider (@stebets) July 3, 2018

July 3rd, CSP policies are enforced.

Our security policies now whitelist JavaScript sources, enforce a nonce for our own JS, restrict IFrame usage and in general protect our users by only allowing stuff we approve to run in the browser context.

And then we went on to the next thing...

HSTS and preloading eveonline.com!

HSTS stands for HTTP Strict Transport Security and it's a simple HTTP header that tells browsers that the site should only be loaded and run over HTTPS. It's a simple thing to do once you have HTTPS up and running, and makes sure that the site can not be modified in any way between our users browsers and our servers as well as makes sure that all communication is encrypted. Once I wassure that everything was working fine over HTTPS I went ahead and submitted our https://eveonline.com domain to the HSTS Preload list, where it has since been accepted and is now preloaded on most modern browsers.

eveonline.com is HSTS preloaded!

Preloading means that browsers won't even try to load the site (or any subdomains) over HTTP, even if you type http://eveonline.com in the browser, and will just immediately redirect to HTTPS. It also means that we can not easily go back to hosting anything over plain old HTTP, which is also the whole point. We want to make sure our sites are secure and that they are only ever loaded over encrypted channels.

Summary

Making these changes was all rather easy to do and implementation went well. There are excellent guides out there on how to do CSP and HSTS (I'll provide links below) so I'd higly recommend anyone running a website to do it.

I hope this encourages you to do the same :)

Monitoring GC and memory allocations with .NET Core 2.2 and Application Insights

Stefán Jökull Sigurðarson — Wed, 09 Jan 2019 10:35:05 GMT

Introduction

Well that title was a mouthful, but we have some very interesting stuff in this blog post, at least in my opinion :)

It all started when we were migrating some of our service from .NET Framework to .NET Core. We have been using Application Insights, using it's performance counter collection feature to monitor performance counters on our Windows server that run our applications. We soon discovered when we moved over to .NET Core that we'd have to do things differently, since performance counters aren't supported in Application Insights for .NET Core.

Getting the data

The things we monitor are for example the number of GC collections by generation, working set, CPU usage etc. and I needed a way to get the same information from within our application, as we like to do this in-process for simplicity reasons (yes, I'm aware that this does have a performance impact, more on that later).

I poked a few people on Twitter for help and soon received this response from David Fowler (@davidfowl).

You can use the new runtime event source in 2.2 to get Rutime events.
— David Fowler (@davidfowl) December 21, 2018

To my surprise and jubilation it seems that as of .NET Core 2.2, a lot of the .NET Runtime events for things like GC, JIT and more are now being published as an Event Source, and best of all, IT WORKS CROSS PLATFORM!

I started with the simple example at https://medium.com/criteo-labs/c-in-process-clr-event-listeners-with-net-core-2-2-ef4075c14e87 and then adapted it a bit.

What I did was create a couple of IHostedService background tasks, one to monitor the GC events, and one to monitor CPU and memory usage. I then use the Application Insights API and post the metrics to Application Insights using their awesome Metric functionality. This of course could just as well be sending the data to New Relic, Prometheus or whatever floats your metric boats :)

For information about the GC events, check this link: https://docs.microsoft.com/en-us/dotnet/framework/performance/garbage-collection-etw-events

Code

Here is the GC collector and GC event listener:

using System;
using System.Diagnostics.Tracing;
using Microsoft.ApplicationInsights;
using Microsoft.ApplicationInsights.Metrics;

namespace CCP.ApplicationInsights.PerformanceCollector
{
    public class GcEventsCollector : IHostedService, IDisposable
    {
        private readonly IOptions _settings;
        private readonly TelemetryClient _telemetryClient;
        private Timer _timer;
        private Metric _gen0Collections;
        private Metric _gen1Collections;
        private Metric _gen2Collections;
        private Metric _totalMemory;
        private GcEventListener _gcTest;

        public GcEventsCollector(IOptions settings, TelemetryClient telemetryClient)
        {
            _settings = settings;
            _telemetryClient = telemetryClient;
        }

        public void Dispose()
        {
            _timer?.Dispose();
            _gcTest?.Dispose();
        }

        public Task StartAsync(CancellationToken cancellationToken)
        {
            if (_settings.Value.GcEventsCollector.Enabled)
            {
                const string MetricNamespace = "dotnet.gc";
                _gen0Collections = _telemetryClient.GetMetric(new MetricIdentifier(MetricNamespace, "Gen 0 Collections"));
                _gen1Collections = _telemetryClient.GetMetric(new MetricIdentifier(MetricNamespace, "Gen 1 Collections"));
                _gen2Collections = _telemetryClient.GetMetric(new MetricIdentifier(MetricNamespace, "Gen 2 Collections"));
                _totalMemory = _telemetryClient.GetMetric(new MetricIdentifier(MetricNamespace, "Total Memory"));

                _timer = new Timer(CollectData, null, 0, 5000);
                _gcTest = new GcEventListener(_telemetryClient, _settings.Value.GcEventsCollector.EnableAllocationEvents);
            }

            return Task.CompletedTask;
        }

        private void CollectData(object state)
        {
            _gen0Collections.TrackValue(GC.CollectionCount(0));
            _gen1Collections.TrackValue(GC.CollectionCount(1));
            _gen2Collections.TrackValue(GC.CollectionCount(2));
            _totalMemory.TrackValue(GC.GetTotalMemory(false));
        }

        public Task StopAsync(CancellationToken cancellationToken)
        {
            _timer?.Change(Timeout.Infinite, Timeout.Infinite);
            return Task.CompletedTask;
        }
    }
    
    sealed class GcEventListener : EventListener
    {
        // from https://docs.microsoft.com/en-us/dotnet/framework/performance/garbage-collection-etw-events
        private const int GC_KEYWORD = 0x0000001;
        private readonly TelemetryClient _client;
        private readonly Metric _allocatedMemory;
        private readonly Metric _gen0Size;
        private readonly Metric _gen1Size;
        private readonly Metric _gen2Size;
        private readonly Metric _lohSize;
        private readonly Metric _gen0Promoted;
        private readonly Metric _gen1Promoted;
        private readonly Metric _gen2Survived;
        private readonly Metric _lohSurvived;
        private EventSource _dotNetRuntime;
        private readonly EventLevel _eventLevel;

        public GcEventListener(TelemetryClient client, bool enableAllocationEvents = false)
        {
            _client = client ?? throw new ArgumentNullException(nameof(client));

            const string MetricNamespace = "dotnet.gc";
            _gen0Size = _client.GetMetric(new MetricIdentifier(MetricNamespace, "Gen 0 Heap Size"));
            _gen1Size = _client.GetMetric(new MetricIdentifier(MetricNamespace, "Gen 1 Heap Size"));
            _gen2Size = _client.GetMetric(new MetricIdentifier(MetricNamespace, "Gen 2 Heap Size"));
            _lohSize = _client.GetMetric(new MetricIdentifier(MetricNamespace, "Large Object Heap Size"));
            _gen0Promoted = _client.GetMetric(new MetricIdentifier(MetricNamespace, "Bytes Promoted From Gen 0"));
            _gen1Promoted = _client.GetMetric(new MetricIdentifier(MetricNamespace, "Bytes Promoted From Gen 1"));
            _gen2Survived = _client.GetMetric(new MetricIdentifier(MetricNamespace, "Bytes Survived Gen 2"));
            _allocatedMemory = _client.GetMetric(new MetricIdentifier(MetricNamespace, "Allocated Memory", "Type"));
            _lohSurvived = _client.GetMetric(new MetricIdentifier(MetricNamespace, "Bytes Survived Large Object Heap"));

            _eventLevel = enableAllocationEvents ? EventLevel.Verbose : EventLevel.Informational;
        }

        protected override void OnEventSourceCreated(EventSource eventSource)
        {
            // look for .NET Garbage Collection events
            if (eventSource.Name.Equals("Microsoft-Windows-DotNETRuntime"))
            {
                _dotNetRuntime = eventSource;
                // EventLevel.Verbose enables the AllocationTick events, but also a heap of other stuff and will increase the memory allocation of your application since it's a lot of data to digest. EventLevel.Information is more light weight and is recommended if you don't need the allocation data.
                EnableEvents(eventSource, EventLevel.Verbose, (EventKeywords)GC_KEYWORD);
            }
        }

        // from https://blogs.msdn.microsoft.com/dotnet/2018/12/04/announcing-net-core-2-2/
        // Called whenever an event is written.
        protected override void OnEventWritten(EventWrittenEventArgs eventData)
        {
            switch (eventData.EventName)
            {
                case "GCHeapStats_V1":
                    ProcessHeapStats(eventData);
                    break;
                case "GCAllocationTick_V3":
                    ProcessAllocationEvent(eventData);
                    break;
            }
        }

        private void ProcessAllocationEvent(EventWrittenEventArgs eventData)
        {
            _allocatedMemory.TrackValue((ulong)eventData.Payload[3], (string)eventData.Payload[5]);
        }

        private void ProcessHeapStats(EventWrittenEventArgs eventData)
        {
            _gen0Size.TrackValue((ulong)eventData.Payload[0]);
            _gen0Promoted.TrackValue((ulong)eventData.Payload[1]);
            _gen1Size.TrackValue((ulong)eventData.Payload[2]);
            _gen1Promoted.TrackValue((ulong)eventData.Payload[3]);
            _gen2Size.TrackValue((ulong)eventData.Payload[4]);
            _gen2Survived.TrackValue((ulong)eventData.Payload[5]);
            _lohSize.TrackValue((ulong)eventData.Payload[6]);
            _lohSurvived.TrackValue((ulong)eventData.Payload[7]);
        }
    }
}

And here is the CPU and Memory usage collector:

using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
using CCP.CustomerInformationService;
using Microsoft.ApplicationInsights;
using Microsoft.ApplicationInsights.Metrics;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Options;

namespace CCP.ApplicationInsights.PerformanceCollector
{
    public class SystemUsageCollector : IHostedService, IDisposable
    {
        private readonly IOptions _settings;
        private readonly TelemetryClient _telemetryClient;
        private Timer _timer;
        private Metric _totalCpuUsed;
        private Metric _privilegedCpuUsed;
        private Metric _userCpuUsed;
        private Metric _workingSet;
        private Metric _nonPagedSystemMemory;
        private Metric _pagedMemory;
        private Metric _pagedSystemMemory;
        private Metric _privateMemory;
        private Metric _virtualMemoryMemory;
        private readonly Process _process = Process.GetCurrentProcess();
        private DateTime _lastTimeStamp;
        private TimeSpan _lastTotalProcessorTime = TimeSpan.Zero;
        private TimeSpan _lastUserProcessorTime = TimeSpan.Zero;
        private TimeSpan _lastPrivilegedProcessorTime = TimeSpan.Zero;

        public SystemUsageCollector(IOptions settings, TelemetryClient telemetryClient)
        {
            _settings = settings;
            _telemetryClient = telemetryClient;
            _lastTimeStamp = _process.StartTime;
        }

        public void Dispose()
        {
            _timer?.Dispose();
        }

        public Task StartAsync(CancellationToken cancellationToken)
        {
            if (_settings.Value.SystemUsageCollector.Enabled)
            {
                _totalCpuUsed = _telemetryClient.GetMetric(new MetricIdentifier("system.cpu", "Total % Used"));
                _privilegedCpuUsed = _telemetryClient.GetMetric(new MetricIdentifier("system.cpu", "Privileged % Used"));
                _userCpuUsed = _telemetryClient.GetMetric(new MetricIdentifier("system.cpu", "User % Used"));
                _workingSet = _telemetryClient.GetMetric(new MetricIdentifier("system.memory", "Working Set"));
                _nonPagedSystemMemory = _telemetryClient.GetMetric(new MetricIdentifier("system.memory", "Non-Paged System Memory"));
                _pagedMemory = _telemetryClient.GetMetric(new MetricIdentifier("system.memory", "Paged Memory"));
                _pagedSystemMemory = _telemetryClient.GetMetric(new MetricIdentifier("system.memory", "System Memory"));
                _privateMemory = _telemetryClient.GetMetric(new MetricIdentifier("system.memory", "Private Memory"));
                _virtualMemoryMemory = _telemetryClient.GetMetric(new MetricIdentifier("system.memory", "Virtual Memory"));

                _timer = new Timer(CollectData, null, 1000, 5000);
            }

            return Task.CompletedTask;
        }

        private void CollectData(object state)
        {
            double totalCpuTimeUsed = _process.TotalProcessorTime.TotalMilliseconds - _lastTotalProcessorTime.TotalMilliseconds;
            double privilegedCpuTimeUsed = _process.PrivilegedProcessorTime.TotalMilliseconds - _lastPrivilegedProcessorTime.TotalMilliseconds;
            double userCpuTimeUsed = _process.UserProcessorTime.TotalMilliseconds - _lastUserProcessorTime.TotalMilliseconds;

            _lastTotalProcessorTime = _process.TotalProcessorTime;
            _lastPrivilegedProcessorTime = _process.PrivilegedProcessorTime;
            _lastUserProcessorTime = _process.UserProcessorTime;

            double cpuTimeElapsed = (DateTime.UtcNow - _lastTimeStamp).TotalMilliseconds * Environment.ProcessorCount;
            _lastTimeStamp = DateTime.UtcNow;

            _totalCpuUsed.TrackValue(totalCpuTimeUsed * 100 / cpuTimeElapsed);
            _privilegedCpuUsed.TrackValue(privilegedCpuTimeUsed * 100 / cpuTimeElapsed);
            _userCpuUsed.TrackValue(userCpuTimeUsed * 100 / cpuTimeElapsed);

            _workingSet.TrackValue(_process.WorkingSet64);
            _nonPagedSystemMemory.TrackValue(_process.NonpagedSystemMemorySize64);
            _pagedMemory.TrackValue(_process.PagedMemorySize64);
            _pagedSystemMemory.TrackValue(_process.PagedSystemMemorySize64);
            _privateMemory.TrackValue(_process.PrivateMemorySize64);
            _virtualMemoryMemory.TrackValue(_process.VirtualMemorySize64);
        }

        public Task StopAsync(CancellationToken cancellationToken)
        {
            _timer?.Change(Timeout.Infinite, Timeout.Infinite);
            return Task.CompletedTask;
        }
    }
}

I then register these services in my ASP.NET Core Startup.cs like so

services.AddHostedService();
services.AddHostedService();

and they will start doing their work once the app boots up. The awesome thing about the GC Event Source is that if you set the EventLevel to Verbose like in the example above, you can catch the Allocation Tick events. These are events that are triggered on approximately every 100.000 bytes allocated and they include the .NET Type that triggered the allocation threshold, so you can get an approximation of which objects are triggering allocations in your application. This will increase the number of events quite a bit, which also of course adds more allocations, but having this data is worth the impact on our case, as it gives us some pretty valuable insight into the behavior and performance of our application.

Querying our data using Application Insights

Gathering the data is one thing, but to make it useful we need to query it. Thankfully, Application Insights makes that really easy :) Here are a few examples of analytics queries we can make.

CPU usage by User and Privileged time

customMetrics
| where name in ("User % Used", "Privileged % Used")
| project timestamp, name, avgValue = valueSum / valueCount
| summarize CPU = avg(avgValue) by name, bin(timestamp, 5m)
| render areachart kind=stacked

CPU usage

GC Heap Size by generation

customMetrics
| where name contains "Heap Size"
| order by cloud_RoleInstance, name, timestamp desc
| project timestamp, name, cloud_RoleInstance, avgValue = valueSum / valueCount
| summarize heapSize = avg(avgValue) by bin(timestamp, 5m), name
| render areachart kind=stacked

Heap Size by generation

GC heap size changes over time (for a single server)

customMetrics
| where name contains "Heap Size" and cloud_RoleInstance == "SERVER_NAME" 
| order by cloud_RoleInstance, name, timestamp desc
| project timestamp, name, cloud_RoleInstance, valueDiff = iff(cloud_RoleInstance == next(cloud_RoleInstance, 1) and name == next(name, 1), (valueSum / valueCount) - (next(valueSum, 1) / next(valueCount, 1)), 0.0)
| summarize heapSizeDiff = avg(valueDiff) by name, cloud_RoleInstance, bin(timestamp, 5m)
| render barchart kind=unstacked

Memory allocated by type (sampled, top 20)

customMetrics
| where name == "Allocated Memory" 
| project valueSum, type = tostring(customDimensions["Type"])
| summarize bytesAllocated = sum(valueSum) by type
| order by bytesAllocated desc
| take 20

List of types

Pie chart

Conclusion

With this new event source in .NET Core 2.2, a whole heap of diagnostics information can now be gathered in-process which can help you to gather information about your application behavior and to help you troubleshoot issues. If you are at all interested in how your application is performing server-side, this is an excellent source of diagnostics data.

EVE Online account security - Part 1 - Pwned Passwords

Stefán Jökull Sigurðarson — Sat, 11 Aug 2018 00:00:00 GMT

Sitting in our Reykjavik Offices last April, I read one of Troy Hunts blog posts about Pwned Passwords mentioning how he launched V2 of his API allowing anyone to check their passwords to see if they had been included in known security breaches.

I thought how cool that feature was and how it would be useful to increase a users' sense of security. It could also let people known if their accounts were in danger of being taken over, and then an idea hit me! We could actually do that check for our EVE Online users during login and then we could prompt them to let them know that they should change their passwords. Aha!

But, first I needed to be sure that we wouldn't be accidentally exposing our users' security credentials to a 3rd party service, so I read up on how it worked.

Thankfully, it didn't take long to see that there was pretty much zero-chance of us leaking our users credentials so I spent a Friday afternoon and whipped up a quick prototype on my local development build of the SSO.

I then posted a small sneak-peek tweet about it to our users.

WIP: Helping our @EveOnline players to be aware if their passwords are on a list of known compromised passwords. Thanks @haveibeenpwned ! CC: @troyhunt #tweetfleet #security #workinprogress pic.twitter.com/miovu6g25q
— Stefán Jökull Sigurðarson - CCP Ghostrider (@stebets) April 27, 2018

Apparently Troy found this pretty interesting, so soon after that tweet he got in contact with me and a discussion started between us on an optimal implementation. The discussion eventually involved Junade Ali at Cloudflare as well, since Troys API is fronted by them. Junade made some tweaks on the CDN side of things which improved things even further. Junade has also just released a blog post on some of those improvements in case you are interested in reading up on that.

Over the next few days I worked with our awesome Customer Support people at CCP on finalizing messaging and translations until we then launched the feature on May 2nd.

And we're live on TQ! Just English for now (translated versions coming in the next few days) but we're now actively notifying pilots when logging in either through the launcher of websites. Fly safe capsuleers! CC: @EveOnline @troyhunt #tweetfleet https://t.co/rXJrdxmhaC
— Stefán Jökull Sigurðarson - CCP Ghostrider (@stebets) May 2, 2018

The launch went incredibly well, and the feature has been very well received. In the next blog posts I will go into more technical details, share some code snippets, statistics and our future plans for better account security. Stay tuned, there's good stuff coming ;)

Benchmarking and performance optimizations in C# using BenchmarkDotNet

Stefán Jökull Sigurðarson — Sat, 04 Jun 2016 10:47:00 GMT

Hello friends!

Long time, no blog, but now it's time to for something fun.

I happened upon the awesome BenchmarkDotNet library the other day, which makes it incredibly easy to write good and solid .NET benchmarks.

You can find a good tutorial on how to get started using it by looking at the GitHub page for the project, but I'm just going to dive right into it.

Introduction

In the services we write at CCP Games we often have to do geographical lookups based on the visiting users IP address. For this we use the IP2Location database which saves the IP address lookups as one large unsigned integer instead of a dot-delimited string of four bytes. This makes sense from a database perspective, because it means that the IP address ranges can more easily be stored in an integer column and makes more sense to be able to quickly find out if an IP address is within a specific range.

This does however require an IP address that's represented as a string to be converted into an integer, which means splitting up the string, and adding the elements of the string into an integer. This means that the IP address is split into 4 numbers and each of them represents one byte out of a 32-bit unsigned integer.

We had written a simple helper class to do this.

Code

public static class IP2LocationHelper1
{
    public static uint IPAddressToInteger(string input)
    {
        uint[] elements = input.Split('.').Select(x => uint.Parse(x)).ToArray();
        return elements[0] * 256U * 256U * 256U + elements[1] * 256U * 256U + elements[2] * 256U + elements[3];
    }
}

Result

Method	Median	StdDev	Scaled	Gen 0	Gen 1	Get 2	Bytes Allocated/Op
IPAddressToInteger_1	784.7456 ns	15.5832 ns	1.00	753.00	-	-	94.92

What this tells us is the following. Each call takes around 785 nanoseconds to execute and it allocates around around 95 bytes of memory. That might not seem like a lot but for a simple IP Address to integer conversion I feel it is rather much. So, we started to experiment.

Watch out for params!

Let's just break things done line by line, which is easy enough since we only have to lines at the moment.

The string.Split() method is actually defined with a params parameter accepting an array of characters as possible delimiters. What a params parameters does is, even provided with just one parameter, it always creates an array, and creating a new array every time we call the method is obviously expensive so we'll start by modifying it into a static variable.

Code

public static class IP2LocationHelper2
{
    private static readonly char[] Delimiter = new[] { '.' };

    public static uint IPAddressToInteger(string input)
    {
        uint[] elements = input.Split(Delimiter).Select(x => uint.Parse(x)).ToArray();
        return elements[0] * 256U * 256U * 256U + elements[1] * 256U * 256U + elements[2] * 256U + elements[3];
    }
}

Results

Method	Median	StdDev	Scaled	Gen 0	Gen 1	Get 2	Bytes Allocated/Op
IPAddressToInteger_1	784.7456 ns	15.5832 ns	1.00	753.00	-	-	94.92
IPAddressToInteger_2	781.3869 ns	15.8058 ns	1.00	737.00	-	-	93.85

Small progress. We're not allocating as much but it's probably within margin of error though.

Using appropriate datatypes

Since an IP address is made up of four bytes (32 bits) there is no reason for us to parse the individual elements as int. Let's try converting them to byte and see where that takes us.

Code

public static class IP2LocationHelper3
{
    private static readonly char[] Delimiter = new[] { '.' };

    public static uint IPAddressToInteger(string input)
    {
        byte[] elements = input.Split(Delimiter).Select(x => byte.Parse(x)).ToArray();
        return elements[0] * 256U * 256U * 256U + elements[1] * 256U * 256U + elements[2] * 256U + elements[3];
    }
}

Results

Method	Median	StdDev	Scaled	Gen 0	Gen 1	Get 2	Bytes Allocated/Op
IPAddressToInteger_1	784.7456 ns	15.5832 ns	1.00	753.00	-	-	94.92
IPAddressToInteger_2	781.3869 ns	15.8058 ns	1.00	737.00	-	-	93.85
IPAddressToInteger_3	781.2921 ns	8.0850 ns	1.00	685.71	-	-	87.25

Steady improvements. But were have more to go still!

Why all the parsing?

For all four elements we are parsing the string representation into a byte using byte.Parse. This involves a lot of checking which really shouldn't be necessary. Since a byte can only have 256 different decimal string representations we might as well create a Dictionary mapping the possible string to their byte values. Let's create the following static variable and initialize it in the static constructor.

Code

public static class IP2LocationHelper4
{
    private static readonly char[] Delimiter = new[] { '.' };
    private static readonly Dictionary elementMapper = new Dictionary();

    static IP2LocationHelper4()
    {
        for (uint i = 0; i < 256; i++)
        {
            elementMapper[i.ToString()] = (byte)i;
        }
    }

    public static uint IPAddressToInteger(string input)
    {
        byte[] elements = input.Split(Delimiter).Select(x => elementMapper[x]).ToArray();
        return elements[0] * 256U * 256U * 256U + elements[1] * 256U * 256U + elements[2] * 256U + elements[3];
    }
}

Results

Method	Median	StdDev	Scaled	Gen 0	Gen 1	Get 2	Bytes Allocated/Op
IPAddressToInteger_1	784.7456 ns	15.5832 ns	1.00	753.00	-	-	94.92
IPAddressToInteger_2	781.3869 ns	15.8058 ns	1.00	737.00	-	-	93.85
IPAddressToInteger_3	781.2921 ns	8.0850 ns	1.00	685.71	-	-	87.25
IPAddressToInteger_4	453.6526 ns	4.4243 ns	0.58	664.50	-	-	84.33

By caching of data and pre-parsing the possible string the performance has drastically improved by 40%. But we're not done yet!

Avoid LINQ in performance critical code

It's been known that although LINQ is great tool to make complex filtering and composition easier to read it does come with a performance impact. Let's try replacing the Select() and ToArray() methods with a standard new.

Code

public static class IP2LocationHelper5
{
    private static readonly char[] Delimiter = new[] { '.' };
    private static readonly Dictionary elementMapper = new Dictionary();

    static IP2LocationHelper5()
    {
        for (uint i = 0; i < 256; i++)
        {
            elementMapper[i.ToString()] = (byte)i;
        }
    }

    public static uint IPAddressToInteger(string input)
    {
        string[] stringElements = input.Split(Delimiter);
        byte[] elements = new[] { elementMapper[stringElements[0]], elementMapper[stringElements[1]], elementMapper[stringElements[2]], elementMapper[stringElements[3]] };
        return elements[0] * 256U * 256U * 256U + elements[1] * 256U * 256U + elements[2] * 256U + elements[3];
    }
}

Results

Method	Median	StdDev	Scaled	Gen 0	Gen 1	Get 2	Bytes Allocated/Op
IPAddressToInteger_1	784.7456 ns	15.5832 ns	1.00	753.00	-	-	94.92
IPAddressToInteger_2	781.3869 ns	15.8058 ns	1.00	737.00	-	-	93.85
IPAddressToInteger_3	781.2921 ns	8.0850 ns	1.00	685.71	-	-	87.25
IPAddressToInteger_4	453.6526 ns	4.4243 ns	0.58	664.50	-	-	84.33
IPAddressToInteger_5	320.2013 ns	88.6061 ns	0.41	636.77	-	-	80.30

OK. That's a pretty big improvement right there! LINQ has a pretty hefty performance hit, and performance has improved by almost 60% now. But is there more we can do? Certainly, let's do some math!

Bit-shifting?

For the first three elements,we really are only shifting their bits around in an integer (multiplying an integer by 256 is really only doing a left shift of 8-bits). Fortunately, C# has the bit shifting operators << and >> (left and right shift respectively). Let's see if that improves our code in any way.

Code

public static class IP2LocationHelper6
{
    private static readonly char[] Delimiter = new[] { '.' };
    private static readonly Dictionary elementMapper = new Dictionary();

    static IP2LocationHelper6()
    {
        for (uint i = 0; i < 256; i++)
        {
            elementMapper[i.ToString()] = i;
        }
    }

    public static uint IPAddressToInteger(string input)
    {
        string[] stringElements = input.Split(Delimiter);
        uint[] elements = new[] { elementMapper[stringElements[0]], elementMapper[stringElements[1]], elementMapper[stringElements[2]], elementMapper[stringElements[3]] };
        return (elements[0] << 24) + (elements[1] << 16) + (elements[2] << 8) + elements[3];
    }
}

Results

Method	Median	StdDev	Scaled	Gen 0	Gen 1	Get 2	Bytes Allocated/Op
IPAddressToInteger_1	784.7456 ns	15.5832 ns	1.00	753.00	-	-	94.92
IPAddressToInteger_2	781.3869 ns	15.8058 ns	1.00	737.00	-	-	93.85
IPAddressToInteger_3	781.2921 ns	8.0850 ns	1.00	685.71	-	-	87.25
IPAddressToInteger_4	453.6526 ns	4.4243 ns	0.58	664.50	-	-	84.33
IPAddressToInteger_5	320.2013 ns	88.6061 ns	0.41	636.77	-	-	80.30
IPAddressToInteger_6	309.9793 ns	6.2034 ns	0.40	618.12	-	-	77.84

Not a lot of change, although allocations seem to have been reduced and just a minor speed bump. Looking at this code made me realize another alternative. Just like we created the map to convert strings into bytes, why don't we create four maps that would map all possible strings to pre-shifted unsigned integers.

Pre-shifting with more maps

Let's do some more pre-mapping, now with pre-shifted bytes so we only have to do additions after splitting up the string elements.

Code

public static class IP2LocationHelper7
{
    private static readonly char[] Delimiter = new[] { '.' };
    private static readonly Dictionary element0Mapper = new Dictionary();
    private static readonly Dictionary element1Mapper = new Dictionary();
    private static readonly Dictionary element2Mapper = new Dictionary();
    private static readonly Dictionary element3Mapper = new Dictionary();

    static IP2LocationHelper7()
    {
        for (uint i = 0; i < 256; i++)
        {
            element0Mapper[i.ToString()] = i << 24;
            element1Mapper[i.ToString()] = i << 16;
            element2Mapper[i.ToString()] = i << 8;
            element3Mapper[i.ToString()] = i;
        }
    }

    public static uint IPAddressToInteger(string input)
    {
        string[] stringElements = input.Split(Delimiter);
        return element0Mapper[stringElements[0]] + element1Mapper[stringElements[1]] + element2Mapper[stringElements[2]] + element3Mapper[stringElements[3]];
    }
}

Results

Method	Median	StdDev	Scaled	Gen 0	Gen 1	Get 2	Bytes Allocated/Op
IPAddressToInteger_1	784.7456 ns	15.5832 ns	1.00	753.00	-	-	94.92
IPAddressToInteger_2	781.3869 ns	15.8058 ns	1.00	737.00	-	-	93.85
IPAddressToInteger_3	781.2921 ns	8.0850 ns	1.00	685.71	-	-	87.25
IPAddressToInteger_4	453.6526 ns	4.4243 ns	0.58	664.50	-	-	84.33
IPAddressToInteger_5	320.2013 ns	88.6061 ns	0.41	636.77	-	-	80.30
IPAddressToInteger_6	309.9793 ns	6.2034 ns	0.40	618.12	-	-	77.84
IPAddressToInteger_7	302.1271 ns	7.0767 ns	0.39	558.55	-	-	70.64

Et voila! More allocations removed and small speed bump along with it. We'll make do with this for now.

Conclusion

The end results is quite something. Our method is now 60% faster than the original and allocates almost 30% less memory as well and we still maintained good readability on our code.

Things can be improved even further as almost all of the rest of the allocations are taking place in the string.Parse method, but then I feel we might be crossing the line between readibility and optimizations, but I'll leave it as an exercise for the reader to take that on ;)

This just goes to show that with a little thinking and some research, we can usually find better and more optimized ways of doing things without sacrificing code readability.

Also, I can't recommend BenchmarkDotNet enough, as it is a wonderful tool to help out with performance optimizations like these, by making it easy to show progress and difference between the different implementations.

Hope this helps!

Add your own performance counters to Application Insights

Stefán Jökull Sigurðarson — Sun, 21 Feb 2016 15:24:00 GMT

I have been trying Application Insights for awhile now and while it is a fantastic tool I do feel it is missing several important performance counters out-of-the-box for .NET applications. Among them are several counters from the .NET CLR Memory category which I'm going to use as an example for this blog post.

Application Insights allows you to easily add performance counters to the list of counters it already measures, and that is done by some simple little changes to the ApplicationInsights.config file (for regular ASP.NET web sites).

If you take a look at the ApplicationInsights.config file you'll find the TelemetryModules section near the top and under it is a PerformanceCollectorModule where you can add your own performance counters.

Here is an example of the .NET CLR Memory counters I added (mostly relating to the .NET Garbage Collector):

The ??APP_CLR_PROC?? string is a "magic variable" that the Application Insights runtime replaces with the identity of the CLR Process that runs the application to select the correct counters.

After adding these counters we can take a look at them using the Metrics Explorer in the Application Insights hub on the Azure Portal.

There is one caveat to remember as well. You need to make sure that the user running the website, for example the IIS Application Pool user, is a member of the local Performance Monitor Users group to be able to read performance counters from the system.

Adding an IIS application pool user to that group can be easily done with PowerShell like this (answer on Stack Overflow from Svein Fidjestøl):

$group = [ADSI]"WinNT://$Env:ComputerName/Performance Monitor Users,group"
$ntAccount = New-Object System.Security.Principal.NTAccount("IIS APPPOOL\ASP.NET v4.0")
$strSID = $ntAccount.Translate([System.Security.Principal.SecurityIdentifier])
$user = [ADSI]"WinNT://$strSID"
$group.Add($user.Path)

If you are running your website as a Windows Azure Web App, you don't need to worry about those permissions.

I hope this helps, and happy performance monitoring!

Using Docker Tools on Windows with Hyper-V instead of Virtualbox

Stefán Jökull Sigurðarson — Fri, 05 Feb 2016 23:23:00 GMT

Docker is a fantastic tool, and the Docker Toolbox is a must to manage it.

If you install the Docker Toolbox on a Windows machine, the installer automatically installs Oracle Virtualbox to run the Docker virtual machine.

That always annoyed me, since Hyper-V has now been a part of the Windows Pro and Enterprise editions since Windows 8.

It is however possible to run the Docker Toolbox using Hyper-V straight out of the box.

First create a Virtual Switch in Hyper-V.

Then, start an Administrative Command Prompt and instead of using the standard documented method to create a Docker VM called default using Virtualbox:
docker-machine create --driver virtualbox default

you simply exchange the virtualbox driver with hyperv like so:
docker-machine create --driver hyperv default

That's all there is to it. If you are running a version of Windows that has Hyper-V, you can safely uninstall Virtualbox and just use Hyper-V instead to manage your Docker VMs.

Hope this helps!

Using Application Insights with Ghost 0.7

Stefán Jökull Sigurðarson — Thu, 10 Sep 2015 17:02:00 GMT

Hello friends!

I've been taking a look at Microsoft's Application Insights recently and decided as a test to see how hard it would be to integrate with the Ghost blogging platform.

Application Insights is Microsoft's application performance monitoring platform. It is build into Azure and provides excellent insights into both server side as well as client side performance.

Turns out adding it to Ghost is relatively easy.

Preparation

First of all you need an account on Microsoft's Azure platform. Once you have that created you can log in to the Azure Portal. From there simply click New -> Developer Services -> Application Insights. Give it a name, select "Other (preview)" under Application Type, make sure "Pin to Startboard" is selected and click "Create".

Once that is completed, go to "Home" and find the Application Insights tile. It should have the name you gave it as well as a purple lightbulb icon. Clicking it should open up the Essentials dashboard. Now look for the Instrumentation Key which is a Guid and copy it to your clipboard, you will need it when configuring Ghost.

Ghost server side integration

First of all we need to add the necessary Application Insights node modules to the application. We do that by using the npm package tool. Simply go to the root folder of the website and run npm install applicationinsights.

The next step is to set Application Insights to start up during the Ghost startup. For that you have to edit index.js in Ghost's root directory.

Here is how my index.js looks like:

// # Ghost Startup
// Orchestrates the startup of Ghost when run from command line.
var express,
    ghost,
    parentApp,
    errors,
    appInsights;

// Make sure dependencies are installed and file system permissions are correct.
require('./core/server/utils/startup-check').check();

// Proceed with startup
express = require('express');
ghost = require('./core');
errors = require('./core/server/errors');
appInsights = require("applicationinsights");

// Initializes app insights
appInsights.setup("xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx").start();

// Create our parent express app instance.
parentApp = express();

// Call Ghost to get an instance of GhostServer
ghost().then(function (ghostServer) {
    // Mount our Ghost instance on our desired subdirectory path if it exists.
    parentApp.use(ghostServer.config.paths.subdir, ghostServer.rootApp);

    // Let Ghost handle starting our server instance.
    ghostServer.start(parentApp);
}).catch(function (err) {
    errors.logErrorAndExit(err, err.context, err.help);
});

See where I call appInsights.setup? That's where your instrumentation key goes, and calling this method will set up all the necessary server side interceptions so you'll start to see server side information such as request times, number of requests and failed requests.

Now, let's see what's needed for the client side integration.

Ghost client side integration

Next. Log in to your Ghost administration portal and got to Settings -> Code Injection. In the "Blog Header" section, add the following snippet, replacing the "xxxxxx" part with your instrumentation key)

And that's all there is to it. Now you should receive a bunch of metrics from your Ghost application, being it server side requets as well as browser and session information.

Hope this helps!

Stefán J. Sigurðarson

Autoscaling Azure SQL HyperScale for better cost management

Fast ManualResetEventSlim checks when waiting is rarely needed

Using HaveIBeenPwned, Application Insights and Grafana to detect credential stuffing attacks.

Real world example of reducing allocations using Span and Memory

Brief introduction to the AMQP 0.9.1 Protocol

Binary Serialization

Double

Single

Serializing data and reusing buffers

The obvious solution: MemoryStream

A better solution: MemoryPool

Converting strings to UTF8 bytes

Arrays

Results!

.NET/Visual Studio Productivity Tips - Part 1

CTRL+. (Period) or ALT+Enter

The [DebuggerDisplay] attribute

C# 8 - Null coalescing/compound assignment

You don't need to be a rocket-scientist to contribute to .NET Core!

Mocking JWT tokens in ASP.NET Core integration tests

EVE Online account security - Part 2 - Pwned Passwords details, CSP and HSTS

CSP

HSTS and preloading eveonline.com!

Summary

Links

Troy Hunt on Pwned Passwords

CSP

HSTS (and HTTPS)

Monitoring GC and memory allocations with .NET Core 2.2 and Application Insights

Introduction

Getting the data

Code

Querying our data using Application Insights

CPU usage by User and Privileged time

GC Heap Size by generation

GC heap size changes over time (for a single server)

Memory allocated by type (sampled, top 20)

Conclusion

EVE Online account security - Part 1 - Pwned Passwords

Benchmarking and performance optimizations in C# using BenchmarkDotNet

Introduction

Code

Result

Watch out for params!

Code

Results

Using appropriate datatypes

Code

Results

Why all the parsing?

Code

Results

Avoid LINQ in performance critical code

Code

Results

Bit-shifting?

Code

Results

Pre-shifting with more maps

Code

Results

Conclusion

Add your own performance counters to Application Insights

Using Docker Tools on Windows with Hyper-V instead of Virtualbox

Using Application Insights with Ghost 0.7

Preparation

Ghost server side integration

Ghost client side integration