Darling Data

Double Your Core Count With Azure SQL Managed Instance

Joe Obbish — Thu, 19 Mar 2026 14:44:12 +0000

I’m going to open with a perhaps controversial statement: “when you buy 4 vCores on the Azure SQL Managed Instance platform, what you’re actually buying is 2 physical cores presented as 4 hyperthreaded cores to SQL Server”. That means that if you have 8 physical cores on your SQL Server machine today then your starting Managed Instance vCore equivalent count could be closer to 16 vCores instead of 8. Perhaps this is already well known to everyone else, but I couldn’t find any (accurate) writing on this topic so I gave it a shot.

The Test Machines

A CPU core is only as good as its performance, so instead of theoretical stuff, I’m simply going to test identical, synthetic workloads on 3 different machines:

An Intel i7-9700k processor gaming PC with Windows Home (big regret)
An 8 core VMware VM on an Intel Gold 6248R physical host without any special configuration
An 8 vCore next-gen general purpose Azure SQL Managed Instance on a Dev/Test subscription in US East

If the Managed Instance has the equivalent of 8 physical cores then I expect to see similar scaling on all three machines. If it effectively only has four physical cores then I expect to see much worse scaling with the Managed Instance compared to the other two machines.

One Big Query

For the first test, I wanted to run a single query with varying MAXDOP on a quiet machine. I wrote a query to burn CPU rather efficiently while minimizing shared state: there’s no I/O, the memory grant is tiny, there’s very little intra-worker communication, and the query uses demand based parallelism to distribute rows between the parallel worker threads. Here is the query that I ran at MAXDOP 1, MAXDOP 4, and at MAXDOP 8:

 WITH vCTE AS (
      SELECT v.v
      FROM
      (VALUES
      (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95)
      ) v(v)
)
SELECT MIN(v0.v + ca.v)
FROM vCTE v0
CROSS APPLY (
      SELECT MIN(v1.v + v2.v + v3.v + v4.v) v
      FROM vCTE v1
      CROSS JOIN vCTE v2
      CROSS JOIN vCTE v3
      CROSS JOIN (SELECT TOP (10) v FROM vCTE) v4
      WHERE v1.v > (v0.v - 96)
) ca
OPTION (NO_PERFORMANCE_SPOOL, MAXDOP 8, USE HINT (N'ENABLE_PARALLEL_PLAN_PREFERENCE'));

as well as the testing results:

The bare metal machine has single-threaded performance better than twice as fast as the Managed Instance. Of course, this isn’t unexpected. The i7-9700k has a higher clock speed and I’m not paying any virtualization/cloud/managed instance taxes. What I think is more notable is how the query performs as MAXDOP increases. The VMware VM and the bare metal machine both have pretty reasonable scaling. It would be nice to see a speed up of exactly 8.0 at MAXDOP 8 but this is fairly difficult to achieve in practice due to OS overhead, SQL Server overhead, and other factors. However, the Managed Instance acts exactly as I would expect from an 4 physical core/8 virtual core machine. The query can run at DOP 8 but the overall speedup barely exceeds the theorized number of physical cores: 4.

Here is the CPU time used by each tested query:

Again, the Managed Instance acts as expected from an 4 physical core/8 virtual core machine. DOP can certainly be set above the number of physical cores, but this is going to result in inflated CPU time metrics because both hyperthreads cannot execute at the same time. Note that we do not see the same inflation of CPU time at MAXDOP 8 for the other two test machines.

Many Small Queries

The second and final test attempt is to run as many small queries as possible within a five minute window. The number of concurrent sessions varies between 1, 4, and 8. As usual, I am forcing each session to go on its own scheduler in order to get the most consistent test results possible. Long time readers of my blog posts with very good memories already know the usual SQLCMD routine. Here is the query that was run, which is very similar to the previous one for highly technical (laziness) reasons:

WITH vCTE AS (	
    SELECT v.v
    FROM
    (VALUES
    (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95)
    ) v(v)
)	
SELECT @Dummy = MIN(v0.v + ca.v)	
FROM vCTE v0	
CROSS APPLY (	
    SELECT MIN(v1.v + v2.v + v3.v) v
    FROM vCTE v1
    CROSS JOIN vCTE v2
    CROSS JOIN (SELECT TOP (8) v FROM vCTE) v3
    WHERE v1.v > (v0.v - 96)
) ca	
OPTION (NO_PERFORMANCE_SPOOL, MAXDOP 1);

as well as the testing results:

Again, I think that the scaling is what matters here. The Managed instance cannot exceed four times the throughput of a single session even with 8 concurrent sessions on all different schedulers. As before, the Managed Instance has the same performance profile as a 4 physical core machine. Both of the other machines have quite reasonable scaling as well as almost triple the throughput during the 8 session test.

Some Theoretical Stuff

Microsoft provides some documentation about the CPU models that power the physical machines that host the Managed Instances. The premium series Managed Instance hardware uses the Intel 8370C (Ice Lake) processor which conveniently is not on Intel’s website. Wikipedia says that this processor has 32 physical cores, 64 logical cores, and supports a maximum of two sockets. That means that the biggest possible Intel 8370C physical host has 64 physical cores and 128 logical cores. Microsoft offers a 128 vCore Managed Instance for premium series CPU counts. It is simply impossible to have 128 physical cores with the Intel 8370C processor.

The same argument can be made for the general purpose series. The Intel E5-2673 v4 (Broadwell) has 20 physical cores, 40 logical cores, and supports a maximum of two sockets. An 80 vCore Managed Instance cannot have 80 physical cores on Intel E5-2673 v4 hardware. It is simply not possible.

I admit that I don’t have any insider knowledge here. Maybe I’m totally wrong about what Microsoft’s virtualization layers are doing with respect to Managed Instance. However, the 8 vCore MI seems to behave exactly as I would expect from a 4 physical core/8 virtual core VM, so that’s what my mental model will be until I see new evidence.

Final Thoughts

Of course, your real production workloads will be more complicated than simply burning CPU. You may have over-provisioned your SQL Server machines or maybe you’ve already exposed hyperthreads to your OS running SQL Server. Those of you in those categories may not see a vCore surprise if you migrate to the Managed Instance platform. However, if you typically think in terms of physical cores and your starting point is a well-sized SQL Server machine, your best bet may be to assume that each MI vCore gives half as much CPU power as you’re used to. Thanks for reading!

The post Double Your Core Count With Azure SQL Managed Instance appeared first on Darling Data.

Get AI-Ready With Erik: Capacity Planning

Erik Darling — Thu, 19 Mar 2026 13:00:09 +0000

Get AI-Ready With Erik: Capacity Planning

Summary

In this video, I delve into the fascinating world of capacity planning as it pertains to vector embeddings in SQL Server databases. I start by comparing the size of a typical Stack Overflow post table from 2010 to its corresponding embeddings table, revealing just how much space these embeddings can consume—eight and a half gigs compared to six and a quarter gigs for the original data. This comparison sets the stage for an exploration of vector sizes, which depend on the number of dimensions in your chosen model. I also discuss the implications of different models and their embedding sizes, touching on both theoretical maximums and practical considerations. Additionally, I explore how vector indexes work differently from traditional B-tree indexes, emphasizing the importance of columnstore indexes for optimizing space usage. Finally, I provide some practical tips for managing database space when working with large embeddings, including the use of clustered columnstore indexes to achieve significant savings.

Chapters

*00:00:00* – Introduction
*00:00:32* – Embedding Size Comparison
*00:01:04* – Post Table vs Post Embeddings Table
*00:01:53* – Projection and Estimation
*00:02:17* – Embeddings Column Details
*00:02:34* – Vector Storage Format
*00:03:02* – Vector Size Dependence
*00:03:31* – Model-Specific Sizes
*00:04:29* – Vector Index Overview
*00:05:04* – Graph Structure of Vector Index
*00:05:10* – Edge Table Size
*00:05:29* – Controlling Embedding Sizes
*00:06:00* – Columnstore Indexing
*00:06:37* – Interoperability Issues
*00:06:42* – Currently Available Features
*00:07:03* – Capacity Planning Tips
*00:07:14* – Clustered Columnstore Index
*00:07:41* – Course Promotion

Full Transcript

Erik Darling here with Darling Data. Boy oh boy, exciting times are upon us. We’re going to talk about a DBA-ish subject here, but who knows, maybe someday a developer will care about this sort of thing as well, and that is capacity planning. Because vectors themselves, the embeddings, can be rather large. And what I’m going to show you is You know what, actually, I’m going to change this on the fly a little bit, because I want to show you actually just how big embeddings can get. And I don’t know why it didn’t occur to me to do this at the beginning. Posties. I’m going to compare the size of the post table to the size of the post embeddings table. Now, this is the Stack Overflow 2010, so this is like not even close to the full-size database. This is like the first like two, three years of Stack’s existence. The post table with the title column, the tags column, the bodies column, body column, not bodies, like last editor, like a whole bunch of string columns and stuff in there. The whole thing, right, is about 6.4 gigs. The post embeddings table with far fewer columns in it, just having the embeddings in there is eight and a half gigs, right? Like, it’s no joke, size-wise.

And, like, one way that you can sort of figure that out is, like, we’re going to take, it’s like 8218. We’re going to take that number, and we’re going to just sort of, like, try to do a rough projection here. Now, remember, post embeddings is about eight and a half gigs, like, full-size up here. If we just sort of look at different row counts and how big they might be, that still doesn’t get us close to, like, we’re still, like, about a gig off for all of those different row counts, right? And, like, part of that is going to be, like, well, like, the post embeddings table has the embeddings column in it, which is the majority of the space, but there are a few other columns in the table.

Nothing huge. It’s, like, ID and, like, post ID and, like, a date time to column or something. So, like, if you look at the table itself, like, those embeddings take up the majority of it, and, you know, every embedding looks like this. It’s, you know, 1024 floaty numbers jammed into a JSON string.

So, you know, not kind of storage. If you’re a storage vendor out there, God bless. God bless. I’m in the wrong line of work, probably.

But vector size is going to depend on the number of dimensions that your embeddings are, and the number of dimensions in your embeddings is going to depend on the model you choose. There are all sorts of different models out there that generate different numbers of embeddings.

The max number of embeddings that you can use for a float32 data, float32 vector in SQL Server is 1,998. That doubles for float16, but float16 is still a preview feature. So, yep, ba-ba-ba-ba-ba-ba-ba-ba-ba.

Here we are back at square one. So, if we look at this and we look at what different models might generate as far as size goes, you know, like 768, you know, theoretically, you know, you know, like, I don’t even know how many gigs that would be.

But this is, you know, like, they get bigger as you involve more embeddings up until you hit, you know, 8,000 actual bytes and when you hit the 1,998 vectors for a float32. But I don’t know, at least no model that I’ve ever seen uses specifically 1,998.

If you’re using, like, OpenAI, you’re going to have, like, the 1,532 or whatever that you see in the majority of the Microsoft demos because, gee, I wonder why they’re all using OpenAI, self-talking, whatever. Anyway, what’s cool about vector indexes, though, sort of, it’s like, it’s not like a B-tree index, right?

Because it’s not like you’re taking all the data and just putting it in order, right? So, it’s like you’re making a graph out of it and you’re sort of connecting lines within that graph. That’s why the vector index create process looks the way it does.

If you go back a few videos to where I showed you the stuff that Microsoft is running when they build a vector index, like, you’re not just, like, taking the data and putting it in, like, B-tree order. It’s not like you’re seeking into, like, a regular B-tree and just being like, oh, this is my entry point. I found this thing. I seek exactly to this primary key vector row.

It’s not like that at all. It builds a graph and the graph structure itself is a lot smaller than the vector embeddings are. Like, you can see the size of the edge table, right, is 286 megs, which is much, much smaller than the size of the embedding data that we have in the post-embeddings table.

If you want to control embedding sizes, like, outside of just which model you choose, right? Like, if you choose a model that’s, like, 1,024 or 1,500, whatever, you know, like, you’re going to have some pretty big vector data.

Like, columnstore works. Like, you can create a, I mean, non-clustered columnstore is, like, it’s okay, but, like, you still have the base table, which is, you know, it can be like a rowstore clustered index or something, and that’s not going to be very well compressed.

So, like, clustered columnstore, you can create that on your table that has the embeddings in it, and columnstore will compress it pretty well. Row and page compression don’t do anything for vectors, right? That is not the type of data that you can actually compress, because it’s, like, the same reason you can’t compress, like, JSON or XML, or, like, you have compressed XML indexes in SQL Server, whatever, but, like, if you just have, like, an Envarkar max or something, like, that doesn’t get compressed.

The only thing is you can’t intermix columnstore and vector indexes. If you have a table with a columnstore, like, a clustered columnstore index on it, or I think even non-clustered columnstore, too, you can’t create a vector index on it.

And if you have a vector index on a table, you can’t create a columnstore index on it. They just don’t, they don’t interoperate. That’s not super important today, since vector indexes are still preview only. The only generally available stuff is float32 and vector distance, and a few of the other, like, AI generate chunks and, you know, things like that.

So, you may as well, at this point, just get the space savings and maybe some batch mode in your query plans by creating the clustered columnstore index on it.

Anyway, just a little bit of stuff about the capacity planning there, a couple hints on how you can maybe save some database space by using clustered columnstore.

But anyway, that’s it there. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I hope that you will buy this course, Get AI Ready with Erik, available today with a coupon code AI Ready, which takes 100 entire dollars off the cost of the course.

It makes it well worth your time and money if your company is planning on doing anything with vectors in SQL Server. All right.

Well, that’s probably good there. Thank you for watching. Goodbye.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

The post Get AI-Ready With Erik: Capacity Planning appeared first on Darling Data.

Get AI-Ready With Erik: Bad Vector Observability

Erik Darling — Wed, 18 Mar 2026 13:00:41 +0000

Get AI-Ready With Erik: Bad Vector Observability

Summary

In this video, I delve into the world of bad vectors and embeddings, illustrating what they look like through real-life examples and practical scenarios. Drawing from my experience attempting to transcribe, summarize, and chapterize a large YouTube catalog using local Language Models (LLMs), I highlight how imperfect these models can be, leading to issues such as overly long chapter timestamps that don’t match the actual video length. This example underscores the need for robust validation mechanisms to ensure embeddings are of high quality before they’re stored or used in any pipeline. I also explore techniques like using dot product arguments and vector distance functions to identify problematic vectors, emphasizing the importance of catching these issues early on to maintain the integrity of the data processing workflow.

Chapters

*00:00:00* – Introduction
*00:01:00* – Bad Embeddings Example
*00:02:14* – LLM Pipeline Issues
*00:03:20* – Vector Distance Function
*00:05:33* – Filtering Out Bad Embeddings
*00:07:46* – Trigger for Validation
*00:08:45* – Conclusion

Full Transcript

Erik Darling here with Darling Data. And what I want to show you in this video is a little bit about what bad vectors look like or what bad embeddings look like. So it would be stuff that like, like just, you know, like this can happen for a lot of reasons, right? It can happen during like, like, like while you’re doing the embedding, something weird can happen. There could all sorts of like, I don’t know, you can even have like weird truncated text, truncated text that does it. But let’s just like, just, I’m going to give you like sort of a, a, an example from my real life. Like, I don’t know if you, I forget when I talked about it, but like one of the things that I said I was trying to do was take all my YouTube catalog, uh, and like have it transcribed, summarized, and chaptered by using local LLMs. So like I have this pipeline set up to like download the YouTube video, use one local, one local LLM to make the transcription. And then another LLM to look at the transcription to generate the summary and chapters. Now what’s, what’s, what was really interesting, like a funny thing that happened that I didn’t catch until like, like there was a fair amount of scrutiny going on where I was like, wait a minute, like, like one LLM generates the transcript and then another LLM looks at the transcript and it summarizes it. And the summaries were generally okay. They were a little repetitive, like in this video I delve into or this video I dive into, like whatever, it doesn’t matter.

But what was really interesting was the chapters. The reason the chapters were interesting is because the LLM that looked at the transcript at first, like until I put it in there, it had no idea how long the video was. So I would have like a 10, 12, 15 minute video and the local LLM would start putting chapters at like an hour, hour and a half, two hours out. And I was like, that’s not good. Someone’s going to look at that and think I’m crazy. This is a 15 minute video. And it’s like three hours in, good night. Okay. So like, like, like, like I’ve been going, like, I’ve had to like redo a lot of stuff because of that. Uh, I spent yesterday with my Mac book on drink coasters with, um, like, like, like ice sleeves for pain under it. Cause it was getting hot.

And it was just the whole thing. Anyway, it was like seven hours of reprocessing 700 videos or something. But, uh, like, like, like, like, you know, again, something that we can all agree on. LLMs are currently imperfect. Um, pipelines are also, also somewhat imperfect. Um, you know, computers quite imperfect. So there are a lot of potential reasons why you might need to like deal with and find things that look like this. So like one, one way you can do that is by using the dot, um, dot product argument for the vector distance function. And you can generally use about these numbers to find vectors that would not be vectors or embeddings that would not be good. Like, thankfully I don’t have any of those. Right. So like everything in here is not like messed up, all zeros, very weak, you know, like kind of the same low numbers all across.

Uh, what I want to show you is what happens when we mix like kind of good ones and bad ones together. And so I’m just using some literal values here to like, you know, like show like at least some like, okay stuff and then some bad stuff where it’s like all zeros and whatever, and then a no one finally. And we can use some fancy queries to sort of categorize those and find ones that are not good. Right. So stuff like this, uh, where we have five rows that are very like near zero. Zero magnitude, almost like, like potential zeros, one row that’s okay. And one row that’s no. And we can use another sort of fancy query to find, um, to find ones that like just to get the detail on that, that was an aggregation.

So like when dot distance is zero, that’s probably not a good sign. Um, when it’s like a number like this, probably not a good sign, negative four, not a good sign. Generally you want to see like where it’s okay, like negative one and close to, close to negative one. Um, and then, uh, and this is like, again, this is the difference between itself. I’m not talking about dot product between like two different vectors. I’m talking about like, like when you say dot product between like, like when you compare a vector to itself, right? Or compare an embedding to itself, not when you compare it to something else, comparing it to something else is completely different. Comparing it to itself is what we’re looking at here.

And like, like the reason why you would care about this is because you might, you know, you might have like the text of the document, uh, somewhere in your database. Uh, and you might also have the embedding for it. And if you like, if you’re looking at the text document and it’s like, you know, it doesn’t matter if it’s like one line or if it’s like a couple paragraphs or a long document, if it’s all zeros, it’s never going to match to anything, right? Like it’s just not going to come up as being similar to anything. So that’s what you have to be really careful of is like, cause it just makes the LLMs look worse than they actually are.

So I’m going to create a different, slightly different table here called bad embeddings. Again, kind of using the same setup with like, you know, like some okay ones and some not so okay ones. And if we just run a query against this and we look at what comes back, like we get like, of course, like the good match and the great match come up on top, but like then like, you know, like weak matches and noise aren’t too far behind. Right. And so like, again, like, like other videos have talked about being careful about filtering with this and like saying, hey, like vector distance is less than like 0.2 or like 0. whatever.

So you can like sort of get rid of stuff like this. But like, if you were expecting good matches from some of these, you might be pretty surprised when you don’t get them. Right. And that’s going to be based on just like the vectors being, the embeddings being messed up. So like, if we, if we just run a couple of queries like this, we’re going to say vector distance not between minus 1.05 and 0.95.

And then between 1.05 and 0.95, then we’ll see those like two different result sets kind of come in. Right. So like, again, good match and great match ended up in here. The bad ones ended up in here. So this is one way of sort of catching and filtering out like bad embeddings by comparing them to themselves. Right.

Because again, we’re just, this isn’t two different embeddings. This is the same embedding. And we’re just saying, hey, how much do you agree with yourself? Right. How strongly do you reinforce yourself?

So you might be now thinking like, what are ways that we could validate vectors as they come into the database so we can catch this stuff? And, you know, sort of unfortunately, this is, this is a real bummer. Like at least, again, as things currently exist, I’m on CU1 of 2025. Right.

I know it says RTM down like here, but I’m on CU1. RTM doesn’t update to say CU1. Thanks, Microsoft for that making me look dumb. But like, you might think that if we created a computed column to say, look at like each vector and then we could like add a check constraint to say, like, if you’re messed up, I don’t want you on my table.

I mean, it doesn’t work. Right. We can’t persist this because vector distance is non-deterministic. Right. So that’s messed up.

And of course, if we take out the persisted. Right. And we say, OK, well, you know, no persisting. That’s OK. Then we get a different error and we can’t create the check constraint on it specifically because it is not persisted. So this messes us up further.

So kind of you’re stuck a little bit with like a trigger. Right. And I’m just going to give you a simple after insert update trigger where real life would be like, and instead of update trigger and, you know, you would insert it into like insert good rows and you would like into the real table and you would insert bad rows into a logging table and be like, I need to reprocess you.

But one way you can do that is with a trigger and just say, hey, if you’re not between these magic numbers that I care about, you’re going to get out of here. So this one passes and this one fails. So that’s just one way of sort of protecting yourself from bad vectors getting in for whatever reason.

Again, LLM failures and coding failures, embedding failures, all the all the stuff that can happen. And if you’ve ever dealt with any sort of ETL pipeline before in your life, you’re not probably or any sort of import process, anything like that, you’re going to be no stranger to things like this and having to use some mechanism of like capturing bad stuff and like logging it and like saying, I need to reprocess this and letting the good stuff in.

Anyway, that’s about it there. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you over in tomorrow’s video. Thank you.

Thank you. Thank you. I hope that at some point you learn to love me. Am I cool yet? Ah, screw it.

All right. Thank you.

Going Further

The post Get AI-Ready With Erik: Bad Vector Observability appeared first on Darling Data.

Stay In The Loop, Not In The Middle

Erik Darling — Tue, 17 Mar 2026 19:00:34 +0000

Credits and Debits

This post is inspired by my Humanoid BFFL Joe Sack’s wonderful post: Keep Humans in the Circle.

It’s an attempt to detail my progression using LLMs, up through how I’m using them today to build my free SQL Server Monitoring and Query Plan analysis tools.

While these days I do find LLMs (specifically Claude Code) to be wonderful enablers for my ideas, they still require quite a bit of guidance and QA, and they’re quite capable of (and sometimes seemingly eager to) wreck your day.

Just in the past couple weeks, I’ve had Dear Claude:

Drop databases
Git checkout and lose a bunch of work
Merge to main when it had no business doing so
Write a bunch of happy-path tests to make bad code pass
Delete inconvenient data to make a query work

Claude Drops a Database

This is along with a bunch of side-annoyances, of course. It’s all in my local environment, so no one’s getting hurt, but it’s still a lot of recovery work for me at times. You should probably not let it go beyond that.

Even Amazon agrees.

Dear Claude has skills and a Claude.md file to give it persistent instructions to do things, but it still has quite a habit of not following them. Like they’re not even there. But they are there, Dear Claude.

Sometimes it can’t figure out how to query DuckDb without being blocked, and other times it will run SQL scripts to manually patch things rather than use the dashboard Installers to make sure those work properly (in my Monitoring tool, there’s an Installer meant to update procedure code and table schema on upgrade).

Manual patches are supposed to be verboten, but…

In other words, it’s a lot like hiring a junior developer who doesn’t really always listen to you, or know better, and then hiring a brand new junior developer every time the conversation compacts.

Hopefully the new one million token context makes this all a bit less painful. Time will tell.

Conversations Gone Wrong

I’ve been working with SQL Server for a long time. Too long, some might say.

Long enough to have quite particular and occasionally profound opinions about it, and long enough to be skeptical of anything that promises to make SQL Server easier at the level I work at.

Even SQL Prompt, which I like quite a bit, leaves me with rather a bit of work to (un)do at times. Perhaps it needs a .md file to learn from. Perhaps if it doesn’t get one, I’ll make something that does. The world is a fun place these days.

When LLMs showed up and everyone started losing their collective minds over them, I did what any reasonable person would do: I tried to lose my mind over them, too.

What followed was a months-long journey from complete, laughable frustration, to settling into a more productive use-case: Them being my little enablers.

The first thing I tried was the most natural thing in the world: having a conversation. About SQL Server. My beloved.

LLMs are supposed to be good at conversation, after all. This should be a meeting of the minds, right? All their years of training data, all my years of training data, together at last. And me with no one to talk to.

Wrong. Wrong, wrong, wrong. So very, horribly, totally, utterly, and terribly wrong. Good golly.

Whenever the monthly model updates roll out and the usual spate of hucksters talk about the most advanced, doctor-level reasoning capabilities, I laugh until it hurts.

I’d ask something like “what causes parameter sniffing problems with local variables?” and get back this enthusiastic, confident, completely wrong answer. The LLM would happily tell me that local variables cause parameter sniffing.

That’s all backwards, ‘natch. Local variables circumvent parameter sniffing, which is its own problem because you end up with the average density estimate guess instead of a sniffed value. This is often rather an unattractive proposition for skewed data distributions.

It got worse, too. I’d ask it about topics I know fundamentally quite well, and get back these half-cooked answers that sounded more like someone went to a one hour talk about a subject and came away with a lot of wrong ideas about what was said. Deeply incorrect in important places and unable to explain further.

World’s shortest interview.

Here’s the kind of exchange I’m talking about:

-- Me: Why might a parallel query with a large memory grant
--     still spill to tempdb?
--
-- LLM: A query may spill to tempdb when the memory
--      grant is insufficient for the data volume
--      being processed. This typically happens when
--      statistics are outdated and SQL Server
--      underestimates the number of rows...
--
-- Me: What about when the grant IS large enough
--     based on the estimate, but the distribution
--     is skewed?
--
-- LLM: Great question! When data distribution is
--      skewed, you should update statistics with
--      FULLSCAN to ensure accurate cardinality
--      estimates...

That’s not even wrong in an interesting way. It’s idiot advice like you’d find on LinkedIn. I won’t even dignify it.

It’s wrong in the “I’m going to confidently repeat the first three results from a search engine” way.

The for-realsies answer involves how memory is allocated to individual threads during execution, and how thread memory allocation can cause spills even when total memory is fine.

Remember that all plans start as serial plans, and that’s when the memory grant is assigned. If a parallel plan is chosen, the memory grant gets divided up equally amongst DOP threads. If one thread gets many/all the rows, it’s likely that division of memory will not quite be adequate.

But the LLM had no idea about any of that. It just kept cheerfully, wantonly suggesting I update my statistics, like that’s the answer to everything.

At least it didn’t tell me to do index maintenance, which is a step up from some people.

Complete waste of time for expert-level SQL Server topics.

I’d rather talk to a cat.

I am allergic to cats.

Getting Organized

One thing I’m terrible at is dull admin work.

Things like building course outlines, writing read me files, and keeping various things up to date as they change along the way.

Documenting things is not my idea of a good time.

I’d often try to use them for that, and walk away sorely disappointed. Much like the above exchange, the LLM would build course content that is unusable.

Page splits, the silent killer
Logical reads: why your queries can’t have nice things
MAXDOP 1 or MAXDOP None: Which is worse?

Okay, cool, let me get in a time machine back to 2008 so any of this will be relevant.

I would sooner die, dear reader.

For the documentation bits, it would just hallucinate things that the code had never, and would never do. For example. it kept insisting that sp_IndexCleanup checks index fragmentation. Dawg is you [bleeping] kidding me?

For the training outlines, I mostly just binned them all. There were some okay fundamental ideas, and progression paths, but the details were a nightmare. How fast is a logical read, anyway?

For the read me files, there was a lot of manual labor fixing them.

Sure it was nice to have something that knew markdown, and could make things pretty for me, but having to give all the text a correctiondectomy was quite the opposite of good.

But hey, it gave me a starting place, and I could work from there. That’s more than I started with.

Writing Queries: Sorta Better, But Still Painful

Okay, so it couldn’t talk about SQL Server competently. But could it write SQL?

Can I finally have something bang out a bunch of queries I don’t feel like writing? Would any of them make good demos, or be logically correct?

I figured the bar was lower here. I’m not asking it to understand optimizer internals. I’m asking it to write a SELECT statement that doesn’t error out.

And it kind of could. In the way that a junior developer who just finished a SQL boot camp kind of can. And the way an ORM kind of can.

Sure, it starts with SELECT, and it has mostly the right table names, and it has all the columns (most of them even existed, too), but…

It still seemed to not understand data very well.

-- Write a query to find the top 10 customers
-- by total order amount for orders placed
-- in the last 90 days
--
-- Tables:
-- dbo.Customers (CustomerID, CustomerName, Email)
-- dbo.Orders (OrderID, CustomerID, OrderDate, TotalAmount)

And I’d get back:

SELECT TOP 10
    c.CustomerName,
    c.Email,
    SUM(o.TotalAmount) AS TotalOrderAmount
FROM Customers c
INNER JOIN Orders o
    ON c.CustomerID = o.CustomerID
WHERE DATEDIFF(DAY, o.OrderDate, GETDATE()) < 90
GROUP BY c.CustomerName, c.Email
ORDER BY TotalOrderAmount DESC

Which is… fine. It works. Looks like crap, though.

But look at it. If a person handed that to me, they might lose that hand. And database access.

For more complex queries, it would break down faster than hostage negotiations for an ex-wife.

I asked it to write a running totals query with a proper window function. I do enjoy a proper window function.

-- What I asked for:
-- Running total of sales by month, reset per fiscal year,
-- with a percentage of the annual total so far

-- What I got back:
SELECT
    FiscalYear,
    MonthNum,
    MonthlySales,
    SUM(MonthlySales) OVER (
        PARTITION BY FiscalYear
        ORDER BY MonthNum
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS RunningTotal,
    MonthlySales / SUM(MonthlySales) OVER (
        PARTITION BY FiscalYear
    ) * 100 AS PctOfAnnual
FROM MonthlySalesData
ORDER BY FiscalYear, MonthNum;

Spot the problem?

That PctOfAnnual calculation is the percentage of the full year, not the percentage of the annual total so far.

The denominator needs to be the running total, not the total-total.

Look, I’m not great at math, and I can see this is a problem. It’s not that me or another human couldn’t or wouldn’t mess this thing up too, but these robots are supposed to be the almighty beings replacing us all in the workforce.

When I pointed this out, it apologized profusely, and gave me back a version where it replaced the denominator with the running total expression, but didn’t take much pride in its efforts generally.

The formatting is an atrocity, and it didn’t really do much for making sure data types were constrained on the expressions. Again, very human laziness on display.

Useful for scaffolding simple queries, but I’d spend almost as long fixing its output as I would writing it myself.

It quite reminded me of this old Dennis the Menace episode:

Uncle Ned asks Helen to plant some bulbs, and she plants all 120 of them all upside down.

In many of these scenarios, I played the part of Uncle Ned. I just gave up and waited until next year.

New Stored Procedures: Guts and Bones and Bloody Knuckles

Writing a stored procedure from scratch is tedious work.

Error handling
Parameter validation
Debug modes
Version detection
Lots of dynamic SQL
Working out #temp table definitions

I thought maybe the LLM could handle that part, and I’d fill in the more detailed knowledge.

Build me a building. I’ll do the decorating. I’ve got quite the fabulous eye.

The template it produced wasn’t that bad. The formatting was still rotten garbage, but that’s what SQL Prompt is for.

CREATE OR ALTER PROCEDURE dbo.ProcessCustomerOrder
    @CustomerID int,
    @OrderDate datetime2(7) = NULL,
    @OrderItems dbo.OrderItemType READONLY,
    @NewOrderID int OUTPUT
AS
BEGIN
    SET NOCOUNT ON;
    SET XACT_ABORT ON;

    IF @CustomerID IS NULL
    BEGIN
        RAISERROR(N'@CustomerID cannot be NULL.', 16, 1);
        RETURN;
    END;

    SET @OrderDate = ISNULL(@OrderDate, SYSDATETIME());

    BEGIN TRANSACTION;

    BEGIN TRY
        INSERT dbo.Orders
        (
            CustomerID,
            OrderDate,
            StatusID
        )
        VALUES
        (
            @CustomerID,
            @OrderDate,
            1 -- New
        );

        SET @NewOrderID = SCOPE_IDENTITY();

        INSERT dbo.OrderItems
        (
            OrderID,
            ProductID,
            Quantity,
            UnitPrice
        )
        SELECT
            @NewOrderID,
            oi.ProductID,
            oi.Quantity,
            oi.UnitPrice
        FROM @OrderItems AS oi;

        COMMIT TRANSACTION;
    END TRY
    BEGIN CATCH
        IF @@TRANCOUNT > 0
        BEGIN
            ROLLBACK TRANSACTION;
        END;

        THROW;
    END CATCH;
END;
GO

That skeleton is pretty decent. SET NOCOUNT ON and SET XACT_ABORT ON, proper TRY/CATCH with a transaction check before ROLLBACK, SCOPE_IDENTITY() instead of @@IDENTITY.

It’s clear the training data included some moderate quality SQL Server content. Perhaps it’s the first thing to fully read Erland’s post.

But then I’d ask it to add business logic:

Inventory checks
Discount tier calculations
Shipping cost logic with regional rules

And that’s when the great Pear Shapening would commence. I’m giving some mock examples, because if I just gave examples from my analysis stored procedures, they’d be largely unrelateable to most readers.

For me, there was just too much fixing to make the scaffolding make sense. Reminding it of things, asking it to include or add things, having that break other things.

Again, not saying a human (even me) wouldn’t have these problems, but this is what’s phasing us out?

Ho hum. That’s nice. Close laptop. Go for walk. At least a bartender will still talk to me.

The worst part was when it was almost right. A procedure that works perfectly in testing, but has a little concurrency issue under load, or that produces wrong results when a specific combination of parameters is passed in.

These are the bugs that cost you a lot of confidence in the robots.

What About Existing Code?

This was where things started to shift a bit.

There was still great suffering, and lots of bugs I had to find out about on my own later, but… at least it had a good example to start with, and it kinda figured out my formatting/style preferences. Less running for the Pepto Bismol than before.

Instead of asking the LLM to create something from nothing, I gave it existing code and asked it to make changes.

Add a parameter (and keep the @help and readme up to date!)
Refactor a query that would otherwise be a lot of tedious text moving
Add error handling to a procedure in multiple places

Having existing code as context made a noticeable difference.

The LLM could see the style, the naming conventions, the patterns already in use, and generally try to match them.

-- Me: "Add a @Debug parameter to this procedure that,
-- when set to 1, prints the dynamic SQL instead
-- of executing it"

-- Before (snippet):
SET @sql = N'
SELECT ...'
+ @where_clause;

EXECUTE sys.sp_executesql 
     @sql, 
     @params, 
     @StartDate = @StartDate;

-- What I got back:
-- Added @Debug bit = 0 to the parameter list

IF @Debug = 1
BEGIN
    PRINT @sql;
    RETURN;
END;

EXECUTE sys.sp_executesql 
    @sql, 
    @params, 
    @StartDate = @StartDate;

That’s reasonable. It understood the intent, put the debug check in the right place, and all the other many right places all at once.

But, like, I didn’t want to RETURN there. Or anywhere. There was so much more stored procedure left to run. These types of unforced errors are quite common.

Try-hard robots.

And the more complex the change, the more it struggled.

I asked it to refactor a 400/500-line procedure to use temp tables instead of table variables, and it made the swap but missed that one of them was a TVP and that kinda wasn’t all that cool to deal with.

Turns out LLMs are about as aware of TVPs as your average SQL developer. At least there’s some consistency in the world, I suppose.

The back-and-forth was still frustrating.

I’d point out a problem, it’d fix that but introduce a new one.

I’d point out the new one, it’d fix that but regress on something else.

Three rounds in, I’m basically re-reviewing the entire procedure from scratch anyway.

Again, playing the part of Uncle Ned. Poor Uncle Ned. All 120 bulbs.

Bug Hunting And Code Reviews

Here’s where things got a bit better.

Instead of asking the LLM to write code, I started asking it to read code and tell me what was wrong with it.

Turns out, LLMs are way better at critique than creation. Which, honestly, tracks.

I am also better at critiques than creation at times. Especially when someone else has to act on the critique.

-- Me: "Review this procedure for performance issues and bugs"

-- The procedure had, among other things:
SELECT
    o.*
FROM dbo.Orders AS o
WHERE o.OrderDate BETWEEN @StartDate AND @EndDate
AND EXISTS
(
    SELECT
        1/0
    FROM dbo.OrderItems AS oi
    WHERE oi.OrderID = o.OrderID
    AND   oi.ProductID IN
          (
              SELECT
                  ProductID 
              FROM @ProductList
          )
)
OPTION (RECOMPILE);

The LLM flagged several things, and they were actually decent:

-- LLM's review points (paraphrased):
--
-- 1. SELECT * pulls unnecessary columns, can prevent
--    covering index usage
--
-- 2. BETWEEN on datetime: if @EndDate is '2024-01-31',
--    rows at exactly midnight are included but nothing
--    after. Use >= @StartDate AND < @EndDatePlusOne
--
-- 3. The table variable @ProductList won't have stats,
--    so the optimizer will estimate 1 row regardless
--    of actual content. Consider a temp table.
--
-- 4. OPTION (RECOMPILE) on every execution: is this
--    intentional? If this runs frequently, you're
--    paying compilation cost every time.

Not all of it is earth-shattering, but it’s the kind of thing a DBA might flag. The BETWEEN on datetime point is one I see people get wrong constantly.

The table variable statistics point is correct and relevant, though it kinda lost that the recompile hint would get at least table cardinality. It’s nice that it was in the right neighborhood.

It’s sort of like if Helen had planted the bulbs a bit sideways instead of upside down.

I could take each of those points, decide if I agreed, ask to see a proposed fix, and then evaluate the fix. I’m in control. Mostly.

The big difference was that I was the one making decisions. The LLM was providing input, not output. It was a reviewer, not an author.

For actual bugs, it was even better:

-- Me: "This procedure sometimes returns duplicate rows.
--      Can you figure out why?"
-- [insert 200-line procedure here]
--
-- LLM: "The JOIN between #TempOrders and
--       dbo.OrderHistory on OrderID can produce
--       multiple matches when an order has been
--       modified more than once, since OrderHistory
--       keeps all versions. You probably want to
--       filter to only the most recent history record
--       per OrderID, or use a ROW_NUMBER() window
--       function to deduplicate."

That’s exactly the kind of answer I want quickly. It identified the specific join, explained why it produces duplicates, and suggested two reasonable fixes.

I didn’t have to go through six rounds of “no, that’s not it” with more and more all caps and curses being introduced before getting somewhere useful.

The only time I want to go through six rounds without being useful is at happy hour.

The Lightbulb Cometh

For all of the previous stages, I was the bottleneck.

The LLM would produce something, I’d review it, find problems, describe the problems in English, the LLM would try to fix them based on my description, ad nauseum, alpha and omega, prompt without end, amen.

It was like pair programming where my partner can’t see the screen. There are also a variety of cooking competition show similarities here. Probably something like two people trying to cook the same dish without being able to see what the other is doing, or something.

The question that changed how I work with LLMs was: Why am I the middleman, here?

I gave Claude Code access to sqlcmd, connected it to a development database, and instead of asking it to write a query and hand it to me to test, I told it to write the query, run the query, look at the results, and figure out if they were right.

At least as much as it could figure out if the results were right.

This is where the happy-path code tests tend to happen, but you learn to catch those and tell the robot to act more adversarial pretty quickly.

The difference was pretty good.

-- What I used to do:
--
-- Me: "Write a query to find customers with
--      declining order frequency"
-- LLM: [writes query]
-- Me: [runs it] "This returns 0 rows, that
--      can't be right"
-- LLM: "Oh, let me fix the date logic..."
-- Me: [runs it] "Now it returns every customer"
-- LLM: "My apologies, I had the comparison
--       backwards..."
-- [repeat 4 more times]

-- What I do now:
--
-- Me: "Connect to the dev database. Find customers
--      whose order frequency has declined over the
--      last 3 quarters compared to the prior 3.
--      Run it and verify the results make sense."
--
-- Claude Code: [writes query]
-- Claude Code: [executes via sqlcmd]
-- Claude Code: "Got 847 rows. Let me verify.
--   Checking a sample customer... CustomerID 4521
--   had 12 orders in Q1-Q3 2024 and 4 orders in
--   Q4 2024-Q2 2025. That looks correct.
--
--   But I notice the query doesn't account for
--   customers who are brand new in the recent
--   period. Let me add a filter for minimum
--   account age..."
--
-- Claude Code: [modifies and re-executes]
-- Claude Code: "Refined to 612 rows."

It went from six rounds of me squinting at code to one round of it figuring things out on its own.

And it caught a problem I hadn’t even thought to mention (the new customer issue) because it could actually see the data.

This pattern became my default workflow for anything involving existing procedures:

-- Me: "The procedure dbo.CalculateMonthlyRevenue is
--   returning numbers that are about 15% too high
--   for January 2025. Figure out why."
--
-- Claude Code:
--   1. Reads the procedure definition
--   2. Runs it with test parameters
--   3. Breaks the procedure into pieces, runs each
--      CTE/subquery independently
--   4. Compares intermediate results
--   5. Finds that a JOIN is including refunded orders
--   6. Proposes a fix
--   7. Runs the fixed version
--   8. Compares output to a manual calculation
--   9. Reports back with the fix and verification

All of that happens without me doing anything.

I’m not copying and pasting results.

I’m not describing output in English and hoping the LLM interprets it correctly.

It can just look.

Unit testing became way easier too:

-- Me: "Create a test scenario for dbo.ApplyDiscountTiers.
--   Set up test data covering: no discount, single tier,
--   multi-tier, boundary values, and NULL handling.
--   Run each scenario and verify the results match
--   the business rules in the procedure comments."
--
-- Claude Code creates a temp table with test cases,
-- executes the procedure for each one, compares actual
-- vs expected results, and reports:
--
-- "All 12 test cases passed except case 8: when
--  Quantity is exactly 100 (the tier boundary), the
--  procedure applies the higher discount tier.
--
--  The comment says 'orders of 100+ get Tier 2'
--  which is ambiguous: is 100 included in Tier 2
--  or not? Currently it is.
--
--  If that's wrong, line 47 needs >= changed to >."

That’s the kind of crap that would have taken fourteen rounds of back-and-forth to get to in the old workflow.

The LLM found the ambiguity, tested the actual behavior, cross-referenced it with the comments, and asked me the right questions.

All Good Things

The progression wasn’t about LLMs getting better (though they have, I believe). It was about me figuring out the right way to use them.

Asking an LLM to be an expert in SQL Server internals is asking it to do something it’s bad at.

It’s pattern matching against training data, and there isn’t enough deeply technical SQL Server content out there for it to pattern match well on niche topics. I’d often ask it loaded questions, get a bunch of absolute gobbledygook nonsense wrong answers, point it to a blog post with the right answer, and reap the apologies.

For every accurate article about parameter sniffing, there are fifty Stack Overflow answers that are somewhere between incomplete and wrong and terribly outdated,

Asking it to write code in isolation is slightly better, but you’re still fighting the fundamental problem: it can’t see what’s happening. It’s writing blind. It doesn’t know your data, your schema, your indexes, your edge cases.

So it makes guesses, and you spend your time correcting those guesses.

The sweet spot turned out to be giving it the ability to close the loop. Write something, test it, see the results, iterate. The same way a human developer works.

The LLM doesn’t need me to be its eyes and hands. It needs a connection string and permission to experiment.

I still do the final QA. I still review what it produces before anything goes out the door. I’m not handing the keys to the castle over.

But the amount of work that gets done before I even look at it is dramatically higher, and the quality of what I’m reviewing is way better, because it’s already been through several rounds of self-correction.

If you’re still arguing with chatbots or manually copy-pasting query results back and forth, try jumping ahead. Give it access to a dev environment. Let it run things. Let it fail and fix its own failures.

Stop being the middleman.

Enablement, and Other Drugs

For years, I’d wanted to build a monitoring tool for SQL Server.

Actually, I’m sort of lying. For years, I’d offered to help every single monitoring tool company make their monitoring tools less crappy.

What’d I get back? Crickets, runarounds, and a whole bunch of frustration.

Well, fine. They don’t care. I do. Perhaps I can take away enough of their money for them to someday care. That’s the goal, anyway. Or maybe they’ll just quit.

Either is fine, really.

So for a while, I started trying to talk developer-friends of mine into helping me build something. They’d all think it was a cool idea until the word “free” came up.

I’m not getting paid for anything, here. I’m probably losing work.

In fact, from what I can tell, the MCP tools I built in are doing a great job for people. I’ve gotten a lot of awesome feedback about them. But they’re also doing my job, so hopefully they’re also doing the jobs of lesser consultants, too. Better get back to the factory.

When I realized that developers were developing things using LLMs, I figured I could build off my SQL Server performance knowledge, and Vision Code myself exactly what I wanted. There was no Vibe Coding here, because I knew exactly what I wanted (pretty much, though that has evolved a bit since the initial process began).

I also, like, know what I’m doing and what I want and what’s right and I’m not just doing. a bunch of guesswork to get there.

From the beginning, I knew that I couldn’t skimp on things like security, so I made sure to use Windows Credential Manager from the start. If you don’t trust that, then you should probably uninstall SSMS.

But this introduced quite a new set of pain points with my robot enablers: they can’t run something and look at a GUI.

While they could run a query and validate results, it was me who had to iterate for long periods of time trying to get things visually correct. I’m still working out some things I don’t quite love in that regard, but the list is narrowing, at least. I can take screenshots and show them, but that doesn’t guarantee a good outcome always.

Work Flows

When people open issues, I have Claude review the issue, investigate the code, and come up with reasonable fixes for testing.

Does it always work? Nope.
Is Claude always right the first time? Nope.
Do I often find myself frantically hitting the Escape key to stop it doing something asinine? Yes.
I don’t believe you, Dave.

But is this something that I could do on my own? Learning WPF? C#? XAML?

Hell no.

The robots have enabled me to see this through, and continue to work on it at a pace that would otherwise be impossible.

And I gotta tell you, for something that’s only been out a couple months, I think it’s competitive-to-better than stuff that’s been around 10 or more years, and those people are charging you a lot of money for the privilege of being annoyed and unhelped by.

I’m going to keep at it until I wear out my Max plan.

But it’s not just this stuff. It has also allowed me to work quickly on:

Getting a TON of issues closed out in sp_WhoIsActive
Doing a crazy round of code review in the First Reponder Kit
Working out a bunch of nits and gripes in my own SQL troubleshooting scripts
Post the plan analyzer from the monitoring tool to a standalone product
Transcribe and summarize my YouTube videos
A bunch of personal projects

I can get a lot of stuff done and out into the world that I couldn’t before. Going from nothing to an imperfect something is fine with me. The world is full of imperfect software. At least I’m willing to fix mine.

An imperfect something is something that can be improved. Nothing is still nothing, no matter how much you think about it.

If you’re out there reading this, I’d encourage you to give something like Claude a crack at helping you build something that has always seemed too difficult, tedious, or involved for you to personally get started on.

It’s going to take some time and patience, and you’re going to have to supervise the process and output, but you’ll have something that used to be nothing.

Thanks for reading!

Going Further

The post Stay In The Loop, Not In The Middle appeared first on Darling Data.

Get AI-Ready With Erik: Vector Query Plans

Erik Darling — Tue, 17 Mar 2026 13:00:16 +0000

Get AI-Ready With Erik: Vector Query Plans

Summary

In this video, I delve into vector material and query plans, specifically addressing common pitfalls when working with vectors in SQL Server. I share insights from the full course on vector search and indexes, which you can access at training.erikdarling.com using the coupon code AIREADY to save $100 off. The key takeaway is to avoid putting vectors into a temp table; instead, use local variables or formal parameters for better performance. I also highlight how query plans can behave unexpectedly when dealing with vector distances and nested loops joins, emphasizing the importance of parallelism and memory grants in SQL Server’s execution strategy.

Chapters

*00:00:00* – Introduction
*00:00:31* – Query Plan Quirk
*00:02:09* – Distribute Streams Issue
*00:03:27* – Avoid Temp Tables
*00:04:56* – Vector Distance Filtering
*00:06:05* – Execution Plan Comparison
*00:08:07* – Memory Grant Considerations
*00:10:15* – Better Query Approach
*00:10:37* – Vector Search Performance
*00:11:42* – Summary

Full Transcript

Erik Darling here with Darling Data. And we’re going to finish out this week with sort of the last of the vector material. Now, two things about that. One is that this was not all of the content from all of the modules in the class. There are a few that I did not include in this video series. And the other is that this is not nearly, this is like a fifth of the content that’s actually in the full course. If you want to buy the full course, which hopefully some of these videos will persuade you to do. Hopefully I am persuasive in my tactics. You can go to training.erikdarling.com. And if you use the coupon code AIREADY, you can get AI ready with Erik, question mark. And you can get a hundred bucks off the course with that. We’re going to spend some time looking at query plans and solving vector stuff. I know that we’ve sort of peaked at them a little bit, but there’s still some fun stuff to talk about. And there’s, there’s actually one thing in here that I want to hopefully get you to not do when you’re doing that. And I only learned that lesson when trying to make the demos a bit more portable. So the thing that I’m going to try to persuade you to eventually not do is to put your vectors into a temp table and use that. And there’s a good reason for that. Now the first thing here is kind of a funny query plan quirk. I mean, the funny query plan quirk is we very clearly have a where clause on vector distance here. We have a cross join between these two tables here. And when we run this and we look at the execution plan, something kind of funny happens. The vector predicate is applied here at the nested loops join. You can see minus two, ba-da-ba-da, all that stuff there.

But this is the reason why I don’t want you to do this in real life. Everything looks okay here, except we have this distribute streams of the round robin variety. And this is not performance tuning content. I do have performance tuning content that goes deeper into this, if you’re quite interested in that. But the thing that you end up with here is that, you know, after the distribute streams, we have thread one with one row on it. And when we come down here, we look at the properties, all those rows over here end up on thread one.

So if I can convince you to not do one thing in this video, that would be to put your embeddings into a temp table and use that, you are far better off at this point in time, at this juncture, using a local variable or a formal parameter to store your embedding in for which to gauge the vector distance between two things on. Now, coming back to the query plan a little bit, you know, it is, you know, we don’t really have a better way of doing this at the moment. It is a cross join. So we do get like vanilla nested loops here. We don’t get apply nested loops.

We can’t get that with other query forms. But, you know, since the only like we’re basically just like saying, hey, I want all these rows and all these rows to match up and then I need to figure out which ones like have, you know, a reasonable vector distance between them, meaning they are reasonably similar. Right. And just for the data that I have in here asking for a vector distance of less than 0.2, remember that cosine is zero to the number two and the closer you are to zero, the better off you are. We want to just preemptively filter out any results that might not be terribly similar.

If we have this query without a filter on vector distance, we need to join into the post table in order to filter on anything else. So if we wanted to, say, compare tags in the post table to some search in order to sort of reinforce the vector distance accuracy a little bit, just by like, like, it’s not obvious from this, but the question, the embedding that I’m searching for is about Git. And so it helps, it helps a bit to look for Git in the tags of the post table, right?

So we can do that here. And of course, we get a bit more of a complicated execution line. This one, this one does not have the weird row imbalance stuff because there are copious repartitions of streams throughout this plan that help rebalance things. So, like, even over here, like, just because we get, we get a pretty good row spread coming out of the post table here.

I’m not even, I’m not saying that parallelism is terribly useful here. Like, for row numbers this low, almost like, it’s kind of dumb, but it makes a good comparison. For row numbers this low, it’s like, why are we even going parallel?

But, like, just because of all that, like, the stuff that ends up over here is usually also pretty well spread out. Like, I’m not saying it’s perfect. Some threads got no rows and others got, you know, weird numbers of rows. But it’s not like the one before where all the rows ended up on a single thread.

That’s, that’s, like, that’s crappy, right? Get your stuff together, SQL Server. But if we, if we, like, wind all that out and we just run this query where we’re saying, hey, just give me the vector distance order by distance.

This is where, the only thing that I really want to point out here is, I mean, we still get all the rows on one thread, right? So, like, this is still the issue here. But since, like, we’re, we’re saying, like, we want to calculate this distance, this is a runtime calculation.

Right? You don’t know this ahead of time. Like, based on this, it’s not even really, like, a good, like, like, computed column you can make. Because you have to, like, you don’t know what the vector is going to be.

It’s like, okay, well, you showed up. I got to use you. So, again, the, the, the cross-join to a temp table thing is not, not a particularly good idea. But it’s also particularly not a good idea because with all the rows ending up on one thread, and it doesn’t happen here, but it is worth warning you about.

With all the rows ending up on a single thread, you know, something like, again, this is not, like, a big performance tuning focused course. But, you know, all plans in SQL Server start off as serial plans. They only graduate to parallel plans under certain circumstances, the subestimated cost of the query being above cost threshold for parallelism.

There’s no, nothing inhibiting parallelism, stuff like that. And then, finally, the parallel plan has to be cheaper than the serial plan. When SQL Server assigns a memory grant to a query plan, that memory grant is assigned when the query plan is single thread, when it’s a serial plan.

And when it, if it graduates to a parallel plan, that memory grant gets divided equally across dot threads. So, if it was dot beta, we get divided by eight. With all the memory getting divided, and one thread having all the rows on it, it’d be very easy for this query to start spilling and maybe showing some signs of performance degradation because of that.

And you team that up with the fact that vector distance is a runtime calculation, and that’s what we’re sorting by, and there’s no way to index this, like, give that data to SQL Server an order. It can be a pretty dangerous game to play. So, you know, please, you know, avoid the temp table thing for these reasons.

You know, we’re only getting 10 rows up here, but if you go out and get bigger result sets, which probably wouldn’t be too, too common with vector search, depending on what you’re doing, but, you know, like, like, more rows does equal more memory, right? So, we run these two queries, the top 10 one asks for an 1152 KB memory grant, right? And, oops, I didn’t want that.

And the one that asks for the top 1,000 rows or whatever asks for a 700 memory grant. So, memory does graduate with this. Even though we’re not, like, saying, even though we’re not including, like, the embedding column in this, which is, you know, a big old thing, there’s absolutely no reason to look at the embeddings. Like, there’s no reason to return the embeddings to a user and be like, here was your array of floats.

Enjoy. There’s no reason for that. So, don’t do that.

But, if we were to add that to the select list, we would get much bigger memory grants out of these. So, generally, the better option is to assign your embedding to a local variable or formal parameter. I’m just using a formal parameter here to make life easy.

And I’ve written this query. Like, I decided to try and get a little bit fancy with stuff. So, like, rather than, you know, put the vector distance up here, I was like, well, why don’t we cross apply to the top one thing in here? And then we can sort that out out here, and we’ll filter out the original post out here.

Just, you know, throw some mud at the wall and see what sticks. This ends up being a fairly fair query plan to the vector search thing up here, right? Because we get, like, we start with the post embeddings table out here.

We get a very even row distribution here. And then on the inner side, we also get a very even row distribution. And, like, we just avoid the, or rather, not avoid, but we reduce the chance that any single thread will get all the rows.

And the memory grant division for a parallel plan would be less likely to spill. Because you don’t have, like, one thread that has an eighth of the memory with seven threads that have seven eighths of the memory but no rows, right? So, just sort of a better way of doing things.

Of course, vector search is, you know, much, much faster, but it is still in preview along with vector indexes. But, you know, all those query plans took somewhere between, like, 200 milliseconds and, like, 1.2 seconds. This, you know, the vector search thing, you know, it is much faster.

There are trade-offs with it, though. You know, like, again, like I said, like, creating vector indexes is incredibly painful. I was working up some different demo stuff last night, and I was, like, I’m going to make a 10 million row table with embeddings in it.

It took me six hours to create a vector index on that thing. And I was, like, I was, like, I’m going to start this and go to bed. So, and then, like, even, like, a million row table took, like, a half hour.

So, you know, like, it’s a real nightmare at this point. Like, I got, and again, it’s in preview, so you shouldn’t probably be using this stuff, but when it gets, when this stuff finally gets out of preview, it’ll be, make your life a lot easier, like, performance tuning vector stuff. There’s even trade-offs in here, though, because, you know, like, the vector distance thing is an exact, is sort of, like, an exact search where it’s, like, it will give you the exact distance between one vector, one embedding and another, one vector and another.

With vector search, it’s more approximate, because you’re sort of, like, navigating this graph structure in the index, but we’re going to talk more about that in another video. Anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. I hope that you will buy my wonderful educational content from me at a discount, which you can get the coupon code for that down in the video description. So, thank you for watching.

Going Further

The post Get AI-Ready With Erik: Vector Query Plans appeared first on Darling Data.

SQL Server Performance Office Hours Episode 52

Erik Darling — Mon, 16 Mar 2026 13:00:00 +0000

SQL Server Performance Office Hours Episode 52

To ask your questions, head over here.

Summary

In this video, I delve into a series of questions submitted by viewers during our office hours session, offering detailed and thoughtful answers to topics ranging from the nuances of parameter sniffing and memory grants in stored procedures to the complexities of index maintenance, query tuning for cloud environments, and the trade-offs involved with using columnstore indexes. I also discuss the impact of reducing CX weights on query performance and stability, as well as why batch mode can sometimes slow down OLTP queries despite making analytical queries faster. Additionally, I explore the challenges of achieving accurate statistics updates in a full scan scenario and touch upon the architectural considerations when implementing columnstore indexes for hot OLTP tables. If you have any questions or topics you’d like to see covered in future office hours sessions, feel free to ask them through the link provided in the video description.

Chapters

*00:00:00* – Introduction
*00:01:00* – Office Hours Overview
*00:02:02* – Upcoming Events
*00:03:23* – Answering Questions
*00:04:16* – Query Performance Discussion
*00:08:12* – Cloud Savings Explanation
*00:10:19* – Statistics and Estimates
*00:12:01* – Columnstore Trade-offs
*00:15:15* – In-Memory OLTP Usage

Full Transcript

Erik Darling here with Darling Data, and you may have guessed, based on various temporal elements, that it is Monday, and Monday means it is time for office hours, where I answer five, as long as I count correctly, user submitted questions to my highly advanced Google Sheets document, and do my best to give good answers, good thoughtful answers, to those. You can find the link to ask those questions down in the video description. While you’re on your way to that link, perhaps, to ask a question, you will see all sorts of other mutually beneficial links down there, in which you can hire me for consulting, which is a great idea. You can buy my training, which is, likewise, a fantastic idea. You can become a supporting member of the channel. For as few as $4 a month, you can say, good job, Erik. Thank you for spending hundreds of hours a month, recording YouTube videos, and coming up with content for us. And then, of course, you can ask that office hours question for free. Amazing. Is there no limit to my generosity? Yes. No. Not sure. And, of course, if you enjoy this content, if you think that someone else in your life may also enjoy this content, please do like, subscribe, and tell a friend, because that’s the nice thing to do. And that is free for all of us. So, I will be out in the world. You know, it is February, currently, at least, I mean, it’s not February when I’m recording this, but it’s going to be February by the time you see this. So, I don’t know, you can maybe guess when I’m recording this based on that, but I’m not leaving the house for a bit. It just snowed like 52 feet here in New York. It looks disgusting already. It is various shades of not white.

out there in the snow. But, but, but as soon as that snow melts, as soon as that, that snow abates a bit, I will be back out joining the world. Baseball will be in the swing of things. It’ll be a marvelous time for all of us. Data 2, Nashville, March 6th and 7th. Data Saturday, Chicago, March 13th and 14th. SQL Day, Poland, May 11th and 13th. Look, I missed the, I missed, I missed the on Poland. I gotta fix that. And of course, Data Saturday, Croatia, June 12th and 13th. At all of these events, I will be teaching my advanced T-SQL pre-con. Because I only have 6.5 hours roughly to teach my advanced T-SQL pre-con, and there is much more to say about advanced T-SQL, all attendees will get free access to my learn T-SQL with Erik course, which encompasses both beginner and advanced material and is far, far longer than the 6.5 hours these training days allow.

So, if you want to get a tremendous amount of bang for your buck or whatever currency these other things use, you should come to that and have a great time. I don’t know. Maybe we can both learn something from each other. But with that out of the way, we have flipped the calendar unofficially to February here. Still winter. Sucks.

Asked ChatGPT to come up with a good, like, shining-y, you know, like, I’ve been stuck in the house all winter vibe. And, you know, aside from this one up here, which has, like, some extra fingers, and that one over there that has a backwards axe, I think, you know, like, the face in the doorway over here is particularly good. I like that one. So, I don’t know. I have some time to think about March. I don’t know what March is going to be yet.

We’ll find out when we get there. But, anyway, let’s answer some questions, right? Let’s do that thing that I promised you I would do. We lowered MacStop. We! Like, it took more than one of you? How many people does it take to change MacStop? I guess there’s a joke in there. How many DBAs does it take to change MacStop? Anyway, to reduce CX weights.

All right. That’s a reason to do it, I guess. And queries now run slower, but look more stable. Is that a net win? Well, of course, your queries run slower. You reduce MacStop.

Let’s just theoretically say you reduce MacStop from 8 to 4. You now have four fewer cores with which to process your queries with. There is some chance that they would get slower by, like, half, right?

So, if your query runs in two seconds at MacStop 8, and you reduce that query to MacStop 4, it might run for, like, two seconds instead of one. So, what I’m struggling with a little bit here is when you say, look more stable. I don’t know what look more stable means.

Is this like a tortoise versus the hare situation? Is this slow and steady wins the race? I don’t know what more stable means. You’re going to have to give me some more information about the stability you speak of, because I have no insight into what more stable means to you.

Is it a net win? I don’t know. Are slower queries ever a net win? I suppose if you are hitting thread pool weights or, you know, like, very, very high CPU percentages constantly, then perhaps it is a net win.

Perhaps your server is just a bit less stressed out, worn out, running out of resources. I don’t know, but golly and gosh, I don’t know if I can call that a net win, because I don’t know what you’re measuring stability with.

I don’t know what criteria you find these queries to be more stable by. So, yada, yada, yada. Batch mode made our analytical queries much faster, but OLTP queries slowed down.

Why does batch mode sometimes hurt mixed workloads? Well, a lot of it, actually, I’ll give you the prime thing that I see when batch mode on rowstore, which I assume is what you’re talking about here, because, you know, OLTP, like, why else would that be a thing?

I’ll give you the primary thing that I see is the optimizer decides perhaps that some facet of one of your OLTP-ish queries would benefit from batch mode. So, it could be an aggregate, but more commonly, it is a join.

And most commonly, what happens with joins is that SQL Server starts choosing adaptive joins, which aren’t a bad thing on the face of them. I’d rather appreciate SQL Server no longer just YOLOing with a hash join or a loop join, regardless of how many rows actually start showing up.

But what it can do is take what is typically a nice, efficient, usually single-threaded nested loops join plan and throw a little bit of a monkey in that wrench by starting the plan off with a batch mode hash join and switching over to a row mode nested loops join if enough rows are not emitted from the outer part of the join.

So, and that is primarily what I see happening, is that, you know, what used to be just a very quick, simple navigational query all of a sudden starts getting weird. And, you know, I’m not saying that you should be scared of batch mode on rowstore, but there may be certain queries that you might have to intervene with and say, please don’t do that.

Like, the easiest way is just to say option loop join for those. But typically, that is what I see. All right.

We tuned queries heavily. I mean, I wasn’t there, so I don’t believe you. But didn’t save much money in Azure. Well, no kidding.

An overpriced resort that that is. Why doesn’t query tuning translate directly into cloud savings? Well, let’s think about that for a moment here. You tuned queries heavily.

And let’s say in the course of your heavy query tuning, you reduce CPU from like 80% to like 30%. Typically, where one starts to realize cost savings in the cloud is when one reduces the size of their cloud hardware. So if you, let’s say, you know, tuned all your queries and made things better in that way, it would have to coincide with some reduction in cloud resources to see money get saved.

It kind of reminds me of the question where it’s like, I got rid of all this data in my database, but the database file is still huge. What happened? It’s like, well, that database doesn’t shrink itself, right?

That Azure instance doesn’t shrink itself. So, you know, what are we going to do here? But anyway, my rates are reasonable and I’m happy to help you save money in the cloud.

Aren’t I nice? We run full scan stats nightly, but estimates are still awful. Why does perfect sampling still produce terrible plans?

Well, it may be a full scan sample, but let’s consider a couple of things about statistics, shall we? First is that you, no matter how big of a scan you use, whether it’s full or not, SQL Server still populates the same 200 step histogram with whatever values it finds during that scan, whether it be sampled or full.

It’s not like you produce bigger histograms just because you do a full scan of the stats. You might already have a two-step histogram. The full scan stat might describe certain portions of that histogram more accurately, but there might still be a lot of information that is not well captured by that.

If that’s the case, you might consider filtered indexes or statistics in order to give the optimizer more information about portions of your table data that is not well represented by the histogram.

The other thing to consider is that you might be writing your queries in stupid ways. You might be doing all sorts of things that may inhibit SQL Server from accurately guessing cardinality for things.

Local variables, table variables, non-stargable predicates, those are some things that immediately sort of spike the ball on that. There’s all sorts of other things you might be doing. You might just have terribly complex queries, which would benefit from being broken down into smaller chunks.

So you cannot rely on statistics updates for everything. They are one part of the picture. There are many other parts of the picture for you to consider.

All right. I wonder if this is the same person. It sounds pretty similar. All right.

Let’s see what we got here. I don’t know why this square is so much bigger. columnstore fixed our reporting queries but wrecked inserts. Really? columnstore wrecked inserts. That’s interesting.

What architectural trade-offs should we expect when adding columnstore indexes to hot OLTP tables? So my initial surprise here is because I typically don’t see columnstore doing too terribly with inserts.

And the reason for that is that, you know, since you’re talking about OLTP, I would assume that you are talking about the OLTP inserts, which are usually a very, very low number of rows.

Generally one, but, you know, I would say maybe up to 100 or something would be reasonable for an OLTP workload before it starts, you know, getting into different types of, you know, different, you know, families of queries, like analytical stuff.

So I’m a little surprised at that because those inserts typically won’t touch the columnstore itself. Inserts that are too small to qualify for instant compression into the columnstore just get inserted into the Delta store, which is sort of a clustered B-tree rowstore brain leach on your columnstore.

So I’m a little surprised at that. If you had told me that updates or deletes were having a tougher time because of the columnstore, that I would believe because, you know, updates with a columnstore are, you know, pretty much the, you know, you delete the old row and insert the new row, and then the new row gets inserted into the columnstore, but you still have to like decompress the old row and all the other stuff.

So a lot of things get, you know, there’s a lot more going on behind the scenes with updates. Even deletes usually aren’t like that beat up by it. You know, you do like just sort of use the deleted bitmap thing in there and like mark the row as a tombstone row or a ghost record or whatever the one million different things Microsoft has called it is.

So I’m not sure why your inserts were hurt so badly by that. As far as architectural trade-offs go, well, you know, mostly it’s the stuff that I just talked about.

You know, you get your either reporting style queries or your kitchen sink style queries where, you know, like sort of, you know, you don’t know what the where clause and the select list and the order by is going to be.

It’s non-clustered column stores. Marvelous for getting those types of things sorted out. But, you know, as far as architectural trade-offs go, what some people will do, and I am an occasional proponent of, is to sort of, you know, separate your data out a little bit, have your hot data that might be seeing frequent modifications sit in a normal rowstore table.

And then sort of once something is past the point of, you know, modification, like frequent modification, you might like move that to sort of an archival thing that is, that has a columnstore index on it.

That’s probably the most common of them. In extreme cases, some people will even use like an in-memory table for the really hot data. It’s sort of like a shock absorber table.

A few of my clients in the online gambling space and in the high-demand consumer shopping goods space is spaces.

I don’t know if that’s, I don’t know how many that actually spans when I think about it. I think maybe it’s just all one space where people buy too much crap. The in-memory OLTP thing is very useful for them.

But if you just have like a generically like hot workload for recent data, like, you know, the last week or month or something, you know, you could keep that in a rowstore table.

You could, you know, move that off in like a big batch to a columnstore table and see most of the benefit of having columnstore for your more historical data and rowstore, which has less insert, update, delete, overhead for the hot data.

It’s just sort of up to you to figure out how long something is sort of useful for. For a lot of people, it’s not a very long stretch. You know, you know, you place an order for something like what happens later, like, you know, like you get like a ship date, right?

Like, and then you get like maybe some tracking notifications or that’s not even like in the table itself. And then you get like a delivery date, which probably updates, you know, the order or something.

But, you know, for the most part, like, you know, most stuff is done within like, an hour to days of it, of its lifetime there. You know, like even things like stock trades, you can only do that once.

I mean, you can place a whole bunch of trades, but it’s not like, like after you make a trade, you’re like, oh, wait a minute. I meant three, no, 12. And then like the day goes by, you’re like, ah, I meant 15.

Like there’s a lot of stuff that, you know, like it’s, it’s, it’s hot time, right? It’s moment in the sun is, is fairly short. But like, even like an Amazon order, which is kind of like what I was just talking through, unless someone like cancels it, but that even that’s going to be, have to be pretty quick, right?

It’s not like, it’s not like months later, someone’s going to be like, I’m still waiting on that cancel, right? It’s kind of silly. Anyway, that is a sort of architectural stuff that immediately comes to mind.

I don’t know more about your situation. So it would be pointless and fruitless to, you know, try to think of sillier situations you might be in.

So if you want more details, well, my rates are reasonable. Anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I will see you in tomorrow’s video where we will do some other SQL Server stuff. All right, thank you.

Going Further

The post SQL Server Performance Office Hours Episode 52 appeared first on Darling Data.

Get AI-Ready With Erik: Generating Chunks

Erik Darling — Fri, 13 Mar 2026 13:00:55 +0000

Get AI-Ready With Erik: Generating Chunks

Summary

In this video, I delve into the intricacies of managing long content when using embedding models like those from MXBAI and OpenAI ADA002. With token limits being a critical factor, I explain how these limits can be reached quickly in scenarios such as Stack Overflow posts, where question bodies and answer bodies can easily exceed the 512-token limit for some models or the more generous 8191-token limit of others. To address this challenge, I introduce Microsoft’s chunking method, detailing its components like source text, chunk type (fixed), size, overlap, and how they interact to preserve context while respecting token constraints.

Chapters

*00:00:00* – Introduction to Chunks and Token Limits
*00:03:45* – Stack Overflow Database Considerations
*00:07:28* – Example Text Chunking
*00:10:09* – Overlap in Chunks
*00:12:04* – Conclusion

Full Transcript

Erik Darling here with Darling Data, and we’re going to finish out this week talking about a subject that is near and dear to my heart, a subject that I have a lot of experience with, and is dealing with long content. Because, you know, I have written some pretty long blog posts in my life. I’ve written some extensive training material in my life, and I’ve got to manage a lot of long content day to day. So, we’re going to do that. This is, of course, all videos, all snippets, tidbits, tiny chunks, right, near shadows of the full course material from my class course, whatever you want to call it, my correspondence course. You can become a locksmith, repair guns, crack safes, get AI ready with Erik, get AI ready with Erik, where you can learn in full, gruesome, gory detail. About all of these things. And, you know, so we’ve got that going for us. And my green screen is acting a little funny in the background, but not so funny that I’m willing to stop.

So, embedding models, as you either know now or will know in the near future, have things called token limits. If you’ve ever used an LLM to any degree, you know, you might have, you know, if you’re using like a web interface, you might see something like, this conversation length is done. That’s enough. We’re over here. We have reached our context. And this is token driven. And if you’ve ever used, you know, a more professional grade LLM product, like say, Claude Code or something like that, you may have noticed it slowly counting up cursors as it does, tokens as it does things.

Where it’s like, you know, it’s like, you know, it’s like, thinking about it. I read this file. I’m thinking about it. And like, you just see like the number of tokens rack up and up. And then you hit a certain amount of tokens and they charge you more money. So, embedding models have token limits. The one that we use for the course, the MXBAI, MXBAI EmbedLarge has a 512 token limit. OpenAI ADA002 has an 8191 token limit. And there are some other ones that have much smaller token limits like the MiniLM, blah, blah, blah.

Stack Overflow content, for a lot of things, will fit just fine into the 512 token limit. Titles are usually fine because they’re less than 300 characters. But bodies are often long. They are often 1,000 to 10,000 or more characters. Some long content, right? Some pretty verbose answerers on Stack Overflow. And that’s even just for like question bodies. And of course, answer bodies, which is probably more in line with what I was just talking about, can be very long.

You’ll have code examples. You’ll have all sorts of stuff in there. You know, quotes from documentation that are extensive. Things like that. That may exhaust the token limit for your model. You don’t want an exhausted model, I can tell you that much.

And what’s, I think what’s especially, you know, maybe surprising to some of you out there is that you, if you exceed the limit of tokens, then the model just stops, right? It just silently truncates the rest of it. It just leaves a whole bunch of stuff out. Which, you know, can be not great if you’ve got a very long answer that’s full of very good information and, like, very important details.

And the first, you know, like, let’s say, I don’t know, 1,000 characters or so is just sort of like, you know, preamble. You probably ain’t training stuff right, right? You’re not going to have very good context for things. So, Microsoft has given us a way to deal with long content in which we can generate chunks.

I don’t know who names these things. Just, like, like, obvious barf references. Why? Why? Why are you going to call it chunks?

What are you doing over there? Ah, generating chunks. Feeling okay? No. So, generating chunks requires, well, requires three, but there is an optional fourth input to your chunk generator. There is the source with which you wish to generate chunks with.

There is chunk type, which currently, I believe now, you can only do fixed. And then there is chunk size, which is the number of characters with which you wish to chunk. Then you have this other thing called overlap.

And overlap can be interesting because if you’re dealing with something that’s, like, paragraphs, like, and you generate a chunk of, like, let’s say, I don’t know, like, 300 characters, but the paragraph itself is, like, 600 characters, the next chunk you generate is going to be just, like, the second half of the paragraph.

And so you might want to have your chunks overlap a little bit so that, like, you retain some context from, like, the first half of the paragraph into the second half of the paragraph, right?

So, like, you kind of want to, like, blend things a little bit, right? It’s kind of like your blender tool where you can say, like, well, you know, this is a really important, like, and you’re not going to be doing this line by line when you generate chunks or else you might actually generate chunks.

But, like, you’re not going to do this line by line, but just, you know, like, like, thinking about a situation where, like, well, like, you’re reading through a paragraph and you’re like, that’s an important sentence.

And then, like, the next sentence kind of, like, carries on with that. You’re like, like, that’s the kind of, like, contextual stuff that you want to carry across so that when you’re generating embeddings, they sort of, like, retain more context across the chunks that you generate.

But AI generate chunks will return a table-valued result with the text fragment that is being chunked. And that would feed into the AI generate embeddings function.

The chunk order, right? So you can see, you know, I guess, like, breakfast, lunch, or dinner. And then you have the chunk offset, which is the position in the source.

So maybe that was, like, an afternoon snack. And then you have the chunk length, which is the number of characters in the chunk. So just a very basic example would be something like this, where I have this text column that is just a string that I’m selecting in here.

And we have our, we’re going to just cross-apply generate chunks here. And we have our source pointing to this thing. And we have our, well, we can only say fixed here.

But then we’re going to just give me a chunk size of 50 characters, right? It’s not, that’s not a best practice. That’s not, like, what you should do. You have to figure that stuff out.

This is just the example I’m using based on the piece of text that I have there that made for a reasonable demo. So that’s what we’ve got, right?

Again, 50 might be great, right? Like, cost threshold for power might be amazing. I don’t know. Anyway, here’s what we get back. And here’s what we see in here.

Now, there are some weird, like, there are some things you have to consider with this. And, like, one of those things is very specific to the Stack Overflow database in that the body column in the post table, like, it has, like, HTML formatting in it, right?

It’s sort of, like, Markdown-y formatting kind of. But it’s all, like, there’s a lot of, like, brackets and stuff. Now, like, it’s not, like, you know, an embedding model is going to consider those things important, but they are characters that do contribute to your token limit.

So, like, if you’re, like, depending on the cleanliness of your data, like, and, you know, how many tokens your embedding model allows, you might have to seriously think about, like, cleaning out these, like, nonsense things, like, you know, like, BR and, like, you know, like, H1 and H3, stuff like that.

But this is what we get back. We get back four lines. We see the order of the chunks. And, like, we see each of these chunks on one line, and they’re all 50 characters, right?

But notice, like, also, like, there are, like, big spaces in it. Like, the spaces contribute to this too, right? So, like, spaces, like, any character, right, whether it’s useful or not, contributes to the chunk size.

If you have empty lines in your stuff, they, like, you’ll get empty lines back from generating chunks, right? So, like, it’s like, I am a line with all these empty things in it that will do no one any good whatsoever.

So, like, sometimes you might have to filter out, like, you know, you might want to, like, you know, pre-filter garbage from here and just say, like, well, you know, if you don’t have any, like, useful characters in you, I’m hacking you out because you’re no good, right?

So just this simple where chunk is, like, you know, like, A to Z and 0 to 9, right? So, like, anything that actually has that in it will keep. Anything else, go away, right? Like, you’re not words anymore.

Like, I don’t know, it’s, well, I mean, I say that from a very, like, you know, you know, like, anglo-centric point of view where these are my letters. If you can do that with your letters, right?

Like, lesson learned, make sure your letters are in there, your alphabet, whatever language you’re doing this in, make sure you’re represented. That’s mine, so that’s what I’m doing. But overlap is there to prevent losing context at chunk boundaries.

So, what each, like, each chunk will include some text from the previous chunk. If you set overlap to 10, that means you will have a chunk size of 10%. So, like, we say, we do all this and say overlap equals 10, right?

And we run this, then we get back this, right? We get back five rows, right? And, you know, like, embed, and embed, and, you know, like, we don’t, there’s not a lot of overlap there.

If we crank the overlap up to, say, like, 25, right? So, let’s make that a little bit bigger, then the results change a bit, right? Like, and, actually, go away, red gate. And then if we do 50, we will get far different results.

Now we get back seven rows of stuff. And, actually, it’s probably more, probably a little bit more illustrative to just run these all together, right? And just kind of see how things change across these.

And so, like, you know, you can kind of see that, like, you know, like the, these don’t exactly line up in the exact same way, right? Like, the alignment here is not the same across, you know, a 10% overlap and a 25% overlap. And then a 50% overlap, which actually, like, causes more rows to get produced by that because we end up having more chunks of 50 characters because we preserve more from the, like, the previous row.

So, if you have long content, and long content, of course, will vary by the size of your text that you’re dealing with and the token limit of the embedding model that you are using, you may need to consider generating chunks. Thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I’ll see you next week. Another one. All right.

Goodbye. Goodbye.

Going Further

The post Get AI-Ready With Erik: Generating Chunks appeared first on Darling Data.

Get AI-Ready With Erik: Embedding Freshness

Erik Darling — Thu, 12 Mar 2026 13:00:50 +0000

Get AI-Ready With Erik: Embedding Freshness

Summary

In this video, I delve into the topic of keeping embeddings fresh in SQL Server databases, a critical aspect often overlooked due to its complexity and lack of straightforward solutions from Microsoft. I explore why regenerating all embeddings can be impractical and discuss more efficient methods like using queue tables and triggers to handle incremental updates. By walking through the setup process and providing examples, including how to create a computed column (despite SQL Server’s current limitations), I aim to equip you with practical strategies to avoid embedding drift and ensure your data remains relevant and accurate. Whether you’re dealing with user profiles, content management systems, or any application that relies on embeddings, this video offers valuable insights into maintaining the integrity of your data.

Chapters

*00:00:00* – Introduction
*00:00:31* – Embedding Freshness Options
*00:01:09* – Microsoft’s Approach Critique
*00:02:14* – Table Creation Challenges
*00:03:06* – Queue Table Setup
*00:04:03* – Trigger Implementation
*00:05:30* – Embedding Process Details
*00:08:12* – Preventing Embedding Drift

Full Transcript

Erik Darling here with Darling Data and more AI embedding goodness for you fine folks out there in the world. We’re going to talk in this video about embedding freshness or keeping embeddings fresh in SQL Server because right now there is not a terribly good story with doing that. So embeddings of course get stale if the source text is edited, new records are added, or records are deleted. Your options are to regenerate all of them, which can be slow and expensive, or you could use some facility inside of SQL Server to track changes and update incrementally, which is probably the smarter thing to do. We are not going to do the stupid thing because the whole point of this course is to help you avoid the stupid thing, right? Don’t do the dumb thing. If you can take anything away from all of this, don’t do the dumb thing. Let Microsoft be your cautionary tale. Don’t do the dumb thing.

So again, coming back to our dear friends at Microsoft, you would think that they would make this easy, but no. Apparently they didn’t want to make this easy, right? And I don’t know. I kind of don’t blame them on this one, but it would be nice if this were a little more convenient. So like if I wanted to create a table and I wanted to say, hey, I want a computed column. You can already see this squiggle.

Now, be fair here. All right. SSMS puts red squiggles under a lot of stuff. It’s completely valid. Like, just has no idea what’s going on most of the time and is lost. This red squiggle is, this is a legitimate red squiggle, right? This is there, right? There for a reason this time, at least. But we cannot create this table and the error messages are rather amusing.

It will tell you that the database model doesn’t exist. Make sure the name is entered correctly. I think I get this one because my, my SQL instance is case sensitive, but it’s still, it’s still funny. Right?

That might make you freak out. Uh, but we’ve got incorrect syntax near the keyword use. So, uh, we can not create a computed column that will just generate this stuff for us. SQL service is like, nah, no.

And who knows? Maybe we’ll get this someday, right? Maybe this is a preview feature too. We don’t know, right? We, who could, who could, who could foresee the future when there’s no communication between the PMs who manage things and, and the end users who care about things, right? Why, why would we ever want to know these things?

But, uh, what you would generally want to do is in this, you know, again, uh, something that is very database, right? Something that is very sort of natural to, um, you know, most people who have had to manage databases is to create a queue table, which we could do here. And we can do that.

And we can set this up in a way that will give us, uh, something to work off of in which we can tell what, what action we need to take on a row and sort of just process through this queue table and do our updates. I’m going to give you an example trigger for, uh, for inserts. Um, in real life, you would also want to create other triggers to handle updates and deletes, right?

You would not want to put everything into one trigger because life gets real weird and complicated when you do that, right? All that sort of checking the inserted and deleted tables, uh, make separate triggers. What’s wrong with you?

But, uh, you would do something like this where you would do the normal things, the, the, the, the standard, the canonical trigger header, right? You bail out if row count big is zero. Uh, you make sure that no other sessions have interfered with important settings, set options, uh, at the, at the session level.

And then you would insert into your queue table, this stuff, right? These, this information from, um, when stuff got inserted into, uh, the, the, the post table, right? So like data goes into the post table.

We put that row into the queue table and then the queue table would process, right? Then we’d have like something else that would process, um, all the other stuff. From here, you would probably want, you know, a cursor, uh, like a, like a separate, you would want to store procedure that like works off the queue table.

And you would want to like loop over it, right? Cause again, like generating embeddings is not free, right? It’s an external call.

It’s outside the database. SQL Server does not generate embeddings on its own. You have to call out to a separate service. It could be a local service on your VM, uh, with a mix of Olama and something that helps SQL Server talk to Olama. Uh, cause Olama only has HTTP, uh, as a, um, as a protocol does not have HTTPS that SQL Server requires.

So you need something that sits between, um, you know, Olama and SQL Server to sort of deal with the HTTP, HTTPS stuff. Um, and, uh, you know, again, it’s not free, right? It’s not happening within the SQL engine.

It’s outside SQL. And if you were to use open AI for this and you had to make an internet call, God bless. You know, I used to make fun of people. Cause like, you know, like I’ve, I’ve worked with lots of like, you know, clients who have had to install software made by people who hate them.

Right? Like I made, I made a little joke in, I think the last video about like, I’m a software developer who loves databases. Like you would have the people like make, you know, like they have a trigger on a table that would use XP command shell to call like curl.

And, or like call like an executable that would go do something on the internet. And you’re like, Oh, of course it sucks. So what’s the matter with your face?

Like why, why would you do that? But here would you, you would, if you just had a store procedure, right? Like, you know, again, like, you know, row at a time or batch process this stuff, you would want to find any pending items. Um, if it’s a delete, then, you know, it’s easy enough to just delete a row.

Um, like I only had an insert trigger up there, but again, you would want, you know, update and delete triggers to handle other modifications to the table. And if merge fires, God help you. Um, and then you would want to delete and reinsert any rows that were marked as an update by the trigger.

And then if it was an insert, then you would just stick the new row in. Um, you know, you could also use last activity date to find change posts, to find any posts that were altered. Uh, you wouldn’t need triggers for that, but you would like, you would have to pull the table constantly for stuff.

Um, the trade-offs there are of course on the pro pro side, like, you know, triggers can be, especially under concurrency triggers can be tough to, you know, get right. Um, it’s a bit of a simpler setup, but, um, you know, then you like the, like the cons there are, you know, it’s, it’s possible that you might miss some changes. Um, you know, if you like paralyzing the process, like if you want to have like multiple workers, I’ll looking at the queue table for stuff that can get complicated.

So, you know, a lot of locking hints and other things go into, um, go into making that work, you know, as flawlessly as possible. Uh, but then you also have like the overhead of, you know, pulling the table. And then, you know, when you find work to do going and doing work, but the basic idea behind either of them is that you, you know, find posts that without embeddings, uh, that need them.

Uh, you would find modified posts that need embeddings and re-embed them. And if you delete a row, then, you know, it doesn’t matter cause that doesn’t go into the polling table. Cause there’s nothing to do.

It’s just gone from the table. But that’s like sort of like, you know, some, like some ideas around how to keep embeddings fresh and up to date in your database. Because one thing that you don’t want is like embedding drift, right?

You don’t want like, you know, um, like say like our example in the last video was the about me thing in the users table, right? Let’s say someone’s like, I’m a software developer who loves databases one day. And then the next day they’re like, um, like I’m a hot yoga instructor, right?

Like, like, but your embedding would still say, I’m a software developer who loves databases. And so like the, like the headhunters from the stack overflow job board would be like, let me get you this job. And like, sorry, I’m just all hot yoga.

I’m just all hot yoga now, right? You want some hot yoga? I got you. But this, and they’re like, but the embedding said you, you were a software developer who loves databases. And everyone’s just walks away sad and confused and no one actually does hot yoga.

And well, what can you do? Anyway, that’s enough here. Thank you for watching.

I hope you enjoyed yourselves. I hope you learned something and I will see you in tomorrow’s video. The final video of the week, our Friday, our gal Friday. And, uh, we’ll, we’ll, we’ll, we’ll do that, I guess.

Thank you for watching.

Going Further

The post Get AI-Ready With Erik: Embedding Freshness appeared first on Darling Data.

Get AI-Ready With Erik: Generating Embeddings

Erik Darling — Wed, 11 Mar 2026 13:00:34 +0000

Get AI-Ready With Erik: Generating Embeddings

Summary

In this video, I delve into the intricacies of generating embeddings in SQL Server 2025, Azure SQL Database, and Azure Managed Instance using T-SQL’s AI generate embeddings function. However, I caution viewers against performing these operations directly within SQL Server due to potential inefficiencies and external dependency issues. The demonstration involves creating a `users_to_embed` table with an ID column and a vector data type embedding column, illustrating the process of generating embeddings for user profiles that include their “About Me” descriptions. While this method works, it highlights the limitations and challenges of using SQL Server for such tasks, suggesting that Python might be a more suitable environment for handling these operations efficiently.

Chapters

*00:00:00* – Introduction to AI Generate Embeddings in SQL Server
*00:03:15* – Generating Embeddings for User Profiles
*00:07:24* – Finding Similar Users Using Vector Distance
*00:10:10* – Best Practices and Considerations
*00:11:43* – Conclusion and Next Steps

Full Transcript

Erik Darling here with Darling Data. And in today’s video, we’re going to talk a bit more about generating embeddings in SQL Server 2025, Azure SQL Database, and Azure Managed Instance. To do so, we are going to continue to invoke our T-SQL friend AI generate embeddings. But this time, what I want to do is kind of show you why you might just want to stick with Python on this. Do it outside of SQL Server for the love of God. So, I’m actually not going to drop in, recreate this table because I’ve already got it loaded up and you will, in short order, see why I did not decide to do this all from scratch.

We’d be wasting some time. So, this would be our sort of demonstration table, right? We would have all the things that we would need to be AI ready, like yours truly.

We would have an ID column, which is an integer, right? Because if we wanted to be forward thinking about someday having a vector index on this table, if they ever become not horrible, if they ever have to ever finish that damn thing, instead of like the Homer Simpson car features that we have now. So, we would want to have that up there, but who knows if that restriction will be relaxed when the vector indexes stop being horrible, right?

Like, maybe we’ll be able to use a big int. Like, hmm, you got fabric. Anyway, this is the table we would use and we would have our wonderful embedding column here, which is a vector data type that accepts a 1024 exactly dimension vector and is currently using the float32 model of storage rather than the float16 preview model of storage.

Now, in real life, you know, if I were running through this, I would want to stick maybe about 50 users into the table who have an about me column, which is a reasonable length, right? Because we’re going to be using the about me column for this one, right? For various reasons.

So, like, let’s say, you know, like, there’s lots of stuff in the Stack Overflow database that one could wish to embed, right? We’ve talked about sort of like the post table where you have titles and then you have bodies and, you know, stuff like tags, you know, you probably wouldn’t want to have, you probably wouldn’t want to do that. But then, like, you know, in the comments table, there’s the comment text column and, oh, screw comments.

But then, like, in the users table, you know, like, location, probably not, website, probably not. But about me would actually make a pretty reasonable embedding because what you could do is take the things that someone has written about themselves, right? The kind view that someone has taken on their Stack Overflow profile.

And it’s like, I am the best coder in the world. And you could search for people who might be good at, I don’t know, answering a question or maybe like even, you know, that Stack Overflow job board, which saved the company, thankfully. You know, hey, thank God we have that job board.

And, you know, you could even find maybe people who would be good matches for a particular job if you were to embed their About Me profile and, you know, do some semantic searching on it. Because you could do some similarity search between things in the job posting and what people are out there in the world. Right?

It’s like everything except the salary. But we could stick a bunch of users in here and we could, you know, just to keep, like, the embedding stuff reasonable. We would say that, you know, we want people who have an About Me that is somewhere between, like, one hundred and two thousand characters.

Beyond that, like, what the hell are you rambling on about? It’s like, let me just, like, paste, like, book verses in there. Out of your mind.

So we’ve got this users to embed table. And this users to embed table. And I realize that this sounds a little, you know, after dark. But trust me, it’s really just vector stuff.

We’ve got 50 users in here that we wish to embed. And, of course, the sort of, you know, like yesterday, the way that we generate embeddings in T-SQL is by calling the AI generate embeddings function, passing in a string that we wish to vectorize.

Right? In this case, I am a software developer who loves databases. I’ve never met one. Right?

They all seem to hate databases. It sounds like the passion. It’s like, oh, what can I do to this thing today and then complain about later? Right? Like, JSON. Vector stuff.

Yeah. Anyway. If we run this, we will get back our embedding JSON. Which, again, starting at line two and scrolling on down. All these numbers apparently mean that you’re a software developer who loves databases.

And so we get 1,024 of these numbers back. It’s fantastic news for us because our column is a vector 1,024. Isn’t that, isn’t it amazing the way this stuff just lines right up in a demo?

Right? How could it be better? But, so if we wanted to update just one user, right? Just one.

Let’s just show you that it works. Uh, we could set their embedding column to, uh, pass in a column. Right? We could say, we want to generate an embedding for the about me column. And they’ve got put into the user, demo user embeddings table, uh, just for the first user.

Right? So just, just one of them. And we run this. Right? And it’s, again, not the slowest thing in the world, uh, for, for one row.

But trust me, doing, doing the rest of them, uh, you know, you have like, like, I think most people who use SQL Server are pretty like comfortable with the idea that once modifications go past a certain like rows row count, you’ll probably want to batch things up.

You’ll probably want to think about that a little bit earlier if you are generating embeddings with T-SQL and, uh, calling this. This, because remember like, like, like what this thing calls out to is not inside of SQL server.

SQL Server does not generate embeddings. Right? SQL Server does not have any facility within it to generate an embedding with. Okay.

Uh, this is still an external model. You still have to call outside of the database, have something do something, then send it back to the database.

And then you get to do the update. Right? So I’m, I’m pretty lucky here in that, you know, um, like this is a VM and all of the processing that I’m doing on here takes place in the VM.

Right? But like, you know, if you look over here, you know, like we still have to call out to Olama. Olama has to do something and send it back.

Olama has to talk to this other thing because Obama, not Obama, Olama only allows, uh, only as right now only has HTTP requests and the external endpoints require HTTPS. So you need like a separate thing that has HTTPS that can talk to HTTP.

And so like, like right now there’s like steps to take. Right? And if you were using like open AI, you’re talking about like an internet call, right? So you’re talking about like reaching out to the internet.

So like, like probably want to think real carefully about like, like how many rows at a time you’re going to count on for this. Because, um, if I were to update all 50 of these rows at once, it would be much slower.

I’m just, you know, just not patient enough for that. But if we look at what we like, you know, we updated the table with, you know, we’ll see all the stuff today. Um, you know, like, like looking at the embedding, isn’t all that interesting.

So just that like, that’s the number of bytes that ended up in the table, uh, for the embedding. That’s what a thousand and 24 bytes, uh, thousand and 24 vectors, um, you know, measures out to bite wise. Uh, so just, just happened to be for John Skeet.

Hey, John Skeet. How you doing? But then, uh, you know, if we wanted to find similar users or maybe, you know, we’re, we’re a headhunter and we’re like, uh, I got this stack overflow job board, you know, I’m going to go find some nerds.

Uh, you could do this. Right. And you could, you could write a query. That’s like, I want to find a database performance tuning expert and you could do all this stuff. Right.

Right. And you can even call AI generate. I don’t recommend it. Don’t particularly recommend it. Right. Granted, like when you, like, uh, I don’t particularly recommend it. Um, you could even do this where you could generate an embedding directly in here.

Right. You could say, I want to like, rather than, you know, like, like, like pre, um, like, you know, and, and vectorizing this string and turning it into an embedding. You can turn it into an embedding on the fly with AI generate embeddings right in the, uh, like right in the query.

And then you can just say, well, I want to find the vector distance between, you know, the embedding that’s in the table and the string that I’m sending in. Right. You can do this.

Right. You’re, you’re allowed to, but think, think hard about doing things like this. Like this is maybe not how you want to do things, but we can run this query and we can get back, you know, for our, our very small 50 row table. Uh, we can get back some, some people.

Um, I don’t know you, I, I, I’ve talked to this guy before, GBN. Um, he’s, he’s, he’s a hoot. Um, the rest of these people, I don’t know.

Um, so, um, I, I couldn’t tell you if they’re actually database experts, but yeah. Yeah. Yeah. Yeah.

He is, he is actually a satanic sysadmin, I guess. Um, SQL Server developer Moe. All right. Well, you know, um, just choosing the, the first hundred characters of someone’s about me is maybe not always a winner. But anyway, uh, just a little bit about how you can, um, you know, use AI generate embeddings, uh, inside SQL Server.

Again, SQL Server does not generate embeddings on its own. These are all external calls. So think very, very carefully about, uh, how and where you use them in your queries.

Um, you know, if you need to generate a lot of embeddings, I would still probably want to do that external to SQL Server and then put the data in rather than have SQL Server call out generate embeddings and bring it back in. Um, in the context of a query, you know, smallish tables, you can, you can do this on the fly if you want. But, you know, some testing that I’ve done on larger, um, larger queries was, uh, even though this is just like a scalar value, it’s still not a, not a good time.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you over in tomorrow’s video.

As always, as usual, where we will talk about, well, we’re gonna, we’re gonna, we got two more unnamed tabs up here of things to talk about. So we’ll, we’ll go and do that. Anyway, thank you for watching.

Goodbye. Thank you for watching. Goodbye.

Going Further

The post Get AI-Ready With Erik: Generating Embeddings appeared first on Darling Data.

Get AI-Ready With Erik: Creating External Models

Erik Darling — Tue, 10 Mar 2026 13:00:51 +0000

Get AI-Ready With Erik: Creating External Models

Summary

In this video, I delve into creating and using external models to generate embeddings directly within SQL Server 2025, Azure SQL Database, and Azure Managed Instance. While there’s minimal coverage in the full course “Get AI Ready with Erik,” which you can access using a special coupon code found in the video description, I provide deeper insights on how to integrate Python scripts for more efficient embedding generation outside of SQL Server. However, leveraging native SQL Server capabilities is incredibly convenient, especially when setting up your environment properly. I cover essential steps like enabling the External REST Endpoint and creating an external model using OLAMA or another provider. By walking through these processes, you can generate embeddings directly in T-SQL, which integrates seamlessly with transactions and triggers, making deployment simpler and more integrated than external methods.

Chapters

*00:00:00* – Introduction
*00:01:27* – External Models in SQL Server 2025
*00:03:06* – Benefits of Internal AI Embeddings
*00:04:22* – SP Configure Setting Overview
*00:05:37* – Creating an External Model
*00:08:17* – Generating Embeddings with T-SQL

Full Transcript

Erik Darling here with Darling Data, and we are going to talk about creating and using external models to generate embeddings in SQL Server 2025, Azure SQL Database, and Azure Managed Instance also. But there’s a good reason for that. And it is something that I am glad we are able to do, but it takes a little bit more work than you might be into. So, up until now, at least in these videos, there’s been a minimal coverage in the full course, right? The full course, Get AI Ready with Erik, which you can use this wonderful coupon code for, and this coupon code exists down in the video description. There’s a bit more coverage on, you know, including, including like Python scripts and, you know, all the other stuff you need in order to generate embeddings externally and then get them into SQL Server, which is, which is, to be honest, is much more efficient, right? Because there’s all sorts of things you can do external to SQL Server that are a lot faster than the things you can do inside of SQL Server. However, doing things inside of SQL Server has many conveniences, so long as you are able to prep your environment to accommodate such conveniences. So, SQL Server 2025 can generate embeddings natively. We have two new functions, like, well, you know, let’s create external model, which allows us to register a provider with which we can call these things. And we have the AI generate embeddings function, which allows us to generate vectors natively in T-SQL.

Cool. Wait, both of those things sound just dandy, don’t they? So, there are some benefits here. Like, you don’t have to do any Pythoning, which, you know, honestly, if you’re doing this in SQL Server, like, you probably just have Cloud Code write you some Python to do the other stuff, right? Like, you don’t actually have to know Python. You know, you might, you’re going to have to, like, you know, spend some time making sure the robots get stuff right, but, you know, like, the robots are good at stuff like Python, C Sharp, things like that, because, like, those languages are a lot easier for robots to deal with. SQL, they’re terrible with. There’s a lot more, or rather, there’s a lot more debugging with SQL than there is with other stuff. And they can fix the other stuff. Like, if you’re like, hey, robot, run this script, and, like, the script has, like, hits an error, they’d be like, oh, well, I can fix that.

With SQL, you’re like, hit an error, and they’re like, dog. You’re on your own. But you can do some stuff with AI generating beddings, right? Like, works in triggers, works in store procedures. It’s integrated with transactions. And the deployment is a bit simpler. In the full course, I teach you how to do all this. There are some external dependencies that you will need to do this stuff. I think, like, the biggest thing is that OLAMA is only, like, set up to do HTTP requests, not HTTPS. And this stuff requires HTTPS. So you need another program to sort of sit between, like, SQL servers calls in OLAMA to, like, get past the HTTPS requirement there.

In the full course, I teach you how to do all that. In this one, in these short videos, you don’t get the full thing because I still need people to buy the course so I can support the banality of my existence. So we’ve got the new SP configure setting called External REST Endpoint. And this will tell us whether it is enabled or not. We have, of course, enabled this because we’ve done this all before. If you need to change that, you can use, well, this command here. I’m not going to type it for you, but you can do it.

And then you will also need to create an external model. I am not going to drop and recreate mine because I like mine just fine. But this is generally what it will look like. If you were doing this just locally, it’ll look like this. If you’re using, like, OpenAI, there’s other stuff to it. But, you know, I don’t want, like, stop giving Sam Altman money.

Like, my sometimes friend Andy Pablo was posting pictures on Twitter about, like, going to the OpenAI offices. And for a company that lost $9 billion or something last year, they, they, like, if, I could lose $9 billion, too. Like, they bet on that. But don’t use OpenAI. Stop giving that man money.

But this is generally what it will look like if we wanted to do something locally. For the purpose of this course, I’m using this model. So that’s like, you don’t, you won’t use this. You’ll do something else. I’m using OLAMA to do all the processing. And I have a little local webby thing set up to do all this for me.

All right. So this is the, this is the location of the, where I send stuff to get a response back that’s a vector form, that’s a vector data type or an embedding, rather. You can see what models you have registered, right? If you run a fancy query like this, right, and you will see that I, me, I already did all this stuff. So I got all this stuff in there. Man, I did that a while ago, huh? I guess so.

But the end result, once you have done all of this stuff, is that you can generate embeddings in SQL Server. Now, if we go over here, and you can just ignore that, because I had to do something else stupid. We’ve got some other stuff running over here, right? And if we run this query, I got to select the whole thing, I suppose.

If we run this, and we go look at these, well, we should, you know, normally you see a little bit more action in here. You know, that was, that was kind of boring. I apologize. But we get back a string, right? So we said, hey, SQL Server, I want you to generate some AI embeddings for me.

And I want to know what the embedding for how do I optimize SQL Server performance would look like. And I want you to use my, the model that I set up with create external model. That’s this thing up here. And now I get back this, right?

And this is apparently the secret to the universe, is a long list of floats. Look at all these magnificent floaty beasts, these creatures of just fanciful existence. We get back 1024, well, yeah, because they don’t start on line one up here, right?

They start on line two. So because the model that I chose, like, what I tell it to do is generate a 1024 vector embedding. So that’s what I get back.

That’s why we have a, it goes down to 1025 here, but the actual numbers up here start on line two. So that’s what we get back. And this is what we can use to put into our column, columns in SQL Server where we have embeddings, right? And I define those as vector, right now, float32, because that’s what’s generally available.

Float16 is still a preview feature. But I have my vector column set up as float32 with 1024 dimension capacity, right? Or, like, exactitude or something, because I can’t put in less and I can’t put in more.

So this is the string that I get back, and it’s all set up to do the thing that I want it to do. And from there, we can do all sorts of other fun stuff. But additional fun stuff is going to have to wait until tomorrow, right?

So this is a basic overview of how we generate an embedding directly with T-SQL. Thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. And I will see you in tomorrow’s video where we will continue on this path. For those of you who are getting sick of this path, I don’t have that many more videos on the subject. left to release, so your time is almost done.

All right. Thank you for watching.

Going Further

The post Get AI-Ready With Erik: Creating External Models appeared first on Darling Data.