[dbsp] Clear rkyv scratch space from one serialization to the next. by blp · Pull Request #5919 · feldera/feldera

blp · 2026-03-25T19:00:13Z

A customer reported large and growing memory use that showed up in heap profiles attributed to rkyv serialization in the storage file writer. Only some of this made sense, in particular the part written to FBufs, which is data blocks that will end up in the cache. The rest was not more specifically attributed.

Some investigation showed the possibility that data is accumulating in our per-thread scratch space cache. This cache should get emptied for every use, but that depends on the rkyv serialization implementations being correct both in the common case and the error case. DBSP hits the error case frequently in practice (once per data block), because it uses errors to avoid going over block size limits. Perhaps the rkyv built-in implementations handle errors correctly regarding scratch space; I don't know whether DBSP implementations of serializers do.

If this is the problem, this commit will avoid it, by clearing the scratch space every time that we use it. It is larger than otherwise necessary because rkvy's HeapScratch and FallbackScratch don't provide any way to clear themselves, so this has to copy in their implementations.

Describe Manual Test Plan

I tested this with the unit tests.

gz · 2026-03-25T19:09:08Z

I have a general question about this code because I was looking at it a few weeks ago:

why do we always need to use this (general form) of the serializer with the scratch space. e.g., we see this thing show up in profiles a lot in macos because TLS is more expensive there and it does all this manipulation of taking it out of the TLS and putting it back in (which seems to cause some copying overheads too).. From what I can tell if the type is relatively simple (like an integer) the relative overhead of doing that is big because the function that uses it gets invoked a lot.

Background is that we tried to take this out from the inner loops (TLS) variable and have e.g., a serializer per writer that we pass down into the relevant functions but it's also very painful to deal with.

gz · 2026-03-25T19:11:54Z

because it uses errors to avoid going over block size limits. Perhaps the rkyv built-in implementations handle errors correctly regarding scratch space; I don't know whether DBSP implementations of serializers do.

I didn't understand this part, wouldn't we be able to check our own serializers? how many do we ahve?

gz · 2026-03-25T19:12:55Z

crates/dbsp/src/storage/file.rs

+    }
+
+    fn cleared(mut self) -> Self {
+        self.main.clear();


does clear reset to SCRATCH_SIZE?

It resets the position:

/// Resets the scratch space to its initial state. pub fn clear(&mut self) { self.pos = 0; }

gz · 2026-03-25T19:13:26Z

crates/dbsp/src/storage/file.rs

+/// This is the amount of space we allocate as base scratch space for rkyv
+/// serialization.  If more is needed for a particular serialization, then we
+/// fall back to [AllocScratch].
+pub const SCRATCH_SIZE: usize = 65536;


should we move this to the top of the file?

gz · 2026-03-25T19:13:35Z

crates/dbsp/src/storage/file.rs

+/// This is the amount of space we allocate as base scratch space for rkyv
+/// serialization.  If more is needed for a particular serialization, then we
+/// fall back to [AllocScratch].
+pub const SCRATCH_SIZE: usize = 65536;


Suggested change

pub const SCRATCH_SIZE: usize = 65536;

pub const SCRATCH_SIZE: usize = 65_536;

gz · 2026-03-25T19:14:51Z

crates/dbsp/src/storage/file.rs

-    let mut serializer = Serializer::new(serializer, SCRATCH.take().unwrap(), Default::default());
+    let mut serializer = Serializer::new(
+        serializer,
+        SCRATCH.take().unwrap().cleared(),


is cleared() expensive to call? this function is quite hot in general

It should not be, it just sets some variables really.

blp · 2026-03-25T19:19:15Z

because it uses errors to avoid going over block size limits. Perhaps the rkyv built-in implementations handle errors correctly regarding scratch space; I don't know whether DBSP implementations of serializers do.

I didn't understand this part, wouldn't we be able to check our own serializers?

We can check every serializer, obviously, but bugs can creep in.

This is a kind of manual memory management, which is always problematic.

how many do we ahve?

About 150 (not all of them handwritten).

blp · 2026-03-25T20:06:42Z

I have a general question about this code because I was looking at it a few weeks ago:

why do we always need to use this (general form) of the serializer with the scratch space. e.g., we see this thing show up in profiles a lot in macos because TLS is more expensive there and it does all this manipulation of taking it out of the TLS and putting it back in (which seems to cause some copying overheads too).. From what I can tell if the type is relatively simple (like an integer) the relative overhead of doing that is big because the function that uses it gets invoked a lot.

Background is that we tried to take this out from the inner loops (TLS) variable and have e.g., a serializer per writer that we pass down into the relevant functions but it's also very painful to deal with.

I'll see what I can do for performance. I was focusing on correctness here.

blp · 2026-03-25T20:40:56Z

I have a general question about this code because I was looking at it a few weeks ago:
why do we always need to use this (general form) of the serializer with the scratch space.

This might have been rhetorical, but I am not sure. Here is the answer. We need scratch space and the sharedserializermap because:

Many common types need scratch space to implement Serialize, e.g. see Vec.
Some types we use need a sharedserializermap, e.g. e.g. Arc.
If we choose a type for our Serializer that doesn't implement those, then our SerializeDyn trait won't be implemented for them because it is auto-implemented for types that implement ArchivedDBData, which in turn requires the type to implement Serialize<Serializer>.

blp · 2026-03-26T00:10:47Z

OK, I've got a much bigger change that I'll submit tomorrow that should eliminate a lot of overhead.

You like higher-order lifetimes, right? It's got higher-order lifetimes.

mihaibudiu · 2026-03-26T00:11:44Z

Oh boy, a new Rust feature!

mihaibudiu · 2026-03-26T00:13:12Z

Hm, I think we already use this in a few places, it's not really new.

blp · 2026-03-26T00:14:02Z

They're not all that rare, but I do notice whenever I add them.

mythical-fred · 2026-03-26T01:10:21Z

crates/dbsp/src/storage/file.rs

+impl ScratchSpace for DbspScratch {
+    #[inline]
+    unsafe fn push_scratch(&mut self, layout: Layout) -> Result<NonNull<[u8]>, Self::Error> {
+        unsafe {


Missing // SAFETY: comment on this unsafe block. Hard rule: every unsafe block needs a comment explaining what invariant makes it safe. Here: explain why it is valid to call push_scratch on self.main and self.fallback without further preconditions — e.g., that the caller holds the invariants required by the ScratchSpace contract.

mythical-fred · 2026-03-26T01:10:21Z

crates/dbsp/src/storage/file.rs

+
+    #[inline]
+    unsafe fn pop_scratch(&mut self, ptr: NonNull<u8>, layout: Layout) -> Result<(), Self::Error> {
+        unsafe {


Missing // SAFETY: comment. For pop_scratch, the key invariant is that ptr must have been returned by a prior push_scratch call on the same scratch space (or, in the fallback case, that we are routing to the right scratch implementation). Please document that here.

blp · 2026-03-26T18:03:00Z

@gz I asked for a re-review because I added a big commit that should fix the performance issues you mentioned.

Signed-off-by: Ben Pfaff <[email protected]>

A customer reported large and growing memory use that showed up in heap profiles attributed to rkyv serialization in the storage file writer. Only some of this made sense, in particular the part written to FBufs, which is data blocks that will end up in the cache. The rest was not more specifically attributed. Some investigation showed the possibility that data is accumulating in our per-thread scratch space cache. This cache should get emptied for every use, but that depends on the rkyv serialization implementations being correct both in the common case and the error case. DBSP hits the error case frequently in practice (once per data block), because it uses errors to avoid going over block size limits. Perhaps the rkyv built-in implementations handle errors correctly regarding scratch space; I don't know whether DBSP implementations of serializers do. If this is the problem, this commit will avoid it, by clearing the scratch space every time that we use it. It is larger than otherwise necessary because rkvy's `HeapScratch` and `FallbackScratch` don't provide any way to clear themselves, so this has to copy in their implementations. Signed-off-by: Ben Pfaff <[email protected]>

gz

awesome, it makes sense... I tried something similar two weeks ago but the way you did it is much more elegant :)

gz · 2026-03-26T22:51:38Z

crates/dbsp/src/operator/dynamic/balance/accumulate_trace_balanced.rs

-        fn serialize_with_flush<B, K, V, R>((batch, flush): (B, bool)) -> Vec<u8>
+        let mut serializer_inner = None;
+        fn serialize_with_flush<B, K, V, R>(
+            (batch, flush): (B, bool),


I didn't know this syntax, cool

Yeah, you can pattern match in function arguments and it's occasionally useful!

rkyv serializers can be lightweight but serializing some types requires additional data, in the form of a ScratchSpace and SharedSerializeRegistry. These types are somewhat big (about 100 bytes) and the ScratchSpace implementation we use also allocates 64 kiB on the heap, so it's very wasteful to create and destroy them for serializing, say, a single i32. So, until now, we mainly kept one of them in a thread-local cache. This has at least two problems: - As implemented, we were still copying the 100-byte structure in and out of the cache whenever we used it, which is wasteful. - I'm told that Mac OS has slow thread-locals. This commit changes the implementation in a couple of ways. The big change is that it changes our serializer from an rkyv CompositeSerializer that includes the Serializer, the ScratchSpace, and the SharedSerializerRegistry inline, to a custom struct that merely references them. Thus, there is no need to copy the 100-byte structure in and out of the cache; we just use a reference to it. This does mean that our serializer now has a generic lifetime parameter, which we have to pass around a bit, but it's not too bad. The other change is to use the cache less often. Instead of using the cache almost everywhere, this commit changes many of the important code paths, such as the storage file writer, to allocate their own serializer data and use it directly instead of through the cache. The hope is that this will reduce overhead further, especially on OSes where thread-locals are slow. Signed-off-by: Ben Pfaff <[email protected]>

Signed-off-by: feldera-bot <[email protected]>

mythical-fred

All my blockers addressed. LGTM.

blp · 2026-03-27T15:44:08Z

I don't see how this change could prevent a pipeline from starting so I filed https://github.com/feldera/cloud/issues/1580 for the merge failure. Requeuing.

blp · 2026-03-27T17:20:50Z

I requeued this because the CI failure was due to:

/usr/bin/docker pull thekevjames/gcloud-pubsub-emulator:e852273e07
  Error response from daemon: Head "https://registry-1.docker.io/v2/thekevjames/gcloud-pubsub-emulator/manifests/e852273e07": unauthorized: incorrect username or password
  Warning: Docker pull failed with exit code 1, back off 6.504 seconds before retry.
  /usr/bin/docker pull thekevjames/gcloud-pubsub-emulator:e852273e07
  Error response from daemon: Head "https://registry-1.docker.io/v2/thekevjames/gcloud-pubsub-emulator/manifests/e852273e07": unauthorized: incorrect username or password
  Warning: Docker pull failed with exit code 1, back off 6.817 seconds before retry.
  /usr/bin/docker pull thekevjames/gcloud-pubsub-emulator:e852273e07
  Error response from daemon: Head "https://registry-1.docker.io/v2/thekevjames/gcloud-pubsub-emulator/manifests/e852273e07": unauthorized: incorrect username or password
  Error: Docker pull failed with exit code 1

blp requested a review from ryzhyk March 25, 2026 19:00

blp self-assigned this Mar 25, 2026

blp added bug Something isn't working DBSP core Related to the core DBSP library storage Persistence for internal state in DBSP operators rust Pull requests that update Rust code user-reported Reported by a user or customer labels Mar 25, 2026

ryzhyk requested a review from gz March 25, 2026 19:06

mihaibudiu approved these changes Mar 25, 2026

View reviewed changes

gz reviewed Mar 25, 2026

View reviewed changes

mythical-fred suggested changes Mar 26, 2026

View reviewed changes

gz approved these changes Mar 26, 2026

View reviewed changes

blp force-pushed the scratch branch from 6efa86c to 6f59f87 Compare March 26, 2026 18:02

blp requested a review from gz March 26, 2026 18:02

blp force-pushed the scratch branch from 2f5d812 to e305bb7 Compare March 26, 2026 22:43

blp added 2 commits March 26, 2026 15:52

[dbsp] Fix debug assertion in IndexedWSetSerializer.

d924560

Signed-off-by: Ben Pfaff <[email protected]>

blp force-pushed the scratch branch from f1aa759 to 983bf00 Compare March 26, 2026 22:55

gz approved these changes Mar 26, 2026

View reviewed changes

blp force-pushed the scratch branch from 92f001a to adfac23 Compare March 26, 2026 23:14

blp enabled auto-merge March 26, 2026 23:16

[ci] apply automatic fixes

41b53ff

Signed-off-by: feldera-bot <[email protected]>

blp added this pull request to the merge queue Mar 26, 2026

mythical-fred approved these changes Mar 27, 2026

View reviewed changes

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 27, 2026

blp added this pull request to the merge queue Mar 27, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 27, 2026

blp added this pull request to the merge queue Mar 27, 2026

Merged via the queue into main with commit 51a642a Mar 27, 2026
1 check passed

blp deleted the scratch branch March 27, 2026 19:54

	pub const SCRATCH_SIZE: usize = 65536;
	pub const SCRATCH_SIZE: usize = 65_536;

Conversation

blp commented Mar 25, 2026

Describe Manual Test Plan

Uh oh!

gz commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gz commented Mar 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blp commented Mar 25, 2026

Uh oh!

blp commented Mar 25, 2026

Uh oh!

blp commented Mar 25, 2026

Uh oh!

blp commented Mar 26, 2026

Uh oh!

mihaibudiu commented Mar 26, 2026

Uh oh!

mihaibudiu commented Mar 26, 2026

Uh oh!

blp commented Mar 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blp commented Mar 26, 2026

Uh oh!

gz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

blp commented Mar 27, 2026

Uh oh!

Uh oh!

blp commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gz commented Mar 25, 2026 •

edited

Loading