Drop format #31

Open
gwil wants to merge 83 commits from drop_format into persistent-store
Owner

Adds an implementation of Willow's Drop Format to willow25

  • Adds two new functions, import_drop and export_drop
  • Adds a DropFormatStore trait to be implemented by stores to be used with this API
  • Implements DropFormatStore on MemoryStore and PersistentStore

Some of the things we had to do to first make this implementation possible:

Adds an implementation of Willow's [Drop Format](https://willowprotocol.org/specs/drop-format/index.html#willow_drop_format) to `willow25` - Adds two new functions, `import_drop` and `export_drop` - Adds a `DropFormatStore` trait to be implemented by stores to be used with this API - Implements `DropFormatStore` on `MemoryStore` and `PersistentStore` --- Some of the things we had to do to first make this implementation possible: - [x] `export_drop` - [x] need [`default_entry`](https://willowprotocol.org/misc-definitions/index.html#default_entry), [`default_authorisation_token`](https://willowprotocol.org/specs/drop-format/index.html#sl_default_authorisation_token) equivalents (see #21) - [x] need a way to determine if a `Store` holds any given `Entry`'s complete payload - [x] `import_drop` - [x] also need [`default_entry`](https://willowprotocol.org/misc-definitions/index.html#default_entry), [`default_authorisation_token`](https://willowprotocol.org/specs/drop-format/index.html#sl_default_authorisation_token) equivalents (see #21) - [x] need a new equivalent to the [`Limit`](https://docs.rs/ufotofu/0.7.1/ufotofu/producer/struct.Limit.html) producer ufotofu used to have (https://codeberg.org/worm-blossom/ufotofu/issues/15) - [x] `Store` trait needs a way to pipe in payloads for existing entries - [x] ... and at a specific offset
gwil added 13 commits 2026-02-24 18:46:40 +01:00

Re limit adaptor, see worm-blossom/ufotofu#15

Re limit adaptor, see https://codeberg.org/worm-blossom/ufotofu/issues/15
gwil added 52 commits 2026-03-23 14:21:45 +01:00
Reviewed-on: #34
gwil changed target branch from main to persistent-store 2026-03-23 14:22:18 +01:00
gwil changed title from WIP: Drop format to Drop format 2026-03-23 14:22:30 +01:00
gwil force-pushed drop_format from c2102b41a2 to c2102b41a2 2026-03-23 14:31:31 +01:00 Compare
gwil force-pushed drop_format from 3f2020696b to acb2a6434b 2026-03-23 15:17:33 +01:00 Compare
AljoschaMeyer left a comment

I'll complete the review after sports =S

I'll complete the review after sports =S
@ -0,0 +15,4 @@
/// A trait for describing Willow stores compatible with the [`export_drop`] and ['import_drop`].
///
/// This trait requires the store is able to count the number of ['Entry`] included by an [`Area`].
pub trait DropFormatStore: Store + PayloadPrefixStore + Clone {

We decided against spec-specific store traits, instead naming them by what functionality they offer. So this should be moved into its own file, be called CountingStore (or something similar), and the PayloadPrefixStore supertrait should be removed.

We decided against spec-specific store traits, instead naming them by what functionality they offer. So this should be moved into its own file, be called `CountingStore` (or something similar), and the `PayloadPrefixStore` supertrait should be removed.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +24,4 @@
) -> impl Future<Output = Result<usize, Self::InternalError>>;
}
#[derive(Debug)]

Missing docs.

Missing docs.
@ -0,0 +26,4 @@
#[derive(Debug)]
pub enum ExportDropError<ConsumerError, StoreErr> {
EmptyDrop,

Why is this an error? The spec allows empty drops.

Why is this an error? The spec allows empty drops.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +25,4 @@
}
#[derive(Debug)]
pub enum ExportDropError<ConsumerError, StoreErr> {

First StoreError, second ConsumerError (that order is how it is everywhere else; and please use "Error" instead of "Err" (or at the very least not both within the same type =D))

First `StoreError`, second `ConsumerError` (that order is how it is everywhere else; and please use "Error" instead of "Err" (or at the very least not both within the *same* type =D))

Name this StoreError, like in every other error type.

Name this `StoreError`, like in every other error type.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +28,4 @@
pub enum ExportDropError<ConsumerError, StoreErr> {
EmptyDrop,
StoreErr(StoreErr),
ConsumerProblem(ConsumerError),

Name this ConsumerError, like in every other error type.

Name this `ConsumerError`, like in every other error type.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +49,4 @@
// https://willowprotocol.org/specs/drop-format/index.html#willow_drop_format
// TODO: Re-order the areas in a way which would give the highest probability of an efficiently relatively encoded entries.
// TODO: De-duplicate entries from overlapping areas.

Add a note that both of these are optional optimisations, not missing functionality, please. To anyone without our context, these comments as they stand are misleading.

Add a note that both of these are optional optimisations, not missing functionality, please. To anyone without our context, these comments as they stand are misleading.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +69,4 @@
return Err(ExportDropError::EmptyDrop);
}
compact_u64::cu64_encode_standalone(entries_count as u64, consumer)

Import instead of using a fully qualified name, please.

Import instead of using a fully qualified name, please.
AljoschaMeyer marked this conversation as resolved
AljoschaMeyer left a comment

Good luck =S

Good luck =S
@ -0,0 +6,4 @@
storage::{PayloadPrefixStore, Store},
};
use bab_rs::generic::storage::verifiable_streaming::SliceStreamingOptions;
use ufotofu::{

Replace all the ufotofu imports with

use ufotofu::prelude::*;
use ufotofu::codec_prelude::*;
Replace all the ufotofu imports with ```rs use ufotofu::prelude::*; use ufotofu::codec_prelude::*; ```
AljoschaMeyer marked this conversation as resolved
@ -0,0 +15,4 @@
/// A trait for describing Willow stores compatible with the [`export_drop`] and ['import_drop`].
///
/// This trait requires the store is able to count the number of ['Entry`] included by an [`Area`].
pub trait DropFormatStore: Store + PayloadPrefixStore + Clone {

Hmm, I wonder if we should simply not have this trait at all (for now). producing a drop has to iterate through all entries individually anyway when encoding. So iterating through all of them first to count them will only slow things down by a factor of 2. Could be worth the reduced code surface for now; once we have a proper SummarisableStore that can do both counting and fingerprints then we could also update the Drop format implementation to use that for counting.

Just a thought, I don't have a particularly strong opinion on this.

Hmm, I wonder if we should simply not have this trait at all (for now). producing a drop has to iterate through all entries individually anyway when encoding. So iterating through all of them first to count them will only slow things down by a factor of 2. Could be worth the reduced code surface for now; once we have a proper `SummarisableStore` that can do both counting and fingerprints then we could also update the Drop format implementation to use that for counting. Just a thought, I don't have a particularly strong opinion on this.
Author
Owner

IIRC the reason we needed this is because the count has to be encoded before the entries are.

If you waited to determine the count of the entries at the end of iteration, you'd either have to put all entries in memory first so that you can encode them, or iterate a second time through the area.

IIRC the reason we needed this is because the count has to be encoded before the entries are. If you waited to determine the count of the entries at the end of iteration, you'd either have to put all entries in memory first so that you can encode them, or iterate a second time through the area.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +24,4 @@
) -> impl Future<Output = Result<usize, Self::InternalError>>;
}
#[derive(Debug)]

Also derive Clone and Eq.

Also derive Clone and Eq.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +69,4 @@
return Err(ExportDropError::EmptyDrop);
}
compact_u64::cu64_encode_standalone(entries_count as u64, consumer)

import

import
AljoschaMeyer marked this conversation as resolved
@ -0,0 +78,4 @@
// We can do this because these are the default values.
let mut prev_authorised = unsafe {
PossiblyAuthorisedEntry::new(entry_minus_one, auth_minus_one)

Could you factor this out into a function of the defaults module?

Could you factor this out into a function of the `defaults` module?
Author
Owner

Later. Left a TODO comment.

Later. Left a TODO comment.
@ -0,0 +110,4 @@
.map_err(ExportDropError::ConsumerProblem)?;
}
compact_u64::write_tag(&mut header, 2, 0, u64::from(authorised.timestamp()));

import

import
AljoschaMeyer marked this conversation as resolved
@ -0,0 +128,4 @@
}
consumer
.consume(ufotofu::prelude::Either::Left(header))

import Left directly (simply use Either::*) somehwere after the ufotofu prelude import.

import `Left` directly (simply `use Either::*`) somehwere after the ufotofu prelude import.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +154,4 @@
.await
.map_err(ExportDropError::ConsumerProblem)?;
compact_u64::cu64_encode(u64::from(authorised.timestamp()), 2, consumer)

import (I'll stop marking these, just do a search for compact_u64 =S)

import (I'll stop marking these, just do a search for `compact_u64` =S)
AljoschaMeyer marked this conversation as resolved
@ -0,0 +178,4 @@
.map_err(ExportDropError::ConsumerProblem)?;
if include_payloads && header & 0b0000_0001 == 0b0000_0001 {
// TODO: If the entry had its complete payload, consume it here.

Why only complete payloads? The drop format can also handle partial payloads.

Why only complete payloads? The drop format can also handle partial payloads.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +193,4 @@
crate::storage::GetPayloadSliceError::ConsumerError(err) => ExportDropError::ConsumerProblem(err),
crate::storage::GetPayloadSliceError::NoSuchEntry => panic!("The store did not have the payload for an entry this implementation was pretty sure it should have."),
})?;
}

This needs to check whether the full payload is available, or whether the consumer got closed after less than payload_length many bytes where written into it. store.get_payload_slice returns how many bytes were written into the consumer.

This needs to check whether the full payload is available, or whether the `consumer` got closed after less than payload_length many bytes where written into it. `store.get_payload_slice` returns how many bytes were written into the consumer.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +202,4 @@
Ok(())
}
#[derive(Debug)]

Derive more stuff (Clone and Eq at the very least - not having equality on types is weird).

Derive more stuff (Clone and Eq at the very least - not having equality on types is weird).
AljoschaMeyer marked this conversation as resolved
@ -0,0 +203,4 @@
}
#[derive(Debug)]
pub enum ImportDropError<ProducerError, StoreError> {

Swap order of generics.

Swap order of generics.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +205,4 @@
#[derive(Debug)]
pub enum ImportDropError<ProducerError, StoreError> {
UnexpectedEndOfInput,
ProducerProblem(ProducerError),

Rename to ProducerError for consistency.

Rename to `ProducerError` for consistency.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +207,4 @@
UnexpectedEndOfInput,
ProducerProblem(ProducerError),
BadEntry,
Store(StoreError),

Rename to StoreError

Rename to `StoreError`
AljoschaMeyer marked this conversation as resolved
@ -0,0 +184,4 @@
authorised.namespace_id(),
&authorised,
None,
consumer,

get_payload_slice closes the consumer, after which you are not allowed to continue using it.

There should probably be a ufotofu adaptor that "shields" an inner consumer from closing, for situations such as this. I'll think about this a bit more.

`get_payload_slice` closes the consumer, after which you are not allowed to continue using it. There should probably be a ufotofu adaptor that "shields" an inner consumer from closing, for situations such as this. I'll think about this a bit more.

See https://discord.com/channels/780542716940517407/1363925804475744587/1486296443588313128 for more discussion; perhaps instead of the adaptor the Store trait should simply change to not close the consumer...

See https://discord.com/channels/780542716940517407/1363925804475744587/1486296443588313128 for more discussion; perhaps instead of the adaptor the Store trait should simply change to *not* close the consumer...
AljoschaMeyer marked this conversation as resolved
@ -0,0 +230,4 @@
/// If `ignore_nonzero_offset_partial_payload_slices` is `true`, then payload slices at non-zero offsets will be ignored, otherwise an error will be returned. As there is no known existing implementation for encoding drops which encodes payload slices at non-zero offsets, you probably want this to be `true`.
pub async fn import_drop<Store, P>(
store: &mut Store,
producer: &mut P,

The docs should give me some information about p.

The docs should give me *some* information about `p`.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +246,4 @@
// We can do this because these are the default values.
let mut prev_authorised = unsafe {
PossiblyAuthorisedEntry::new(entry_minus_one, auth_minus_one)

Reuse the function from the defaults module here that doesn't exist yet.

Reuse the function from the `defaults` module here that doesn't exist yet.
Author
Owner

Later. Left a TODO comment.

Later. Left a TODO comment.
@ -0,0 +227,4 @@
/// Decode [`Entry`]s (and possibly their corresponding payloads) from a [drop] and insert them into a [`Store`].
///
/// If `ignore_nonzero_offset_partial_payload_slices` is `true`, then payload slices at non-zero offsets will be ignored, otherwise an error will be returned. As there is no known existing implementation for encoding drops which encodes payload slices at non-zero offsets, you probably want this to be `true`.

That second sentence reads more like an argument for setting this to false to me. But also, I'd remove it completely, because it will require updating in the future.

That second sentence reads more like an argument for setting this to `false` to me. But also, I'd remove it completely, because it will require updating in the future.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +235,4 @@
) -> Result<(), ImportDropError<P::Error, Store::InternalError>>
where
Store: DropFormatStore,
P: BulkProducer<Item = u8, Final = ()>,

Do we need Final = () for anything?

Do we need `Final = ()` for anything?
Author
Owner

PayloadPrefixStore.append_to_payload_prefix requires it.

`PayloadPrefixStore.append_to_payload_prefix` requires it.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +339,4 @@
.await
.map_err(ImportDropError::Store)?;
// If we have more than zero bytes, skip ahead by this much.

I'm a bit scared of race conditions happening here, because available_len might be outdated by the time the rest of the logic happens.

I think it is better to simply write all the payload bytes from the drop directly into the store. The store will check validity and reject invalid stuff, so nothing can go wrong with that approach. (also, it simplifies things)

I'm a bit scared of race conditions happening here, because `available_len` might be outdated by the time the rest of the logic happens. I think it is better to simply write all the payload bytes from the drop directly into the store. The store will check validity and reject invalid stuff, so nothing can go wrong with that approach. (also, it simplifies things)
AljoschaMeyer marked this conversation as resolved
@ -0,0 +504,4 @@
prev_authorised = decoded_authed_entry
}
todo!()

hmm

hmm
Author
Owner

Just "you forgot to change this to Ok(())" is sufficient, thank you

Just "you forgot to change this to `Ok(())`" is sufficient, thank you
AljoschaMeyer marked this conversation as resolved
@ -0,0 +368,4 @@
)
.await
.map_err(|err| match err {
crate::storage::AppendToPayloadPrefixError::ProducerError(err) => match err

I really don't like your Rust editor extension for using fully qualified names instead of importing stuff...

I really don't like your Rust editor extension for using fully qualified names instead of importing stuff...
Author
Owner

This is actually what the Rust LSP does, it will be like this in all editors. But I will remember that qualified names are a no-go in future.

This is actually what the Rust LSP does, it will be like this in all editors. But I will remember that qualified names are a no-go in future.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +439,4 @@
)
.await
.map_err(|err| match err {
crate::storage::AppendToPayloadPrefixError::ProducerError(err) => match err

Had the exact (?) same code a couple of lines earlier as well. Could this move into a From implementation and then disappear via the ? operator?

Had the exact (?) same code a couple of lines earlier as well. Could this move into a `From` implementation and then disappear via the `?` operator?
@ -0,0 +400,4 @@
let mut limited = producer.to_limit(slice_length as usize);
if !entry_was_inserted {
// We might already have some of the payload bytes!

Again, better not to do this whole optimisation I think.

Again, better not to do this whole optimisation I think.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +463,4 @@
})?;
}
0b0000_0011 => {
// TODO: Our store traits don't yet have a way of ingesting partial slices at offsets.

Is this still a todo?

Is this still a todo?
Author
Owner

Our intention is to eventually support partial payload slices at non-zero offsets, so I'd say yes.

Our intention is to eventually support partial payload slices at non-zero offsets, so I'd say yes.
AljoschaMeyer marked this conversation as resolved
@ -0,0 +466,4 @@
// TODO: Our store traits don't yet have a way of ingesting partial slices at offsets.
// So if it was asked for, error here.
// Or just skip ahead to the next entry without ingesting any of the slices!
if ignore_nonzero_offset_partial_payload_slices {

Missing negation?

Missing negation?
AljoschaMeyer marked this conversation as resolved
gwil added 17 commits 2026-03-25 11:29:14 +01:00
@ -193,0 +220,4 @@
/// A trait for describing Willow stores which can count all the of entries in a given [`Area`].
pub trait AreaCountStore: Store {
/// Return the count of [`Entry`] [included](https://willowprotocol.org/specs/grouping-entries/index.html#area_include) by a given [`Area`] within a given [namespace](https://willowprotocol.org/specs/data-model/index.html#namespace)

``/// Returns [...] .` (third-person "s", and "." at end of sentence)

``/// Return*s* [...] .` (third-person "s", and "." at end of sentence)
@ -193,0 +225,4 @@
&self,
namespace_id: &NamespaceId,
area: &Area,
) -> impl Future<Output = Result<usize, Self::InternalError>>;

Should be an async fn instead of manually returning an impl Future

Should be an `async fn` instead of manually returning an `impl Future`
@ -0,0 +22,4 @@
StoreError(StoreError),
/// The consumer consuming the encoded bytes of the drop returned an error.
ConsumerError(ConsumerError),
/// The given consumer consumed a different number of bytes than was expected.

This comment feels a bit misleading, it makes it sound like there is a problem with the consumer. When actually the problem is that export_drop wrote invalid data. How about something like

/// The [`export_drop`] function wrote an invalid encoding into the given consumer, because the store contents were modified concurrent to exporting the drop. You can simply try again, and you might want to make sure to not modify the store while attempting to export a drop.
EncodingError
This comment feels a bit misleading, it makes it sound like there is a problem with the consumer. When actually the problem is that `export_drop` wrote invalid data. How about something like ```rs /// The [`export_drop`] function wrote an invalid encoding into the given consumer, because the store contents were modified concurrent to exporting the drop. You can simply try again, and you might want to make sure to not modify the store while attempting to export a drop. EncodingError ```
@ -0,0 +105,4 @@
write_tag(&mut header, 2, 0, u64::from(authorised.timestamp()));
write_tag(&mut header, 2, 2, authorised.payload_length());
let mut bytes_available = 0;

Unless I'm missing something, this value is never written into the encoding. That seems wrong, given that it exists in order to detect mismatches between the payload slice length we claim upfront to encode and the slice we actually encode. Do we always promise to encode the full payload, or does this implementation also support incomplete prefixes? If the latter, than it is almost certainly buggy. If the former, then there should be no need to use bytes_available (as it is being looked up right now), simply check whether the number of written bytes matches the payload length instead.

Unless I'm missing something, this value is never written into the encoding. That seems wrong, given that it exists in order to detect mismatches between the payload slice length we claim upfront to encode and the slice we actually encode. Do we always promise to encode the *full* payload, or does this implementation also support incomplete prefixes? If the latter, than it is almost certainly buggy. If the former, then there should be no need to use `bytes_available` (as it is being looked up right now), simply check whether the number of written bytes matches the payload length instead.
@ -0,0 +196,4 @@
if bytes_consumed != bytes_available {
return Err(ExportDropError::ConsumedBytesMismatch);
}
}

There is another missing validity check: we need to emit an error if the number of entries we ended up encoding is unequal to entries_count (which is the number we put into the encoding as the number of entries that will follow).

There is another missing validity check: we need to emit an error if the number of entries we ended up encoding is unequal to `entries_count` (which is the number we put into the encoding as the number of entries that will follow).
This pull request has changes conflicting with the target branch.
  • Cargo.lock
  • willow25/CHANGELOG.md
  • willow25/Cargo.toml
  • willow25/src/defaults.rs
  • willow25/src/lib.rs
  • willow25/src/storage/memory_store.rs
  • willow25/src/storage/mod.rs
  • willow25/src/storage/persistent_store.rs
  • willow25/src/storage/store.rs
View command line instructions

Manual merge helper

Use this merge commit message when completing the merge manually.

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin drop_format:drop_format
git switch drop_format

Merge

Merge the changes and update on Forgejo.

Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.

git switch persistent-store
git merge --no-ff drop_format
git switch drop_format
git rebase persistent-store
git switch persistent-store
git merge --ff-only drop_format
git switch drop_format
git rebase persistent-store
git switch persistent-store
git merge --no-ff drop_format
git switch persistent-store
git merge --squash drop_format
git switch persistent-store
git merge --ff-only drop_format
git switch persistent-store
git merge drop_format
git push origin persistent-store
Sign in to join this conversation.
No description provided.