Drop format #31
No reviewers
Labels
No labels
breaking
bug
contribution welcome
duplicate
enhancement
good first issue
help wanted
invalid
question
upstream
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
worm-blossom/willow_rs!31
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "drop_format"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Adds an implementation of Willow's Drop Format to
willow25import_dropandexport_dropDropFormatStoretrait to be implemented by stores to be used with this APIDropFormatStoreonMemoryStoreandPersistentStoreSome of the things we had to do to first make this implementation possible:
export_dropdefault_entry,default_authorisation_tokenequivalents (see #21)Storeholds any givenEntry's complete payloadimport_dropdefault_entry,default_authorisation_tokenequivalents (see #21)Limitproducer ufotofu used to have (worm-blossom/ufotofu#15)Storetrait needs a way to pipe in payloads for existing entriesRe limit adaptor, see worm-blossom/ufotofu#15
WIP: Drop formatto Drop formatc2102b41a2toc2102b41a23f2020696btoacb2a6434bI'll complete the review after sports =S
@ -0,0 +15,4 @@/// A trait for describing Willow stores compatible with the [`export_drop`] and ['import_drop`].////// This trait requires the store is able to count the number of ['Entry`] included by an [`Area`].pub trait DropFormatStore: Store + PayloadPrefixStore + Clone {We decided against spec-specific store traits, instead naming them by what functionality they offer. So this should be moved into its own file, be called
CountingStore(or something similar), and thePayloadPrefixStoresupertrait should be removed.@ -0,0 +24,4 @@) -> impl Future<Output = Result<usize, Self::InternalError>>;}#[derive(Debug)]Missing docs.
@ -0,0 +26,4 @@#[derive(Debug)]pub enum ExportDropError<ConsumerError, StoreErr> {EmptyDrop,Why is this an error? The spec allows empty drops.
@ -0,0 +25,4 @@}#[derive(Debug)]pub enum ExportDropError<ConsumerError, StoreErr> {First
StoreError, secondConsumerError(that order is how it is everywhere else; and please use "Error" instead of "Err" (or at the very least not both within the same type =D))Name this
StoreError, like in every other error type.@ -0,0 +28,4 @@pub enum ExportDropError<ConsumerError, StoreErr> {EmptyDrop,StoreErr(StoreErr),ConsumerProblem(ConsumerError),Name this
ConsumerError, like in every other error type.@ -0,0 +49,4 @@// https://willowprotocol.org/specs/drop-format/index.html#willow_drop_format// TODO: Re-order the areas in a way which would give the highest probability of an efficiently relatively encoded entries.// TODO: De-duplicate entries from overlapping areas.Add a note that both of these are optional optimisations, not missing functionality, please. To anyone without our context, these comments as they stand are misleading.
@ -0,0 +69,4 @@return Err(ExportDropError::EmptyDrop);}compact_u64::cu64_encode_standalone(entries_count as u64, consumer)Import instead of using a fully qualified name, please.
Good luck =S
@ -0,0 +6,4 @@storage::{PayloadPrefixStore, Store},};use bab_rs::generic::storage::verifiable_streaming::SliceStreamingOptions;use ufotofu::{Replace all the ufotofu imports with
@ -0,0 +15,4 @@/// A trait for describing Willow stores compatible with the [`export_drop`] and ['import_drop`].////// This trait requires the store is able to count the number of ['Entry`] included by an [`Area`].pub trait DropFormatStore: Store + PayloadPrefixStore + Clone {Hmm, I wonder if we should simply not have this trait at all (for now). producing a drop has to iterate through all entries individually anyway when encoding. So iterating through all of them first to count them will only slow things down by a factor of 2. Could be worth the reduced code surface for now; once we have a proper
SummarisableStorethat can do both counting and fingerprints then we could also update the Drop format implementation to use that for counting.Just a thought, I don't have a particularly strong opinion on this.
IIRC the reason we needed this is because the count has to be encoded before the entries are.
If you waited to determine the count of the entries at the end of iteration, you'd either have to put all entries in memory first so that you can encode them, or iterate a second time through the area.
@ -0,0 +24,4 @@) -> impl Future<Output = Result<usize, Self::InternalError>>;}#[derive(Debug)]Also derive Clone and Eq.
@ -0,0 +69,4 @@return Err(ExportDropError::EmptyDrop);}compact_u64::cu64_encode_standalone(entries_count as u64, consumer)import
@ -0,0 +78,4 @@// We can do this because these are the default values.let mut prev_authorised = unsafe {PossiblyAuthorisedEntry::new(entry_minus_one, auth_minus_one)Could you factor this out into a function of the
defaultsmodule?Later. Left a TODO comment.
@ -0,0 +110,4 @@.map_err(ExportDropError::ConsumerProblem)?;}compact_u64::write_tag(&mut header, 2, 0, u64::from(authorised.timestamp()));import
@ -0,0 +128,4 @@}consumer.consume(ufotofu::prelude::Either::Left(header))import
Leftdirectly (simplyuse Either::*) somehwere after the ufotofu prelude import.@ -0,0 +154,4 @@.await.map_err(ExportDropError::ConsumerProblem)?;compact_u64::cu64_encode(u64::from(authorised.timestamp()), 2, consumer)import (I'll stop marking these, just do a search for
compact_u64=S)@ -0,0 +178,4 @@.map_err(ExportDropError::ConsumerProblem)?;if include_payloads && header & 0b0000_0001 == 0b0000_0001 {// TODO: If the entry had its complete payload, consume it here.Why only complete payloads? The drop format can also handle partial payloads.
@ -0,0 +193,4 @@crate::storage::GetPayloadSliceError::ConsumerError(err) => ExportDropError::ConsumerProblem(err),crate::storage::GetPayloadSliceError::NoSuchEntry => panic!("The store did not have the payload for an entry this implementation was pretty sure it should have."),})?;}This needs to check whether the full payload is available, or whether the
consumergot closed after less than payload_length many bytes where written into it.store.get_payload_slicereturns how many bytes were written into the consumer.@ -0,0 +202,4 @@Ok(())}#[derive(Debug)]Derive more stuff (Clone and Eq at the very least - not having equality on types is weird).
@ -0,0 +203,4 @@}#[derive(Debug)]pub enum ImportDropError<ProducerError, StoreError> {Swap order of generics.
@ -0,0 +205,4 @@#[derive(Debug)]pub enum ImportDropError<ProducerError, StoreError> {UnexpectedEndOfInput,ProducerProblem(ProducerError),Rename to
ProducerErrorfor consistency.@ -0,0 +207,4 @@UnexpectedEndOfInput,ProducerProblem(ProducerError),BadEntry,Store(StoreError),Rename to
StoreError@ -0,0 +184,4 @@authorised.namespace_id(),&authorised,None,consumer,get_payload_slicecloses the consumer, after which you are not allowed to continue using it.There should probably be a ufotofu adaptor that "shields" an inner consumer from closing, for situations such as this. I'll think about this a bit more.
See https://discord.com/channels/780542716940517407/1363925804475744587/1486296443588313128 for more discussion; perhaps instead of the adaptor the Store trait should simply change to not close the consumer...
@ -0,0 +230,4 @@/// If `ignore_nonzero_offset_partial_payload_slices` is `true`, then payload slices at non-zero offsets will be ignored, otherwise an error will be returned. As there is no known existing implementation for encoding drops which encodes payload slices at non-zero offsets, you probably want this to be `true`.pub async fn import_drop<Store, P>(store: &mut Store,producer: &mut P,The docs should give me some information about
p.@ -0,0 +246,4 @@// We can do this because these are the default values.let mut prev_authorised = unsafe {PossiblyAuthorisedEntry::new(entry_minus_one, auth_minus_one)Reuse the function from the
defaultsmodule here that doesn't exist yet.Later. Left a TODO comment.
@ -0,0 +227,4 @@/// Decode [`Entry`]s (and possibly their corresponding payloads) from a [drop] and insert them into a [`Store`].////// If `ignore_nonzero_offset_partial_payload_slices` is `true`, then payload slices at non-zero offsets will be ignored, otherwise an error will be returned. As there is no known existing implementation for encoding drops which encodes payload slices at non-zero offsets, you probably want this to be `true`.That second sentence reads more like an argument for setting this to
falseto me. But also, I'd remove it completely, because it will require updating in the future.@ -0,0 +235,4 @@) -> Result<(), ImportDropError<P::Error, Store::InternalError>>whereStore: DropFormatStore,P: BulkProducer<Item = u8, Final = ()>,Do we need
Final = ()for anything?PayloadPrefixStore.append_to_payload_prefixrequires it.@ -0,0 +339,4 @@.await.map_err(ImportDropError::Store)?;// If we have more than zero bytes, skip ahead by this much.I'm a bit scared of race conditions happening here, because
available_lenmight be outdated by the time the rest of the logic happens.I think it is better to simply write all the payload bytes from the drop directly into the store. The store will check validity and reject invalid stuff, so nothing can go wrong with that approach. (also, it simplifies things)
@ -0,0 +504,4 @@prev_authorised = decoded_authed_entry}todo!()hmm
Just "you forgot to change this to
Ok(())" is sufficient, thank you@ -0,0 +368,4 @@).await.map_err(|err| match err {crate::storage::AppendToPayloadPrefixError::ProducerError(err) => match errI really don't like your Rust editor extension for using fully qualified names instead of importing stuff...
This is actually what the Rust LSP does, it will be like this in all editors. But I will remember that qualified names are a no-go in future.
@ -0,0 +439,4 @@).await.map_err(|err| match err {crate::storage::AppendToPayloadPrefixError::ProducerError(err) => match errHad the exact (?) same code a couple of lines earlier as well. Could this move into a
Fromimplementation and then disappear via the?operator?@ -0,0 +400,4 @@let mut limited = producer.to_limit(slice_length as usize);if !entry_was_inserted {// We might already have some of the payload bytes!Again, better not to do this whole optimisation I think.
@ -0,0 +463,4 @@})?;}0b0000_0011 => {// TODO: Our store traits don't yet have a way of ingesting partial slices at offsets.Is this still a todo?
Our intention is to eventually support partial payload slices at non-zero offsets, so I'd say yes.
@ -0,0 +466,4 @@// TODO: Our store traits don't yet have a way of ingesting partial slices at offsets.// So if it was asked for, error here.// Or just skip ahead to the next entry without ingesting any of the slices!if ignore_nonzero_offset_partial_payload_slices {Missing negation?
@ -193,0 +220,4 @@/// A trait for describing Willow stores which can count all the of entries in a given [`Area`].pub trait AreaCountStore: Store {/// Return the count of [`Entry`] [included](https://willowprotocol.org/specs/grouping-entries/index.html#area_include) by a given [`Area`] within a given [namespace](https://willowprotocol.org/specs/data-model/index.html#namespace)``/// Returns [...] .` (third-person "s", and "." at end of sentence)
@ -193,0 +225,4 @@&self,namespace_id: &NamespaceId,area: &Area,) -> impl Future<Output = Result<usize, Self::InternalError>>;Should be an
async fninstead of manually returning animpl Future@ -0,0 +22,4 @@StoreError(StoreError),/// The consumer consuming the encoded bytes of the drop returned an error.ConsumerError(ConsumerError),/// The given consumer consumed a different number of bytes than was expected.This comment feels a bit misleading, it makes it sound like there is a problem with the consumer. When actually the problem is that
export_dropwrote invalid data. How about something like@ -0,0 +105,4 @@write_tag(&mut header, 2, 0, u64::from(authorised.timestamp()));write_tag(&mut header, 2, 2, authorised.payload_length());let mut bytes_available = 0;Unless I'm missing something, this value is never written into the encoding. That seems wrong, given that it exists in order to detect mismatches between the payload slice length we claim upfront to encode and the slice we actually encode. Do we always promise to encode the full payload, or does this implementation also support incomplete prefixes? If the latter, than it is almost certainly buggy. If the former, then there should be no need to use
bytes_available(as it is being looked up right now), simply check whether the number of written bytes matches the payload length instead.@ -0,0 +196,4 @@if bytes_consumed != bytes_available {return Err(ExportDropError::ConsumedBytesMismatch);}}There is another missing validity check: we need to emit an error if the number of entries we ended up encoding is unequal to
entries_count(which is the number we put into the encoding as the number of entries that will follow).View command line instructions
Manual merge helper
Use this merge commit message when completing the merge manually.
Checkout
From your project repository, check out a new branch and test the changes.Merge
Merge the changes and update on Forgejo.Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.