Skip to content

feat: Refactor apply_block() locks#1901

Open
sergerad wants to merge 50 commits intonextfrom
sergerad-store-lock-free
Open

feat: Refactor apply_block() locks#1901
sergerad wants to merge 50 commits intonextfrom
sergerad-store-lock-free

Conversation

@sergerad
Copy link
Copy Markdown
Collaborator

@sergerad sergerad commented Apr 8, 2026

Lock-free Store state

Problem

On next, the store's apply_block uses a two-phase commit with oneshot channels to synchronize DB and in-memory updates. It acquires a Mutex to prevent concurrent writes, then coordinates between a DB update task and the main apply path using channels (acquired_allowed, inform_acquire_done). During the commit phase, it acquires RwLock::write() on inner (which holds the nullifier tree, account tree, and blockchain) and a separate RwLock::write() on forest, blocking all readers for the duration of the DB commit and in-memory mutations.

This means:

  • Write throughput is limited by read latency: the writer must wait for all in-flight reads to release their RwLock read guards before it can acquire the write lock. On Testnet we have observed store.Rpc/GetAccount having P99 ~4.4s over 5.6M calls in a 3 day period. This can cause the write path to slow down significantly.
  • Read latency spikes during writes: readers block on the write lock while the DB transaction commits and in-memory state is updated.
  • Complex coordination logic: the two-phase channel handshake (acquired_allowed / inform_acquire_done) adds error-prone complexity and extra ConcurrentWrite error paths for root staleness checks.

Solution

Replace the RwLock/Mutex/channel coordination with a single ArcSwap<InMemoryState> and a dedicated writer task serialized by an mpsc channel.

Readers call snapshot() which returns Arc<InMemoryState> via ArcSwap::load_full() — a wait-free atomic refcount increment with no data cloning and no locks. The returned Arc is a frozen view: readers are completely unaffected if the writer publishes new state while they hold it.

The writer is a dedicated tokio::spawned task that owns the writable trees directly as local variables — no Arc, no locks, no interior mutability. It receives blocks via an mpsc channel (one at a time), validates against the current snapshot, commits to DB with no locks held, applies mutations to its owned trees, then builds a new InMemoryState with snapshot-backed read-only copies of the trees and atomically publishes via ArcSwap::store().

Both the nullifier tree and account tree are RocksDB-backed. The writer holds writable LargeSmt<RocksDbStorage> instances. After applying mutations (which write to RocksDB), the writer creates LargeSmt<RocksDbSnapshotStorage> instances for the new InMemoryState — these use native RocksDB snapshots for point-in-time consistent reads with zero data copying.

Changes

  • InMemoryState: New struct holding block_num, nullifier tree, account tree, blockchain MMR, and forest. Lives behind ArcSwap on State.
  • Scoped<T>: New wrapper pairing query results with the BlockNumber of the snapshot that produced them, so callers always know which block the data corresponds to. Applied to get_transaction_inputs, sync_nullifiers, sync_notes, sync_account_vault, sync_account_storage_maps, sync_transactions.
  • Writer task: Replaced the two-phase commit (apply_block.rs) with a dedicated mpsc-based writer loop (writer.rs). Owns writable trees as local variables — no locks held during DB commit.
  • RocksDbSnapshotStorage: New read-only, Cloneable storage backend using native RocksDB snapshots. Used for the in-memory snapshot trees so readers don't block the writer.
  • Trait bound relaxation: SmtStorage split into SmtStorageReader (read) and SmtStorage (read+write). Read-only snapshot trees only need SmtStorageReader. AccountTreeWithHistory and loader functions relaxed accordingly.
  • Clone derives added to AccountTreeWithHistory, AccountStateForest, and upstream LargeSmt/LargeSmtForest (via patches) to support snapshot construction.
  • Removed: RwLock<InnerState>, RwLock<AccountStateForest>, Mutex<()> write lock, apply_block.rs, all .await on snapshot calls.

Further changes required

We will want to do the following either in this PR or followups:

  • Replace LargeSmtForest<ForestInMemoryBackend> with a RocksDb backend.
  • Dynamic in-memory cache size for LargeSmt so that we can reduce memory footprint of NullifierTree, AccountTree, and AccountStateForest in InMemoryState.
  • Observability as to number of separate Arc<InMemoryState> instances at any one time.

Performance characteristics

next This PR
Reader cost RwLock read guard (may block on writer) Atomic refcount bump (wait-free)
Writer blocked by readers Yes (waits for read guards to drain) No
Locks held during DB commit RwLock write lock on inner None
Write coordination Two-phase channel handshake + root staleness check Single writer task via mpsc
Deep clone per block None Once (writer clones InMemoryState — trees use RocksDB snapshots, no tree data copied)
Unsafe code None None

@sergerad sergerad added the no changelog This PR does not require an entry in the `CHANGELOG.md` file label Apr 8, 2026
@sergerad sergerad requested review from bobbinth and kkovaacs April 14, 2026 03:38
@sergerad sergerad marked this pull request as ready for review April 14, 2026 03:38
Comment thread crates/store/src/state/writer.rs Outdated
Copy link
Copy Markdown
Contributor

@kkovaacs kkovaacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments, but I think the overall concept is solid.

Comment thread crates/store/src/state/writer.rs Outdated
Comment thread crates/store/src/state/writer.rs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no changelog This PR does not require an entry in the `CHANGELOG.md` file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants