turbo-tasks-backend: stability fixes for task cancellation and error handling#92254
Merged
turbo-tasks-backend: stability fixes for task cancellation and error handling#92254
Conversation
When a task fails partway through execution (before creating all cells from its previous run), cell_counters only reflects the partially-executed state. Updating cell_type_max_index from these partial counters removes entries for cell types not yet encountered, causing "Cell no longer exists" errors for tasks that still hold cell dependencies from the previous run. This mirrors the existing behavior of task_execution_completed_cleanup, which already skips removal of cell data when a task errors. Now cell_type_max_index is kept consistent with the preserved cell data. The bug manifested with serialization = "hash" types (e.g. FileContent) where cell data is transient: readers fall back to cell_type_max_index to determine whether to schedule recomputation, so a stale None there caused a hard "no longer exists" error instead of a retry. Co-Authored-By: Claude <[email protected]>
…led tasks Three fixes for issues that occur when tasks are cancelled during shutdown with filesystem caching enabled: 1. Notify in_progress_cells on cancellation: task_execution_canceled now drains and notifies all InProgressCellState events, preventing stop_and_wait from hanging on foreground jobs that will never complete. 2. Bail early for cancelled tasks in try_read_task_cell: when a task is in Canceled state, bail before calling listen_to_cell instead of after, avoiding the creation of pointless listeners. 3. Mark cancelled tasks as session-dependent dirty: prevents cache poisoning where "was canceled" errors get persisted as task output and break subsequent builds. The session-dependent dirty flag ensures the task is re-executed in the next session, which invalidates dependents and corrects the stale errors. Co-Authored-By: Claude <[email protected]>
Collaborator
Tests Passed |
Deduplicate the "read old dirty state, apply new state, propagate via ComputeDirtyAndCleanUpdate" pattern that was duplicated between task_execution_canceled and task_execution_completed_finish. The new update_dirty_state default method on TaskGuard handles both transitions (to SessionDependent or to None) and returns the aggregation job + result for callers that need post-processing (e.g. all_clean_event). Co-Authored-By: Claude <[email protected]>
Merging this PR will degrade performance by 3.71%
Performance Changes
Comparing Footnotes
|
Deduplicate the "read old dirty state, apply new state, propagate via ComputeDirtyAndCleanUpdate" pattern that was duplicated between task_execution_canceled and task_execution_completed_finish. The new update_dirty_state default method on TaskGuard handles both transitions (to SessionDependent or to None) and returns the aggregation job + result for callers that need post-processing (e.g. all_clean_event). Co-Authored-By: Claude <[email protected]>
Collaborator
Stats from current PR✅ No significant changes detected📊 All Metrics📖 Metrics GlossaryDev Server Metrics:
Build Metrics:
Change Thresholds:
⚡ Dev Server
📦 Dev Server (Webpack) (Legacy)📦 Dev Server (Webpack)
⚡ Production Builds
📦 Production Builds (Webpack) (Legacy)📦 Production Builds (Webpack)
📦 Bundle SizesBundle Sizes⚡ TurbopackClient Main Bundles
Server Middleware
Build DetailsBuild Manifests
📦 WebpackClient Main Bundles
Polyfills
Pages
Server Edge SSR
Middleware
Build DetailsBuild Manifests
Build Cache
🔄 Shared (bundler-independent)Runtimes
📝 Changed Files (2 files)Files with changes:
View diffspages-api-tu..time.prod.jsDiff too large to display pages-turbo...time.prod.jsDiff too large to display 📎 Tarball URL |
- Hoist set_in_progress(Canceled) + drop(task) out of if/else branches in task_execution_canceled to reduce duplication - Move ComputeDirtyAndCleanUpdate import from function body to module level in operation/mod.rs - Add cross-reference comments between cell_type_max_index skip in task_execution_completed_prepare and cell data skip in task_execution_completed_cleanup Co-Authored-By: Claude <[email protected]>
The clean transition notification is a direct consequence of the dirty state update. Moving it into the helper ensures any future caller that transitions a task to clean will fire the event. This also simplifies the return type from (Option<Job>, Result) to just Option<Job>. Co-Authored-By: Claude <[email protected]>
- Collapse nested if blocks into `if a && let Some(x) = expr` (collapsible_if) - Remove unnecessary let binding before return (let_and_return) Co-Authored-By: Claude <[email protected]>
lukesandberg
reviewed
Apr 2, 2026
lukesandberg
approved these changes
Apr 2, 2026
sokra
added a commit
that referenced
this pull request
Apr 3, 2026
…handling (#92254) ### What? Bug fixes and a refactoring in `turbo-tasks-backend` targeting stability issues that surface when filesystem caching is enabled: 1. **Preserve `cell_type_max_index` on task error** — when a task fails partway through execution, `cell_counters` only reflects the partially-executed state. Previously, `cell_type_max_index` was updated from these incomplete counters, which removed entries for cell types not yet encountered. This caused `"Cell no longer exists"` hard errors for tasks that still held dependencies on those cells. The fix skips the `cell_type_max_index` update on error, keeping it consistent with the preserved cell data (which already wasn't cleared on error). This bug manifested specifically with `serialization = "hash"` cell types (e.g. `FileContent`), where cell data is transient and readers fall back to `cell_type_max_index` to decide whether to schedule recomputation. 2. **Fix shutdown hang and cache poisoning for cancelled tasks** — three related fixes for tasks cancelled during shutdown: - `task_execution_canceled` now drains and notifies all `InProgressCellState` events, preventing `stop_and_wait` from hanging on foreground jobs waiting on cells that will never be filled. - `try_read_task_cell` bails early (before calling `listen_to_cell`) when a task is in `Canceled` state, avoiding pointless listener registrations that would never resolve. - Cancelled tasks are marked as session-dependent dirty, preventing cache poisoning where `"was canceled"` errors get persisted as task output and break subsequent builds. The session-dependent dirty flag causes the task to re-execute in the next session, invalidating stale dependents. 3. **Extract `update_dirty_state` helper on `TaskGuard`** — the "read old dirty state → apply new state → propagate via `ComputeDirtyAndCleanUpdate`" pattern was duplicated between `task_execution_canceled` and `task_execution_completed_finish`. The new `update_dirty_state` default method on `TaskGuard` handles both transitions (to `SessionDependent` or to `None`) and returns the aggregation job + `ComputeDirtyAndCleanUpdateResult` for callers that need post-processing (e.g. firing the `all_clean_event`). ### Why? These bugs caused observable failures when using Turbopack with filesystem caching (`--cache` / persistent cache): - `"Cell no longer exists"` panics/errors on incremental rebuilds after a task error. - Hangs on `stop_and_wait` during dev server shutdown. - Stale `"was canceled"` errors persisted in the cache breaking subsequent builds until the cache is cleared. ### How? Changes are in `turbopack/crates/turbo-tasks-backend/src/backend/`: **`mod.rs`:** - Guard the `cell_type_max_index` update block inside `if result.is_ok()` to skip it on error, with a cross-reference comment to `task_execution_completed_cleanup` (which similarly skips cell data removal on error — the two must stay in sync). - Move the `is_cancelled` bail in `try_read_task_cell` before the `listen_to_cell` call to avoid inserting phantom `InProgressCellState` events that would never be notified. - In `task_execution_canceled`: switch to `TaskDataCategory::All` (needed for dirty state metadata access), notify all pending in-progress cell events, and mark the task as `SessionDependent` dirty via the new helper. - In `task_execution_completed_finish`: replace ~77 lines of inline dirty state logic with a call to `task.update_dirty_state(new_dirtyness)`, preserving the `all_clean_event` post-processing and the `dirty_changed` variable under `#[cfg(feature = "verify_determinism")]`. **`operation/mod.rs`:** - Add `update_dirty_state` default method on `TaskGuard` trait (~60 lines), co-located with the existing `dirty_state()` reader. Takes `Option<Dirtyness>`, applies the transition, builds `ComputeDirtyAndCleanUpdate`, and returns `(Option<AggregationUpdateJob>, ComputeDirtyAndCleanUpdateResult)`. - Add `ComputeDirtyAndCleanUpdateResult` to the public re-exports. --------- Co-authored-by: Tobias Koppers <[email protected]> Co-authored-by: Claude <[email protected]>
eps1lon
pushed a commit
that referenced
this pull request
Apr 7, 2026
…handling (#92254) ### What? Bug fixes and a refactoring in `turbo-tasks-backend` targeting stability issues that surface when filesystem caching is enabled: 1. **Preserve `cell_type_max_index` on task error** — when a task fails partway through execution, `cell_counters` only reflects the partially-executed state. Previously, `cell_type_max_index` was updated from these incomplete counters, which removed entries for cell types not yet encountered. This caused `"Cell no longer exists"` hard errors for tasks that still held dependencies on those cells. The fix skips the `cell_type_max_index` update on error, keeping it consistent with the preserved cell data (which already wasn't cleared on error). This bug manifested specifically with `serialization = "hash"` cell types (e.g. `FileContent`), where cell data is transient and readers fall back to `cell_type_max_index` to decide whether to schedule recomputation. 2. **Fix shutdown hang and cache poisoning for cancelled tasks** — three related fixes for tasks cancelled during shutdown: - `task_execution_canceled` now drains and notifies all `InProgressCellState` events, preventing `stop_and_wait` from hanging on foreground jobs waiting on cells that will never be filled. - `try_read_task_cell` bails early (before calling `listen_to_cell`) when a task is in `Canceled` state, avoiding pointless listener registrations that would never resolve. - Cancelled tasks are marked as session-dependent dirty, preventing cache poisoning where `"was canceled"` errors get persisted as task output and break subsequent builds. The session-dependent dirty flag causes the task to re-execute in the next session, invalidating stale dependents. 3. **Extract `update_dirty_state` helper on `TaskGuard`** — the "read old dirty state → apply new state → propagate via `ComputeDirtyAndCleanUpdate`" pattern was duplicated between `task_execution_canceled` and `task_execution_completed_finish`. The new `update_dirty_state` default method on `TaskGuard` handles both transitions (to `SessionDependent` or to `None`) and returns the aggregation job + `ComputeDirtyAndCleanUpdateResult` for callers that need post-processing (e.g. firing the `all_clean_event`). ### Why? These bugs caused observable failures when using Turbopack with filesystem caching (`--cache` / persistent cache): - `"Cell no longer exists"` panics/errors on incremental rebuilds after a task error. - Hangs on `stop_and_wait` during dev server shutdown. - Stale `"was canceled"` errors persisted in the cache breaking subsequent builds until the cache is cleared. ### How? Changes are in `turbopack/crates/turbo-tasks-backend/src/backend/`: **`mod.rs`:** - Guard the `cell_type_max_index` update block inside `if result.is_ok()` to skip it on error, with a cross-reference comment to `task_execution_completed_cleanup` (which similarly skips cell data removal on error — the two must stay in sync). - Move the `is_cancelled` bail in `try_read_task_cell` before the `listen_to_cell` call to avoid inserting phantom `InProgressCellState` events that would never be notified. - In `task_execution_canceled`: switch to `TaskDataCategory::All` (needed for dirty state metadata access), notify all pending in-progress cell events, and mark the task as `SessionDependent` dirty via the new helper. - In `task_execution_completed_finish`: replace ~77 lines of inline dirty state logic with a call to `task.update_dirty_state(new_dirtyness)`, preserving the `all_clean_event` post-processing and the `dirty_changed` variable under `#[cfg(feature = "verify_determinism")]`. **`operation/mod.rs`:** - Add `update_dirty_state` default method on `TaskGuard` trait (~60 lines), co-located with the existing `dirty_state()` reader. Takes `Option<Dirtyness>`, applies the transition, builds `ComputeDirtyAndCleanUpdate`, and returns `(Option<AggregationUpdateJob>, ComputeDirtyAndCleanUpdateResult)`. - Add `ComputeDirtyAndCleanUpdateResult` to the public re-exports. --------- Co-authored-by: Tobias Koppers <[email protected]> Co-authored-by: Claude <[email protected]>
eps1lon
pushed a commit
that referenced
this pull request
Apr 7, 2026
…handling (#92254) ### What? Bug fixes and a refactoring in `turbo-tasks-backend` targeting stability issues that surface when filesystem caching is enabled: 1. **Preserve `cell_type_max_index` on task error** — when a task fails partway through execution, `cell_counters` only reflects the partially-executed state. Previously, `cell_type_max_index` was updated from these incomplete counters, which removed entries for cell types not yet encountered. This caused `"Cell no longer exists"` hard errors for tasks that still held dependencies on those cells. The fix skips the `cell_type_max_index` update on error, keeping it consistent with the preserved cell data (which already wasn't cleared on error). This bug manifested specifically with `serialization = "hash"` cell types (e.g. `FileContent`), where cell data is transient and readers fall back to `cell_type_max_index` to decide whether to schedule recomputation. 2. **Fix shutdown hang and cache poisoning for cancelled tasks** — three related fixes for tasks cancelled during shutdown: - `task_execution_canceled` now drains and notifies all `InProgressCellState` events, preventing `stop_and_wait` from hanging on foreground jobs waiting on cells that will never be filled. - `try_read_task_cell` bails early (before calling `listen_to_cell`) when a task is in `Canceled` state, avoiding pointless listener registrations that would never resolve. - Cancelled tasks are marked as session-dependent dirty, preventing cache poisoning where `"was canceled"` errors get persisted as task output and break subsequent builds. The session-dependent dirty flag causes the task to re-execute in the next session, invalidating stale dependents. 3. **Extract `update_dirty_state` helper on `TaskGuard`** — the "read old dirty state → apply new state → propagate via `ComputeDirtyAndCleanUpdate`" pattern was duplicated between `task_execution_canceled` and `task_execution_completed_finish`. The new `update_dirty_state` default method on `TaskGuard` handles both transitions (to `SessionDependent` or to `None`) and returns the aggregation job + `ComputeDirtyAndCleanUpdateResult` for callers that need post-processing (e.g. firing the `all_clean_event`). ### Why? These bugs caused observable failures when using Turbopack with filesystem caching (`--cache` / persistent cache): - `"Cell no longer exists"` panics/errors on incremental rebuilds after a task error. - Hangs on `stop_and_wait` during dev server shutdown. - Stale `"was canceled"` errors persisted in the cache breaking subsequent builds until the cache is cleared. ### How? Changes are in `turbopack/crates/turbo-tasks-backend/src/backend/`: **`mod.rs`:** - Guard the `cell_type_max_index` update block inside `if result.is_ok()` to skip it on error, with a cross-reference comment to `task_execution_completed_cleanup` (which similarly skips cell data removal on error — the two must stay in sync). - Move the `is_cancelled` bail in `try_read_task_cell` before the `listen_to_cell` call to avoid inserting phantom `InProgressCellState` events that would never be notified. - In `task_execution_canceled`: switch to `TaskDataCategory::All` (needed for dirty state metadata access), notify all pending in-progress cell events, and mark the task as `SessionDependent` dirty via the new helper. - In `task_execution_completed_finish`: replace ~77 lines of inline dirty state logic with a call to `task.update_dirty_state(new_dirtyness)`, preserving the `all_clean_event` post-processing and the `dirty_changed` variable under `#[cfg(feature = "verify_determinism")]`. **`operation/mod.rs`:** - Add `update_dirty_state` default method on `TaskGuard` trait (~60 lines), co-located with the existing `dirty_state()` reader. Takes `Option<Dirtyness>`, applies the transition, builds `ComputeDirtyAndCleanUpdate`, and returns `(Option<AggregationUpdateJob>, ComputeDirtyAndCleanUpdateResult)`. - Add `ComputeDirtyAndCleanUpdateResult` to the public re-exports. --------- Co-authored-by: Tobias Koppers <[email protected]> Co-authored-by: Claude <[email protected]>
4 tasks
sokra
added a commit
that referenced
this pull request
Apr 7, 2026
#92108) ### What? Re-lands #91576 ("turbo-tasks: add hashed cell mode for hash-based change detection without cell data"), which was reverted in #92103 due to a `FATAL` crash in the `filesystem-cache` test suite. Includes a bug fix on top: in `task_execution_completed_prepare`, skip updating `cell_type_max_index` when the task completed with an error. Also adds a `CellHash = [u8; 16]` type alias (requested in review) used throughout the hash pipeline. ### Why? **The original feature** (`serialization = "hash"` on `FileContent` and `Code`) stores a hash of the cell data instead of the full serialized value. On session restore, the hash is used to detect whether cell content has changed without needing the full data in memory. This avoids a large persistent cache size increase. **The bug** that caused the revert: When a task fails partway through re-execution (before recreating all the cells from its previous run), `cell_counters` only reflects the partially-executed state. The old code used those partial counters to update `cell_type_max_index`, removing entries for cell types that were not yet created at the point of failure. This caused downstream tasks that still held cell dependencies from the previous successful run to hit a hard "Cell no longer exists" error. **Concrete failure path** in `filesystem-cache rename app page` test: 1. `get_app_page_entry` runs for `/remove-me/page`, creating two `FileContent` cells (indices 0 and 1). `cell_type_max_index[FileContent] = 2` is persisted. 2. The folder is renamed (`app/remove-me` → `app/add-me`), dirtying the task. 3. On re-execution, `get_app_page_entry` fails at `config.await?` (the loader tree errors because the directory is gone) — before any `FileContent::cell()` calls. 4. `cell_counters` has no `FileContent` entry → old code removed `cell_type_max_index[FileContent]`. 5. The `parse` task tries to read `FileContent` cell 1 from `get_app_page_entry` → `cell_type_max_index` is `None` → **"Cell no longer exists" panic → FATAL error**. **Why it didn't crash before** `serialization = "hash"`: `FileContent` was previously serializable, so `parse` read stale cell data directly from `persistent_cell_data`, which `task_execution_completed_cleanup` already preserves on error. With `serialization = "hash"`, data is transient — readers fall back to `cell_type_max_index` for range validation, where a stale `None` caused the crash. ### How? #### Core feature: `serialization = "hash"` cell mode - New `SerializationMode::Hash` variant in `turbo-tasks-macros` — marks a value type as non-serializable but stores a `DeterministicHash` of the cell data for change detection. - `VcCellHashedCompareMode<T>` cell mode: compares values via `PartialEq` when available, falls back to hash comparison when transient data has been evicted. - `hashed_compare_and_update` / `hashed_compare_and_update_with_shared_reference` on `CurrentCellRef` compute and pass content hashes through the update pipeline. - Backend `update_cell` uses hash-based comparison to skip invalidation when the old cell data is unavailable but the hash matches. - `cell_data_hash: AutoMap<CellId, CellHash>` field in task storage persists hashes across sessions. - Stale `cell_data_hash` entries are cleaned up in `task_execution_completed_cleanup` alongside cell data removal. - `CellHash = [u8; 16]` type alias keeps alignment at 1 byte to avoid padding growth in `AutoMap`/`LazyField` enum variants. - Hash bytes use little-endian encoding (`to_le_bytes`) for cross-platform cache portability. #### Bug fix: preserve `cell_type_max_index` on task error In `task_execution_completed_prepare`, guard the `cell_type_max_index` update block with `if result.is_ok()`. This mirrors the existing `task_execution_completed_cleanup` behavior that already skips cell data removal when `is_error` is true, keeping `cell_type_max_index` consistent with the preserved transient cell data. #### Applied to `FileContent` and `Code` - `FileContent` uses `serialization = "hash"` — full content is persisted via a separate `PersistedFileContent` type when needed (e.g., in `DiskFileSystem::write`). - `Code` uses `serialization = "hash"` with `Arc<Vec<Mapping>>` for cheap cloning. `Code::cell_persisted()` creates a `PersistedCode` cell directly and returns `Vc<Code>` via `PersistedCode::to_code()`, avoiding an intermediate hash-mode cell. #### Other improvements - `DeterministicHash` impls for `SmallVec` and `()`. - `Xxh3Hash128Hasher::finish_bytes()` method returning `[u8; 16]`. - `hash = "manual"` option on `#[turbo_tasks::value]` to opt out of auto-deriving `DeterministicHash`. **Note:** The shutdown hang and cache poisoning fixes that were previously on this branch have been merged separately via #92254. ### Test plan - [x] `test/e2e/filesystem-cache/filesystem-cache.test.ts` passes (all 17 tests) - [x] New `turbopack/crates/turbo-tasks-backend/tests/hashed_cell_mode.rs` integration test verifies hash-based change detection: value changes trigger invalidation, equal values (same hash) do not - [x] `cargo check` passes for `turbo-tasks`, `turbo-tasks-backend`, `turbo-tasks-fs`, `turbopack-core`, `turbopack-ecmascript` - [x] CI green (attempt 2) <!-- NEXT_JS_LLM_PR --> --------- Co-authored-by: Tobias Koppers <[email protected]> Co-authored-by: Claude <[email protected]>
lukesandberg
pushed a commit
that referenced
this pull request
Apr 7, 2026
…handling (#92254) Bug fixes and a refactoring in `turbo-tasks-backend` targeting stability issues that surface when filesystem caching is enabled: 1. **Preserve `cell_type_max_index` on task error** — when a task fails partway through execution, `cell_counters` only reflects the partially-executed state. Previously, `cell_type_max_index` was updated from these incomplete counters, which removed entries for cell types not yet encountered. This caused `"Cell no longer exists"` hard errors for tasks that still held dependencies on those cells. The fix skips the `cell_type_max_index` update on error, keeping it consistent with the preserved cell data (which already wasn't cleared on error). This bug manifested specifically with `serialization = "hash"` cell types (e.g. `FileContent`), where cell data is transient and readers fall back to `cell_type_max_index` to decide whether to schedule recomputation. 2. **Fix shutdown hang and cache poisoning for cancelled tasks** — three related fixes for tasks cancelled during shutdown: - `task_execution_canceled` now drains and notifies all `InProgressCellState` events, preventing `stop_and_wait` from hanging on foreground jobs waiting on cells that will never be filled. - `try_read_task_cell` bails early (before calling `listen_to_cell`) when a task is in `Canceled` state, avoiding pointless listener registrations that would never resolve. - Cancelled tasks are marked as session-dependent dirty, preventing cache poisoning where `"was canceled"` errors get persisted as task output and break subsequent builds. The session-dependent dirty flag causes the task to re-execute in the next session, invalidating stale dependents. 3. **Extract `update_dirty_state` helper on `TaskGuard`** — the "read old dirty state → apply new state → propagate via `ComputeDirtyAndCleanUpdate`" pattern was duplicated between `task_execution_canceled` and `task_execution_completed_finish`. The new `update_dirty_state` default method on `TaskGuard` handles both transitions (to `SessionDependent` or to `None`) and returns the aggregation job + `ComputeDirtyAndCleanUpdateResult` for callers that need post-processing (e.g. firing the `all_clean_event`). These bugs caused observable failures when using Turbopack with filesystem caching (`--cache` / persistent cache): - `"Cell no longer exists"` panics/errors on incremental rebuilds after a task error. - Hangs on `stop_and_wait` during dev server shutdown. - Stale `"was canceled"` errors persisted in the cache breaking subsequent builds until the cache is cleared. Changes are in `turbopack/crates/turbo-tasks-backend/src/backend/`: **`mod.rs`:** - Guard the `cell_type_max_index` update block inside `if result.is_ok()` to skip it on error, with a cross-reference comment to `task_execution_completed_cleanup` (which similarly skips cell data removal on error — the two must stay in sync). - Move the `is_cancelled` bail in `try_read_task_cell` before the `listen_to_cell` call to avoid inserting phantom `InProgressCellState` events that would never be notified. - In `task_execution_canceled`: switch to `TaskDataCategory::All` (needed for dirty state metadata access), notify all pending in-progress cell events, and mark the task as `SessionDependent` dirty via the new helper. - In `task_execution_completed_finish`: replace ~77 lines of inline dirty state logic with a call to `task.update_dirty_state(new_dirtyness)`, preserving the `all_clean_event` post-processing and the `dirty_changed` variable under `#[cfg(feature = "verify_determinism")]`. **`operation/mod.rs`:** - Add `update_dirty_state` default method on `TaskGuard` trait (~60 lines), co-located with the existing `dirty_state()` reader. Takes `Option<Dirtyness>`, applies the transition, builds `ComputeDirtyAndCleanUpdate`, and returns `(Option<AggregationUpdateJob>, ComputeDirtyAndCleanUpdateResult)`. - Add `ComputeDirtyAndCleanUpdateResult` to the public re-exports. --------- Co-authored-by: Tobias Koppers <[email protected]> Co-authored-by: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What?
Bug fixes and a refactoring in
turbo-tasks-backendtargeting stability issues that surface when filesystem caching is enabled:Preserve
cell_type_max_indexon task error — when a task fails partway through execution,cell_countersonly reflects the partially-executed state. Previously,cell_type_max_indexwas updated from these incomplete counters, which removed entries for cell types not yet encountered. This caused"Cell no longer exists"hard errors for tasks that still held dependencies on those cells. The fix skips thecell_type_max_indexupdate on error, keeping it consistent with the preserved cell data (which already wasn't cleared on error).This bug manifested specifically with
serialization = "hash"cell types (e.g.FileContent), where cell data is transient and readers fall back tocell_type_max_indexto decide whether to schedule recomputation.Fix shutdown hang and cache poisoning for cancelled tasks — three related fixes for tasks cancelled during shutdown:
task_execution_cancelednow drains and notifies allInProgressCellStateevents, preventingstop_and_waitfrom hanging on foreground jobs waiting on cells that will never be filled.try_read_task_cellbails early (before callinglisten_to_cell) when a task is inCanceledstate, avoiding pointless listener registrations that would never resolve."was canceled"errors get persisted as task output and break subsequent builds. The session-dependent dirty flag causes the task to re-execute in the next session, invalidating stale dependents.Extract
update_dirty_statehelper onTaskGuard— the "read old dirty state → apply new state → propagate viaComputeDirtyAndCleanUpdate" pattern was duplicated betweentask_execution_canceledandtask_execution_completed_finish. The newupdate_dirty_statedefault method onTaskGuardhandles both transitions (toSessionDependentor toNone) and returns the aggregation job +ComputeDirtyAndCleanUpdateResultfor callers that need post-processing (e.g. firing theall_clean_event).Why?
These bugs caused observable failures when using Turbopack with filesystem caching (
--cache/ persistent cache):"Cell no longer exists"panics/errors on incremental rebuilds after a task error.stop_and_waitduring dev server shutdown."was canceled"errors persisted in the cache breaking subsequent builds until the cache is cleared.How?
Changes are in
turbopack/crates/turbo-tasks-backend/src/backend/:mod.rs:cell_type_max_indexupdate block insideif result.is_ok()to skip it on error, with a cross-reference comment totask_execution_completed_cleanup(which similarly skips cell data removal on error — the two must stay in sync).is_cancelledbail intry_read_task_cellbefore thelisten_to_cellcall to avoid inserting phantomInProgressCellStateevents that would never be notified.task_execution_canceled: switch toTaskDataCategory::All(needed for dirty state metadata access), notify all pending in-progress cell events, and mark the task asSessionDependentdirty via the new helper.task_execution_completed_finish: replace ~77 lines of inline dirty state logic with a call totask.update_dirty_state(new_dirtyness), preserving theall_clean_eventpost-processing and thedirty_changedvariable under#[cfg(feature = "verify_determinism")].operation/mod.rs:update_dirty_statedefault method onTaskGuardtrait (~60 lines), co-located with the existingdirty_state()reader. TakesOption<Dirtyness>, applies the transition, buildsComputeDirtyAndCleanUpdate, and returns(Option<AggregationUpdateJob>, ComputeDirtyAndCleanUpdateResult).ComputeDirtyAndCleanUpdateResultto the public re-exports.