Skip to content

turbo-tasks-backend: stability fixes for task cancellation and error handling#92254

Merged
sokra merged 7 commits intocanaryfrom
sokra/stability-fixes
Apr 3, 2026
Merged

turbo-tasks-backend: stability fixes for task cancellation and error handling#92254
sokra merged 7 commits intocanaryfrom
sokra/stability-fixes

Conversation

@sokra
Copy link
Copy Markdown
Member

@sokra sokra commented Apr 2, 2026

What?

Bug fixes and a refactoring in turbo-tasks-backend targeting stability issues that surface when filesystem caching is enabled:

  1. Preserve cell_type_max_index on task error — when a task fails partway through execution, cell_counters only reflects the partially-executed state. Previously, cell_type_max_index was updated from these incomplete counters, which removed entries for cell types not yet encountered. This caused "Cell no longer exists" hard errors for tasks that still held dependencies on those cells. The fix skips the cell_type_max_index update on error, keeping it consistent with the preserved cell data (which already wasn't cleared on error).

    This bug manifested specifically with serialization = "hash" cell types (e.g. FileContent), where cell data is transient and readers fall back to cell_type_max_index to decide whether to schedule recomputation.

  2. Fix shutdown hang and cache poisoning for cancelled tasks — three related fixes for tasks cancelled during shutdown:

    • task_execution_canceled now drains and notifies all InProgressCellState events, preventing stop_and_wait from hanging on foreground jobs waiting on cells that will never be filled.
    • try_read_task_cell bails early (before calling listen_to_cell) when a task is in Canceled state, avoiding pointless listener registrations that would never resolve.
    • Cancelled tasks are marked as session-dependent dirty, preventing cache poisoning where "was canceled" errors get persisted as task output and break subsequent builds. The session-dependent dirty flag causes the task to re-execute in the next session, invalidating stale dependents.
  3. Extract update_dirty_state helper on TaskGuard — the "read old dirty state → apply new state → propagate via ComputeDirtyAndCleanUpdate" pattern was duplicated between task_execution_canceled and task_execution_completed_finish. The new update_dirty_state default method on TaskGuard handles both transitions (to SessionDependent or to None) and returns the aggregation job + ComputeDirtyAndCleanUpdateResult for callers that need post-processing (e.g. firing the all_clean_event).

Why?

These bugs caused observable failures when using Turbopack with filesystem caching (--cache / persistent cache):

  • "Cell no longer exists" panics/errors on incremental rebuilds after a task error.
  • Hangs on stop_and_wait during dev server shutdown.
  • Stale "was canceled" errors persisted in the cache breaking subsequent builds until the cache is cleared.

How?

Changes are in turbopack/crates/turbo-tasks-backend/src/backend/:

mod.rs:

  • Guard the cell_type_max_index update block inside if result.is_ok() to skip it on error, with a cross-reference comment to task_execution_completed_cleanup (which similarly skips cell data removal on error — the two must stay in sync).
  • Move the is_cancelled bail in try_read_task_cell before the listen_to_cell call to avoid inserting phantom InProgressCellState events that would never be notified.
  • In task_execution_canceled: switch to TaskDataCategory::All (needed for dirty state metadata access), notify all pending in-progress cell events, and mark the task as SessionDependent dirty via the new helper.
  • In task_execution_completed_finish: replace ~77 lines of inline dirty state logic with a call to task.update_dirty_state(new_dirtyness), preserving the all_clean_event post-processing and the dirty_changed variable under #[cfg(feature = "verify_determinism")].

operation/mod.rs:

  • Add update_dirty_state default method on TaskGuard trait (~60 lines), co-located with the existing dirty_state() reader. Takes Option<Dirtyness>, applies the transition, builds ComputeDirtyAndCleanUpdate, and returns (Option<AggregationUpdateJob>, ComputeDirtyAndCleanUpdateResult).
  • Add ComputeDirtyAndCleanUpdateResult to the public re-exports.

sokra and others added 2 commits April 2, 2026 11:28
When a task fails partway through execution (before creating all cells
from its previous run), cell_counters only reflects the partially-executed
state. Updating cell_type_max_index from these partial counters removes
entries for cell types not yet encountered, causing "Cell no longer exists"
errors for tasks that still hold cell dependencies from the previous run.

This mirrors the existing behavior of task_execution_completed_cleanup,
which already skips removal of cell data when a task errors. Now
cell_type_max_index is kept consistent with the preserved cell data.

The bug manifested with serialization = "hash" types (e.g. FileContent)
where cell data is transient: readers fall back to cell_type_max_index
to determine whether to schedule recomputation, so a stale None there
caused a hard "no longer exists" error instead of a retry.

Co-Authored-By: Claude <[email protected]>
…led tasks

Three fixes for issues that occur when tasks are cancelled during shutdown
with filesystem caching enabled:

1. Notify in_progress_cells on cancellation: task_execution_canceled now
   drains and notifies all InProgressCellState events, preventing
   stop_and_wait from hanging on foreground jobs that will never complete.

2. Bail early for cancelled tasks in try_read_task_cell: when a task is
   in Canceled state, bail before calling listen_to_cell instead of after,
   avoiding the creation of pointless listeners.

3. Mark cancelled tasks as session-dependent dirty: prevents cache
   poisoning where "was canceled" errors get persisted as task output
   and break subsequent builds. The session-dependent dirty flag ensures
   the task is re-executed in the next session, which invalidates
   dependents and corrects the stale errors.

Co-Authored-By: Claude <[email protected]>
@nextjs-bot nextjs-bot added created-by: Turbopack team PRs by the Turbopack team. Turbopack Related to Turbopack with Next.js. labels Apr 2, 2026
@sokra sokra requested review from bgw and lukesandberg and removed request for bgw April 2, 2026 11:32
@nextjs-bot
Copy link
Copy Markdown
Collaborator

nextjs-bot commented Apr 2, 2026

Tests Passed

Deduplicate the "read old dirty state, apply new state, propagate via
ComputeDirtyAndCleanUpdate" pattern that was duplicated between
task_execution_canceled and task_execution_completed_finish. The new
update_dirty_state default method on TaskGuard handles both transitions
(to SessionDependent or to None) and returns the aggregation job +
result for callers that need post-processing (e.g. all_clean_event).

Co-Authored-By: Claude <[email protected]>
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 2, 2026

Merging this PR will degrade performance by 3.71%

❌ 3 regressed benchmarks
✅ 14 untouched benchmarks
⏩ 3 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation app-page-turbo.runtime.prod.js[full] 620.3 ms 643.9 ms -3.67%
Simulation packages-bundle.js[full] 953.7 ms 987.8 ms -3.45%
Simulation react-dom-client.development.js[full] 391.2 ms 406.3 ms -3.71%

Comparing sokra/stability-fixes (34bde7e) with canary (e464ca3)

Open in CodSpeed

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Deduplicate the "read old dirty state, apply new state, propagate via
ComputeDirtyAndCleanUpdate" pattern that was duplicated between
task_execution_canceled and task_execution_completed_finish. The new
update_dirty_state default method on TaskGuard handles both transitions
(to SessionDependent or to None) and returns the aggregation job +
result for callers that need post-processing (e.g. all_clean_event).

Co-Authored-By: Claude <[email protected]>
@sokra sokra marked this pull request as ready for review April 2, 2026 12:59
@nextjs-bot
Copy link
Copy Markdown
Collaborator

nextjs-bot commented Apr 2, 2026

Stats from current PR

✅ No significant changes detected

📊 All Metrics
📖 Metrics Glossary

Dev Server Metrics:

  • Listen = TCP port starts accepting connections
  • First Request = HTTP server returns successful response
  • Cold = Fresh build (no cache)
  • Warm = With cached build artifacts

Build Metrics:

  • Fresh = Clean build (no .next directory)
  • Cached = With existing .next directory

Change Thresholds:

  • Time: Changes < 50ms AND < 10%, OR < 2% are insignificant
  • Size: Changes < 1KB AND < 1% are insignificant
  • All other changes are flagged to catch regressions

⚡ Dev Server

Metric Canary PR Change Trend
Cold (Listen) 455ms 456ms ▁▁▁▁▁
Cold (Ready in log) 439ms 439ms ▁▁▁▁▁
Cold (First Request) 1.148s 1.148s ▂▁▁▁▁
Warm (Listen) 457ms 457ms ▁▁▁▁▁
Warm (Ready in log) 448ms 443ms ▁▁▁▁▁
Warm (First Request) 350ms 347ms ▁▁▁▁▁
📦 Dev Server (Webpack) (Legacy)

📦 Dev Server (Webpack)

Metric Canary PR Change Trend
Cold (Listen) 455ms 455ms ▁▁█▁▁
Cold (Ready in log) 436ms 436ms ▁▂█▃▁
Cold (First Request) 1.926s 1.934s ▂▃█▅▃
Warm (Listen) 455ms 455ms ▁▁█▁▁
Warm (Ready in log) 434ms 435ms ▁▂█▄▁
Warm (First Request) 1.938s 1.946s ▁▂█▄▂

⚡ Production Builds

Metric Canary PR Change Trend
Fresh Build 3.865s 3.785s ▁▁▁▁▁
Cached Build 3.846s 3.768s ▁▁▁▁▁
📦 Production Builds (Webpack) (Legacy)

📦 Production Builds (Webpack)

Metric Canary PR Change Trend
Fresh Build 14.287s 14.244s ▁▂█▄▁
Cached Build 14.385s 14.381s ▁▂█▃▁
node_modules Size 487 MB 487 MB █████
📦 Bundle Sizes

Bundle Sizes

⚡ Turbopack

Client

Main Bundles
Canary PR Change
0-s9xnn97hhz_.js gzip 155 B N/A -
02fkg8wfh0iju.js gzip 9.19 kB N/A -
050zwt5xh_0tx.js gzip 10.4 kB N/A -
06rvbj82bhyo0.js gzip 13 kB N/A -
087fzjd-gvlzv.js gzip 450 B N/A -
0cz1d0mv5g_q7.js gzip 39.4 kB 39.4 kB
0orypefv0sg4h.js gzip 169 B N/A -
0ppxcl_z43mad.js gzip 8.52 kB N/A -
0zuhf5vf20u7o.js gzip 157 B N/A -
147u02q840nds.js gzip 65.7 kB N/A -
19oha6-znmkcv.js gzip 8.55 kB N/A -
1cwbxstui2kn9.js gzip 163 B N/A -
1elt1qium-r2m.css gzip 115 B 115 B
1ib_961smaomg.js gzip 155 B N/A -
1l1cjhqjxo9jx.js gzip 153 B N/A -
1uy-8i-dql9e6.js gzip 160 B N/A -
2_5rjb7lqxntf.js gzip 221 B 221 B
219prxwxgaalc.js gzip 7.61 kB N/A -
26elcgxnn9zjd.js gzip 8.52 kB N/A -
27tj4drv-jgkz.js gzip 157 B N/A -
2900hudr6gvm0.js gzip 2.28 kB N/A -
2cb31yqfdo74p.js gzip 160 B N/A -
2igqq9sm6suh7.js gzip 70.8 kB N/A -
2lv2js3kmdeho.js gzip 8.48 kB N/A -
2rehygrd36hqv.js gzip 8.58 kB N/A -
2srwswih0m9_h.js gzip 13.3 kB N/A -
3-p9p9mheqhzx.js gzip 8.55 kB N/A -
31030bryqpolg.js gzip 8.53 kB N/A -
31dx5nmrzzuy7.js gzip 225 B N/A -
31yrzpqu3z4sc.js gzip 157 B N/A -
34f2yfxqlulqr.js gzip 157 B N/A -
3925v09gtu-5k.js gzip 49 kB N/A -
39x4zj5mjb4d_.js gzip 9.77 kB N/A -
3g-b30oggx_m7.js gzip 158 B N/A -
3k-48b78ys_vy.js gzip 10.1 kB N/A -
3m7-5rfj0avoz.js gzip 12.9 kB N/A -
3uqce_6sa526g.js gzip 8.47 kB N/A -
3yurjqk-sjs3y.js gzip 1.46 kB N/A -
3zok5oe1_3hw6.js gzip 158 B N/A -
40ybjx9c192n0.js gzip 13.8 kB N/A -
421vzwdt9j1b_.js gzip 5.62 kB N/A -
turbopack-01..9w7z.js gzip 4.18 kB N/A -
turbopack-0a..890a.js gzip 4.18 kB N/A -
turbopack-1_..imuk.js gzip 4.16 kB N/A -
turbopack-13..7s_i.js gzip 4.18 kB N/A -
turbopack-19..bgmw.js gzip 4.18 kB N/A -
turbopack-1j..23e9.js gzip 4.18 kB N/A -
turbopack-2e..uvoo.js gzip 4.18 kB N/A -
turbopack-2i..mit_.js gzip 4.17 kB N/A -
turbopack-2n..-pmd.js gzip 4.18 kB N/A -
turbopack-33..lk1-.js gzip 4.18 kB N/A -
turbopack-3j..2tlj.js gzip 4.18 kB N/A -
turbopack-3u..d9sv.js gzip 4.19 kB N/A -
turbopack-3w..3__b.js gzip 4.18 kB N/A -
turbopack-3z..qvs8.js gzip 4.18 kB N/A -
03dgzoo-qf3sm.js gzip N/A 9.19 kB -
05tx5f25dlivn.js gzip N/A 8.53 kB -
0c7ez6p2qc57f.js gzip N/A 5.62 kB -
0cphuvbg0dzz9.js gzip N/A 157 B -
0duvj3qk5pvgn.js gzip N/A 13.8 kB -
0ewvr7ewqz8nc.js gzip N/A 159 B -
0m-34rm9w_wpm.js gzip N/A 7.6 kB -
0qnwuk92m8i7o.js gzip N/A 10.4 kB -
0r4wrn6n0ue2m.js gzip N/A 8.55 kB -
0rp0fodtbt_6m.js gzip N/A 8.52 kB -
0sfck-km4dl1k.js gzip N/A 8.47 kB -
0x0xuhmxzwkp8.js gzip N/A 8.47 kB -
0xfzafjhtuxd7.js gzip N/A 156 B -
1-wdvgxnzicj7.js gzip N/A 1.46 kB -
11u6nxujb2eg4.js gzip N/A 450 B -
12ai2vqxa656z.js gzip N/A 169 B -
1cmq9a6x44ja1.js gzip N/A 155 B -
1ephae-4kx1h5.js gzip N/A 151 B -
1jv-o1_s-zmua.js gzip N/A 49 kB -
1yp_bv93so5bs.js gzip N/A 156 B -
215vkmyjdlsbw.js gzip N/A 158 B -
2dtcubu8afy2l.js gzip N/A 154 B -
2h1wymo-skeep.js gzip N/A 155 B -
2k9ax08cjl2id.js gzip N/A 12.9 kB -
2lms6k76q5-6m.js gzip N/A 13.3 kB -
2qx4twi9i3xus.js gzip N/A 2.28 kB -
2srnqic6tvxxd.js gzip N/A 8.52 kB -
2twb36c6z0tl-.js gzip N/A 65.7 kB -
2u8halqgb3qy0.js gzip N/A 162 B -
30l7m4nayp73a.js gzip N/A 8.55 kB -
38rr7d3kfutni.js gzip N/A 13 kB -
3h_ecpiaatwgc.js gzip N/A 10.1 kB -
3itjxofjxnpib.js gzip N/A 70.8 kB -
3ity0aahajapd.js gzip N/A 225 B -
3kunsrmuwqgu9.js gzip N/A 154 B -
3wrhpuc-j1aw9.js gzip N/A 9.77 kB -
41wnj6hox-zd9.js gzip N/A 156 B -
43mlw9dy_8f02.js gzip N/A 8.58 kB -
turbopack-0y..e-xt.js gzip N/A 4.18 kB -
turbopack-13..sbve.js gzip N/A 4.18 kB -
turbopack-1h..efwi.js gzip N/A 4.18 kB -
turbopack-1m..21n4.js gzip N/A 4.18 kB -
turbopack-2d..e9pt.js gzip N/A 4.19 kB -
turbopack-2u..2svc.js gzip N/A 4.18 kB -
turbopack-2v..1sl3.js gzip N/A 4.17 kB -
turbopack-2w..n231.js gzip N/A 4.18 kB -
turbopack-2z..mki1.js gzip N/A 4.18 kB -
turbopack-34..l40h.js gzip N/A 4.18 kB -
turbopack-3i..bws0.js gzip N/A 4.18 kB -
turbopack-3z..x6_r.js gzip N/A 4.16 kB -
turbopack-40..ejeg.js gzip N/A 4.18 kB -
turbopack-42..8s9b.js gzip N/A 4.18 kB -
Total 464 kB 464 kB ✅ -53 B

Server

Middleware
Canary PR Change
middleware-b..fest.js gzip 721 B 716 B
Total 721 B 716 B ✅ -5 B
Build Details
Build Manifests
Canary PR Change
_buildManifest.js gzip 435 B 432 B
Total 435 B 432 B ✅ -3 B

📦 Webpack

Client

Main Bundles
Canary PR Change
5528-HASH.js gzip 5.54 kB N/A -
6280-HASH.js gzip 60.7 kB N/A -
6335.HASH.js gzip 169 B N/A -
912-HASH.js gzip 4.59 kB N/A -
e8aec2e4-HASH.js gzip 62.8 kB N/A -
framework-HASH.js gzip 59.7 kB 59.7 kB
main-app-HASH.js gzip 256 B 253 B 🟢 3 B (-1%)
main-HASH.js gzip 39.3 kB 39.2 kB
webpack-HASH.js gzip 1.68 kB 1.68 kB
262-HASH.js gzip N/A 4.59 kB -
2889.HASH.js gzip N/A 169 B -
5602-HASH.js gzip N/A 5.55 kB -
6948ada0-HASH.js gzip N/A 62.8 kB -
9544-HASH.js gzip N/A 61.4 kB -
Total 235 kB 235 kB ⚠️ +647 B
Polyfills
Canary PR Change
polyfills-HASH.js gzip 39.4 kB 39.4 kB
Total 39.4 kB 39.4 kB
Pages
Canary PR Change
_app-HASH.js gzip 194 B 194 B
_error-HASH.js gzip 183 B 180 B 🟢 3 B (-2%)
css-HASH.js gzip 331 B 330 B
dynamic-HASH.js gzip 1.81 kB 1.81 kB
edge-ssr-HASH.js gzip 256 B 256 B
head-HASH.js gzip 351 B 352 B
hooks-HASH.js gzip 384 B 383 B
image-HASH.js gzip 580 B 581 B
index-HASH.js gzip 260 B 260 B
link-HASH.js gzip 2.51 kB 2.51 kB
routerDirect..HASH.js gzip 320 B 319 B
script-HASH.js gzip 386 B 386 B
withRouter-HASH.js gzip 315 B 315 B
1afbb74e6ecf..834.css gzip 106 B 106 B
Total 7.98 kB 7.98 kB ✅ -1 B

Server

Edge SSR
Canary PR Change
edge-ssr.js gzip 125 kB 126 kB
page.js gzip 272 kB 272 kB
Total 398 kB 398 kB ⚠️ +178 B
Middleware
Canary PR Change
middleware-b..fest.js gzip 618 B 615 B
middleware-r..fest.js gzip 156 B 155 B
middleware.js gzip 44.2 kB 44.1 kB
edge-runtime..pack.js gzip 842 B 842 B
Total 45.8 kB 45.7 kB ✅ -112 B
Build Details
Build Manifests
Canary PR Change
_buildManifest.js gzip 715 B 718 B
Total 715 B 718 B ⚠️ +3 B
Build Cache
Canary PR Change
0.pack gzip 4.36 MB 4.35 MB 🟢 5.87 kB (0%)
index.pack gzip 111 kB 110 kB
index.pack.old gzip 111 kB 111 kB
Total 4.58 MB 4.57 MB ✅ -5.94 kB

🔄 Shared (bundler-independent)

Runtimes
Canary PR Change
app-page-exp...dev.js gzip 341 kB 341 kB
app-page-exp..prod.js gzip 189 kB 189 kB
app-page-tur...dev.js gzip 341 kB 341 kB
app-page-tur..prod.js gzip 189 kB 189 kB
app-page-tur...dev.js gzip 337 kB 337 kB
app-page-tur..prod.js gzip 187 kB 187 kB
app-page.run...dev.js gzip 338 kB 338 kB
app-page.run..prod.js gzip 187 kB 187 kB
app-route-ex...dev.js gzip 76.6 kB 76.6 kB
app-route-ex..prod.js gzip 52.2 kB 52.2 kB
app-route-tu...dev.js gzip 76.6 kB 76.6 kB
app-route-tu..prod.js gzip 52.2 kB 52.2 kB
app-route-tu...dev.js gzip 76.2 kB 76.2 kB
app-route-tu..prod.js gzip 52 kB 52 kB
app-route.ru...dev.js gzip 76.2 kB 76.2 kB
app-route.ru..prod.js gzip 52 kB 52 kB
dist_client_...dev.js gzip 324 B 324 B
dist_client_...dev.js gzip 326 B 326 B
dist_client_...dev.js gzip 318 B 318 B
dist_client_...dev.js gzip 317 B 317 B
pages-api-tu...dev.js gzip 43.8 kB 43.8 kB
pages-api-tu..prod.js gzip 33.4 kB 33.4 kB
pages-api.ru...dev.js gzip 43.8 kB 43.8 kB
pages-api.ru..prod.js gzip 33.4 kB 33.4 kB
pages-turbo....dev.js gzip 53.2 kB 53.2 kB
pages-turbo...prod.js gzip 39 kB 39 kB
pages.runtim...dev.js gzip 53.2 kB 53.2 kB
pages.runtim..prod.js gzip 39 kB 39 kB
server.runti..prod.js gzip 62.8 kB 62.8 kB
Total 3.03 MB 3.03 MB ✅ -2 B
📝 Changed Files (2 files)

Files with changes:

  • pages-api-tu..time.prod.js
  • pages-turbo...time.prod.js
View diffs
pages-api-tu..time.prod.js

Diff too large to display

pages-turbo...time.prod.js

Diff too large to display

📎 Tarball URL
https://vercel-packages.vercel.app/next/commits/34bde7e027793efd3cbaa9b7ffcb65750568c61f/next

sokra and others added 3 commits April 2, 2026 13:28
- Hoist set_in_progress(Canceled) + drop(task) out of if/else branches
  in task_execution_canceled to reduce duplication
- Move ComputeDirtyAndCleanUpdate import from function body to module
  level in operation/mod.rs
- Add cross-reference comments between cell_type_max_index skip in
  task_execution_completed_prepare and cell data skip in
  task_execution_completed_cleanup

Co-Authored-By: Claude <[email protected]>
The clean transition notification is a direct consequence of the dirty
state update. Moving it into the helper ensures any future caller that
transitions a task to clean will fire the event. This also simplifies
the return type from (Option<Job>, Result) to just Option<Job>.

Co-Authored-By: Claude <[email protected]>
- Collapse nested if blocks into `if a && let Some(x) = expr` (collapsible_if)
- Remove unnecessary let binding before return (let_and_return)

Co-Authored-By: Claude <[email protected]>
Comment thread turbopack/crates/turbo-tasks-backend/src/backend/operation/mod.rs
@sokra sokra merged commit be3ee02 into canary Apr 3, 2026
287 of 310 checks passed
@sokra sokra deleted the sokra/stability-fixes branch April 3, 2026 08:05
sokra added a commit that referenced this pull request Apr 3, 2026
…handling (#92254)

### What?

Bug fixes and a refactoring in `turbo-tasks-backend` targeting stability
issues that surface when filesystem caching is enabled:

1. **Preserve `cell_type_max_index` on task error** — when a task fails
partway through execution, `cell_counters` only reflects the
partially-executed state. Previously, `cell_type_max_index` was updated
from these incomplete counters, which removed entries for cell types not
yet encountered. This caused `"Cell no longer exists"` hard errors for
tasks that still held dependencies on those cells. The fix skips the
`cell_type_max_index` update on error, keeping it consistent with the
preserved cell data (which already wasn't cleared on error).

This bug manifested specifically with `serialization = "hash"` cell
types (e.g. `FileContent`), where cell data is transient and readers
fall back to `cell_type_max_index` to decide whether to schedule
recomputation.

2. **Fix shutdown hang and cache poisoning for cancelled tasks** — three
related fixes for tasks cancelled during shutdown:
- `task_execution_canceled` now drains and notifies all
`InProgressCellState` events, preventing `stop_and_wait` from hanging on
foreground jobs waiting on cells that will never be filled.
- `try_read_task_cell` bails early (before calling `listen_to_cell`)
when a task is in `Canceled` state, avoiding pointless listener
registrations that would never resolve.
- Cancelled tasks are marked as session-dependent dirty, preventing
cache poisoning where `"was canceled"` errors get persisted as task
output and break subsequent builds. The session-dependent dirty flag
causes the task to re-execute in the next session, invalidating stale
dependents.

3. **Extract `update_dirty_state` helper on `TaskGuard`** — the "read
old dirty state → apply new state → propagate via
`ComputeDirtyAndCleanUpdate`" pattern was duplicated between
`task_execution_canceled` and `task_execution_completed_finish`. The new
`update_dirty_state` default method on `TaskGuard` handles both
transitions (to `SessionDependent` or to `None`) and returns the
aggregation job + `ComputeDirtyAndCleanUpdateResult` for callers that
need post-processing (e.g. firing the `all_clean_event`).

### Why?

These bugs caused observable failures when using Turbopack with
filesystem caching (`--cache` / persistent cache):

- `"Cell no longer exists"` panics/errors on incremental rebuilds after
a task error.
- Hangs on `stop_and_wait` during dev server shutdown.
- Stale `"was canceled"` errors persisted in the cache breaking
subsequent builds until the cache is cleared.

### How?

Changes are in `turbopack/crates/turbo-tasks-backend/src/backend/`:

**`mod.rs`:**
- Guard the `cell_type_max_index` update block inside `if
result.is_ok()` to skip it on error, with a cross-reference comment to
`task_execution_completed_cleanup` (which similarly skips cell data
removal on error — the two must stay in sync).
- Move the `is_cancelled` bail in `try_read_task_cell` before the
`listen_to_cell` call to avoid inserting phantom `InProgressCellState`
events that would never be notified.
- In `task_execution_canceled`: switch to `TaskDataCategory::All`
(needed for dirty state metadata access), notify all pending in-progress
cell events, and mark the task as `SessionDependent` dirty via the new
helper.
- In `task_execution_completed_finish`: replace ~77 lines of inline
dirty state logic with a call to
`task.update_dirty_state(new_dirtyness)`, preserving the
`all_clean_event` post-processing and the `dirty_changed` variable under
`#[cfg(feature = "verify_determinism")]`.

**`operation/mod.rs`:**
- Add `update_dirty_state` default method on `TaskGuard` trait (~60
lines), co-located with the existing `dirty_state()` reader. Takes
`Option<Dirtyness>`, applies the transition, builds
`ComputeDirtyAndCleanUpdate`, and returns
`(Option<AggregationUpdateJob>, ComputeDirtyAndCleanUpdateResult)`.
- Add `ComputeDirtyAndCleanUpdateResult` to the public re-exports.

---------

Co-authored-by: Tobias Koppers <[email protected]>
Co-authored-by: Claude <[email protected]>
eps1lon pushed a commit that referenced this pull request Apr 7, 2026
…handling (#92254)

### What?

Bug fixes and a refactoring in `turbo-tasks-backend` targeting stability
issues that surface when filesystem caching is enabled:

1. **Preserve `cell_type_max_index` on task error** — when a task fails
partway through execution, `cell_counters` only reflects the
partially-executed state. Previously, `cell_type_max_index` was updated
from these incomplete counters, which removed entries for cell types not
yet encountered. This caused `"Cell no longer exists"` hard errors for
tasks that still held dependencies on those cells. The fix skips the
`cell_type_max_index` update on error, keeping it consistent with the
preserved cell data (which already wasn't cleared on error).

This bug manifested specifically with `serialization = "hash"` cell
types (e.g. `FileContent`), where cell data is transient and readers
fall back to `cell_type_max_index` to decide whether to schedule
recomputation.

2. **Fix shutdown hang and cache poisoning for cancelled tasks** — three
related fixes for tasks cancelled during shutdown:
- `task_execution_canceled` now drains and notifies all
`InProgressCellState` events, preventing `stop_and_wait` from hanging on
foreground jobs waiting on cells that will never be filled.
- `try_read_task_cell` bails early (before calling `listen_to_cell`)
when a task is in `Canceled` state, avoiding pointless listener
registrations that would never resolve.
- Cancelled tasks are marked as session-dependent dirty, preventing
cache poisoning where `"was canceled"` errors get persisted as task
output and break subsequent builds. The session-dependent dirty flag
causes the task to re-execute in the next session, invalidating stale
dependents.

3. **Extract `update_dirty_state` helper on `TaskGuard`** — the "read
old dirty state → apply new state → propagate via
`ComputeDirtyAndCleanUpdate`" pattern was duplicated between
`task_execution_canceled` and `task_execution_completed_finish`. The new
`update_dirty_state` default method on `TaskGuard` handles both
transitions (to `SessionDependent` or to `None`) and returns the
aggregation job + `ComputeDirtyAndCleanUpdateResult` for callers that
need post-processing (e.g. firing the `all_clean_event`).

### Why?

These bugs caused observable failures when using Turbopack with
filesystem caching (`--cache` / persistent cache):

- `"Cell no longer exists"` panics/errors on incremental rebuilds after
a task error.
- Hangs on `stop_and_wait` during dev server shutdown.
- Stale `"was canceled"` errors persisted in the cache breaking
subsequent builds until the cache is cleared.

### How?

Changes are in `turbopack/crates/turbo-tasks-backend/src/backend/`:

**`mod.rs`:**
- Guard the `cell_type_max_index` update block inside `if
result.is_ok()` to skip it on error, with a cross-reference comment to
`task_execution_completed_cleanup` (which similarly skips cell data
removal on error — the two must stay in sync).
- Move the `is_cancelled` bail in `try_read_task_cell` before the
`listen_to_cell` call to avoid inserting phantom `InProgressCellState`
events that would never be notified.
- In `task_execution_canceled`: switch to `TaskDataCategory::All`
(needed for dirty state metadata access), notify all pending in-progress
cell events, and mark the task as `SessionDependent` dirty via the new
helper.
- In `task_execution_completed_finish`: replace ~77 lines of inline
dirty state logic with a call to
`task.update_dirty_state(new_dirtyness)`, preserving the
`all_clean_event` post-processing and the `dirty_changed` variable under
`#[cfg(feature = "verify_determinism")]`.

**`operation/mod.rs`:**
- Add `update_dirty_state` default method on `TaskGuard` trait (~60
lines), co-located with the existing `dirty_state()` reader. Takes
`Option<Dirtyness>`, applies the transition, builds
`ComputeDirtyAndCleanUpdate`, and returns
`(Option<AggregationUpdateJob>, ComputeDirtyAndCleanUpdateResult)`.
- Add `ComputeDirtyAndCleanUpdateResult` to the public re-exports.

---------

Co-authored-by: Tobias Koppers <[email protected]>
Co-authored-by: Claude <[email protected]>
eps1lon pushed a commit that referenced this pull request Apr 7, 2026
…handling (#92254)

### What?

Bug fixes and a refactoring in `turbo-tasks-backend` targeting stability
issues that surface when filesystem caching is enabled:

1. **Preserve `cell_type_max_index` on task error** — when a task fails
partway through execution, `cell_counters` only reflects the
partially-executed state. Previously, `cell_type_max_index` was updated
from these incomplete counters, which removed entries for cell types not
yet encountered. This caused `"Cell no longer exists"` hard errors for
tasks that still held dependencies on those cells. The fix skips the
`cell_type_max_index` update on error, keeping it consistent with the
preserved cell data (which already wasn't cleared on error).

This bug manifested specifically with `serialization = "hash"` cell
types (e.g. `FileContent`), where cell data is transient and readers
fall back to `cell_type_max_index` to decide whether to schedule
recomputation.

2. **Fix shutdown hang and cache poisoning for cancelled tasks** — three
related fixes for tasks cancelled during shutdown:
- `task_execution_canceled` now drains and notifies all
`InProgressCellState` events, preventing `stop_and_wait` from hanging on
foreground jobs waiting on cells that will never be filled.
- `try_read_task_cell` bails early (before calling `listen_to_cell`)
when a task is in `Canceled` state, avoiding pointless listener
registrations that would never resolve.
- Cancelled tasks are marked as session-dependent dirty, preventing
cache poisoning where `"was canceled"` errors get persisted as task
output and break subsequent builds. The session-dependent dirty flag
causes the task to re-execute in the next session, invalidating stale
dependents.

3. **Extract `update_dirty_state` helper on `TaskGuard`** — the "read
old dirty state → apply new state → propagate via
`ComputeDirtyAndCleanUpdate`" pattern was duplicated between
`task_execution_canceled` and `task_execution_completed_finish`. The new
`update_dirty_state` default method on `TaskGuard` handles both
transitions (to `SessionDependent` or to `None`) and returns the
aggregation job + `ComputeDirtyAndCleanUpdateResult` for callers that
need post-processing (e.g. firing the `all_clean_event`).

### Why?

These bugs caused observable failures when using Turbopack with
filesystem caching (`--cache` / persistent cache):

- `"Cell no longer exists"` panics/errors on incremental rebuilds after
a task error.
- Hangs on `stop_and_wait` during dev server shutdown.
- Stale `"was canceled"` errors persisted in the cache breaking
subsequent builds until the cache is cleared.

### How?

Changes are in `turbopack/crates/turbo-tasks-backend/src/backend/`:

**`mod.rs`:**
- Guard the `cell_type_max_index` update block inside `if
result.is_ok()` to skip it on error, with a cross-reference comment to
`task_execution_completed_cleanup` (which similarly skips cell data
removal on error — the two must stay in sync).
- Move the `is_cancelled` bail in `try_read_task_cell` before the
`listen_to_cell` call to avoid inserting phantom `InProgressCellState`
events that would never be notified.
- In `task_execution_canceled`: switch to `TaskDataCategory::All`
(needed for dirty state metadata access), notify all pending in-progress
cell events, and mark the task as `SessionDependent` dirty via the new
helper.
- In `task_execution_completed_finish`: replace ~77 lines of inline
dirty state logic with a call to
`task.update_dirty_state(new_dirtyness)`, preserving the
`all_clean_event` post-processing and the `dirty_changed` variable under
`#[cfg(feature = "verify_determinism")]`.

**`operation/mod.rs`:**
- Add `update_dirty_state` default method on `TaskGuard` trait (~60
lines), co-located with the existing `dirty_state()` reader. Takes
`Option<Dirtyness>`, applies the transition, builds
`ComputeDirtyAndCleanUpdate`, and returns
`(Option<AggregationUpdateJob>, ComputeDirtyAndCleanUpdateResult)`.
- Add `ComputeDirtyAndCleanUpdateResult` to the public re-exports.

---------

Co-authored-by: Tobias Koppers <[email protected]>
Co-authored-by: Claude <[email protected]>
sokra added a commit that referenced this pull request Apr 7, 2026
#92108)

### What?

Re-lands #91576 ("turbo-tasks: add hashed cell mode for hash-based
change detection without cell data"), which was reverted in #92103 due
to a `FATAL` crash in the `filesystem-cache` test suite.

Includes a bug fix on top: in `task_execution_completed_prepare`, skip
updating `cell_type_max_index` when the task completed with an error.

Also adds a `CellHash = [u8; 16]` type alias (requested in review) used
throughout the hash pipeline.

### Why?

**The original feature** (`serialization = "hash"` on `FileContent` and
`Code`) stores a hash of the cell data instead of the full serialized
value. On session restore, the hash is used to detect whether cell
content has changed without needing the full data in memory. This avoids
a large persistent cache size increase.

**The bug** that caused the revert: When a task fails partway through
re-execution (before recreating all the cells from its previous run),
`cell_counters` only reflects the partially-executed state. The old code
used those partial counters to update `cell_type_max_index`, removing
entries for cell types that were not yet created at the point of
failure. This caused downstream tasks that still held cell dependencies
from the previous successful run to hit a hard "Cell no longer exists"
error.

**Concrete failure path** in `filesystem-cache rename app page` test:

1. `get_app_page_entry` runs for `/remove-me/page`, creating two
`FileContent` cells (indices 0 and 1). `cell_type_max_index[FileContent]
= 2` is persisted.
2. The folder is renamed (`app/remove-me` → `app/add-me`), dirtying the
task.
3. On re-execution, `get_app_page_entry` fails at `config.await?` (the
loader tree errors because the directory is gone) — before any
`FileContent::cell()` calls.
4. `cell_counters` has no `FileContent` entry → old code removed
`cell_type_max_index[FileContent]`.
5. The `parse` task tries to read `FileContent` cell 1 from
`get_app_page_entry` → `cell_type_max_index` is `None` → **"Cell no
longer exists" panic → FATAL error**.

**Why it didn't crash before** `serialization = "hash"`: `FileContent`
was previously serializable, so `parse` read stale cell data directly
from `persistent_cell_data`, which `task_execution_completed_cleanup`
already preserves on error. With `serialization = "hash"`, data is
transient — readers fall back to `cell_type_max_index` for range
validation, where a stale `None` caused the crash.

### How?

#### Core feature: `serialization = "hash"` cell mode

- New `SerializationMode::Hash` variant in `turbo-tasks-macros` — marks
a value type as non-serializable but stores a `DeterministicHash` of the
cell data for change detection.
- `VcCellHashedCompareMode<T>` cell mode: compares values via
`PartialEq` when available, falls back to hash comparison when transient
data has been evicted.
- `hashed_compare_and_update` /
`hashed_compare_and_update_with_shared_reference` on `CurrentCellRef`
compute and pass content hashes through the update pipeline.
- Backend `update_cell` uses hash-based comparison to skip invalidation
when the old cell data is unavailable but the hash matches.
- `cell_data_hash: AutoMap<CellId, CellHash>` field in task storage
persists hashes across sessions.
- Stale `cell_data_hash` entries are cleaned up in
`task_execution_completed_cleanup` alongside cell data removal.
- `CellHash = [u8; 16]` type alias keeps alignment at 1 byte to avoid
padding growth in `AutoMap`/`LazyField` enum variants.
- Hash bytes use little-endian encoding (`to_le_bytes`) for
cross-platform cache portability.

#### Bug fix: preserve `cell_type_max_index` on task error

In `task_execution_completed_prepare`, guard the `cell_type_max_index`
update block with `if result.is_ok()`. This mirrors the existing
`task_execution_completed_cleanup` behavior that already skips cell data
removal when `is_error` is true, keeping `cell_type_max_index`
consistent with the preserved transient cell data.

#### Applied to `FileContent` and `Code`

- `FileContent` uses `serialization = "hash"` — full content is
persisted via a separate `PersistedFileContent` type when needed (e.g.,
in `DiskFileSystem::write`).
- `Code` uses `serialization = "hash"` with `Arc<Vec<Mapping>>` for
cheap cloning. `Code::cell_persisted()` creates a `PersistedCode` cell
directly and returns `Vc<Code>` via `PersistedCode::to_code()`, avoiding
an intermediate hash-mode cell.

#### Other improvements

- `DeterministicHash` impls for `SmallVec` and `()`.
- `Xxh3Hash128Hasher::finish_bytes()` method returning `[u8; 16]`.
- `hash = "manual"` option on `#[turbo_tasks::value]` to opt out of
auto-deriving `DeterministicHash`.

**Note:** The shutdown hang and cache poisoning fixes that were
previously on this branch have been merged separately via #92254.

### Test plan

- [x] `test/e2e/filesystem-cache/filesystem-cache.test.ts` passes (all
17 tests)
- [x] New
`turbopack/crates/turbo-tasks-backend/tests/hashed_cell_mode.rs`
integration test verifies hash-based change detection: value changes
trigger invalidation, equal values (same hash) do not
- [x] `cargo check` passes for `turbo-tasks`, `turbo-tasks-backend`,
`turbo-tasks-fs`, `turbopack-core`, `turbopack-ecmascript`
- [x] CI green (attempt 2)

<!-- NEXT_JS_LLM_PR -->

---------

Co-authored-by: Tobias Koppers <[email protected]>
Co-authored-by: Claude <[email protected]>
lukesandberg pushed a commit that referenced this pull request Apr 7, 2026
…handling (#92254)

Bug fixes and a refactoring in `turbo-tasks-backend` targeting stability
issues that surface when filesystem caching is enabled:

1. **Preserve `cell_type_max_index` on task error** — when a task fails
partway through execution, `cell_counters` only reflects the
partially-executed state. Previously, `cell_type_max_index` was updated
from these incomplete counters, which removed entries for cell types not
yet encountered. This caused `"Cell no longer exists"` hard errors for
tasks that still held dependencies on those cells. The fix skips the
`cell_type_max_index` update on error, keeping it consistent with the
preserved cell data (which already wasn't cleared on error).

This bug manifested specifically with `serialization = "hash"` cell
types (e.g. `FileContent`), where cell data is transient and readers
fall back to `cell_type_max_index` to decide whether to schedule
recomputation.

2. **Fix shutdown hang and cache poisoning for cancelled tasks** — three
related fixes for tasks cancelled during shutdown:
- `task_execution_canceled` now drains and notifies all
`InProgressCellState` events, preventing `stop_and_wait` from hanging on
foreground jobs waiting on cells that will never be filled.
- `try_read_task_cell` bails early (before calling `listen_to_cell`)
when a task is in `Canceled` state, avoiding pointless listener
registrations that would never resolve.
- Cancelled tasks are marked as session-dependent dirty, preventing
cache poisoning where `"was canceled"` errors get persisted as task
output and break subsequent builds. The session-dependent dirty flag
causes the task to re-execute in the next session, invalidating stale
dependents.

3. **Extract `update_dirty_state` helper on `TaskGuard`** — the "read
old dirty state → apply new state → propagate via
`ComputeDirtyAndCleanUpdate`" pattern was duplicated between
`task_execution_canceled` and `task_execution_completed_finish`. The new
`update_dirty_state` default method on `TaskGuard` handles both
transitions (to `SessionDependent` or to `None`) and returns the
aggregation job + `ComputeDirtyAndCleanUpdateResult` for callers that
need post-processing (e.g. firing the `all_clean_event`).

These bugs caused observable failures when using Turbopack with
filesystem caching (`--cache` / persistent cache):

- `"Cell no longer exists"` panics/errors on incremental rebuilds after
a task error.
- Hangs on `stop_and_wait` during dev server shutdown.
- Stale `"was canceled"` errors persisted in the cache breaking
subsequent builds until the cache is cleared.

Changes are in `turbopack/crates/turbo-tasks-backend/src/backend/`:

**`mod.rs`:**
- Guard the `cell_type_max_index` update block inside `if
result.is_ok()` to skip it on error, with a cross-reference comment to
`task_execution_completed_cleanup` (which similarly skips cell data
removal on error — the two must stay in sync).
- Move the `is_cancelled` bail in `try_read_task_cell` before the
`listen_to_cell` call to avoid inserting phantom `InProgressCellState`
events that would never be notified.
- In `task_execution_canceled`: switch to `TaskDataCategory::All`
(needed for dirty state metadata access), notify all pending in-progress
cell events, and mark the task as `SessionDependent` dirty via the new
helper.
- In `task_execution_completed_finish`: replace ~77 lines of inline
dirty state logic with a call to
`task.update_dirty_state(new_dirtyness)`, preserving the
`all_clean_event` post-processing and the `dirty_changed` variable under
`#[cfg(feature = "verify_determinism")]`.

**`operation/mod.rs`:**
- Add `update_dirty_state` default method on `TaskGuard` trait (~60
lines), co-located with the existing `dirty_state()` reader. Takes
`Option<Dirtyness>`, applies the transition, builds
`ComputeDirtyAndCleanUpdate`, and returns
`(Option<AggregationUpdateJob>, ComputeDirtyAndCleanUpdateResult)`.
- Add `ComputeDirtyAndCleanUpdateResult` to the public re-exports.

---------

Co-authored-by: Tobias Koppers <[email protected]>
Co-authored-by: Claude <[email protected]>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

created-by: Turbopack team PRs by the Turbopack team. locked Turbopack Related to Turbopack with Next.js.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants