Fix GC TOCTOU race in collect_inner referent traversal#7511
Fix GC TOCTOU race in collect_inner referent traversal#7511youknowone merged 1 commit intoRustPython:mainfrom
Conversation
Pre-compute referent pointers once per object in step 3 and reuse them in step 4 (BFS reachability). Previously, gc_get_referent_ptrs() was called independently in both steps. If a dict's write lock state changed between the two calls (e.g., held by another thread during one traversal but not the other), the two traversals could return different results. This caused live objects to be incorrectly classified as unreachable and cleared by GC.
📝 WalkthroughWalkthroughThis change optimizes the garbage collection algorithm in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/vm/src/gc_state.rs (1)
465-473: Reduce avoidable cloning in the GC hot path.At Line 473 and Lines 501-504, referent vectors are cloned multiple times. This adds extra allocations during collection and can be trimmed without changing behavior.
♻️ Proposed refactor (same behavior, fewer allocations)
- let mut referents_map: std::collections::HashMap<GcPtr, Vec<NonNull<PyObject>>> = + let mut referents_map: std::collections::HashMap<GcPtr, Vec<NonNull<PyObject>>> = std::collections::HashMap::new(); for &ptr in &collecting { let obj = unsafe { ptr.0.as_ref() }; if obj.strong_count() == 0 { continue; } let referent_ptrs = unsafe { obj.gc_get_referent_ptrs() }; - referents_map.insert(ptr, referent_ptrs.clone()); - for child_ptr in referent_ptrs { + for &child_ptr in &referent_ptrs { let gc_ptr = GcPtr(child_ptr); if collecting.contains(&gc_ptr) && let Some(refs) = gc_refs.get_mut(&gc_ptr) { *refs = refs.saturating_sub(1); } } + referents_map.insert(ptr, referent_ptrs); } @@ - let referent_ptrs = referents_map - .get(&ptr) - .cloned() - .unwrap_or_else(|| unsafe { obj.gc_get_referent_ptrs() }); - for child_ptr in referent_ptrs { + let referent_ptrs = referents_map + .get(&ptr) + .map(|v| std::borrow::Cow::Borrowed(v.as_slice())) + .unwrap_or_else(|| { + std::borrow::Cow::Owned(unsafe { obj.gc_get_referent_ptrs() }) + }); + for &child_ptr in referent_ptrs.iter() { let gc_ptr = GcPtr(child_ptr); if collecting.contains(&gc_ptr) && reachable.insert(gc_ptr) { worklist.push(gc_ptr); } }Also applies to: 501-505
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/gc_state.rs` around lines 465 - 473, The GC hot path currently clones referent vectors unnecessarily: when iterating over collecting you call obj.gc_get_referent_ptrs() and insert referents_map.insert(ptr, referent_ptrs.clone()), then later clone again (lines ~501-505). Change this to move the original Vec<NonNull<PyObject>> into the map (avoid .clone()) and, when later iterating, borrow or drain the stored Vec as needed instead of cloning it (update uses around where referents_map.get/entry is accessed). Specifically modify the code around referents_map, the loop over collecting, and usages of gc_get_referent_ptrs()/referents_map so that the Vec returned by gc_get_referent_ptrs() is inserted by value and reused rather than cloned.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/vm/src/gc_state.rs`:
- Around line 465-473: The GC hot path currently clones referent vectors
unnecessarily: when iterating over collecting you call
obj.gc_get_referent_ptrs() and insert referents_map.insert(ptr,
referent_ptrs.clone()), then later clone again (lines ~501-505). Change this to
move the original Vec<NonNull<PyObject>> into the map (avoid .clone()) and, when
later iterating, borrow or drain the stored Vec as needed instead of cloning it
(update uses around where referents_map.get/entry is accessed). Specifically
modify the code around referents_map, the loop over collecting, and usages of
gc_get_referent_ptrs()/referents_map so that the Vec returned by
gc_get_referent_ptrs() is inserted by value and reused rather than cloned.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: bed67f50-2606-4c55-8469-84d0f8b546cc
📒 Files selected for processing (1)
crates/vm/src/gc_state.rs
Pre-compute referent pointers once per object in step 3 and reuse them in step 4 (BFS reachability). Previously, gc_get_referent_ptrs() was called independently in both steps. If a dict's write lock state changed between the two calls (e.g., held by another thread during one traversal but not the other), the two traversals could return different results. This caused live objects to be incorrectly classified as unreachable and cleared by GC.
Summary by CodeRabbit