Skip to content

Add global type mutex for thread-safe type operations#7416

Open
youknowone wants to merge 2 commits intoRustPython:mainfrom
youknowone:typelock
Open

Add global type mutex for thread-safe type operations#7416
youknowone wants to merge 2 commits intoRustPython:mainfrom
youknowone:typelock

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Mar 13, 2026

Introduce vm.state.type_mutex to serialize type mutation and version-tag assignment, matching CPython's free-threading model.

  • Add with_type_lock() helper and split modified/assign_version_tag into inner (lock-free) and public (lock-acquiring) variants
  • Add version_for_specialization() and lookup_ref_and_version_interned() for atomic type-cache lookups under the lock
  • Wrap all type-mutating paths (bases, annotations, module, type_params, doc, SetAttr) with the type lock
  • Drop old values outside the lock to prevent deadlock from weakref callbacks that may re-enter specialization
  • Fix _ctypes set_attr calls to route through proper SetAttr path
  • Reinitialize type_mutex after fork

Summary by CodeRabbit

  • Bug Fixes

    • Improved thread safety for concurrent type operations by introducing centralized synchronization, reducing potential race conditions in multi-threaded environments.
  • Performance

    • Optimized type attribute caching and version management for faster attribute lookups under concurrent access.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

This PR introduces a global type_mutex to PyGlobalState and refactors type-mutation and specialization paths to serialize version-tag assignment, cache population, and metadata updates under centralized locking. New helper methods (version_for_specialization, lookup_ref_and_version_interned) consolidate version-fetching logic, and frame specialization code is updated to use them.

Changes

Cohort / File(s) Summary
Type Versioning & Locking
crates/vm/src/builtins/type.rs
Added with_type_lock wrapper for synchronized type mutations; split version-tag and cache-invalidation logic into _inner() implementations and public wrappers; introduced version_for_specialization and lookup_ref_and_version_interned for versioned lookups; wrapped __bases__, annotation, and attribute setters to acquire lock and call modified_inner(); refactored SetAttr to drop old values outside the lock.
Specialization Paths
crates/vm/src/frame.rs
Replaced scattered type-version load-and-assign logic with calls to PyType::version_for_specialization(vm) and cls.lookup_ref_and_version_interned() across default-getattro, attribute load, __getitem__ subscript, and CallAllocAndEnterInit specialization paths.
Global State & Infrastructure
crates/vm/src/vm/mod.rs, crates/vm/src/vm/interpreter.rs, crates/vm/src/stdlib/posix.rs
Added type_mutex: PyMutex<()> field to PyGlobalState struct; initialized it in initialize_main_vm; updated post-fork lock reinitialization to call reinit_mutex_after_fork(&vm.state.type_mutex).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰 A mutex hops into place,
Guarding types with grace—
No race condition here!
Version tags hold clear,
Thread-safe caches embrace. ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add global type mutex for thread-safe type operations' accurately captures the primary change: introducing vm.state.type_mutex to serialize type mutation and version-tag assignment. It is concise, clear, and directly related to the main objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 84.85% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the typelock branch 5 times, most recently from d0cfe84 to 1daba38 Compare March 19, 2026 19:39
@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[ ] test: cpython/Lib/test/test_class.py (TODO: 16)
[x] test: cpython/Lib/test/test_genericclass.py
[x] test: cpython/Lib/test/test_subclassinit.py

dependencies:

dependent tests: (no tests depend on class)

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@youknowone youknowone force-pushed the typelock branch 5 times, most recently from 1865733 to da1d518 Compare March 23, 2026 05:06
@youknowone youknowone marked this pull request as ready for review March 23, 2026 09:09
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
extra_tests/custom_text_test_runner.py (1)

391-396: ⚠️ Potential issue | 🟡 Minor

Inconsistent access pattern may cause AttributeError for plain functions.

Line 392-394 still uses the old __func__.__dict__ access pattern while line 395 uses the new helper. For plain functions (without __func__), this will raise AttributeError before the helper is even called.

Proposed fix
         if self.test_types:
-            if "test_type" in getattr(
-                test, test._testMethodName
-            ).__func__.__dict__ and set([s.lower() for s in self.test_types]) == set(
+            if "test_type" in _get_method_dict(test) and set([s.lower() for s in self.test_types]) == set(
                 [s.lower() for s in _get_method_dict(test)["test_type"]]
             ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extra_tests/custom_text_test_runner.py` around lines 391 - 396, The
conditional uses getattr(test, test._testMethodName).__func__.__dict__ which
will raise AttributeError for plain functions; replace that access with the safe
helper used on the next line by calling _get_method_dict(test) consistently:
check for "test_type" in _get_method_dict(test) and compare set([s.lower() for s
in self.test_types]) against set([s.lower() for s in
_get_method_dict(test)["test_type"]]) so you never access __func__ directly
(symbols: self.test_types, test._testMethodName, _get_method_dict, "test_type").
🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)

7436-7436: Keep the remaining attr specializers on one type snapshot.

These sites now start from version_for_specialization(), but the surrounding specializer still does later MRO/dict/descriptor inspection after that call. That leaves LOAD_ATTR/STORE_ATTR on a looser snapshot than the new __getitem__ / __init__ paths. I’d either switch these to an atomic lookup helper as well or revalidate the version right before publishing the specialized opcode/cache.

Also applies to: 7474-7474, 7694-7694, 7729-7729, 9121-9121

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` at line 7436, The attr specializers call
cls.version_for_specialization() early but then perform additional
MRO/dict/descriptor inspection, leaving LOAD_ATTR/STORE_ATTR snapshots looser
than the new __getitem__/__init__ paths; update the attr-specializer flows (the
sites calling version_for_specialization(), e.g. the code around
version_for_specialization at the lines flagged and the logic that publishes the
specialized opcode/cache for LOAD_ATTR/STORE_ATTR) to either perform the whole
lookup atomically via a helper (similar to the __getitem__/__init__ path) or
revalidate/refresh the class version immediately before publishing the
specialized opcode/cache so the snapshot used for specialization is exact;
locate references to version_for_specialization, the LOAD_ATTR/STORE_ATTR
specialization code path, and the publish/emit-cache logic and ensure the
version is checked/locked at the final publish point.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 2607-2629: The attribute mutation currently invalidates versions
and mutates the attributes map inside with_type_lock but applies slot table
updates (update_slot) outside the same lock; move the call to update_slot() into
the same with_type_lock closure (alongside modified_inner() and the
attributes.write() insert/shift_remove) so the tp_version_tag, inline-cache
invalidation, dict mutation, and slot rewrite occur atomically under the type
lock, then return the previous value and drop it outside the lock as before;
ensure both assign and remove paths in the closure call update_slot on the
appropriate slot and preserve the existing error handling (references:
with_type_lock, modified_inner, attributes.write(), update_slot).
- Around line 1427-1443: Compute and validate the new bases/MROs for zelf and
all descendants before mutating the live structures: inside with_type_lock, do
not immediately assign to zelf.bases; instead build the candidate bases and run
PyType::resolve_mro for zelf and every subclass (the same logic as
update_mro_recursively) into a temporary mapping of PyType -> new_mro
(preserving mro[0] where needed) and return any error if resolution fails; only
after all resolve_mro calls succeed, assign *zelf.bases.write() = bases and then
write each cls.mro from the prepared mapping and proceed (this ensures
update_mro_recursively-like validation happens before any live mutation).
- Around line 1436-1439: The loop over cls.subclasses currently unwraps weak
refs and downcasts (calling upgrade().unwrap() and downcast_ref().unwrap()),
which can panic on stale weakrefs; instead, change the loop to skip entries that
fail to upgrade or fail the downcast by using conditional checks (e.g., if let
Some(strong) = subclass.upgrade() { if let Some(subclass_pytype) =
strong.downcast_ref::<Py<PyType>>() { update_mro_recursively(subclass_pytype,
vm)? } } ) so stale weakrefs are ignored rather than unwrapping and crashing;
apply the same non-unwrapping pattern to the other occurrences around
update_mro_recursively (the similar block at lines ~1451-1459).
- Around line 1427-1449: The MRO update only rewrites bases/mro for zelf and
leaves derived metadata (per-class slot tables and __base__) stale on
subclasses; modify update_mro_recursively (inside with_type_lock) so that after
computing and writing each class's mro you also recompute and write its
base/__base__ (the same logic used for the original class), call that class's
init_slots(&vm.ctx) to rebuild its slots, and call modified_inner() to
invalidate caches for that class before recursing into its subclasses; ensure
you perform these steps for zelf as well (not just at the top level) so every
descendant gets updated slot tables and base-chain info.
- Around line 454-458: with_type_lock currently takes a zero-arg closure so
callers (like assign_version_tag via version_for_specialization /
find_name_in_mro) can end up executing code outside the held lock; change
with_type_lock signature to fn with_type_lock<R>(vm: &VirtualMachine, f: impl
FnOnce(&VirtualMachine) -> R) -> R, acquire the lock into _guard as before and
call f(vm) while the guard is in scope, then update all call sites (notably
version_for_specialization, assign_version_tag, find_name_in_mro and the block
around 490-503) to accept the &VirtualMachine parameter so the work runs while
type_mutex is held, preserving the invariant used by modified_inner.

---

Outside diff comments:
In `@extra_tests/custom_text_test_runner.py`:
- Around line 391-396: The conditional uses getattr(test,
test._testMethodName).__func__.__dict__ which will raise AttributeError for
plain functions; replace that access with the safe helper used on the next line
by calling _get_method_dict(test) consistently: check for "test_type" in
_get_method_dict(test) and compare set([s.lower() for s in self.test_types])
against set([s.lower() for s in _get_method_dict(test)["test_type"]]) so you
never access __func__ directly (symbols: self.test_types, test._testMethodName,
_get_method_dict, "test_type").

---

Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Line 7436: The attr specializers call cls.version_for_specialization() early
but then perform additional MRO/dict/descriptor inspection, leaving
LOAD_ATTR/STORE_ATTR snapshots looser than the new __getitem__/__init__ paths;
update the attr-specializer flows (the sites calling
version_for_specialization(), e.g. the code around version_for_specialization at
the lines flagged and the logic that publishes the specialized opcode/cache for
LOAD_ATTR/STORE_ATTR) to either perform the whole lookup atomically via a helper
(similar to the __getitem__/__init__ path) or revalidate/refresh the class
version immediately before publishing the specialized opcode/cache so the
snapshot used for specialization is exact; locate references to
version_for_specialization, the LOAD_ATTR/STORE_ATTR specialization code path,
and the publish/emit-cache logic and ensure the version is checked/locked at the
final publish point.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: a8a615eb-9d3f-4cd2-a1b8-841c4089a979

📥 Commits

Reviewing files that changed from the base of the PR and between 8c01615 and da1d518.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_class.py is excluded by !Lib/**
📒 Files selected for processing (8)
  • .cspell.json
  • .github/workflows/cron-ci.yaml
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/stdlib/posix.rs
  • crates/vm/src/vm/interpreter.rs
  • crates/vm/src/vm/mod.rs
  • extra_tests/custom_text_test_runner.py

Comment on lines +1427 to +1443
Self::with_type_lock(vm, || {
*zelf.bases.write() = bases;
// Recursively update the mros of this class and all subclasses
fn update_mro_recursively(cls: &PyType, vm: &VirtualMachine) -> PyResult<()> {
let mut mro =
PyType::resolve_mro(&cls.bases.read()).map_err(|msg| vm.new_type_error(msg))?;
// Preserve self (mro[0]) when updating MRO
mro.insert(0, cls.mro.read()[0].to_owned());
*cls.mro.write() = mro;
for subclass in cls.subclasses.write().iter() {
let subclass = subclass.upgrade().unwrap();
let subclass: &Py<PyType> = subclass.downcast_ref().unwrap();
update_mro_recursively(subclass, vm)?;
}
Ok(())
}
Ok(())
}
update_mro_recursively(zelf, vm)?;
update_mro_recursively(zelf, vm)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Validate the full __bases__ rewrite before mutating the live hierarchy.

zelf.bases is overwritten before resolve_mro() has succeeded for zelf and every descendant. If any recursive update_mro_recursively() call fails, this setter raises after leaving a partially applied base/MRO graph behind.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 1427 - 1443, Compute and
validate the new bases/MROs for zelf and all descendants before mutating the
live structures: inside with_type_lock, do not immediately assign to zelf.bases;
instead build the candidate bases and run PyType::resolve_mro for zelf and every
subclass (the same logic as update_mro_recursively) into a temporary mapping of
PyType -> new_mro (preserving mro[0] where needed) and return any error if
resolution fails; only after all resolve_mro calls succeed, assign
*zelf.bases.write() = bases and then write each cls.mro from the prepared
mapping and proceed (this ensures update_mro_recursively-like validation happens
before any live mutation).

Comment on lines +1427 to +1449
Self::with_type_lock(vm, || {
*zelf.bases.write() = bases;
// Recursively update the mros of this class and all subclasses
fn update_mro_recursively(cls: &PyType, vm: &VirtualMachine) -> PyResult<()> {
let mut mro =
PyType::resolve_mro(&cls.bases.read()).map_err(|msg| vm.new_type_error(msg))?;
// Preserve self (mro[0]) when updating MRO
mro.insert(0, cls.mro.read()[0].to_owned());
*cls.mro.write() = mro;
for subclass in cls.subclasses.write().iter() {
let subclass = subclass.upgrade().unwrap();
let subclass: &Py<PyType> = subclass.downcast_ref().unwrap();
update_mro_recursively(subclass, vm)?;
}
Ok(())
}
Ok(())
}
update_mro_recursively(zelf, vm)?;
update_mro_recursively(zelf, vm)?;

// Invalidate inline caches
zelf.modified();
// Invalidate inline caches
zelf.modified_inner();

// TODO: do any old slots need to be cleaned up first?
zelf.init_slots(&vm.ctx);
// TODO: do any old slots need to be cleaned up first?
zelf.init_slots(&vm.ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rebasing needs to refresh all derived metadata, not just bases/mro.

Only zelf runs init_slots() after the recursive MRO rewrite, and self.base is never recomputed. Descendants can keep slot tables from the old MRO, while __base__ and base-chain walks still reflect the previous best base.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 1427 - 1449, The MRO update only
rewrites bases/mro for zelf and leaves derived metadata (per-class slot tables
and __base__) stale on subclasses; modify update_mro_recursively (inside
with_type_lock) so that after computing and writing each class's mro you also
recompute and write its base/__base__ (the same logic used for the original
class), call that class's init_slots(&vm.ctx) to rebuild its slots, and call
modified_inner() to invalidate caches for that class before recursing into its
subclasses; ensure you perform these steps for zelf as well (not just at the top
level) so every descendant gets updated slot tables and base-chain info.

Comment on lines +1436 to +1439
for subclass in cls.subclasses.write().iter() {
let subclass = subclass.upgrade().unwrap();
let subclass: &Py<PyType> = subclass.downcast_ref().unwrap();
update_mro_recursively(subclass, vm)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't unwrap subclass weakrefs during the recursive walk.

subclasses is lazily cleaned, and this branch only appends new weakrefs, so upgrade() can legitimately return None here. Turning that into unwrap() makes a stale weakref crash __bases__ assignment instead of just being skipped.

🔧 Suggested fix
-                for subclass in cls.subclasses.write().iter() {
-                    let subclass = subclass.upgrade().unwrap();
-                    let subclass: &Py<PyType> = subclass.downcast_ref().unwrap();
-                    update_mro_recursively(subclass, vm)?;
-                }
+                let subclasses: Vec<_> = {
+                    let mut subclasses = cls.subclasses.write();
+                    subclasses.retain(|weak| weak.upgrade().is_some());
+                    subclasses.iter().filter_map(|weak| weak.upgrade()).collect()
+                };
+                for subclass in subclasses {
+                    let subclass: &Py<PyType> = subclass.downcast_ref().unwrap();
+                    update_mro_recursively(subclass, vm)?;
+                }

Also applies to: 1451-1459

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 1436 - 1439, The loop over
cls.subclasses currently unwraps weak refs and downcasts (calling
upgrade().unwrap() and downcast_ref().unwrap()), which can panic on stale
weakrefs; instead, change the loop to skip entries that fail to upgrade or fail
the downcast by using conditional checks (e.g., if let Some(strong) =
subclass.upgrade() { if let Some(subclass_pytype) =
strong.downcast_ref::<Py<PyType>>() { update_mro_recursively(subclass_pytype,
vm)? } } ) so stale weakrefs are ignored rather than unwrapping and crashing;
apply the same non-unwrapping pattern to the other occurrences around
update_mro_recursively (the similar block at lines ~1451-1459).

Comment on lines +2607 to +2629
// Drop old value OUTSIDE the type lock to avoid deadlock:
// dropping may trigger weakref callbacks → method calls →
// LOAD_ATTR specialization → version_for_specialization → type lock.
let _prev_value = Self::with_type_lock(vm, || {
// Invalidate inline caches before modifying attributes.
// This ensures other threads see the version invalidation before
// any attribute changes, preventing use-after-free of cached descriptors.
zelf.modified_inner();

if let PySetterValue::Assign(value) = value {
Ok(zelf.attributes.write().insert(attr_name, value))
} else {
let prev_value = zelf.attributes.write().shift_remove(attr_name); // TODO: swap_remove applicable?
if prev_value.is_none() {
return Err(vm.new_attribute_error(format!(
"type object '{}' has no attribute '{}'",
zelf.name(),
attr_name,
)));
}
Ok(prev_value)
}
}
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Keep update_slot() inside the same type-lock transaction.

The dict mutation and version invalidation are published before the slot table is rewritten. Another thread can assign a fresh tp_version_tag or dispatch through the old slot in that gap, so the new version no longer covers the full type state.

🔧 Suggested fix
-        let _prev_value = Self::with_type_lock(vm, || {
+        let _prev_value = Self::with_type_lock(vm, || {
             // Invalidate inline caches before modifying attributes.
             // This ensures other threads see the version invalidation before
             // any attribute changes, preventing use-after-free of cached descriptors.
             zelf.modified_inner();
 
-            if let PySetterValue::Assign(value) = value {
-                Ok(zelf.attributes.write().insert(attr_name, value))
+            let prev_value = if let PySetterValue::Assign(value) = value {
+                zelf.attributes.write().insert(attr_name, value)
             } else {
                 let prev_value = zelf.attributes.write().shift_remove(attr_name); // TODO: swap_remove applicable?
                 if prev_value.is_none() {
                     return Err(vm.new_attribute_error(format!(
                         "type object '{}' has no attribute '{}'",
@@
                         attr_name,
                     )));
                 }
-                Ok(prev_value)
-            }
+                prev_value
+            };
+
+            if attr_name.as_wtf8().starts_with("__") && attr_name.as_wtf8().ends_with("__") {
+                if assign {
+                    zelf.update_slot::<true>(attr_name, &vm.ctx);
+                } else {
+                    zelf.update_slot::<false>(attr_name, &vm.ctx);
+                }
+            }
+            Ok(prev_value)
         })?;
-
-        if attr_name.as_wtf8().starts_with("__") && attr_name.as_wtf8().ends_with("__") {
-            if assign {
-                zelf.update_slot::<true>(attr_name, &vm.ctx);
-            } else {
-                zelf.update_slot::<false>(attr_name, &vm.ctx);
-            }
-        }
         Ok(())

Also applies to: 2631-2637

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 2607 - 2629, The attribute
mutation currently invalidates versions and mutates the attributes map inside
with_type_lock but applies slot table updates (update_slot) outside the same
lock; move the call to update_slot() into the same with_type_lock closure
(alongside modified_inner() and the attributes.write() insert/shift_remove) so
the tp_version_tag, inline-cache invalidation, dict mutation, and slot rewrite
occur atomically under the type lock, then return the previous value and drop it
outside the lock as before; ensure both assign and remove paths in the closure
call update_slot on the appropriate slot and preserve the existing error
handling (references: with_type_lock, modified_inner, attributes.write(),
update_slot).

Dropping values inside with_type_lock can trigger weakref callbacks,
which may access attributes (LOAD_ATTR specialization) and re-acquire
the non-reentrant type mutex, causing deadlock.

Return old values from lock closures so they drop after lock release.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/vm/src/frame.rs (2)

7713-7744: ⚠️ Potential issue | 🔴 Critical

The metaclass descriptor check still has a version-skew window.

mcl_attr is inspected before metaclass_version is captured here. If another thread installs a metaclass data descriptor between those two steps, LoadAttrClass* can still be cached against the post-mutation metaclass version while bypassing the new descriptor precedence.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 7713 - 7744, The metaclass descriptor
check reads mcl_attr before capturing metaclass_version, leaving a race where a
data descriptor could be installed between those steps; to fix, call
mcl.version_for_specialization(_vm) and store metaclass_version (and handle the
zero-version backoff early) before calling mcl.get_attr(attr_name), then only
inspect the attribute's descr_set if metaclass_version is nonzero; reference
mcl.version_for_specialization, metaclass_version, mcl.get_attr, mcl_attr, and
the LoadAttrClass* specialization path when moving the attribute-inspection
after the version capture and preserving the existing adaptive_counter_backoff
behavior when version_for_specialization returns 0.

7433-7439: ⚠️ Potential issue | 🔴 Critical

Capture the type version from the same snapshot as these slot checks.

These blocks still read mutable type behavior (getattro/setattro, __bool__/__len__, tp_new/tp_alloc) before they capture the version they cache against. Under free-threading, another thread can mutate the type in between and make us publish a specialization under the new version while still relying on the old behavior, which can skip a newly-installed __getattribute__, __setattr__, __bool__/__len__, or __new__.

Also applies to: 7477-7490, 8455-8488, 8781-8794, 9106-9112, 9124-9136

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 7433 - 7439, The slot-check code reads
mutable slot pointers (e.g., cls.slots.getattro.load() /
cls.slots.setattro.load(), tp_new/tp_alloc, __bool__/__len__) before capturing
the type version, which can race; fix by first taking a single snapshot of the
type version (read and store the version tag field from cls once into a local,
e.g., version_snapshot) and then read the slot pointers and compute is_default_*
flags, finally publish the specialization tagged with that same
version_snapshot; apply the same change to the equivalent blocks that check
cls.slots.setattro, tp_new/tp_alloc, and __bool__/__len__ (the other occurrences
noted at the listed ranges) so all slot-checks and the cached version come from
the same snapshot.
♻️ Duplicate comments (5)
crates/vm/src/builtins/type.rs (5)

2614-2636: ⚠️ Potential issue | 🔴 Critical

Keep the slot rewrite in the same type-lock transaction.

Lines 2617-2636 publish the dict mutation and version invalidation, but Lines 2638-2643 update the slot table after type_mutex is released. Another thread can observe the new attribute state, assign a fresh version tag, or dispatch through the old slot table in that window.

🔒 Suggested fix
         let _prev_value = Self::with_type_lock(vm, || {
             // Invalidate inline caches before modifying attributes.
             // This ensures other threads see the version invalidation before
             // any attribute changes, preventing use-after-free of cached descriptors.
             zelf.modified_inner();
 
-            if let PySetterValue::Assign(value) = value {
-                Ok(zelf.attributes.write().insert(attr_name, value))
+            let prev_value = if let PySetterValue::Assign(value) = value {
+                zelf.attributes.write().insert(attr_name, value)
             } else {
                 let prev_value = zelf.attributes.write().shift_remove(attr_name); // TODO: swap_remove applicable?
                 if prev_value.is_none() {
                     return Err(vm.new_attribute_error(format!(
                         "type object '{}' has no attribute '{}'",
@@
                         attr_name,
                     )));
                 }
-                Ok(prev_value)
-            }
+                prev_value
+            };
+
+            if attr_name.as_wtf8().starts_with("__") && attr_name.as_wtf8().ends_with("__") {
+                if assign {
+                    zelf.update_slot::<true>(attr_name, &vm.ctx);
+                } else {
+                    zelf.update_slot::<false>(attr_name, &vm.ctx);
+                }
+            }
+
+            Ok(prev_value)
         })?;
-
-        if attr_name.as_wtf8().starts_with("__") && attr_name.as_wtf8().ends_with("__") {
-            if assign {
-                zelf.update_slot::<true>(attr_name, &vm.ctx);
-            } else {
-                zelf.update_slot::<false>(attr_name, &vm.ctx);
-            }
-        }
         Ok(())

Also applies to: 2638-2643

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 2614 - 2636, The dict mutation,
version invalidation, and the slot-table rewrite must occur inside the same type
lock to avoid a race; move the slot-table update logic into the closure passed
to Self::with_type_lock so modified_inner(), the
attributes.write().insert/shift_remove mutation (handling
PySetterValue::Assign), and the slot table rewrite/update happen before the lock
is released, but keep dropping the previous value outside the lock by returning
it from the closure (as _prev_value) so its destructor runs after with_type_lock
returns.

1434-1450: ⚠️ Potential issue | 🔴 Critical

Validate the full __bases__ rewrite before touching live state.

Line 1435 overwrites zelf.bases before any resolve_mro() call succeeds. If update_mro_recursively() then fails for zelf or any descendant, this setter unwinds after partially mutating the hierarchy.


1437-1456: ⚠️ Potential issue | 🟠 Major

Recompute base and slot tables for every rebased class.

update_mro_recursively() only rewrites mro, and only zelf runs init_slots() on Line 1456. Descendants keep slot tables derived from the old MRO, and self.base / __base__ is never refreshed for any of the rewritten types.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 1437 - 1456,
update_mro_recursively currently only rewrites each type's mro, but doesn't
refresh per-class base/slot state; update the function so that after computing
and assigning mro it also recomputes and writes the class base/__base__ (e.g.
set cls.base to the appropriate MRO entry) and calls cls.init_slots(&vm.ctx) and
cls.modified_inner() for that class (instead of only calling init_slots/modified
on zelf after recursion). Refer to update_mro_recursively, PyType::resolve_mro,
cls.mro, cls.base, cls.subclasses, modified_inner, and init_slots to locate
where to update base and reinitialize slots for each rewritten subclass.

1443-1446: ⚠️ Potential issue | 🟠 Major

Skip dead subclass weakrefs instead of unwrapping.

subclasses is lazily cleaned. Line 1444's upgrade().unwrap() can legitimately fail and turn __bases__ assignment into a panic instead of just ignoring a stale weakref.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 1443 - 1446, The loop currently
calls upgrade().unwrap() (and then downcast_ref().unwrap()) on entries in
cls.subclasses, which can panic on dead weakrefs; change it to safely skip stale
weakrefs by checking the Option/Result: replace the unwrap chain with a
conditional that first does if let Some(strong) = subclass.upgrade() { if let
Some(sub_pytype) = strong.downcast_ref::<Py<PyType>>() {
update_mro_recursively(sub_pytype, vm)?; } } so dead weakrefs or unexpected
types are ignored instead of causing a panic.

496-508: ⚠️ Potential issue | 🔴 Critical

Serialize the public version-tag path too.

assign_version_tag() still just forwards to _inner, and Line 1138 still reaches it from find_name_in_mro() without type_mutex. That leaves the base-before-subclass tagging invariant racing with modified_inner(), so a subclass can still end up with a nonzero tp_version_tag after its base was reset to 0.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 1607-1617: The annotations cache cleanup should run
unconditionally: inside the closure passed to Self::with_type_lock (around
modified_inner, attributes.write(), and the attrs.insert(identifier!(vm,
__annotate_func__), value) call), always remove any existing identifier!(vm,
__annotations_cache__) instead of only calling attrs.swap_remove(...) when
!vm.is_none(&value); i.e., drop the vm.is_none conditional and unconditionally
call attrs.swap_remove(identifier!(vm, __annotations_cache__)) before inserting
the new __annotate_func__ value so assigning None will also clear the cached
__annotations__.

---

Outside diff comments:
In `@crates/vm/src/frame.rs`:
- Around line 7713-7744: The metaclass descriptor check reads mcl_attr before
capturing metaclass_version, leaving a race where a data descriptor could be
installed between those steps; to fix, call mcl.version_for_specialization(_vm)
and store metaclass_version (and handle the zero-version backoff early) before
calling mcl.get_attr(attr_name), then only inspect the attribute's descr_set if
metaclass_version is nonzero; reference mcl.version_for_specialization,
metaclass_version, mcl.get_attr, mcl_attr, and the LoadAttrClass* specialization
path when moving the attribute-inspection after the version capture and
preserving the existing adaptive_counter_backoff behavior when
version_for_specialization returns 0.
- Around line 7433-7439: The slot-check code reads mutable slot pointers (e.g.,
cls.slots.getattro.load() / cls.slots.setattro.load(), tp_new/tp_alloc,
__bool__/__len__) before capturing the type version, which can race; fix by
first taking a single snapshot of the type version (read and store the version
tag field from cls once into a local, e.g., version_snapshot) and then read the
slot pointers and compute is_default_* flags, finally publish the specialization
tagged with that same version_snapshot; apply the same change to the equivalent
blocks that check cls.slots.setattro, tp_new/tp_alloc, and __bool__/__len__ (the
other occurrences noted at the listed ranges) so all slot-checks and the cached
version come from the same snapshot.

---

Duplicate comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 2614-2636: The dict mutation, version invalidation, and the
slot-table rewrite must occur inside the same type lock to avoid a race; move
the slot-table update logic into the closure passed to Self::with_type_lock so
modified_inner(), the attributes.write().insert/shift_remove mutation (handling
PySetterValue::Assign), and the slot table rewrite/update happen before the lock
is released, but keep dropping the previous value outside the lock by returning
it from the closure (as _prev_value) so its destructor runs after with_type_lock
returns.
- Around line 1437-1456: update_mro_recursively currently only rewrites each
type's mro, but doesn't refresh per-class base/slot state; update the function
so that after computing and assigning mro it also recomputes and writes the
class base/__base__ (e.g. set cls.base to the appropriate MRO entry) and calls
cls.init_slots(&vm.ctx) and cls.modified_inner() for that class (instead of only
calling init_slots/modified on zelf after recursion). Refer to
update_mro_recursively, PyType::resolve_mro, cls.mro, cls.base, cls.subclasses,
modified_inner, and init_slots to locate where to update base and reinitialize
slots for each rewritten subclass.
- Around line 1443-1446: The loop currently calls upgrade().unwrap() (and then
downcast_ref().unwrap()) on entries in cls.subclasses, which can panic on dead
weakrefs; change it to safely skip stale weakrefs by checking the Option/Result:
replace the unwrap chain with a conditional that first does if let Some(strong)
= subclass.upgrade() { if let Some(sub_pytype) =
strong.downcast_ref::<Py<PyType>>() { update_mro_recursively(sub_pytype, vm)?; }
} so dead weakrefs or unexpected types are ignored instead of causing a panic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: c8c994ce-d646-4364-b6ea-ea561ef96c2b

📥 Commits

Reviewing files that changed from the base of the PR and between da1d518 and f491a45.

📒 Files selected for processing (5)
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/stdlib/posix.rs
  • crates/vm/src/vm/interpreter.rs
  • crates/vm/src/vm/mod.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/vm/src/stdlib/posix.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/vm/src/vm/interpreter.rs
  • crates/vm/src/vm/mod.rs

Comment on lines +1607 to +1617
let _prev_values = Self::with_type_lock(vm, || {
self.modified_inner();
let mut attrs = self.attributes.write();
let removed = if !vm.is_none(&value) {
attrs.swap_remove(identifier!(vm, __annotations_cache__))
} else {
None
};
let prev = attrs.insert(identifier!(vm, __annotate_func__), value);
(removed, prev)
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clear __annotations_cache__ even when __annotate__ becomes None.

Right now the cache is only cleared when the new value is callable. If the class already materialized __annotations__, assigning None here leaves __annotations_cache__ behind, so later reads keep returning the stale dict.

💡 Minimal fix
         let _prev_values = Self::with_type_lock(vm, || {
             self.modified_inner();
             let mut attrs = self.attributes.write();
-            let removed = if !vm.is_none(&value) {
-                attrs.swap_remove(identifier!(vm, __annotations_cache__))
-            } else {
-                None
-            };
+            let removed = attrs.swap_remove(identifier!(vm, __annotations_cache__));
             let prev = attrs.insert(identifier!(vm, __annotate_func__), value);
             (removed, prev)
         });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 1607 - 1617, The annotations
cache cleanup should run unconditionally: inside the closure passed to
Self::with_type_lock (around modified_inner, attributes.write(), and the
attrs.insert(identifier!(vm, __annotate_func__), value) call), always remove any
existing identifier!(vm, __annotations_cache__) instead of only calling
attrs.swap_remove(...) when !vm.is_none(&value); i.e., drop the vm.is_none
conditional and unconditionally call attrs.swap_remove(identifier!(vm,
__annotations_cache__)) before inserting the new __annotate_func__ value so
assigning None will also clear the cached __annotations__.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant