Conversation
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] test: cpython/Lib/test/test_marshal.py (TODO: 7) dependencies: dependent tests: (21 tests)
Legend:
|
📝 WalkthroughWalkthroughRefactors varint/exception-table encoding, rewrites marshal serialization/deserialization with depth/ref tracking and CPython field ordering, reworks oparg enums (MakeFunctionFlag, ComparisonOperator), adds CodeUnits specialization-disable, and small VM API/trait adjustments. Changes
Sequence Diagram(s)sequenceDiagram
participant Py as Python code (call site)
participant VM as VM stdlib marshal
participant Core as compiler-core marshal
participant File as file-like object
Py->>VM: marshal.dumps/dump(args)
VM->>Core: serialize_value/write_object (allow_code, depth/ref args)
Core-->>VM: bytes (marshal format, FLAG_REF entries)
VM->>File: write(bytes) / seek after load
File-->>VM: read bytes
VM->>Core: deserialize_value_depth (uses FLAG_REF, MAX_MARSHAL_STACK_DEPTH)
Core-->>VM: PyObjectRef
VM-->>Py: return object / bytes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
170d8fe to
1a0c0ce
Compare
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 570-576: The current disable_specialization() only writes
UNREACHABLE_BACKOFF into adaptive_counters but CodeUnits::quicken() later
unconditionally reinitializes cacheable sites, undoing the disable; update the
implementation so that quicken() respects the non-specializable state: either
add a persistent flag on the code object (e.g., non_specializable) set by
disable_specialization() and check that flag in CodeUnits::quicken() before
initializing warmup counters at RESUME, or have quicken() inspect each adaptive
counter and skip reinitialization when the counter already equals
UNREACHABLE_BACKOFF; ensure you reference disable_specialization,
adaptive_counters, UNREACHABLE_BACKOFF, and CodeUnits::quicken() in your changes
so the disable persists across quicken().
In `@crates/compiler-core/src/bytecode/oparg.rs`:
- Around line 385-389: The current encoding mixes bit positions and CPython
power-of-two bitmasks, causing ambiguity in TryFrom<u32> and in
From<MakeFunctionFlag> for u32; change the conversions so TryFrom<u32> decodes
CPython-style masks (treat input as bitmask and match 1,2,4,8,16,32
specifically, returning Err for non-power-of-two or unknown values) and make
From<MakeFunctionFlag> for u32 emit the CPython bitmask (1 << position) rather
than the raw position; if you need to support position-based values keep or add
a distinct TryFrom<u8> (or a helper like from_position) to decode stored
positions, and adjust any uses of the bitflag! macro or internal storage to
consistently treat the enum as representing a position with conversions handling
mask<->position via (1 << position). Ensure unique symbols updated: TryFrom<u32>
impl for MakeFunctionFlag, From<MakeFunctionFlag> for u32, and any helper
from_position/from_mask functions.
In `@crates/compiler-core/src/marshal.rs`:
- Around line 236-254: The current code reconstructs localspluskinds using
split_localplus(...) and build_localspluskinds_from_split(...), which strips the
CO_FAST_HIDDEN bit; instead preserve hidden flags by merging them back from the
original localspluskinds: after computing lp via split_localplus and before
assigning localspluskinds for the CodeObject, iterate the original
localspluskinds and reapply CO_FAST_HIDDEN to the corresponding entries in the
rebuilt localspluskinds (matching by index/name), or simply reuse the original
localspluskinds where possible so serialize_code(...) retains CO_FAST_HIDDEN;
update the block around split_localplus, build_localspluskinds_from_split, and
the localspluskinds assignment to perform this merge.
- Around line 481-518: deserialize_value_depth currently reserves a None
placeholder for flagged containers, but only sets refs[idx] after
deserialize_value_typed returns, so recursive backrefs to a container being
built see None and fail; fix by reserving and inserting a live placeholder value
into refs before calling deserialize_value_typed so TYPE_REF can resolve to the
in-progress container. Concretely, modify deserialize_value_depth to create an
empty/container placeholder via the Bag API (or add a
Bag::make_placeholder/mutable container constructor) and store Some(placeholder)
at refs[idx] prior to calling deserialize_value_typed, then have
deserialize_value_typed fill/populate that placeholder in-place (or accept the
slot index to mutate refs[idx]) instead of replacing it after the fact; ensure
Type::try_from and the slot handling logic in deserialize_value_typed support
in-place population for lists/dicts/sets to allow recursive references to
resolve.
- Around line 279-345: The helpers read_marshal_bytes, read_marshal_str,
read_marshal_str_vec, read_marshal_name_tuple and read_marshal_const_tuple are
stripping FLAG_REF and not reserving/referring into the global marshal ref
table, and read_marshal_const_tuple calls deserialize_value in a way that builds
a fresh refs vector for each element; this breaks v3+ streams where TYPE_REF
indices are global. Fix by making these readers handle FLAG_REF correctly: when
reading the element type byte check for FLAG_REF and if present reserve a slot
in the shared refs table before deserializing and use the reserved index for ref
entries instead of stripping the flag; propagate a mutable refs container into
read_marshal_* functions (and into deserialize_value) so tuples use the same
refs vector for all elements; remove any local per-element refs creation in
read_marshal_const_tuple and instead pass the shared refs through each call so
TYPE_REFs decode consistently across the whole code object. Ensure functions
read_marshal_str_vec, read_marshal_name_tuple and read_marshal_const_tuple
accept &mut refs (or a context that owns it) and that FLAG_REF handling is
centralized where type_byte is consumed.
In `@crates/compiler-core/src/varint.rs`:
- Around line 62-80: read_varint_with_start (and the other varint reader)
currently returns Some(val) even when the input ends while the continuation bit
is still set; update the loop exit logic to detect an unterminated varint and
return None in that case. Concretely: after the while loop in
read_varint_with_start (and analogously in the other reader), if cont is still
true (meaning we exited because *pos >= data.len()), return None instead of
Some(val); otherwise return Some(val). This ensures malformed/unterminated
varints are treated as failures.
- Around line 35-50: write_varint_be currently only emits up to five 6-bit
chunks and thus drops bits for values >= 1 << 30; change it to emit all 6-bit
groups until the remaining value fits in 6 bits (i.e., use a loop instead of
fixed conditionals) so the top bits are preserved for full u32 range.
Specifically, in write_varint_be replace the fixed if-chain with a loop that
repeatedly computes group = (val >> shift) & 0x3f or repeatedly shifts the
remaining value, pushes 0x40 | group for all non-final groups and pushes the
final (val & 0x3f) without the continuation bit, and then return the number of
bytes written; keep the function signature and return behavior the same.
In `@crates/vm/src/stdlib/marshal.rs`:
- Around line 519-549: The load function currently calls read() with no size
which slurps the rest of the stream; change load to read only as much as the
deserializer needs by feeding a streaming buffer to marshal::deserialize_value
instead of pulling the entire file: replace the
ArgBytesLike::try_from_object(read_res) / buf approach with a small-read loop
(using the file object's read(size) or readinto-like API) that appends bytes to
a Vec<u8> and repeatedly calls deserialize_value on a &[u8] slice when more data
is required, only calling tell/seek after you know how many bytes were consumed;
keep the existing error mapping and check_no_code(&result, vm) logic and
reference the same symbols (load, marshal::deserialize_value, ArgBytesLike,
tell/seek, check_no_code) when making the change.
- Around line 420-473: The code is calling unwrap() on container insertions
which can panic on malformed input; update make_set (remove
set.add(...).unwrap()), make_frozenset (remove
PyFrozenSet::from_iter(...).unwrap()), and make_dict (remove
dict.set_item(...).unwrap()) to handle insertion errors cleanly instead of
panicking: detect failures from set.add, PyFrozenSet::from_iter, and
dict.set_item and return an appropriate Err(marshal::MarshalError) (or convert
the underlying Python error into a MarshalError) so malformed/unhashable
elements produce a controlled error from make_set/make_frozenset/make_dict
rather than causing a Rust panic.
- Around line 552-580: check_no_code currently skips PySlice, allowing code
objects in slice.start/stop/step to bypass allow_code=false; update
check_no_code to detect PySlice (crate::builtins::PySlice) and recursively call
check_no_code on its start, stop, and step components (similar to how
PyTuple/PyList/PyDict are handled) so that slice(start, stop, step) cannot
smuggle code objects through dumps/loads; ensure you use the slice's accessor
methods consistent with other container handling and keep error behavior
identical when a PyCode is found.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 523278ff-f428-42e4-813f-fb5cc5ab398b
⛔ Files ignored due to path filters (1)
Lib/test/test_marshal.pyis excluded by!Lib/**
📒 Files selected for processing (7)
crates/compiler-core/src/bytecode.rscrates/compiler-core/src/bytecode/oparg.rscrates/compiler-core/src/marshal.rscrates/compiler-core/src/varint.rscrates/vm/src/builtins/code.rscrates/vm/src/stdlib/marshal.rscrates/vm/src/types/slot.rs
| /// Disable adaptive specialization by setting all counters to unreachable. | ||
| /// Used for CPython-compiled bytecode where specialization may not be safe. | ||
| pub fn disable_specialization(&self) { | ||
| for counter in self.adaptive_counters.iter() { | ||
| counter.store(UNREACHABLE_BACKOFF, Ordering::Relaxed); | ||
| } | ||
| } |
There was a problem hiding this comment.
disable_specialization() is undone by the next quicken().
This only writes UNREACHABLE_BACKOFF into adaptive_counters, but CodeUnits::quicken() later reinitializes every cacheable site unconditionally. A code object marked non-specializable before first execution will still get warmup counters at RESUME.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/compiler-core/src/bytecode.rs` around lines 570 - 576, The current
disable_specialization() only writes UNREACHABLE_BACKOFF into adaptive_counters
but CodeUnits::quicken() later unconditionally reinitializes cacheable sites,
undoing the disable; update the implementation so that quicken() respects the
non-specializable state: either add a persistent flag on the code object (e.g.,
non_specializable) set by disable_specialization() and check that flag in
CodeUnits::quicken() before initializing warmup counters at RESUME, or have
quicken() inspect each adaptive counter and skip reinitialization when the
counter already equals UNREACHABLE_BACKOFF; ensure you reference
disable_specialization, adaptive_counters, UNREACHABLE_BACKOFF, and
CodeUnits::quicken() in your changes so the disable persists across quicken().
| /// `SET_FUNCTION_ATTRIBUTE` flags. | ||
| /// Bitmask: Defaults=0x01, KwOnly=0x02, Annotations=0x04, | ||
| /// Closure=0x08, TypeParams=0x10, Annotate=0x20. | ||
| /// Stored as bit position (0-5) by `bitflag!` macro. | ||
| #[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)] |
There was a problem hiding this comment.
Split bit-position decoding from CPython bitmask decoding.
SET_FUNCTION_ATTRIBUTE has two incompatible encodings here, but TryFrom<u32> cannot disambiguate 1, 2, or 4: the direct u8 path maps them to bit positions before the power-of-two branch runs. From<MakeFunctionFlag> for u32 still emits positions as well, so marshaled code can round-trip the wrong flag in both directions.
Also applies to: 410-427
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/compiler-core/src/bytecode/oparg.rs` around lines 385 - 389, The
current encoding mixes bit positions and CPython power-of-two bitmasks, causing
ambiguity in TryFrom<u32> and in From<MakeFunctionFlag> for u32; change the
conversions so TryFrom<u32> decodes CPython-style masks (treat input as bitmask
and match 1,2,4,8,16,32 specifically, returning Err for non-power-of-two or
unknown values) and make From<MakeFunctionFlag> for u32 emit the CPython bitmask
(1 << position) rather than the raw position; if you need to support
position-based values keep or add a distinct TryFrom<u8> (or a helper like
from_position) to decode stored positions, and adjust any uses of the bitflag!
macro or internal storage to consistently treat the enum as representing a
position with conversions handling mask<->position via (1 << position). Ensure
unique symbols updated: TryFrom<u32> impl for MakeFunctionFlag,
From<MakeFunctionFlag> for u32, and any helper from_position/from_mask
functions.
| /// Read a marshal bytes object (TYPE_STRING = b's'). | ||
| fn read_marshal_bytes<R: Read>(rdr: &mut R) -> Result<Vec<u8>> { | ||
| let type_byte = rdr.read_u8()? & !FLAG_REF; | ||
| if type_byte != Type::Bytes as u8 { | ||
| return Err(MarshalError::BadType); | ||
| } | ||
| let len = rdr.read_u32()?; | ||
| Ok(rdr.read_slice(len)?.to_vec()) | ||
| } | ||
|
|
||
| /// Read a marshal string object. | ||
| fn read_marshal_str<R: Read>(rdr: &mut R) -> Result<alloc::string::String> { | ||
| let type_byte = rdr.read_u8()? & !FLAG_REF; | ||
| let s = match type_byte { | ||
| b'u' | b't' | b'a' | b'A' => { | ||
| let len = rdr.read_u32()?; | ||
| rdr.read_str(len)? | ||
| } | ||
| b'z' | b'Z' => { | ||
| let len = rdr.read_u8()? as u32; | ||
| rdr.read_str(len)? | ||
| } | ||
| _ => return Err(MarshalError::BadType), | ||
| }; | ||
| Ok(alloc::string::String::from(s)) | ||
| } | ||
|
|
||
| /// Read a marshal tuple of strings, returning owned Strings. | ||
| fn read_marshal_str_vec<R: Read>(rdr: &mut R) -> Result<Vec<alloc::string::String>> { | ||
| let type_byte = rdr.read_u8()? & !FLAG_REF; | ||
| let n = match type_byte { | ||
| b'(' => rdr.read_u32()? as usize, | ||
| b')' => rdr.read_u8()? as usize, | ||
| _ => return Err(MarshalError::BadType), | ||
| }; | ||
| (0..n).map(|_| read_marshal_str(rdr)).collect() | ||
| } | ||
|
|
||
| fn read_marshal_name_tuple<R: Read, Bag: ConstantBag>( | ||
| rdr: &mut R, | ||
| bag: &Bag, | ||
| ) -> Result<Box<[<Bag::Constant as Constant>::Name]>> { | ||
| let type_byte = rdr.read_u8()? & !FLAG_REF; | ||
| let n = match type_byte { | ||
| b'(' => rdr.read_u32()? as usize, | ||
| b')' => rdr.read_u8()? as usize, | ||
| _ => return Err(MarshalError::BadType), | ||
| }; | ||
| (0..n) | ||
| .map(|_| Ok(bag.make_name(&read_marshal_str(rdr)?))) | ||
| .collect::<Result<Vec<_>>>() | ||
| .map(Vec::into_boxed_slice) | ||
| } | ||
|
|
||
| /// Read a marshal tuple of constants. | ||
| fn read_marshal_const_tuple<R: Read, Bag: ConstantBag>( | ||
| rdr: &mut R, | ||
| bag: Bag, | ||
| ) -> Result<Constants<Bag::Constant>> { | ||
| let type_byte = rdr.read_u8()? & !FLAG_REF; | ||
| let n = match type_byte { | ||
| b'(' => rdr.read_u32()? as usize, | ||
| b')' => rdr.read_u8()? as usize, | ||
| _ => return Err(MarshalError::BadType), | ||
| }; | ||
| (0..n).map(|_| deserialize_value(rdr, bag)).collect() | ||
| } |
There was a problem hiding this comment.
Code-object field readers are bypassing the marshal ref table.
These helpers strip FLAG_REF without reserving a slot, and read_marshal_const_tuple() deserializes each constant with a fresh refs vector. In v3+ streams, ref indices are global across the whole code object, so valid TYPE_REFs inside co_consts / co_names will misdecode or shift every later reference.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/compiler-core/src/marshal.rs` around lines 279 - 345, The helpers
read_marshal_bytes, read_marshal_str, read_marshal_str_vec,
read_marshal_name_tuple and read_marshal_const_tuple are stripping FLAG_REF and
not reserving/referring into the global marshal ref table, and
read_marshal_const_tuple calls deserialize_value in a way that builds a fresh
refs vector for each element; this breaks v3+ streams where TYPE_REF indices are
global. Fix by making these readers handle FLAG_REF correctly: when reading the
element type byte check for FLAG_REF and if present reserve a slot in the shared
refs table before deserializing and use the reserved index for ref entries
instead of stripping the flag; propagate a mutable refs container into
read_marshal_* functions (and into deserialize_value) so tuples use the same
refs vector for all elements; remove any local per-element refs creation in
read_marshal_const_tuple and instead pass the shared refs through each call so
TYPE_REFs decode consistently across the whole code object. Ensure functions
read_marshal_str_vec, read_marshal_name_tuple and read_marshal_const_tuple
accept &mut refs (or a context that owns it) and that FLAG_REF handling is
centralized where type_byte is consumed.
| fn deserialize_value_depth<R: Read, Bag: MarshalBag>( | ||
| rdr: &mut R, | ||
| bag: Bag, | ||
| depth: usize, | ||
| refs: &mut Vec<Option<Bag::Value>>, | ||
| ) -> Result<Bag::Value> { | ||
| if depth == 0 { | ||
| return Err(MarshalError::InvalidBytecode); | ||
| } | ||
| let raw = rdr.read_u8()?; | ||
| let flag = raw & FLAG_REF != 0; | ||
| let type_code = raw & !FLAG_REF; | ||
|
|
||
| // TYPE_REF: return previously stored object | ||
| if type_code == Type::Ref as u8 { | ||
| let idx = rdr.read_u32()? as usize; | ||
| return refs | ||
| .get(idx) | ||
| .and_then(|v| v.clone()) | ||
| .ok_or(MarshalError::InvalidBytecode); | ||
| } | ||
|
|
||
| // Reserve ref slot before reading (matches write order) | ||
| let slot = if flag { | ||
| let idx = refs.len(); | ||
| refs.push(None); | ||
| Some(idx) | ||
| } else { | ||
| None | ||
| }; | ||
|
|
||
| let typ = Type::try_from(type_code)?; | ||
| let value = deserialize_value_typed(rdr, bag, depth, refs, typ)?; | ||
|
|
||
| if let Some(idx) = slot { | ||
| refs[idx] = Some(value.clone()); | ||
| } | ||
| Ok(value) |
There was a problem hiding this comment.
Reserved refs cannot resolve recursive containers yet.
A flagged container gets a None placeholder here, but the slot is only populated after the whole value is built. If a TYPE_REF points back to that container while its contents are still being read, the lookup hits None and fails with InvalidBytecode. The new stdlib writer already emits those backrefs for recursive lists/dicts/sets, so dumps() and loads() will not round-trip them.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/compiler-core/src/marshal.rs` around lines 481 - 518,
deserialize_value_depth currently reserves a None placeholder for flagged
containers, but only sets refs[idx] after deserialize_value_typed returns, so
recursive backrefs to a container being built see None and fail; fix by
reserving and inserting a live placeholder value into refs before calling
deserialize_value_typed so TYPE_REF can resolve to the in-progress container.
Concretely, modify deserialize_value_depth to create an empty/container
placeholder via the Bag API (or add a Bag::make_placeholder/mutable container
constructor) and store Some(placeholder) at refs[idx] prior to calling
deserialize_value_typed, then have deserialize_value_typed fill/populate that
placeholder in-place (or accept the slot index to mutate refs[idx]) instead of
replacing it after the fact; ensure Type::try_from and the slot handling logic
in deserialize_value_typed support in-place population for lists/dicts/sets to
allow recursive references to resolve.
| /// Read a big-endian varint with start marker (0x80). | ||
| pub fn read_varint_with_start(data: &[u8], pos: &mut usize) -> Option<u32> { | ||
| if *pos >= data.len() { | ||
| return None; | ||
| } | ||
| let first = data[*pos]; | ||
| if first & 0x80 == 0 { | ||
| return None; // Not a start byte | ||
| return None; | ||
| } | ||
| *pos += 1; | ||
|
|
||
| let mut val = (first & 0x3f) as u32; | ||
| let mut shift = 6; | ||
| let mut has_continuation = first & 0x40 != 0; | ||
|
|
||
| while has_continuation && *pos < data.len() { | ||
| let byte = data[*pos]; | ||
| if byte & 0x80 != 0 { | ||
| break; // Next entry start | ||
| } | ||
| let mut cont = first & 0x40 != 0; | ||
| while cont && *pos < data.len() { | ||
| let b = data[*pos]; | ||
| *pos += 1; | ||
| val |= ((byte & 0x3f) as u32) << shift; | ||
| shift += 6; | ||
| has_continuation = byte & 0x40 != 0; | ||
| val = (val << 6) | (b & 0x3f) as u32; | ||
| cont = b & 0x40 != 0; | ||
| } | ||
| Some(val) |
There was a problem hiding this comment.
Don't accept unterminated varints as complete values.
Both readers return Some(val) even if the buffer ends while the continuation bit is still set on the last byte. That silently turns malformed exception tables into shorter numbers instead of signaling failure.
Also applies to: 83-99
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/compiler-core/src/varint.rs` around lines 62 - 80,
read_varint_with_start (and the other varint reader) currently returns Some(val)
even when the input ends while the continuation bit is still set; update the
loop exit logic to detect an unterminated varint and return None in that case.
Concretely: after the while loop in read_varint_with_start (and analogously in
the other reader), if cont is still true (meaning we exited because *pos >=
data.len()), return None instead of Some(val); otherwise return Some(val). This
ensures malformed/unterminated varints are treated as failures.
| fn make_list( | ||
| &self, | ||
| it: impl Iterator<Item = Self::Value>, | ||
| ) -> Result<Self::Value, marshal::MarshalError> { | ||
| Ok(self.0.ctx.new_list(it.collect()).into()) | ||
| } | ||
|
|
||
| fn make_set( | ||
| &self, | ||
| it: impl Iterator<Item = Self::Value>, | ||
| ) -> Result<Self::Value, marshal::MarshalError> { | ||
| let vm = self.0; | ||
| let set = PySet::default().into_ref(&vm.ctx); | ||
| let set = PySet::default().into_ref(&self.0.ctx); | ||
| for elem in it { | ||
| set.add(elem, vm).unwrap() | ||
| set.add(elem, self.0).unwrap() | ||
| } | ||
| Ok(set.into()) | ||
| } | ||
|
|
||
| fn make_frozenset( | ||
| &self, | ||
| it: impl Iterator<Item = Self::Value>, | ||
| ) -> Result<Self::Value, marshal::MarshalError> { | ||
| let vm = self.0; | ||
| Ok(PyFrozenSet::from_iter(vm, it).unwrap().to_pyobject(vm)) | ||
| Ok(PyFrozenSet::from_iter(self.0, it) | ||
| .unwrap() | ||
| .to_pyobject(self.0)) | ||
| } | ||
|
|
||
| fn make_dict( | ||
| &self, | ||
| it: impl Iterator<Item = (Self::Value, Self::Value)>, | ||
| ) -> Result<Self::Value, marshal::MarshalError> { | ||
| let vm = self.0; | ||
| let dict = vm.ctx.new_dict(); | ||
| let dict = self.0.ctx.new_dict(); | ||
| for (k, v) in it { | ||
| dict.set_item(&*k, v, vm).unwrap() | ||
| dict.set_item(&*k, v, self.0).unwrap() | ||
| } | ||
| Ok(dict.into()) | ||
| } | ||
|
|
||
| fn make_slice( | ||
| &self, | ||
| start: Self::Value, | ||
| stop: Self::Value, | ||
| step: Self::Value, | ||
| ) -> Result<Self::Value, marshal::MarshalError> { | ||
| use crate::builtins::PySlice; | ||
| let vm = self.0; | ||
| Ok(PySlice { | ||
| start: if vm.is_none(&start) { | ||
| None | ||
| } else { | ||
| Some(start) | ||
| }, | ||
| stop, | ||
| step: if vm.is_none(&step) { None } else { Some(step) }, | ||
| } | ||
| .into_ref(&vm.ctx) | ||
| .into()) | ||
| } |
There was a problem hiding this comment.
Don't unwrap() container insertions during loads().
A malformed stream can put an unhashable object inside a set/frozenset or use it as a dict key. These unwrap() calls turn that user-controlled input into a Rust panic instead of a Python exception.
Suggested fix
fn make_set(
&self,
it: impl Iterator<Item = Self::Value>,
) -> Result<Self::Value, marshal::MarshalError> {
let set = PySet::default().into_ref(&self.0.ctx);
for elem in it {
- set.add(elem, self.0).unwrap()
+ set.add(elem, self.0)
+ .map_err(|_| marshal::MarshalError::BadType)?;
}
Ok(set.into())
}
@@
fn make_frozenset(
&self,
it: impl Iterator<Item = Self::Value>,
) -> Result<Self::Value, marshal::MarshalError> {
- Ok(PyFrozenSet::from_iter(self.0, it)
- .unwrap()
- .to_pyobject(self.0))
+ Ok(PyFrozenSet::from_iter(self.0, it)
+ .map_err(|_| marshal::MarshalError::BadType)?
+ .to_pyobject(self.0))
}
@@
fn make_dict(
&self,
it: impl Iterator<Item = (Self::Value, Self::Value)>,
) -> Result<Self::Value, marshal::MarshalError> {
let dict = self.0.ctx.new_dict();
for (k, v) in it {
- dict.set_item(&*k, v, self.0).unwrap()
+ dict.set_item(&*k, v, self.0)
+ .map_err(|_| marshal::MarshalError::BadType)?;
}
Ok(dict.into())
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn make_list( | |
| &self, | |
| it: impl Iterator<Item = Self::Value>, | |
| ) -> Result<Self::Value, marshal::MarshalError> { | |
| Ok(self.0.ctx.new_list(it.collect()).into()) | |
| } | |
| fn make_set( | |
| &self, | |
| it: impl Iterator<Item = Self::Value>, | |
| ) -> Result<Self::Value, marshal::MarshalError> { | |
| let vm = self.0; | |
| let set = PySet::default().into_ref(&vm.ctx); | |
| let set = PySet::default().into_ref(&self.0.ctx); | |
| for elem in it { | |
| set.add(elem, vm).unwrap() | |
| set.add(elem, self.0).unwrap() | |
| } | |
| Ok(set.into()) | |
| } | |
| fn make_frozenset( | |
| &self, | |
| it: impl Iterator<Item = Self::Value>, | |
| ) -> Result<Self::Value, marshal::MarshalError> { | |
| let vm = self.0; | |
| Ok(PyFrozenSet::from_iter(vm, it).unwrap().to_pyobject(vm)) | |
| Ok(PyFrozenSet::from_iter(self.0, it) | |
| .unwrap() | |
| .to_pyobject(self.0)) | |
| } | |
| fn make_dict( | |
| &self, | |
| it: impl Iterator<Item = (Self::Value, Self::Value)>, | |
| ) -> Result<Self::Value, marshal::MarshalError> { | |
| let vm = self.0; | |
| let dict = vm.ctx.new_dict(); | |
| let dict = self.0.ctx.new_dict(); | |
| for (k, v) in it { | |
| dict.set_item(&*k, v, vm).unwrap() | |
| dict.set_item(&*k, v, self.0).unwrap() | |
| } | |
| Ok(dict.into()) | |
| } | |
| fn make_slice( | |
| &self, | |
| start: Self::Value, | |
| stop: Self::Value, | |
| step: Self::Value, | |
| ) -> Result<Self::Value, marshal::MarshalError> { | |
| use crate::builtins::PySlice; | |
| let vm = self.0; | |
| Ok(PySlice { | |
| start: if vm.is_none(&start) { | |
| None | |
| } else { | |
| Some(start) | |
| }, | |
| stop, | |
| step: if vm.is_none(&step) { None } else { Some(step) }, | |
| } | |
| .into_ref(&vm.ctx) | |
| .into()) | |
| } | |
| fn make_list( | |
| &self, | |
| it: impl Iterator<Item = Self::Value>, | |
| ) -> Result<Self::Value, marshal::MarshalError> { | |
| Ok(self.0.ctx.new_list(it.collect()).into()) | |
| } | |
| fn make_set( | |
| &self, | |
| it: impl Iterator<Item = Self::Value>, | |
| ) -> Result<Self::Value, marshal::MarshalError> { | |
| let set = PySet::default().into_ref(&self.0.ctx); | |
| for elem in it { | |
| set.add(elem, self.0) | |
| .map_err(|_| marshal::MarshalError::BadType)?; | |
| } | |
| Ok(set.into()) | |
| } | |
| fn make_frozenset( | |
| &self, | |
| it: impl Iterator<Item = Self::Value>, | |
| ) -> Result<Self::Value, marshal::MarshalError> { | |
| Ok(PyFrozenSet::from_iter(self.0, it) | |
| .map_err(|_| marshal::MarshalError::BadType)? | |
| .to_pyobject(self.0)) | |
| } | |
| fn make_dict( | |
| &self, | |
| it: impl Iterator<Item = (Self::Value, Self::Value)>, | |
| ) -> Result<Self::Value, marshal::MarshalError> { | |
| let dict = self.0.ctx.new_dict(); | |
| for (k, v) in it { | |
| dict.set_item(&*k, v, self.0) | |
| .map_err(|_| marshal::MarshalError::BadType)?; | |
| } | |
| Ok(dict.into()) | |
| } | |
| fn make_slice( | |
| &self, | |
| start: Self::Value, | |
| stop: Self::Value, | |
| step: Self::Value, | |
| ) -> Result<Self::Value, marshal::MarshalError> { | |
| use crate::builtins::PySlice; | |
| let vm = self.0; | |
| Ok(PySlice { | |
| start: if vm.is_none(&start) { | |
| None | |
| } else { | |
| Some(start) | |
| }, | |
| stop, | |
| step: if vm.is_none(&step) { None } else { Some(step) }, | |
| } | |
| .into_ref(&vm.ctx) | |
| .into()) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/stdlib/marshal.rs` around lines 420 - 473, The code is calling
unwrap() on container insertions which can panic on malformed input; update
make_set (remove set.add(...).unwrap()), make_frozenset (remove
PyFrozenSet::from_iter(...).unwrap()), and make_dict (remove
dict.set_item(...).unwrap()) to handle insertion errors cleanly instead of
panicking: detect failures from set.add, PyFrozenSet::from_iter, and
dict.set_item and return an appropriate Err(marshal::MarshalError) (or convert
the underlying Python error into a MarshalError) so malformed/unhashable
elements produce a controlled error from make_set/make_frozenset/make_dict
rather than causing a Rust panic.
| fn load(args: LoadArgs, vm: &VirtualMachine) -> PyResult<PyObjectRef> { | ||
| // Read from file object into a buffer, one object at a time. | ||
| // We read all available data, deserialize one object, then seek | ||
| // back to just after the consumed bytes. | ||
| let tell_before = vm | ||
| .call_method(&args.f, "tell", ())? | ||
| .try_into_value::<i64>(vm)?; | ||
| let read_res = vm.call_method(&args.f, "read", ())?; | ||
| let bytes = ArgBytesLike::try_from_object(vm, read_res)?; | ||
| let buf = bytes.borrow_buf(); | ||
|
|
||
| let mut rdr: &[u8] = &buf; | ||
| let len_before = rdr.len(); | ||
| let result = | ||
| marshal::deserialize_value(&mut rdr, PyMarshalBag(vm)).map_err(|e| match e { | ||
| marshal::MarshalError::Eof => vm.new_exception_msg( | ||
| vm.ctx.exceptions.eof_error.to_owned(), | ||
| "marshal data too short".into(), | ||
| ), | ||
| _ => vm.new_value_error("bad marshal data"), | ||
| })?; | ||
| let consumed = len_before - rdr.len(); | ||
|
|
||
| // Seek file to just after the consumed bytes | ||
| let new_pos = tell_before + consumed as i64; | ||
| vm.call_method(&args.f, "seek", (new_pos,))?; | ||
|
|
||
| if !args.allow_code { | ||
| check_no_code(&result, vm)?; | ||
| } | ||
| Ok(result) |
There was a problem hiding this comment.
load() should not read to EOF just to parse one object.
Calling read() with no size slurps the entire remainder of the stream, can block indefinitely on non-EOF sources, and forces tell() / seek() on file-likes that would otherwise be readable. marshal.load() only needs a streaming reader for one value.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/stdlib/marshal.rs` around lines 519 - 549, The load function
currently calls read() with no size which slurps the rest of the stream; change
load to read only as much as the deserializer needs by feeding a streaming
buffer to marshal::deserialize_value instead of pulling the entire file: replace
the ArgBytesLike::try_from_object(read_res) / buf approach with a small-read
loop (using the file object's read(size) or readinto-like API) that appends
bytes to a Vec<u8> and repeatedly calls deserialize_value on a &[u8] slice when
more data is required, only calling tell/seek after you know how many bytes were
consumed; keep the existing error mapping and check_no_code(&result, vm) logic
and reference the same symbols (load, marshal::deserialize_value, ArgBytesLike,
tell/seek, check_no_code) when making the change.
| /// Reject subclasses of marshallable types (int, float, complex, tuple, etc.). | ||
| /// Recursively check that no code objects are present. | ||
| fn check_no_code(obj: &PyObjectRef, vm: &VirtualMachine) -> PyResult<()> { | ||
| if obj.downcast_ref::<PyCode>().is_some() { | ||
| return Err(vm.new_value_error("unmarshalling code objects is disallowed".to_string())); | ||
| } | ||
| if let Some(tup) = obj.downcast_ref::<PyTuple>() { | ||
| for elem in tup.as_slice() { | ||
| check_no_code(elem, vm)?; | ||
| } | ||
| marshal::MarshalError::InvalidUtf8 => { | ||
| vm.new_value_error("invalid utf8 in marshalled string") | ||
| } else if let Some(list) = obj.downcast_ref::<PyList>() { | ||
| for elem in list.borrow_vec().iter() { | ||
| check_no_code(elem, vm)?; | ||
| } | ||
| marshal::MarshalError::InvalidLocation => { | ||
| vm.new_value_error("invalid location in marshalled object") | ||
| } else if let Some(set) = obj.downcast_ref::<PySet>() { | ||
| for elem in set.elements() { | ||
| check_no_code(&elem, vm)?; | ||
| } | ||
| marshal::MarshalError::BadType => { | ||
| vm.new_value_error("bad marshal data (unknown type code)") | ||
| } else if let Some(fset) = obj.downcast_ref::<PyFrozenSet>() { | ||
| for elem in fset.elements() { | ||
| check_no_code(&elem, vm)?; | ||
| } | ||
| }) | ||
| } else if let Some(dict) = obj.downcast_ref::<PyDict>() { | ||
| for (k, v) in dict.into_iter() { | ||
| check_no_code(&k, vm)?; | ||
| check_no_code(&v, vm)?; | ||
| } | ||
| } | ||
| Ok(()) |
There was a problem hiding this comment.
allow_code = false is bypassed through slice members.
check_no_code() now walks the common containers, but it never inspects crate::builtins::PySlice even though write_object_depth() serializes start, stop, and step recursively. slice(code_obj, None, None) will still smuggle a code object through both dumps() and loads().
Suggested fix
} else if let Some(dict) = obj.downcast_ref::<PyDict>() {
for (k, v) in dict.into_iter() {
check_no_code(&k, vm)?;
check_no_code(&v, vm)?;
}
+ } else if let Some(slice) = obj.downcast_ref::<crate::builtins::PySlice>() {
+ if let Some(start) = slice.start.as_ref() {
+ check_no_code(start, vm)?;
+ }
+ check_no_code(&slice.stop, vm)?;
+ if let Some(step) = slice.step.as_ref() {
+ check_no_code(step, vm)?;
+ }
}
Ok(())
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/stdlib/marshal.rs` around lines 552 - 580, check_no_code
currently skips PySlice, allowing code objects in slice.start/stop/step to
bypass allow_code=false; update check_no_code to detect PySlice
(crate::builtins::PySlice) and recursively call check_no_code on its start,
stop, and step components (similar to how PyTuple/PyList/PyDict are handled) so
that slice(start, stop, step) cannot smuggle code objects through dumps/loads;
ensure you use the slice's accessor methods consistent with other container
handling and keep error behavior identical when a PyCode is found.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
crates/compiler-core/src/marshal.rs (3)
921-934:⚠️ Potential issue | 🟠 MajorRebuilding
co_localspluskindshere still drops hidden flags.This reconstruction only emits
LOCAL/CELL/FREE, so metadata already preserved incode.localspluskinds—for exampleCO_FAST_HIDDEN—is discarded on the next marshal write. Since Line 252 now keeps the original bytes, serialize those bytes back out here instead of regenerating the mask.💡 Minimal fix
- let total_lp = code.varnames.len() + cell_only_names.len() + code.freevars.len(); - let mut kinds = Vec::with_capacity(total_lp); - for v in code.varnames.iter() { - let mut k: u8 = CO_FAST_LOCAL; - if local_cell_set.contains(v.as_ref()) { - k |= CO_FAST_CELL; - } - kinds.push(k); - } - kinds.extend(core::iter::repeat_n(CO_FAST_CELL, cell_only_names.len())); - kinds.extend(core::iter::repeat_n(CO_FAST_FREE, code.freevars.len())); buf.write_u8(Type::Bytes as u8); - write_vec(buf, &kinds); + debug_assert_eq!(total_lp_count, code.localspluskinds.len()); + write_vec(buf, &code.localspluskinds);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/marshal.rs` around lines 921 - 934, The current reconstruction of locals+kind masks (loop over code.varnames, extend repeats for cell_only_names and code.freevars) discards hidden/other flags present in the original code.localspluskinds; instead of regenerating the mask, emit the preserved byte sequence saved earlier (code.localspluskinds) back into the buffer — replace the creation/population of kinds and the write_vec(&kinds) with writing the original code.localspluskinds bytes via the same Type::Bytes/write_vec path so hidden flags (e.g. CO_FAST_HIDDEN) are retained.
502-516:⚠️ Potential issue | 🔴 CriticalReserved slots still cannot satisfy backrefs to in-progress containers.
refs.push(None)reserves the index, but the slot is only populated afterdeserialize_value_typed()returns. If a list/dict/set/frozenset contains aTYPE_REFback to itself while it is being built, the lookup still seesNoneand fails withInvalidBytecode, so recursive containers will not round-trip. This needs a live placeholder inserted before descending and then filled in place.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/marshal.rs` around lines 502 - 516, The reserved-slot approach pushes None then fills it after deserialize_value_typed, which still prevents TYPE_REF lookups during recursive descent; change this to insert a live placeholder into refs (when flag is true) before calling deserialize_value_typed so backrefs can resolve to the in-progress container, then after deserialization replace that placeholder in-place with the final value (avoid using a cloned value for the slot so backrefs point to the same instance). Update the logic around refs, slot, and the call to deserialize_value_typed to create/fill a placeholder per container and ensure the final assignment mutates refs[idx] to the completed value rather than leaving a None or a clone.
278-344:⚠️ Potential issue | 🔴 CriticalStill decoding code-object fields outside the shared marshal ref table.
These helpers all clear
FLAG_REFlocally, so flagged field values never reserve a global ref slot, andread_marshal_const_tuple()recreatesrefsfor each element viadeserialize_value(rdr, bag). In a valid v3+ stream,TYPE_REFs insideco_consts,co_names, or later code-object fields will then read the wrong slot and desynchronize every following reference. Thread the caller’s sharedrefsstate throughdeserialize_code()and these helpers instead of consuming the type byte here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/marshal.rs` around lines 278 - 344, The helpers read_marshal_bytes, read_marshal_str, read_marshal_str_vec, read_marshal_name_tuple and read_marshal_const_tuple are clearing FLAG_REF and consuming the type byte locally, so flagged objects never reserve entries in the shared refs table and subsequent TYPE_REFs get desynchronized; thread the caller's shared refs state through deserialize_code and these helper functions (add a &mut refs parameter or equivalent to their signatures), stop stripping FLAG_REF or consuming the type byte inside these helpers (let deserialize_code/read-dispatch handle the initial type/FLAG_REF logic and reserve ref slots), and make deserialize_value use the shared &mut refs (and not recreate refs or take Bag by value) so all co_consts/co_names/code-object fields use the same global refs table.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 606-637: In the Type::Dict branch the code peels raw and calls
deserialize_value_typed(rdr, bag, d, refs, key_type) which bypasses the depth
guard and can underflow when depth == 1; change the key-path to use the same
guarded entry used for values by calling deserialize_value_depth(rdr, bag, d,
refs) (or, if you must keep typed deserialization, explicitly check if d == 0
and reject composite key types like tuples/frozensets) so that depth is not
decremented twice; update the logic around reading FLAG_REF and refs (the
key_slot handling and refs[idx] assignment) to work with the
deserialize_value_depth flow and preserve the existing ref semantics.
---
Duplicate comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 921-934: The current reconstruction of locals+kind masks (loop
over code.varnames, extend repeats for cell_only_names and code.freevars)
discards hidden/other flags present in the original code.localspluskinds;
instead of regenerating the mask, emit the preserved byte sequence saved earlier
(code.localspluskinds) back into the buffer — replace the creation/population of
kinds and the write_vec(&kinds) with writing the original code.localspluskinds
bytes via the same Type::Bytes/write_vec path so hidden flags (e.g.
CO_FAST_HIDDEN) are retained.
- Around line 502-516: The reserved-slot approach pushes None then fills it
after deserialize_value_typed, which still prevents TYPE_REF lookups during
recursive descent; change this to insert a live placeholder into refs (when flag
is true) before calling deserialize_value_typed so backrefs can resolve to the
in-progress container, then after deserialization replace that placeholder
in-place with the final value (avoid using a cloned value for the slot so
backrefs point to the same instance). Update the logic around refs, slot, and
the call to deserialize_value_typed to create/fill a placeholder per container
and ensure the final assignment mutates refs[idx] to the completed value rather
than leaving a None or a clone.
- Around line 278-344: The helpers read_marshal_bytes, read_marshal_str,
read_marshal_str_vec, read_marshal_name_tuple and read_marshal_const_tuple are
clearing FLAG_REF and consuming the type byte locally, so flagged objects never
reserve entries in the shared refs table and subsequent TYPE_REFs get
desynchronized; thread the caller's shared refs state through deserialize_code
and these helper functions (add a &mut refs parameter or equivalent to their
signatures), stop stripping FLAG_REF or consuming the type byte inside these
helpers (let deserialize_code/read-dispatch handle the initial type/FLAG_REF
logic and reserve ref slots), and make deserialize_value use the shared &mut
refs (and not recreate refs or take Bag by value) so all
co_consts/co_names/code-object fields use the same global refs table.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 16fbaa0f-e75b-46d4-8cec-fce352c0476f
📒 Files selected for processing (2)
crates/compiler-core/src/marshal.rscrates/compiler-core/src/varint.rs
✅ Files skipped from review due to trivial changes (1)
- crates/compiler-core/src/varint.rs
Unify marshal to a single CPython-compatible format. No separate "cpython_marshal" reader — one format for frozen modules, .pyc files, and the Python-level marshal module. - ComparisonOperator: `(cmp_index << 5) | mask` matching COMPARE_OP - MakeFunctionFlag: bit-position matching SET_FUNCTION_ATTRIBUTE - Exception table varint: big-endian (matching Python/assemble.c) - Linetable varint: little-endian (unchanged) - Integer: TYPE_INT (i32) / TYPE_LONG (base-2^15 digits) - Code objects: CPython field order (argcount, posonlyargcount, ..., co_localsplusnames, co_localspluskinds, ..., co_exceptiontable) - FLAG_REF / TYPE_REF for object deduplication (version >= 3) - allow_code keyword argument on dumps/loads/dump/load - Subclass rejection (int/float/complex/tuple/list/dict/set/frozenset) - Slice serialization (version >= 5) - Buffer protocol fallback for memoryview/array - Recursion depth limit (2000) for both reads and writes - Streaming load (reads one object, seeks file position) - TYPE_INT64, TYPE_FLOAT (text), TYPE_COMPLEX (text) for compat serialize_code writes co_localsplusnames/co_localspluskinds from split varnames/cellvars/freevars. deserialize_code splits them back. Cell variable DEREF indices are translated between flat (wire) and cell-relative (internal) representations in both directions. Replace bitwise trick with match for new ComparisonOperator values. 21 -> 3 expected failures. Remaining: test_bad_reader (IO layer), test_deterministic_sets (PYTHONHASHSEED), testIntern (string interning).
- Use original localspluskinds from marshal data instead of rebuilding, preserving CO_FAST_HIDDEN and other flags - Fix write_varint_be to handle values >= 2^30 (add 6th chunk) - Remove unused build_localspluskinds_from_split
Prevents usize underflow when dict key deserialization path calls deserialize_value_typed with depth=0 on composite types.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
crates/compiler-core/src/varint.rs (1)
121-129: Consider adding edge-case test values.The roundtrip test covers common values but doesn't test the upper boundary. Adding
u32::MAXand values near1 << 30would validate the full 32-bit range handling.💡 Suggested addition
- for &val in &[0u32, 1, 63, 64, 127, 128, 4095, 4096, 1_000_000] { + for &val in &[0u32, 1, 63, 64, 127, 128, 4095, 4096, 1_000_000, 1 << 30, u32::MAX] {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/varint.rs` around lines 121 - 129, Update the test_be_varint_roundtrip test to include edge-case u32 values such as u32::MAX and values around 1 << 30 (for example (1 << 30) - 1 and 1 << 30) so the big-endian varint encoder/decoder (write_varint_be and read_varint) is validated across the full 32-bit boundary; add those values to the list iterated in the test and keep the same round-trip assertions (assert_eq!(read_varint(&buf, &mut pos), Some(val)) and assert_eq!(pos, buf.len())).crates/compiler-core/src/marshal.rs (1)
609-644: Dict deserialization logic is correct but duplicates ref handling.The key-path (lines 619-638) manually handles
TYPE_REFandFLAG_REFinstead of usingdeserialize_value_depth. While functionally correct, this duplicates the ref-handling logic. The depth guard is properly maintained since both keys and values used = depth - 1.💡 Consider using deserialize_value_depth for keys
This would reduce code duplication:
loop { // Peek type byte for null check let raw = rdr.read_u8()?; if raw & !FLAG_REF == b'0' { break; } // Rewind or use a different approach to let deserialize_value_depth handle the byte // Alternatively, accept the current implementation as-is since it works }Since rewinding isn't straightforward with the current
Readtrait, the current approach is acceptable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/marshal.rs` around lines 609 - 644, The Dict deserialization duplicates reference-handling logic for keys; refactor the key-path to let deserialize_value_depth handle both normal and TYPE_REF cases by peeking the next type byte instead of consuming it: replace the read_u8() call with a non-consuming peek using BufRead::fill_buf (or wrap rdr in a BufReader) to check for the terminator (raw & !FLAG_REF == b'0'), and then call deserialize_value_depth(rdr, bag, d, refs, key_type) (or the existing deserialize_value_depth overload) to read the key so ref bookkeeping is centralized; update the function signature or wrap the reader as needed so rdr supports BufRead and remove the manual TYPE_REF / FLAG_REF branch in the key handling block.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 609-644: The Dict deserialization duplicates reference-handling
logic for keys; refactor the key-path to let deserialize_value_depth handle both
normal and TYPE_REF cases by peeking the next type byte instead of consuming it:
replace the read_u8() call with a non-consuming peek using BufRead::fill_buf (or
wrap rdr in a BufReader) to check for the terminator (raw & !FLAG_REF == b'0'),
and then call deserialize_value_depth(rdr, bag, d, refs, key_type) (or the
existing deserialize_value_depth overload) to read the key so ref bookkeeping is
centralized; update the function signature or wrap the reader as needed so rdr
supports BufRead and remove the manual TYPE_REF / FLAG_REF branch in the key
handling block.
In `@crates/compiler-core/src/varint.rs`:
- Around line 121-129: Update the test_be_varint_roundtrip test to include
edge-case u32 values such as u32::MAX and values around 1 << 30 (for example (1
<< 30) - 1 and 1 << 30) so the big-endian varint encoder/decoder
(write_varint_be and read_varint) is validated across the full 32-bit boundary;
add those values to the list iterated in the test and keep the same round-trip
assertions (assert_eq!(read_varint(&buf, &mut pos), Some(val)) and
assert_eq!(pos, buf.len())).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 0cc38ce0-d090-4c65-9b73-110903f04d03
⛔ Files ignored due to path filters (1)
Lib/test/test_marshal.pyis excluded by!Lib/**
📒 Files selected for processing (7)
crates/compiler-core/src/bytecode.rscrates/compiler-core/src/bytecode/oparg.rscrates/compiler-core/src/marshal.rscrates/compiler-core/src/varint.rscrates/vm/src/builtins/code.rscrates/vm/src/stdlib/marshal.rscrates/vm/src/types/slot.rs
✅ Files skipped from review due to trivial changes (1)
- crates/vm/src/stdlib/marshal.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/compiler-core/src/bytecode.rs
- crates/vm/src/builtins/code.rs
Summary by CodeRabbit
New Features
Improvements