Skip to content

marshal#7467

Open
youknowone wants to merge 3 commits intoRustPython:mainfrom
youknowone:marshal
Open

marshal#7467
youknowone wants to merge 3 commits intoRustPython:mainfrom
youknowone:marshal

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Mar 19, 2026

Summary by CodeRabbit

  • New Features

    • Marshal APIs now accept an allow_code option and updated load/dump signatures; file-load returns the exact consumed bytes position.
  • Improvements

    • Broader CPython marshal type support, depth limits and reference tracking for safer (de)serialization.
    • Exception-table and varint encodings updated for more robust bytecode layout.
    • Comparison evaluation made more explicit and predictable.
    • Adaptive specialization behavior hardened for stability.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] test: cpython/Lib/test/test_marshal.py (TODO: 7)

dependencies:

dependent tests: (21 tests)

  • marshal: test_bool test_exceptions test_importlib test_inspect test_marshal test_zipimport
    • importlib._bootstrap_external: test_importlib test_unittest
      • modulefinder: test_importlib test_modulefinder
      • py_compile: test_argparse test_cmd_line_script test_compileall test_importlib test_multiprocessing_main_handling test_py_compile test_pydoc test_runpy
      • pydoc: test_enum
    • pkgutil: test_pkgutil
    • zipimport: test_importlib

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

Refactors varint/exception-table encoding, rewrites marshal serialization/deserialization with depth/ref tracking and CPython field ordering, reworks oparg enums (MakeFunctionFlag, ComparisonOperator), adds CodeUnits specialization-disable, and small VM API/trait adjustments.

Changes

Cohort / File(s) Summary
Varint encoding
crates/compiler-core/src/varint.rs
Added big-endian varint writer write_varint_be; write_varint_with_start now uses BE + start bit; read/write semantics updated (BE for exception tables, LE retained for linetable writes).
Bytecode core
crates/compiler-core/src/bytecode.rs
Exception-table fields serialized with BE varints; Constant trait now requires Clone; added pub fn disable_specialization(&self) to set adaptive counters to UNREACHABLE_BACKOFF.
Bytecode oparg types
crates/compiler-core/src/bytecode/oparg.rs
Renumbered MakeFunctionFlag discriminants and changed decoding to accept either direct u8 or power-of-two bitmasks; replaced macro bit-pattern ComparisonOperator with explicit enum and new encode/decode (TryFrom/From) using cmp_index << 5 layout.
Marshal core (compiler)
crates/compiler-core/src/marshal.rs
Large marshal rewrite: enable many CPython marshal tags; CPython field-order code read/write; added depth-bounded deserialization, FLAG_REF/ref table support, split_localplus, linetable_to_locations, new helpers, and extended Dumpable/MarshalBag traits and serialization variants.
VM marshal stdlib
crates/vm/src/stdlib/marshal.rs
API switched to arg-structs (dumps/dump/loads/load) with allow_code; removed Dumpable path and implemented custom writer with depth limits, reference-tracking for marshal ≥3, check_no_code/check_exact_type, and file-backed load behavior.
VM builtins / small change
crates/vm/src/builtins/code.rs
Added impl From<PyObjectRef> for Literal to allow direct conversion of VM objects into Literal.
Comparison semantics
crates/vm/src/types/slot.rs
Rewrote PyComparisonOp::eval_ord from bitmask logic to explicit per-variant comparisons against Ordering.

Sequence Diagram(s)

sequenceDiagram
    participant Py as Python code (call site)
    participant VM as VM stdlib marshal
    participant Core as compiler-core marshal
    participant File as file-like object
    Py->>VM: marshal.dumps/dump(args)
    VM->>Core: serialize_value/write_object (allow_code, depth/ref args)
    Core-->>VM: bytes (marshal format, FLAG_REF entries)
    VM->>File: write(bytes) / seek after load
    File-->>VM: read bytes
    VM->>Core: deserialize_value_depth (uses FLAG_REF, MAX_MARSHAL_STACK_DEPTH)
    Core-->>VM: PyObjectRef
    VM-->>Py: return object / bytes
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh
  • coolreader18

Poem

"🐇 I hopped through bytes and tables wide,

Big-endian hops and ref-tables applied.
I muted counters, fixed op flags too,
Marshal threads stitched old and new —
A little rabbit cheers the compile tide!"

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'marshal' is vague and generic, using a single module name without conveying what changes are being made or why. Use a more descriptive title that indicates the specific change, such as 'Refactor marshal serialization/deserialization' or 'Update marshal encoding for CPython compatibility'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the marshal branch 4 times, most recently from 170d8fe to 1a0c0ce Compare March 20, 2026 14:38
@youknowone youknowone closed this Mar 21, 2026
@youknowone youknowone reopened this Mar 21, 2026
@youknowone youknowone marked this pull request as ready for review March 22, 2026 03:50
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 570-576: The current disable_specialization() only writes
UNREACHABLE_BACKOFF into adaptive_counters but CodeUnits::quicken() later
unconditionally reinitializes cacheable sites, undoing the disable; update the
implementation so that quicken() respects the non-specializable state: either
add a persistent flag on the code object (e.g., non_specializable) set by
disable_specialization() and check that flag in CodeUnits::quicken() before
initializing warmup counters at RESUME, or have quicken() inspect each adaptive
counter and skip reinitialization when the counter already equals
UNREACHABLE_BACKOFF; ensure you reference disable_specialization,
adaptive_counters, UNREACHABLE_BACKOFF, and CodeUnits::quicken() in your changes
so the disable persists across quicken().

In `@crates/compiler-core/src/bytecode/oparg.rs`:
- Around line 385-389: The current encoding mixes bit positions and CPython
power-of-two bitmasks, causing ambiguity in TryFrom<u32> and in
From<MakeFunctionFlag> for u32; change the conversions so TryFrom<u32> decodes
CPython-style masks (treat input as bitmask and match 1,2,4,8,16,32
specifically, returning Err for non-power-of-two or unknown values) and make
From<MakeFunctionFlag> for u32 emit the CPython bitmask (1 << position) rather
than the raw position; if you need to support position-based values keep or add
a distinct TryFrom<u8> (or a helper like from_position) to decode stored
positions, and adjust any uses of the bitflag! macro or internal storage to
consistently treat the enum as representing a position with conversions handling
mask<->position via (1 << position). Ensure unique symbols updated: TryFrom<u32>
impl for MakeFunctionFlag, From<MakeFunctionFlag> for u32, and any helper
from_position/from_mask functions.

In `@crates/compiler-core/src/marshal.rs`:
- Around line 236-254: The current code reconstructs localspluskinds using
split_localplus(...) and build_localspluskinds_from_split(...), which strips the
CO_FAST_HIDDEN bit; instead preserve hidden flags by merging them back from the
original localspluskinds: after computing lp via split_localplus and before
assigning localspluskinds for the CodeObject, iterate the original
localspluskinds and reapply CO_FAST_HIDDEN to the corresponding entries in the
rebuilt localspluskinds (matching by index/name), or simply reuse the original
localspluskinds where possible so serialize_code(...) retains CO_FAST_HIDDEN;
update the block around split_localplus, build_localspluskinds_from_split, and
the localspluskinds assignment to perform this merge.
- Around line 481-518: deserialize_value_depth currently reserves a None
placeholder for flagged containers, but only sets refs[idx] after
deserialize_value_typed returns, so recursive backrefs to a container being
built see None and fail; fix by reserving and inserting a live placeholder value
into refs before calling deserialize_value_typed so TYPE_REF can resolve to the
in-progress container. Concretely, modify deserialize_value_depth to create an
empty/container placeholder via the Bag API (or add a
Bag::make_placeholder/mutable container constructor) and store Some(placeholder)
at refs[idx] prior to calling deserialize_value_typed, then have
deserialize_value_typed fill/populate that placeholder in-place (or accept the
slot index to mutate refs[idx]) instead of replacing it after the fact; ensure
Type::try_from and the slot handling logic in deserialize_value_typed support
in-place population for lists/dicts/sets to allow recursive references to
resolve.
- Around line 279-345: The helpers read_marshal_bytes, read_marshal_str,
read_marshal_str_vec, read_marshal_name_tuple and read_marshal_const_tuple are
stripping FLAG_REF and not reserving/referring into the global marshal ref
table, and read_marshal_const_tuple calls deserialize_value in a way that builds
a fresh refs vector for each element; this breaks v3+ streams where TYPE_REF
indices are global. Fix by making these readers handle FLAG_REF correctly: when
reading the element type byte check for FLAG_REF and if present reserve a slot
in the shared refs table before deserializing and use the reserved index for ref
entries instead of stripping the flag; propagate a mutable refs container into
read_marshal_* functions (and into deserialize_value) so tuples use the same
refs vector for all elements; remove any local per-element refs creation in
read_marshal_const_tuple and instead pass the shared refs through each call so
TYPE_REFs decode consistently across the whole code object. Ensure functions
read_marshal_str_vec, read_marshal_name_tuple and read_marshal_const_tuple
accept &mut refs (or a context that owns it) and that FLAG_REF handling is
centralized where type_byte is consumed.

In `@crates/compiler-core/src/varint.rs`:
- Around line 62-80: read_varint_with_start (and the other varint reader)
currently returns Some(val) even when the input ends while the continuation bit
is still set; update the loop exit logic to detect an unterminated varint and
return None in that case. Concretely: after the while loop in
read_varint_with_start (and analogously in the other reader), if cont is still
true (meaning we exited because *pos >= data.len()), return None instead of
Some(val); otherwise return Some(val). This ensures malformed/unterminated
varints are treated as failures.
- Around line 35-50: write_varint_be currently only emits up to five 6-bit
chunks and thus drops bits for values >= 1 << 30; change it to emit all 6-bit
groups until the remaining value fits in 6 bits (i.e., use a loop instead of
fixed conditionals) so the top bits are preserved for full u32 range.
Specifically, in write_varint_be replace the fixed if-chain with a loop that
repeatedly computes group = (val >> shift) & 0x3f or repeatedly shifts the
remaining value, pushes 0x40 | group for all non-final groups and pushes the
final (val & 0x3f) without the continuation bit, and then return the number of
bytes written; keep the function signature and return behavior the same.

In `@crates/vm/src/stdlib/marshal.rs`:
- Around line 519-549: The load function currently calls read() with no size
which slurps the rest of the stream; change load to read only as much as the
deserializer needs by feeding a streaming buffer to marshal::deserialize_value
instead of pulling the entire file: replace the
ArgBytesLike::try_from_object(read_res) / buf approach with a small-read loop
(using the file object's read(size) or readinto-like API) that appends bytes to
a Vec<u8> and repeatedly calls deserialize_value on a &[u8] slice when more data
is required, only calling tell/seek after you know how many bytes were consumed;
keep the existing error mapping and check_no_code(&result, vm) logic and
reference the same symbols (load, marshal::deserialize_value, ArgBytesLike,
tell/seek, check_no_code) when making the change.
- Around line 420-473: The code is calling unwrap() on container insertions
which can panic on malformed input; update make_set (remove
set.add(...).unwrap()), make_frozenset (remove
PyFrozenSet::from_iter(...).unwrap()), and make_dict (remove
dict.set_item(...).unwrap()) to handle insertion errors cleanly instead of
panicking: detect failures from set.add, PyFrozenSet::from_iter, and
dict.set_item and return an appropriate Err(marshal::MarshalError) (or convert
the underlying Python error into a MarshalError) so malformed/unhashable
elements produce a controlled error from make_set/make_frozenset/make_dict
rather than causing a Rust panic.
- Around line 552-580: check_no_code currently skips PySlice, allowing code
objects in slice.start/stop/step to bypass allow_code=false; update
check_no_code to detect PySlice (crate::builtins::PySlice) and recursively call
check_no_code on its start, stop, and step components (similar to how
PyTuple/PyList/PyDict are handled) so that slice(start, stop, step) cannot
smuggle code objects through dumps/loads; ensure you use the slice's accessor
methods consistent with other container handling and keep error behavior
identical when a PyCode is found.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 523278ff-f428-42e4-813f-fb5cc5ab398b

📥 Commits

Reviewing files that changed from the base of the PR and between dfcb07c and 14d6f73.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_marshal.py is excluded by !Lib/**
📒 Files selected for processing (7)
  • crates/compiler-core/src/bytecode.rs
  • crates/compiler-core/src/bytecode/oparg.rs
  • crates/compiler-core/src/marshal.rs
  • crates/compiler-core/src/varint.rs
  • crates/vm/src/builtins/code.rs
  • crates/vm/src/stdlib/marshal.rs
  • crates/vm/src/types/slot.rs

Comment on lines +570 to +576
/// Disable adaptive specialization by setting all counters to unreachable.
/// Used for CPython-compiled bytecode where specialization may not be safe.
pub fn disable_specialization(&self) {
for counter in self.adaptive_counters.iter() {
counter.store(UNREACHABLE_BACKOFF, Ordering::Relaxed);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

disable_specialization() is undone by the next quicken().

This only writes UNREACHABLE_BACKOFF into adaptive_counters, but CodeUnits::quicken() later reinitializes every cacheable site unconditionally. A code object marked non-specializable before first execution will still get warmup counters at RESUME.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/bytecode.rs` around lines 570 - 576, The current
disable_specialization() only writes UNREACHABLE_BACKOFF into adaptive_counters
but CodeUnits::quicken() later unconditionally reinitializes cacheable sites,
undoing the disable; update the implementation so that quicken() respects the
non-specializable state: either add a persistent flag on the code object (e.g.,
non_specializable) set by disable_specialization() and check that flag in
CodeUnits::quicken() before initializing warmup counters at RESUME, or have
quicken() inspect each adaptive counter and skip reinitialization when the
counter already equals UNREACHABLE_BACKOFF; ensure you reference
disable_specialization, adaptive_counters, UNREACHABLE_BACKOFF, and
CodeUnits::quicken() in your changes so the disable persists across quicken().

Comment on lines +385 to 389
/// `SET_FUNCTION_ATTRIBUTE` flags.
/// Bitmask: Defaults=0x01, KwOnly=0x02, Annotations=0x04,
/// Closure=0x08, TypeParams=0x10, Annotate=0x20.
/// Stored as bit position (0-5) by `bitflag!` macro.
#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Split bit-position decoding from CPython bitmask decoding.

SET_FUNCTION_ATTRIBUTE has two incompatible encodings here, but TryFrom<u32> cannot disambiguate 1, 2, or 4: the direct u8 path maps them to bit positions before the power-of-two branch runs. From<MakeFunctionFlag> for u32 still emits positions as well, so marshaled code can round-trip the wrong flag in both directions.

Also applies to: 410-427

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/bytecode/oparg.rs` around lines 385 - 389, The
current encoding mixes bit positions and CPython power-of-two bitmasks, causing
ambiguity in TryFrom<u32> and in From<MakeFunctionFlag> for u32; change the
conversions so TryFrom<u32> decodes CPython-style masks (treat input as bitmask
and match 1,2,4,8,16,32 specifically, returning Err for non-power-of-two or
unknown values) and make From<MakeFunctionFlag> for u32 emit the CPython bitmask
(1 << position) rather than the raw position; if you need to support
position-based values keep or add a distinct TryFrom<u8> (or a helper like
from_position) to decode stored positions, and adjust any uses of the bitflag!
macro or internal storage to consistently treat the enum as representing a
position with conversions handling mask<->position via (1 << position). Ensure
unique symbols updated: TryFrom<u32> impl for MakeFunctionFlag,
From<MakeFunctionFlag> for u32, and any helper from_position/from_mask
functions.

Comment on lines +279 to +345
/// Read a marshal bytes object (TYPE_STRING = b's').
fn read_marshal_bytes<R: Read>(rdr: &mut R) -> Result<Vec<u8>> {
let type_byte = rdr.read_u8()? & !FLAG_REF;
if type_byte != Type::Bytes as u8 {
return Err(MarshalError::BadType);
}
let len = rdr.read_u32()?;
Ok(rdr.read_slice(len)?.to_vec())
}

/// Read a marshal string object.
fn read_marshal_str<R: Read>(rdr: &mut R) -> Result<alloc::string::String> {
let type_byte = rdr.read_u8()? & !FLAG_REF;
let s = match type_byte {
b'u' | b't' | b'a' | b'A' => {
let len = rdr.read_u32()?;
rdr.read_str(len)?
}
b'z' | b'Z' => {
let len = rdr.read_u8()? as u32;
rdr.read_str(len)?
}
_ => return Err(MarshalError::BadType),
};
Ok(alloc::string::String::from(s))
}

/// Read a marshal tuple of strings, returning owned Strings.
fn read_marshal_str_vec<R: Read>(rdr: &mut R) -> Result<Vec<alloc::string::String>> {
let type_byte = rdr.read_u8()? & !FLAG_REF;
let n = match type_byte {
b'(' => rdr.read_u32()? as usize,
b')' => rdr.read_u8()? as usize,
_ => return Err(MarshalError::BadType),
};
(0..n).map(|_| read_marshal_str(rdr)).collect()
}

fn read_marshal_name_tuple<R: Read, Bag: ConstantBag>(
rdr: &mut R,
bag: &Bag,
) -> Result<Box<[<Bag::Constant as Constant>::Name]>> {
let type_byte = rdr.read_u8()? & !FLAG_REF;
let n = match type_byte {
b'(' => rdr.read_u32()? as usize,
b')' => rdr.read_u8()? as usize,
_ => return Err(MarshalError::BadType),
};
(0..n)
.map(|_| Ok(bag.make_name(&read_marshal_str(rdr)?)))
.collect::<Result<Vec<_>>>()
.map(Vec::into_boxed_slice)
}

/// Read a marshal tuple of constants.
fn read_marshal_const_tuple<R: Read, Bag: ConstantBag>(
rdr: &mut R,
bag: Bag,
) -> Result<Constants<Bag::Constant>> {
let type_byte = rdr.read_u8()? & !FLAG_REF;
let n = match type_byte {
b'(' => rdr.read_u32()? as usize,
b')' => rdr.read_u8()? as usize,
_ => return Err(MarshalError::BadType),
};
(0..n).map(|_| deserialize_value(rdr, bag)).collect()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Code-object field readers are bypassing the marshal ref table.

These helpers strip FLAG_REF without reserving a slot, and read_marshal_const_tuple() deserializes each constant with a fresh refs vector. In v3+ streams, ref indices are global across the whole code object, so valid TYPE_REFs inside co_consts / co_names will misdecode or shift every later reference.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/marshal.rs` around lines 279 - 345, The helpers
read_marshal_bytes, read_marshal_str, read_marshal_str_vec,
read_marshal_name_tuple and read_marshal_const_tuple are stripping FLAG_REF and
not reserving/referring into the global marshal ref table, and
read_marshal_const_tuple calls deserialize_value in a way that builds a fresh
refs vector for each element; this breaks v3+ streams where TYPE_REF indices are
global. Fix by making these readers handle FLAG_REF correctly: when reading the
element type byte check for FLAG_REF and if present reserve a slot in the shared
refs table before deserializing and use the reserved index for ref entries
instead of stripping the flag; propagate a mutable refs container into
read_marshal_* functions (and into deserialize_value) so tuples use the same
refs vector for all elements; remove any local per-element refs creation in
read_marshal_const_tuple and instead pass the shared refs through each call so
TYPE_REFs decode consistently across the whole code object. Ensure functions
read_marshal_str_vec, read_marshal_name_tuple and read_marshal_const_tuple
accept &mut refs (or a context that owns it) and that FLAG_REF handling is
centralized where type_byte is consumed.

Comment on lines +481 to +518
fn deserialize_value_depth<R: Read, Bag: MarshalBag>(
rdr: &mut R,
bag: Bag,
depth: usize,
refs: &mut Vec<Option<Bag::Value>>,
) -> Result<Bag::Value> {
if depth == 0 {
return Err(MarshalError::InvalidBytecode);
}
let raw = rdr.read_u8()?;
let flag = raw & FLAG_REF != 0;
let type_code = raw & !FLAG_REF;

// TYPE_REF: return previously stored object
if type_code == Type::Ref as u8 {
let idx = rdr.read_u32()? as usize;
return refs
.get(idx)
.and_then(|v| v.clone())
.ok_or(MarshalError::InvalidBytecode);
}

// Reserve ref slot before reading (matches write order)
let slot = if flag {
let idx = refs.len();
refs.push(None);
Some(idx)
} else {
None
};

let typ = Type::try_from(type_code)?;
let value = deserialize_value_typed(rdr, bag, depth, refs, typ)?;

if let Some(idx) = slot {
refs[idx] = Some(value.clone());
}
Ok(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Reserved refs cannot resolve recursive containers yet.

A flagged container gets a None placeholder here, but the slot is only populated after the whole value is built. If a TYPE_REF points back to that container while its contents are still being read, the lookup hits None and fails with InvalidBytecode. The new stdlib writer already emits those backrefs for recursive lists/dicts/sets, so dumps() and loads() will not round-trip them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/marshal.rs` around lines 481 - 518,
deserialize_value_depth currently reserves a None placeholder for flagged
containers, but only sets refs[idx] after deserialize_value_typed returns, so
recursive backrefs to a container being built see None and fail; fix by
reserving and inserting a live placeholder value into refs before calling
deserialize_value_typed so TYPE_REF can resolve to the in-progress container.
Concretely, modify deserialize_value_depth to create an empty/container
placeholder via the Bag API (or add a Bag::make_placeholder/mutable container
constructor) and store Some(placeholder) at refs[idx] prior to calling
deserialize_value_typed, then have deserialize_value_typed fill/populate that
placeholder in-place (or accept the slot index to mutate refs[idx]) instead of
replacing it after the fact; ensure Type::try_from and the slot handling logic
in deserialize_value_typed support in-place population for lists/dicts/sets to
allow recursive references to resolve.

Comment on lines +62 to 80
/// Read a big-endian varint with start marker (0x80).
pub fn read_varint_with_start(data: &[u8], pos: &mut usize) -> Option<u32> {
if *pos >= data.len() {
return None;
}
let first = data[*pos];
if first & 0x80 == 0 {
return None; // Not a start byte
return None;
}
*pos += 1;

let mut val = (first & 0x3f) as u32;
let mut shift = 6;
let mut has_continuation = first & 0x40 != 0;

while has_continuation && *pos < data.len() {
let byte = data[*pos];
if byte & 0x80 != 0 {
break; // Next entry start
}
let mut cont = first & 0x40 != 0;
while cont && *pos < data.len() {
let b = data[*pos];
*pos += 1;
val |= ((byte & 0x3f) as u32) << shift;
shift += 6;
has_continuation = byte & 0x40 != 0;
val = (val << 6) | (b & 0x3f) as u32;
cont = b & 0x40 != 0;
}
Some(val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't accept unterminated varints as complete values.

Both readers return Some(val) even if the buffer ends while the continuation bit is still set on the last byte. That silently turns malformed exception tables into shorter numbers instead of signaling failure.

Also applies to: 83-99

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/varint.rs` around lines 62 - 80,
read_varint_with_start (and the other varint reader) currently returns Some(val)
even when the input ends while the continuation bit is still set; update the
loop exit logic to detect an unterminated varint and return None in that case.
Concretely: after the while loop in read_varint_with_start (and analogously in
the other reader), if cont is still true (meaning we exited because *pos >=
data.len()), return None instead of Some(val); otherwise return Some(val). This
ensures malformed/unterminated varints are treated as failures.

Comment on lines 420 to +473
fn make_list(
&self,
it: impl Iterator<Item = Self::Value>,
) -> Result<Self::Value, marshal::MarshalError> {
Ok(self.0.ctx.new_list(it.collect()).into())
}

fn make_set(
&self,
it: impl Iterator<Item = Self::Value>,
) -> Result<Self::Value, marshal::MarshalError> {
let vm = self.0;
let set = PySet::default().into_ref(&vm.ctx);
let set = PySet::default().into_ref(&self.0.ctx);
for elem in it {
set.add(elem, vm).unwrap()
set.add(elem, self.0).unwrap()
}
Ok(set.into())
}

fn make_frozenset(
&self,
it: impl Iterator<Item = Self::Value>,
) -> Result<Self::Value, marshal::MarshalError> {
let vm = self.0;
Ok(PyFrozenSet::from_iter(vm, it).unwrap().to_pyobject(vm))
Ok(PyFrozenSet::from_iter(self.0, it)
.unwrap()
.to_pyobject(self.0))
}

fn make_dict(
&self,
it: impl Iterator<Item = (Self::Value, Self::Value)>,
) -> Result<Self::Value, marshal::MarshalError> {
let vm = self.0;
let dict = vm.ctx.new_dict();
let dict = self.0.ctx.new_dict();
for (k, v) in it {
dict.set_item(&*k, v, vm).unwrap()
dict.set_item(&*k, v, self.0).unwrap()
}
Ok(dict.into())
}

fn make_slice(
&self,
start: Self::Value,
stop: Self::Value,
step: Self::Value,
) -> Result<Self::Value, marshal::MarshalError> {
use crate::builtins::PySlice;
let vm = self.0;
Ok(PySlice {
start: if vm.is_none(&start) {
None
} else {
Some(start)
},
stop,
step: if vm.is_none(&step) { None } else { Some(step) },
}
.into_ref(&vm.ctx)
.into())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't unwrap() container insertions during loads().

A malformed stream can put an unhashable object inside a set/frozenset or use it as a dict key. These unwrap() calls turn that user-controlled input into a Rust panic instead of a Python exception.

Suggested fix
         fn make_set(
             &self,
             it: impl Iterator<Item = Self::Value>,
         ) -> Result<Self::Value, marshal::MarshalError> {
             let set = PySet::default().into_ref(&self.0.ctx);
             for elem in it {
-                set.add(elem, self.0).unwrap()
+                set.add(elem, self.0)
+                    .map_err(|_| marshal::MarshalError::BadType)?;
             }
             Ok(set.into())
         }
@@
         fn make_frozenset(
             &self,
             it: impl Iterator<Item = Self::Value>,
         ) -> Result<Self::Value, marshal::MarshalError> {
-            Ok(PyFrozenSet::from_iter(self.0, it)
-                .unwrap()
-                .to_pyobject(self.0))
+            Ok(PyFrozenSet::from_iter(self.0, it)
+                .map_err(|_| marshal::MarshalError::BadType)?
+                .to_pyobject(self.0))
         }
@@
         fn make_dict(
             &self,
             it: impl Iterator<Item = (Self::Value, Self::Value)>,
         ) -> Result<Self::Value, marshal::MarshalError> {
             let dict = self.0.ctx.new_dict();
             for (k, v) in it {
-                dict.set_item(&*k, v, self.0).unwrap()
+                dict.set_item(&*k, v, self.0)
+                    .map_err(|_| marshal::MarshalError::BadType)?;
             }
             Ok(dict.into())
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn make_list(
&self,
it: impl Iterator<Item = Self::Value>,
) -> Result<Self::Value, marshal::MarshalError> {
Ok(self.0.ctx.new_list(it.collect()).into())
}
fn make_set(
&self,
it: impl Iterator<Item = Self::Value>,
) -> Result<Self::Value, marshal::MarshalError> {
let vm = self.0;
let set = PySet::default().into_ref(&vm.ctx);
let set = PySet::default().into_ref(&self.0.ctx);
for elem in it {
set.add(elem, vm).unwrap()
set.add(elem, self.0).unwrap()
}
Ok(set.into())
}
fn make_frozenset(
&self,
it: impl Iterator<Item = Self::Value>,
) -> Result<Self::Value, marshal::MarshalError> {
let vm = self.0;
Ok(PyFrozenSet::from_iter(vm, it).unwrap().to_pyobject(vm))
Ok(PyFrozenSet::from_iter(self.0, it)
.unwrap()
.to_pyobject(self.0))
}
fn make_dict(
&self,
it: impl Iterator<Item = (Self::Value, Self::Value)>,
) -> Result<Self::Value, marshal::MarshalError> {
let vm = self.0;
let dict = vm.ctx.new_dict();
let dict = self.0.ctx.new_dict();
for (k, v) in it {
dict.set_item(&*k, v, vm).unwrap()
dict.set_item(&*k, v, self.0).unwrap()
}
Ok(dict.into())
}
fn make_slice(
&self,
start: Self::Value,
stop: Self::Value,
step: Self::Value,
) -> Result<Self::Value, marshal::MarshalError> {
use crate::builtins::PySlice;
let vm = self.0;
Ok(PySlice {
start: if vm.is_none(&start) {
None
} else {
Some(start)
},
stop,
step: if vm.is_none(&step) { None } else { Some(step) },
}
.into_ref(&vm.ctx)
.into())
}
fn make_list(
&self,
it: impl Iterator<Item = Self::Value>,
) -> Result<Self::Value, marshal::MarshalError> {
Ok(self.0.ctx.new_list(it.collect()).into())
}
fn make_set(
&self,
it: impl Iterator<Item = Self::Value>,
) -> Result<Self::Value, marshal::MarshalError> {
let set = PySet::default().into_ref(&self.0.ctx);
for elem in it {
set.add(elem, self.0)
.map_err(|_| marshal::MarshalError::BadType)?;
}
Ok(set.into())
}
fn make_frozenset(
&self,
it: impl Iterator<Item = Self::Value>,
) -> Result<Self::Value, marshal::MarshalError> {
Ok(PyFrozenSet::from_iter(self.0, it)
.map_err(|_| marshal::MarshalError::BadType)?
.to_pyobject(self.0))
}
fn make_dict(
&self,
it: impl Iterator<Item = (Self::Value, Self::Value)>,
) -> Result<Self::Value, marshal::MarshalError> {
let dict = self.0.ctx.new_dict();
for (k, v) in it {
dict.set_item(&*k, v, self.0)
.map_err(|_| marshal::MarshalError::BadType)?;
}
Ok(dict.into())
}
fn make_slice(
&self,
start: Self::Value,
stop: Self::Value,
step: Self::Value,
) -> Result<Self::Value, marshal::MarshalError> {
use crate::builtins::PySlice;
let vm = self.0;
Ok(PySlice {
start: if vm.is_none(&start) {
None
} else {
Some(start)
},
stop,
step: if vm.is_none(&step) { None } else { Some(step) },
}
.into_ref(&vm.ctx)
.into())
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/marshal.rs` around lines 420 - 473, The code is calling
unwrap() on container insertions which can panic on malformed input; update
make_set (remove set.add(...).unwrap()), make_frozenset (remove
PyFrozenSet::from_iter(...).unwrap()), and make_dict (remove
dict.set_item(...).unwrap()) to handle insertion errors cleanly instead of
panicking: detect failures from set.add, PyFrozenSet::from_iter, and
dict.set_item and return an appropriate Err(marshal::MarshalError) (or convert
the underlying Python error into a MarshalError) so malformed/unhashable
elements produce a controlled error from make_set/make_frozenset/make_dict
rather than causing a Rust panic.

Comment on lines +519 to +549
fn load(args: LoadArgs, vm: &VirtualMachine) -> PyResult<PyObjectRef> {
// Read from file object into a buffer, one object at a time.
// We read all available data, deserialize one object, then seek
// back to just after the consumed bytes.
let tell_before = vm
.call_method(&args.f, "tell", ())?
.try_into_value::<i64>(vm)?;
let read_res = vm.call_method(&args.f, "read", ())?;
let bytes = ArgBytesLike::try_from_object(vm, read_res)?;
let buf = bytes.borrow_buf();

let mut rdr: &[u8] = &buf;
let len_before = rdr.len();
let result =
marshal::deserialize_value(&mut rdr, PyMarshalBag(vm)).map_err(|e| match e {
marshal::MarshalError::Eof => vm.new_exception_msg(
vm.ctx.exceptions.eof_error.to_owned(),
"marshal data too short".into(),
),
_ => vm.new_value_error("bad marshal data"),
})?;
let consumed = len_before - rdr.len();

// Seek file to just after the consumed bytes
let new_pos = tell_before + consumed as i64;
vm.call_method(&args.f, "seek", (new_pos,))?;

if !args.allow_code {
check_no_code(&result, vm)?;
}
Ok(result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

load() should not read to EOF just to parse one object.

Calling read() with no size slurps the entire remainder of the stream, can block indefinitely on non-EOF sources, and forces tell() / seek() on file-likes that would otherwise be readable. marshal.load() only needs a streaming reader for one value.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/marshal.rs` around lines 519 - 549, The load function
currently calls read() with no size which slurps the rest of the stream; change
load to read only as much as the deserializer needs by feeding a streaming
buffer to marshal::deserialize_value instead of pulling the entire file: replace
the ArgBytesLike::try_from_object(read_res) / buf approach with a small-read
loop (using the file object's read(size) or readinto-like API) that appends
bytes to a Vec<u8> and repeatedly calls deserialize_value on a &[u8] slice when
more data is required, only calling tell/seek after you know how many bytes were
consumed; keep the existing error mapping and check_no_code(&result, vm) logic
and reference the same symbols (load, marshal::deserialize_value, ArgBytesLike,
tell/seek, check_no_code) when making the change.

Comment on lines +552 to +580
/// Reject subclasses of marshallable types (int, float, complex, tuple, etc.).
/// Recursively check that no code objects are present.
fn check_no_code(obj: &PyObjectRef, vm: &VirtualMachine) -> PyResult<()> {
if obj.downcast_ref::<PyCode>().is_some() {
return Err(vm.new_value_error("unmarshalling code objects is disallowed".to_string()));
}
if let Some(tup) = obj.downcast_ref::<PyTuple>() {
for elem in tup.as_slice() {
check_no_code(elem, vm)?;
}
marshal::MarshalError::InvalidUtf8 => {
vm.new_value_error("invalid utf8 in marshalled string")
} else if let Some(list) = obj.downcast_ref::<PyList>() {
for elem in list.borrow_vec().iter() {
check_no_code(elem, vm)?;
}
marshal::MarshalError::InvalidLocation => {
vm.new_value_error("invalid location in marshalled object")
} else if let Some(set) = obj.downcast_ref::<PySet>() {
for elem in set.elements() {
check_no_code(&elem, vm)?;
}
marshal::MarshalError::BadType => {
vm.new_value_error("bad marshal data (unknown type code)")
} else if let Some(fset) = obj.downcast_ref::<PyFrozenSet>() {
for elem in fset.elements() {
check_no_code(&elem, vm)?;
}
})
} else if let Some(dict) = obj.downcast_ref::<PyDict>() {
for (k, v) in dict.into_iter() {
check_no_code(&k, vm)?;
check_no_code(&v, vm)?;
}
}
Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

allow_code = false is bypassed through slice members.

check_no_code() now walks the common containers, but it never inspects crate::builtins::PySlice even though write_object_depth() serializes start, stop, and step recursively. slice(code_obj, None, None) will still smuggle a code object through both dumps() and loads().

Suggested fix
         } else if let Some(dict) = obj.downcast_ref::<PyDict>() {
             for (k, v) in dict.into_iter() {
                 check_no_code(&k, vm)?;
                 check_no_code(&v, vm)?;
             }
+        } else if let Some(slice) = obj.downcast_ref::<crate::builtins::PySlice>() {
+            if let Some(start) = slice.start.as_ref() {
+                check_no_code(start, vm)?;
+            }
+            check_no_code(&slice.stop, vm)?;
+            if let Some(step) = slice.step.as_ref() {
+                check_no_code(step, vm)?;
+            }
         }
         Ok(())
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/marshal.rs` around lines 552 - 580, check_no_code
currently skips PySlice, allowing code objects in slice.start/stop/step to
bypass allow_code=false; update check_no_code to detect PySlice
(crate::builtins::PySlice) and recursively call check_no_code on its start,
stop, and step components (similar to how PyTuple/PyList/PyDict are handled) so
that slice(start, stop, step) cannot smuggle code objects through dumps/loads;
ensure you use the slice's accessor methods consistent with other container
handling and keep error behavior identical when a PyCode is found.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
crates/compiler-core/src/marshal.rs (3)

921-934: ⚠️ Potential issue | 🟠 Major

Rebuilding co_localspluskinds here still drops hidden flags.

This reconstruction only emits LOCAL/CELL/FREE, so metadata already preserved in code.localspluskinds—for example CO_FAST_HIDDEN—is discarded on the next marshal write. Since Line 252 now keeps the original bytes, serialize those bytes back out here instead of regenerating the mask.

💡 Minimal fix
-    let total_lp = code.varnames.len() + cell_only_names.len() + code.freevars.len();
-    let mut kinds = Vec::with_capacity(total_lp);
-    for v in code.varnames.iter() {
-        let mut k: u8 = CO_FAST_LOCAL;
-        if local_cell_set.contains(v.as_ref()) {
-            k |= CO_FAST_CELL;
-        }
-        kinds.push(k);
-    }
-    kinds.extend(core::iter::repeat_n(CO_FAST_CELL, cell_only_names.len()));
-    kinds.extend(core::iter::repeat_n(CO_FAST_FREE, code.freevars.len()));
     buf.write_u8(Type::Bytes as u8);
-    write_vec(buf, &kinds);
+    debug_assert_eq!(total_lp_count, code.localspluskinds.len());
+    write_vec(buf, &code.localspluskinds);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/marshal.rs` around lines 921 - 934, The current
reconstruction of locals+kind masks (loop over code.varnames, extend repeats for
cell_only_names and code.freevars) discards hidden/other flags present in the
original code.localspluskinds; instead of regenerating the mask, emit the
preserved byte sequence saved earlier (code.localspluskinds) back into the
buffer — replace the creation/population of kinds and the write_vec(&kinds) with
writing the original code.localspluskinds bytes via the same
Type::Bytes/write_vec path so hidden flags (e.g. CO_FAST_HIDDEN) are retained.

502-516: ⚠️ Potential issue | 🔴 Critical

Reserved slots still cannot satisfy backrefs to in-progress containers.

refs.push(None) reserves the index, but the slot is only populated after deserialize_value_typed() returns. If a list/dict/set/frozenset contains a TYPE_REF back to itself while it is being built, the lookup still sees None and fails with InvalidBytecode, so recursive containers will not round-trip. This needs a live placeholder inserted before descending and then filled in place.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/marshal.rs` around lines 502 - 516, The
reserved-slot approach pushes None then fills it after deserialize_value_typed,
which still prevents TYPE_REF lookups during recursive descent; change this to
insert a live placeholder into refs (when flag is true) before calling
deserialize_value_typed so backrefs can resolve to the in-progress container,
then after deserialization replace that placeholder in-place with the final
value (avoid using a cloned value for the slot so backrefs point to the same
instance). Update the logic around refs, slot, and the call to
deserialize_value_typed to create/fill a placeholder per container and ensure
the final assignment mutates refs[idx] to the completed value rather than
leaving a None or a clone.

278-344: ⚠️ Potential issue | 🔴 Critical

Still decoding code-object fields outside the shared marshal ref table.

These helpers all clear FLAG_REF locally, so flagged field values never reserve a global ref slot, and read_marshal_const_tuple() recreates refs for each element via deserialize_value(rdr, bag). In a valid v3+ stream, TYPE_REFs inside co_consts, co_names, or later code-object fields will then read the wrong slot and desynchronize every following reference. Thread the caller’s shared refs state through deserialize_code() and these helpers instead of consuming the type byte here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/marshal.rs` around lines 278 - 344, The helpers
read_marshal_bytes, read_marshal_str, read_marshal_str_vec,
read_marshal_name_tuple and read_marshal_const_tuple are clearing FLAG_REF and
consuming the type byte locally, so flagged objects never reserve entries in the
shared refs table and subsequent TYPE_REFs get desynchronized; thread the
caller's shared refs state through deserialize_code and these helper functions
(add a &mut refs parameter or equivalent to their signatures), stop stripping
FLAG_REF or consuming the type byte inside these helpers (let
deserialize_code/read-dispatch handle the initial type/FLAG_REF logic and
reserve ref slots), and make deserialize_value use the shared &mut refs (and not
recreate refs or take Bag by value) so all co_consts/co_names/code-object fields
use the same global refs table.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 606-637: In the Type::Dict branch the code peels raw and calls
deserialize_value_typed(rdr, bag, d, refs, key_type) which bypasses the depth
guard and can underflow when depth == 1; change the key-path to use the same
guarded entry used for values by calling deserialize_value_depth(rdr, bag, d,
refs) (or, if you must keep typed deserialization, explicitly check if d == 0
and reject composite key types like tuples/frozensets) so that depth is not
decremented twice; update the logic around reading FLAG_REF and refs (the
key_slot handling and refs[idx] assignment) to work with the
deserialize_value_depth flow and preserve the existing ref semantics.

---

Duplicate comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 921-934: The current reconstruction of locals+kind masks (loop
over code.varnames, extend repeats for cell_only_names and code.freevars)
discards hidden/other flags present in the original code.localspluskinds;
instead of regenerating the mask, emit the preserved byte sequence saved earlier
(code.localspluskinds) back into the buffer — replace the creation/population of
kinds and the write_vec(&kinds) with writing the original code.localspluskinds
bytes via the same Type::Bytes/write_vec path so hidden flags (e.g.
CO_FAST_HIDDEN) are retained.
- Around line 502-516: The reserved-slot approach pushes None then fills it
after deserialize_value_typed, which still prevents TYPE_REF lookups during
recursive descent; change this to insert a live placeholder into refs (when flag
is true) before calling deserialize_value_typed so backrefs can resolve to the
in-progress container, then after deserialization replace that placeholder
in-place with the final value (avoid using a cloned value for the slot so
backrefs point to the same instance). Update the logic around refs, slot, and
the call to deserialize_value_typed to create/fill a placeholder per container
and ensure the final assignment mutates refs[idx] to the completed value rather
than leaving a None or a clone.
- Around line 278-344: The helpers read_marshal_bytes, read_marshal_str,
read_marshal_str_vec, read_marshal_name_tuple and read_marshal_const_tuple are
clearing FLAG_REF and consuming the type byte locally, so flagged objects never
reserve entries in the shared refs table and subsequent TYPE_REFs get
desynchronized; thread the caller's shared refs state through deserialize_code
and these helper functions (add a &mut refs parameter or equivalent to their
signatures), stop stripping FLAG_REF or consuming the type byte inside these
helpers (let deserialize_code/read-dispatch handle the initial type/FLAG_REF
logic and reserve ref slots), and make deserialize_value use the shared &mut
refs (and not recreate refs or take Bag by value) so all
co_consts/co_names/code-object fields use the same global refs table.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 16fbaa0f-e75b-46d4-8cec-fce352c0476f

📥 Commits

Reviewing files that changed from the base of the PR and between 14d6f73 and 11d2eaa.

📒 Files selected for processing (2)
  • crates/compiler-core/src/marshal.rs
  • crates/compiler-core/src/varint.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/compiler-core/src/varint.rs

Unify marshal to a single CPython-compatible format. No separate
"cpython_marshal" reader — one format for frozen modules, .pyc
files, and the Python-level marshal module.

- ComparisonOperator: `(cmp_index << 5) | mask` matching COMPARE_OP
- MakeFunctionFlag: bit-position matching SET_FUNCTION_ATTRIBUTE
- Exception table varint: big-endian (matching Python/assemble.c)
- Linetable varint: little-endian (unchanged)
- Integer: TYPE_INT (i32) / TYPE_LONG (base-2^15 digits)
- Code objects: CPython field order (argcount, posonlyargcount, ...,
  co_localsplusnames, co_localspluskinds, ..., co_exceptiontable)

- FLAG_REF / TYPE_REF for object deduplication (version >= 3)
- allow_code keyword argument on dumps/loads/dump/load
- Subclass rejection (int/float/complex/tuple/list/dict/set/frozenset)
- Slice serialization (version >= 5)
- Buffer protocol fallback for memoryview/array
- Recursion depth limit (2000) for both reads and writes
- Streaming load (reads one object, seeks file position)
- TYPE_INT64, TYPE_FLOAT (text), TYPE_COMPLEX (text) for compat

serialize_code writes co_localsplusnames/co_localspluskinds from
split varnames/cellvars/freevars. deserialize_code splits them back.
Cell variable DEREF indices are translated between flat (wire) and
cell-relative (internal) representations in both directions.

Replace bitwise trick with match for new ComparisonOperator values.

21 -> 3 expected failures. Remaining: test_bad_reader (IO layer),
test_deterministic_sets (PYTHONHASHSEED), testIntern (string interning).
- Use original localspluskinds from marshal data instead of
  rebuilding, preserving CO_FAST_HIDDEN and other flags
- Fix write_varint_be to handle values >= 2^30 (add 6th chunk)
- Remove unused build_localspluskinds_from_split
Prevents usize underflow when dict key deserialization path calls
deserialize_value_typed with depth=0 on composite types.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
crates/compiler-core/src/varint.rs (1)

121-129: Consider adding edge-case test values.

The roundtrip test covers common values but doesn't test the upper boundary. Adding u32::MAX and values near 1 << 30 would validate the full 32-bit range handling.

💡 Suggested addition
-        for &val in &[0u32, 1, 63, 64, 127, 128, 4095, 4096, 1_000_000] {
+        for &val in &[0u32, 1, 63, 64, 127, 128, 4095, 4096, 1_000_000, 1 << 30, u32::MAX] {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/varint.rs` around lines 121 - 129, Update the
test_be_varint_roundtrip test to include edge-case u32 values such as u32::MAX
and values around 1 << 30 (for example (1 << 30) - 1 and 1 << 30) so the
big-endian varint encoder/decoder (write_varint_be and read_varint) is validated
across the full 32-bit boundary; add those values to the list iterated in the
test and keep the same round-trip assertions (assert_eq!(read_varint(&buf, &mut
pos), Some(val)) and assert_eq!(pos, buf.len())).
crates/compiler-core/src/marshal.rs (1)

609-644: Dict deserialization logic is correct but duplicates ref handling.

The key-path (lines 619-638) manually handles TYPE_REF and FLAG_REF instead of using deserialize_value_depth. While functionally correct, this duplicates the ref-handling logic. The depth guard is properly maintained since both keys and values use d = depth - 1.

💡 Consider using deserialize_value_depth for keys

This would reduce code duplication:

loop {
    // Peek type byte for null check
    let raw = rdr.read_u8()?;
    if raw & !FLAG_REF == b'0' {
        break;
    }
    // Rewind or use a different approach to let deserialize_value_depth handle the byte
    // Alternatively, accept the current implementation as-is since it works
}

Since rewinding isn't straightforward with the current Read trait, the current approach is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/marshal.rs` around lines 609 - 644, The Dict
deserialization duplicates reference-handling logic for keys; refactor the
key-path to let deserialize_value_depth handle both normal and TYPE_REF cases by
peeking the next type byte instead of consuming it: replace the read_u8() call
with a non-consuming peek using BufRead::fill_buf (or wrap rdr in a BufReader)
to check for the terminator (raw & !FLAG_REF == b'0'), and then call
deserialize_value_depth(rdr, bag, d, refs, key_type) (or the existing
deserialize_value_depth overload) to read the key so ref bookkeeping is
centralized; update the function signature or wrap the reader as needed so rdr
supports BufRead and remove the manual TYPE_REF / FLAG_REF branch in the key
handling block.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 609-644: The Dict deserialization duplicates reference-handling
logic for keys; refactor the key-path to let deserialize_value_depth handle both
normal and TYPE_REF cases by peeking the next type byte instead of consuming it:
replace the read_u8() call with a non-consuming peek using BufRead::fill_buf (or
wrap rdr in a BufReader) to check for the terminator (raw & !FLAG_REF == b'0'),
and then call deserialize_value_depth(rdr, bag, d, refs, key_type) (or the
existing deserialize_value_depth overload) to read the key so ref bookkeeping is
centralized; update the function signature or wrap the reader as needed so rdr
supports BufRead and remove the manual TYPE_REF / FLAG_REF branch in the key
handling block.

In `@crates/compiler-core/src/varint.rs`:
- Around line 121-129: Update the test_be_varint_roundtrip test to include
edge-case u32 values such as u32::MAX and values around 1 << 30 (for example (1
<< 30) - 1 and 1 << 30) so the big-endian varint encoder/decoder
(write_varint_be and read_varint) is validated across the full 32-bit boundary;
add those values to the list iterated in the test and keep the same round-trip
assertions (assert_eq!(read_varint(&buf, &mut pos), Some(val)) and
assert_eq!(pos, buf.len())).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 0cc38ce0-d090-4c65-9b73-110903f04d03

📥 Commits

Reviewing files that changed from the base of the PR and between 11d2eaa and 76856b4.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_marshal.py is excluded by !Lib/**
📒 Files selected for processing (7)
  • crates/compiler-core/src/bytecode.rs
  • crates/compiler-core/src/bytecode/oparg.rs
  • crates/compiler-core/src/marshal.rs
  • crates/compiler-core/src/varint.rs
  • crates/vm/src/builtins/code.rs
  • crates/vm/src/stdlib/marshal.rs
  • crates/vm/src/types/slot.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/vm/src/stdlib/marshal.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/builtins/code.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant