Skip to content

fix: evaluate expressions in WHERE, UPDATE, and UPSERT; wire ALTER COLLECTION column ops#26

Merged
farhan-syah merged 6 commits intomainfrom
fix/sql-expression-evaluation
Apr 14, 2026
Merged

fix: evaluate expressions in WHERE, UPDATE, and UPSERT; wire ALTER COLLECTION column ops#26
farhan-syah merged 6 commits intomainfrom
fix/sql-expression-evaluation

Conversation

@farhan-syah
Copy link
Copy Markdown
Contributor

Summary

Fixes three reported silent-correctness bugs (#22, #23, #24) and expands coverage to the full class of failures they represent. The core change is wiring the existing nodedb_query::expr::SqlExpr evaluator into every place the planner previously gave up on non-trivial expressions.

What was broken

What changed

Scan filter path (nodedb-query/src/scan_filter, nodedb/src/control/planner/sql_plan_convert/filter.rs)

  • New FilterOp::Expr variant carrying a SqlExpr payload on ScanFilter.
  • Planner now produces expr_filter(root) for any WHERE expression it cannot reduce to (field, op, value) instead of falling through to match-all. Every fall-through in sql_expr_to_scan_filters now routes through the evaluator.
  • matches_value, matches_binary, and matches_binary_indexed all evaluate FilterOp::Expr against the row via SqlExpr::eval.
  • Planner desugars parser-only variants (NOT, BETWEEN, InList, Like, IsNull) into the evaluator's equivalent AST nodes during sql_expr_to_bridge_expr.
  • Bulk update/delete now treat empty filter_bytes as "match every row" instead of a zerompk decode error.

UPDATE assignment path (bridge::physical_plan::UpdateValue, handlers/point/update.rs, handlers/bulk_dml.rs)

  • New UpdateValue::{Literal(Vec<u8>), Expr(SqlExpr)} enum carrying both pre-encoded literals and unevaluated expressions across the SPSC bridge.
  • PointUpdate.updates and BulkUpdate.updates take Vec<(String, UpdateValue)>.
  • Planner (assignments_to_update_values) converts non-literal RHS via sql_expr_to_bridge_expr; KV FieldSet now errors loudly on non-literal RHS instead of silently dropping fields.
  • UPDATE handler snapshots the current row, evaluates each Expr assignment against that snapshot (PostgreSQL-style single-pass semantics), and emits the result into the doc.
  • Fast binary-merge path still taken when every RHS is literal.
  • check_generated_readonly / needs_recomputation helpers made generic over the payload type so literal and expression updates share one implementation.

UPSERT ON CONFLICT DO UPDATE (plan_insert, SqlPlan::Upsert, DocumentOp::Upsert, handlers/upsert.rs)

  • plan_insert detects ON CONFLICT ... DO UPDATE SET in the AST and reroutes to a new plan_upsert_with_on_conflict path that carries the assignments.
  • UpsertParams, SqlPlan::Upsert, and DocumentOp::Upsert all gain on_conflict_updates: Vec<(String, UpdateValue)>.
  • execute_upsert applies those assignments against the existing row (evaluating expressions against a pre-update snapshot) instead of merging inserted fields.

ALTER COLLECTION column ops (pgwire/ddl/collection/alter/)

  • New router branches for ALTER COLLECTION ADD COLUMN, DROP COLUMN, RENAME COLUMN, and ALTER COLUMN ... TYPE ....
  • ADD COLUMN reuses the existing strict-schema handler. DROP/RENAME/ALTER TYPE are new handlers that mutate StrictSchema, bump version, propose via Raft, and refresh the Data Plane cache. Type changes are restricted to same-discriminant aliases (e.g. INT ↔ BIGINT); cross-type changes error loudly with an explanation rather than silently corrupting existing rows.

Structural hygiene on the fix path
Four pre-existing size violations were on files this fix needed to touch. Each was split into a directory before new code was added:

File Before After
sql_plan_convert/value.rs 509 value/{mod,convert,msgpack_write,assignments,rows,defaults,time_range}.rs (max 152)
nodedb-query/src/expr.rs 625 expr/{mod,types,codec,binary,eval}.rs (max 285)
scan_filter/mod.rs 386 scan_filter/{mod,op,types,like,parse}.rs (max 203)
handlers/point.rs 696 point/{mod,get,put,delete,update,apply_put}.rs (max 265)
collection/alter.rs 442 alter/{mod,add_column,drop_column,rename_column,alter_type,enforcement,materialized_sum}.rs

mod.rs files are pub mod + pub use only per project convention.

Shared evaluator hygiene (nodedb-query/src/expr/eval.rs)

  • eval() and eval_with_old() previously duplicated ~140 lines of identical AST walking with a different column resolver. Extracted a private RowScope struct and a shared eval_scope walker; public methods now just build a scope and delegate. One traversal, one place to fix.
  • Replaced .expect(\"list non-empty\") in the InList desugaring with a .reduce(...).unwrap_or(...) that falls back to the already-computed empty-list sentinel.

Test plan

All three originally-reported bugs have positive regression coverage asserting the correct behavior (not the current broken behavior). The coverage was widened along the same root-cause code paths — filter.rs fall-throughs, value.rs catch-all, router gaps — to catch every docs-advertised sibling of each bug.

  • 24/24 integration tests pass in the three affected test files:
    • nodedb/tests/sql_transactions.rs — ALTER ADD/DROP/RENAME/ALTER TYPE on strict collections
    • nodedb/tests/sql_where_expressions.rs — LOWER/UPPER/NOT/BETWEEN/IN-with-expression/column-arithmetic in WHERE, including DELETE and UPDATE scoping
    • nodedb/tests/sql_update_expressions.rs — column arithmetic, NOW(), UPPER, concat, CASE, LEAST on strict + schemaless, plus UPSERT DO UPDATE SET
  • 2121/2121 `nodedb` lib unit tests pass
  • 331/331 `nodedb-query` unit tests pass
  • `scripts/test.sh` full pgwire SQL suite passes end-to-end — no new errors, only pre-existing DROP-before-CREATE noise
  • `cargo clippy -p nodedb -p nodedb-query -p nodedb-sql --all-targets -- -D warnings` clean
  • `cargo fmt --all` clean

Closes

…dules

Move the monolithic `expr.rs` into an `expr/` directory (eval, codec,
binary, types submodules) so each concern lives in its own file.

Extract `FilterOp` and the `ScanFilter`/`CompareValue` types from
`scan_filter/mod.rs` into `op.rs` and `types.rs` respectively, leaving
`mod.rs` as pure re-exports. Add `like` and `parse` as explicit submodule
declarations now that the types they depend on are in separate files.

`msgpack_scan/filter.rs` updated to import from the new paths.
…ssion evaluation

Introduce `UpdateValue` on the bridge — a tagged union carrying either
pre-encoded msgpack bytes (fast literal path) or a `SqlExpr` that the
Data Plane evaluates against the existing row at apply time.

Planner changes:
- `SqlPlan::Upsert` gains an `on_conflict_updates` field for the SET
  assignments extracted from `ON CONFLICT (...) DO UPDATE SET`.
- `dml::plan_upsert_with_on_conflict` routes `INSERT ... ON CONFLICT DO
  UPDATE SET` through the upsert path, carrying the assignments.
- `assignments_to_update_values` converts planner `SqlExpr` to
  `UpdateValue`, choosing `Literal` for constant RHS and `Expr` for
  anything that must be evaluated per-row.
- `ScanFilter` gains an `expr: Option<SqlExpr>` field so complex WHERE
  clauses (scalar functions, arithmetic, non-literal BETWEEN) that the
  planner cannot reduce to simple triples are shipped verbatim to the
  Data Plane for row-level evaluation.
- `sql_expr_to_bridge_expr` extended to cover `UnaryOp`, `IsNull`,
  `Between`, `Substring`, `Trim`, and `Case` expressions.
- `sql_plan_convert/filter.rs` populates the new `expr` field for
  general-purpose `FilterExpr::Expr` predicates.

Executor changes:
- `bulk_dml` and `upsert` handlers receive `&[(String, UpdateValue)]`
  instead of `&[(String, Vec<u8>)]`; literal arms are identical to the
  previous fast path, expression arms evaluate against a pre-update
  snapshot of the row (matching PostgreSQL semantics — assignments do
  not observe each other).
- Empty `filter_bytes` in `bulk_dml` now means "no WHERE clause" instead
  of a deserialization error.
- `check_generated_readonly` and `needs_recomputation` generified over
  the update-value type so both call sites compile without duplication.

All existing call sites updated to pass `expr: None` on the `ScanFilter`
constructor and supply the required `on_conflict_updates` slice.
value.rs (509 lines) → value/ with assignments, convert, defaults,
msgpack_write, rows, and time_range submodules.

alter.rs (441 lines) → alter/ with add_column, alter_type, drop_column,
enforcement, materialized_sum, and rename_column submodules. DDL router
schema.rs gains explicit routing for ADD COLUMN, DROP COLUMN, and
RENAME COLUMN so requests reach the handlers in the new module layout.

point.rs (696 lines) → point/ with apply_put, delete, get, put, and
update submodules.

All files remain under 200 lines. No behaviour changes.
…TE and WHERE

sql_update_expressions.rs — covers UPDATE statements where the RHS is a
non-literal expression (column arithmetic, scalar functions, NOW()). These
catch the silent corruption bug where expression RHS was serialized as a
debug string instead of being evaluated against the current row.

sql_where_expressions.rs — covers WHERE clauses containing scalar function
calls (LOWER, UPPER, LENGTH, arithmetic). These must be evaluated per-row
by the scan filter, not silently dropped through a match-all fall-through.

sql_transactions.rs — covers basic transaction semantics: COMMIT persists
buffered writes and ROLLBACK discards them.

Existing executor tests updated to pass the new `on_conflict_updates` and
`expr` fields added to the physical plan types.
jieba-rs 0.8 pulled in include-flate → libflate → core2 0.4.0, which
was yanked from crates.io and caused CI to fail with a version-yanked
error during cargo clippy. jieba-rs 0.9 removes the libflate dependency
entirely, eliminating core2 from the transitive graph.
core2 0.4.0 is yanked on crates.io and no 0.4.1+ release exists. The
libflate / libflate_lz77 chain (pulled in transitively via nodedb-fts'
optional jieba feature → include-flate → include-flate-compress) still
requires core2 ^0.4, so any fresh dependency resolution — such as CI
running without a committed lockfile — fails with "version 0.4.0 is
yanked".

Because NodeDB is a library workspace (nodedb-fts and others are
published to crates.io), committing Cargo.lock is not appropriate.
Patching directly to the upstream technocreatives/core2 git repo at the
commit that corresponds to 0.4.0 lets cargo bypass the registry yank
check while building bit-identical code.

Verified: `cargo check --workspace --all-features` and
`cargo clippy --workspace --all-targets --all-features -- -D warnings`
both succeed with a fresh lockfile.

Remove this patch once libflate releases a version that drops the core2
dependency.
@farhan-syah farhan-syah merged commit 6878aad into main Apr 14, 2026
2 checks passed
@farhan-syah farhan-syah deleted the fix/sql-expression-evaluation branch April 14, 2026 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant