fix: evaluate expressions in WHERE, UPDATE, and UPSERT; wire ALTER COLLECTION column ops by farhan-syah · Pull Request #26 · NodeDB-Lab/nodedb

farhan-syah · 2026-04-14T20:22:23Z

Summary

Fixes three reported silent-correctness bugs (#22, #23, #24) and expands coverage to the full class of failures they represent. The core change is wiring the existing nodedb_query::expr::SqlExpr evaluator into every place the planner previously gave up on non-trivial expressions.

What was broken

Scalar functions in WHERE silently return MATCH ALL instead of raising error #22 — Scalar functions in WHERE (LOWER(col) = 'x', qty + 1 = 5, NOT (...), IN (2+1, 7), BETWEEN lo AND hi) silently fell through to FilterOp::MatchAll. DELETE WHERE LOWER(...) would wipe the table; UPDATE WHERE LOWER(...) would touch every row.
UPDATE SET col = col + N (column arithmetic) not evaluated — errors on strict, silently corrupts on schemaless #23 — UPDATE t SET n = n + 1 (and NOW(), UPPER(col), CASE, concat, LEAST(...)) wrote format!(\"{expr:?}\") into the column instead of evaluating. Strict errored on re-encode; schemaless silently stored the debug AST as a string.
ALTER COLLECTION x ADD COLUMN y rejected — router dispatcher missing branch #24 — ALTER COLLECTION x ADD COLUMN y ... (and docs-advertised DROP COLUMN / RENAME COLUMN / ALTER COLUMN TYPE) were rejected at parse time because the pgwire DDL router had no branch for them.

What changed

Scan filter path (nodedb-query/src/scan_filter, nodedb/src/control/planner/sql_plan_convert/filter.rs)

New FilterOp::Expr variant carrying a SqlExpr payload on ScanFilter.
Planner now produces expr_filter(root) for any WHERE expression it cannot reduce to (field, op, value) instead of falling through to match-all. Every fall-through in sql_expr_to_scan_filters now routes through the evaluator.
matches_value, matches_binary, and matches_binary_indexed all evaluate FilterOp::Expr against the row via SqlExpr::eval.
Planner desugars parser-only variants (NOT, BETWEEN, InList, Like, IsNull) into the evaluator's equivalent AST nodes during sql_expr_to_bridge_expr.
Bulk update/delete now treat empty filter_bytes as "match every row" instead of a zerompk decode error.

UPDATE assignment path (bridge::physical_plan::UpdateValue, handlers/point/update.rs, handlers/bulk_dml.rs)

New UpdateValue::{Literal(Vec<u8>), Expr(SqlExpr)} enum carrying both pre-encoded literals and unevaluated expressions across the SPSC bridge.
PointUpdate.updates and BulkUpdate.updates take Vec<(String, UpdateValue)>.
Planner (assignments_to_update_values) converts non-literal RHS via sql_expr_to_bridge_expr; KV FieldSet now errors loudly on non-literal RHS instead of silently dropping fields.
UPDATE handler snapshots the current row, evaluates each Expr assignment against that snapshot (PostgreSQL-style single-pass semantics), and emits the result into the doc.
Fast binary-merge path still taken when every RHS is literal.
check_generated_readonly / needs_recomputation helpers made generic over the payload type so literal and expression updates share one implementation.

UPSERT ON CONFLICT DO UPDATE (plan_insert, SqlPlan::Upsert, DocumentOp::Upsert, handlers/upsert.rs)

plan_insert detects ON CONFLICT ... DO UPDATE SET in the AST and reroutes to a new plan_upsert_with_on_conflict path that carries the assignments.
UpsertParams, SqlPlan::Upsert, and DocumentOp::Upsert all gain on_conflict_updates: Vec<(String, UpdateValue)>.
execute_upsert applies those assignments against the existing row (evaluating expressions against a pre-update snapshot) instead of merging inserted fields.

ALTER COLLECTION column ops (pgwire/ddl/collection/alter/)

New router branches for ALTER COLLECTION ADD COLUMN, DROP COLUMN, RENAME COLUMN, and ALTER COLUMN ... TYPE ....
ADD COLUMN reuses the existing strict-schema handler. DROP/RENAME/ALTER TYPE are new handlers that mutate StrictSchema, bump version, propose via Raft, and refresh the Data Plane cache. Type changes are restricted to same-discriminant aliases (e.g. INT ↔ BIGINT); cross-type changes error loudly with an explanation rather than silently corrupting existing rows.

Structural hygiene on the fix path
Four pre-existing size violations were on files this fix needed to touch. Each was split into a directory before new code was added:

File	Before	After
`sql_plan_convert/value.rs`	509	`value/{mod,convert,msgpack_write,assignments,rows,defaults,time_range}.rs` (max 152)
`nodedb-query/src/expr.rs`	625	`expr/{mod,types,codec,binary,eval}.rs` (max 285)
`scan_filter/mod.rs`	386	`scan_filter/{mod,op,types,like,parse}.rs` (max 203)
`handlers/point.rs`	696	`point/{mod,get,put,delete,update,apply_put}.rs` (max 265)
`collection/alter.rs`	442	`alter/{mod,add_column,drop_column,rename_column,alter_type,enforcement,materialized_sum}.rs`

mod.rs files are pub mod + pub use only per project convention.

Shared evaluator hygiene (nodedb-query/src/expr/eval.rs)

eval() and eval_with_old() previously duplicated ~140 lines of identical AST walking with a different column resolver. Extracted a private RowScope struct and a shared eval_scope walker; public methods now just build a scope and delegate. One traversal, one place to fix.
Replaced .expect(\"list non-empty\") in the InList desugaring with a .reduce(...).unwrap_or(...) that falls back to the already-computed empty-list sentinel.

Test plan

All three originally-reported bugs have positive regression coverage asserting the correct behavior (not the current broken behavior). The coverage was widened along the same root-cause code paths — filter.rs fall-throughs, value.rs catch-all, router gaps — to catch every docs-advertised sibling of each bug.

24/24 integration tests pass in the three affected test files:
- nodedb/tests/sql_transactions.rs — ALTER ADD/DROP/RENAME/ALTER TYPE on strict collections
- nodedb/tests/sql_where_expressions.rs — LOWER/UPPER/NOT/BETWEEN/IN-with-expression/column-arithmetic in WHERE, including DELETE and UPDATE scoping
- nodedb/tests/sql_update_expressions.rs — column arithmetic, NOW(), UPPER, concat, CASE, LEAST on strict + schemaless, plus UPSERT DO UPDATE SET
2121/2121 `nodedb` lib unit tests pass
331/331 `nodedb-query` unit tests pass
`scripts/test.sh` full pgwire SQL suite passes end-to-end — no new errors, only pre-existing DROP-before-CREATE noise
`cargo clippy -p nodedb -p nodedb-query -p nodedb-sql --all-targets -- -D warnings` clean
`cargo fmt --all` clean

Closes

…dules Move the monolithic `expr.rs` into an `expr/` directory (eval, codec, binary, types submodules) so each concern lives in its own file. Extract `FilterOp` and the `ScanFilter`/`CompareValue` types from `scan_filter/mod.rs` into `op.rs` and `types.rs` respectively, leaving `mod.rs` as pure re-exports. Add `like` and `parse` as explicit submodule declarations now that the types they depend on are in separate files. `msgpack_scan/filter.rs` updated to import from the new paths.

…ssion evaluation Introduce `UpdateValue` on the bridge — a tagged union carrying either pre-encoded msgpack bytes (fast literal path) or a `SqlExpr` that the Data Plane evaluates against the existing row at apply time. Planner changes: - `SqlPlan::Upsert` gains an `on_conflict_updates` field for the SET assignments extracted from `ON CONFLICT (...) DO UPDATE SET`. - `dml::plan_upsert_with_on_conflict` routes `INSERT ... ON CONFLICT DO UPDATE SET` through the upsert path, carrying the assignments. - `assignments_to_update_values` converts planner `SqlExpr` to `UpdateValue`, choosing `Literal` for constant RHS and `Expr` for anything that must be evaluated per-row. - `ScanFilter` gains an `expr: Option<SqlExpr>` field so complex WHERE clauses (scalar functions, arithmetic, non-literal BETWEEN) that the planner cannot reduce to simple triples are shipped verbatim to the Data Plane for row-level evaluation. - `sql_expr_to_bridge_expr` extended to cover `UnaryOp`, `IsNull`, `Between`, `Substring`, `Trim`, and `Case` expressions. - `sql_plan_convert/filter.rs` populates the new `expr` field for general-purpose `FilterExpr::Expr` predicates. Executor changes: - `bulk_dml` and `upsert` handlers receive `&[(String, UpdateValue)]` instead of `&[(String, Vec<u8>)]`; literal arms are identical to the previous fast path, expression arms evaluate against a pre-update snapshot of the row (matching PostgreSQL semantics — assignments do not observe each other). - Empty `filter_bytes` in `bulk_dml` now means "no WHERE clause" instead of a deserialization error. - `check_generated_readonly` and `needs_recomputation` generified over the update-value type so both call sites compile without duplication. All existing call sites updated to pass `expr: None` on the `ScanFilter` constructor and supply the required `on_conflict_updates` slice.

value.rs (509 lines) → value/ with assignments, convert, defaults, msgpack_write, rows, and time_range submodules. alter.rs (441 lines) → alter/ with add_column, alter_type, drop_column, enforcement, materialized_sum, and rename_column submodules. DDL router schema.rs gains explicit routing for ADD COLUMN, DROP COLUMN, and RENAME COLUMN so requests reach the handlers in the new module layout. point.rs (696 lines) → point/ with apply_put, delete, get, put, and update submodules. All files remain under 200 lines. No behaviour changes.

…TE and WHERE sql_update_expressions.rs — covers UPDATE statements where the RHS is a non-literal expression (column arithmetic, scalar functions, NOW()). These catch the silent corruption bug where expression RHS was serialized as a debug string instead of being evaluated against the current row. sql_where_expressions.rs — covers WHERE clauses containing scalar function calls (LOWER, UPPER, LENGTH, arithmetic). These must be evaluated per-row by the scan filter, not silently dropped through a match-all fall-through. sql_transactions.rs — covers basic transaction semantics: COMMIT persists buffered writes and ROLLBACK discards them. Existing executor tests updated to pass the new `on_conflict_updates` and `expr` fields added to the physical plan types.

jieba-rs 0.8 pulled in include-flate → libflate → core2 0.4.0, which was yanked from crates.io and caused CI to fail with a version-yanked error during cargo clippy. jieba-rs 0.9 removes the libflate dependency entirely, eliminating core2 from the transitive graph.

core2 0.4.0 is yanked on crates.io and no 0.4.1+ release exists. The libflate / libflate_lz77 chain (pulled in transitively via nodedb-fts' optional jieba feature → include-flate → include-flate-compress) still requires core2 ^0.4, so any fresh dependency resolution — such as CI running without a committed lockfile — fails with "version 0.4.0 is yanked". Because NodeDB is a library workspace (nodedb-fts and others are published to crates.io), committing Cargo.lock is not appropriate. Patching directly to the upstream technocreatives/core2 git repo at the commit that corresponds to 0.4.0 lets cargo bypass the registry yank check while building bit-identical code. Verified: `cargo check --workspace --all-features` and `cargo clippy --workspace --all-targets --all-features -- -D warnings` both succeed with a fresh lockfile. Remove this patch once libflate releases a version that drops the core2 dependency.

farhan-syah added 6 commits April 15, 2026 04:17

farhan-syah merged commit 6878aad into main Apr 14, 2026
2 checks passed

farhan-syah deleted the fix/sql-expression-evaluation branch April 14, 2026 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: evaluate expressions in WHERE, UPDATE, and UPSERT; wire ALTER COLLECTION column ops#26

fix: evaluate expressions in WHERE, UPDATE, and UPSERT; wire ALTER COLLECTION column ops#26
farhan-syah merged 6 commits intomainfrom
fix/sql-expression-evaluation

farhan-syah commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

farhan-syah commented Apr 14, 2026

Summary

What was broken

What changed

Test plan

Closes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant