Skip to content

[Feature] Add validation hardening, sort/limit restrictions, and integration tests for vectorSearch()#5324

Merged
mengweieric merged 10 commits intoopensearch-project:feature/vector-search-p0from
mengweieric:feature/vector-search-p0-hardening
Apr 14, 2026
Merged

[Feature] Add validation hardening, sort/limit restrictions, and integration tests for vectorSearch()#5324
mengweieric merged 10 commits intoopensearch-project:feature/vector-search-p0from
mengweieric:feature/vector-search-p0-hardening

Conversation

@mengweieric
Copy link
Copy Markdown
Collaborator

@mengweieric mengweieric commented Apr 8, 2026

Summary

Adds validation hardening, query restrictions, and integration tests on top of the
core vectorSearch() execution pipeline in #5320.

What this PR adds

Validation hardening

  • Option mutual exclusivity: exactly one of k, max_distance, min_score
  • k range validation: [1, 10000]
  • LIMIT > k rejection in top-k mode
  • Sort restriction: only ORDER BY _score DESC allowed; rejects non-_score
    fields and _score ASC
  • Radial size policy: caps max_distance/min_score queries at maxResultWindow
    to prevent unbounded result sets

pushDownSort() contract preservation

  • Preserves base OpenSearchIndexScanQueryBuilder.pushDownSort() contract for
    sort.getCount() → limit pushdown. SQL always sets count=0, but PPL or future
    callers may combine sort+limit in one LogicalSort node. Focused test verifies
    LogicalSort(count=7) is correctly pushed down as request size.

Test coverage

  • VectorSearchIndexTest (9 tests): knn JSON generation, options rendering,
    non-numeric option quoting, nested fields
  • VectorSearchIT (16 integration tests): 4 _explain DSL shape verification +
    8 validation error tests + 3 ORDER BY restriction tests + 1 LIMIT success test
  • Additional unit tests across existing test files: multi-sort rejection,
    case-insensitive arg lookup, sort count preservation, radial LIMIT modes,
    resolver argument count edge cases, NaN/Infinity edge cases

Benchmark results (against live OpenSearch + k-NN)

  • Semantic correctness: 10/10 on all 8 test cases (top-k, radial, post-filter, hotels)
  • SQL overhead: +1.4ms to +2.6ms p50 vs native DSL

Test plan

  • ./gradlew spotlessCheck — PASS
  • ./gradlew :opensearch:test :core:test :sql:test — PASS
  • ./gradlew :integ-test:integTest -Dtests.class="*VectorSearchIT" — 16/16 PASS
  • Correctness benchmark: 10/10 semantic match across A1, A2, A4, A5, A6a, A6b, A7-tight, A8

@mengweieric mengweieric added SQL feature skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis. labels Apr 8, 2026
@mengweieric mengweieric changed the title Add vectorSearch execution pipeline, validation hardening, and integration tests [Feature] Add vectorSearch execution pipeline, validation hardening, and integration tests Apr 8, 2026
@mengweieric mengweieric changed the title [Feature] Add vectorSearch execution pipeline, validation hardening, and integration tests [Feature] Add vectorSearch validation hardening, and integration tests Apr 8, 2026
@mengweieric mengweieric changed the title [Feature] Add vectorSearch validation hardening, and integration tests Add validation hardening, sort/limit restrictions, and integration tests for vectorSearch() Apr 8, 2026
@mengweieric mengweieric changed the title Add validation hardening, sort/limit restrictions, and integration tests for vectorSearch() [Feature] Add validation hardening, sort/limit restrictions, and integration tests for vectorSearch() Apr 8, 2026
- Enforce exactly one of k, max_distance, or min_score
- Validate k is in [1, 10000] range
- Add 6 tests: mutual exclusivity (3 combos), k too small, k too
  large, k boundary values (1 and 10000)

Signed-off-by: Eric Wei <[email protected]>
VectorSearchQueryBuilder now accepts options map and rejects
pushDownLimit when LIMIT exceeds k. Radial modes (max_distance,
min_score) have no LIMIT restriction.

Signed-off-by: Eric Wei <[email protected]>
- Create VectorSearchIndexTest: 7 tests covering buildKnnQueryJson()
  for top-k, max_distance, min_score, nested fields, multi-element
  and single-element vectors, numeric option rendering
- Add edge case tests to VectorSearchTableFunctionImplementationTest:
  NaN vector component, empty option key/value, negative k, NaN for
  max_distance and min_score (6 new tests)
- Add VectorSearchQueryBuilderTest: min_score radial mode LIMIT,
  pushDownSort delegation to parent (2 new tests)
- Extract buildKnnQueryJson() as package-private for direct testing

Signed-off-by: Eric Wei <[email protected]>
Test too-many (5) and zero arguments paths in
VectorSearchTableFunctionResolver to complement existing
too-few (2) test.

Signed-off-by: Eric Wei <[email protected]>
- Cap radial mode (max_distance/min_score) results at maxResultWindow
  to prevent unbounded result sets
- Reject ORDER BY on non-_score fields and _score ASC in vectorSearch
  since knn results are naturally sorted by _score DESC
- Add 12 integration tests: 4 _explain DSL shape verification tests
  and 8 validation error path tests

Signed-off-by: Eric Wei <[email protected]>
- Add multi-sort expression test: ORDER BY _score DESC, name ASC
  correctly rejects the non-_score field (VectorSearchQueryBuilderTest)
- Add case-insensitive argument name lookup test to verify
  TABLE='x' resolves same as table='x' (Implementation test)
- Add non-numeric option fallback test: verifies string options
  are quoted in JSON output (VectorSearchIndexTest)
- Add 4 integration tests: ORDER BY _score DESC succeeds,
  ORDER BY non-score rejects, ORDER BY _score ASC rejects,
  LIMIT within k succeeds (VectorSearchIT, now 16 tests)

Signed-off-by: Eric Wei <[email protected]>
The base OpenSearchIndexScanQueryBuilder.pushDownSort() pushes
sort.getCount() as a limit when non-zero. Our override validated
_score DESC and returned true, but did not preserve this contract.

SQL always sets count=0, so this was not reachable today, but PPL
or future callers may set a non-zero count to combine sort+limit
in one LogicalSort node. Preserve the behavior defensively.

Add focused test: LogicalSort(count=7) with _score DESC verifies
the count is pushed down as request size.

Signed-off-by: Eric Wei <[email protected]>
@mengweieric mengweieric force-pushed the feature/vector-search-p0-hardening branch from 08ab0c6 to 534b5f4 Compare April 9, 2026 18:46
if (hasK) {
parseIntOption(options, "k");
int k = Integer.parseInt(options.get("k"));
if (k < 1 || k > 10000) {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- Unit test: compound AND predicate survives pushdown into bool.filter
- Integration test: compound WHERE (term + range) produces bool query
- Integration test: radial max_distance with WHERE produces bool query

Signed-off-by: Eric Wei <[email protected]>
@mengweieric mengweieric force-pushed the feature/vector-search-p0-hardening branch from 5b0373c to 0e02488 Compare April 9, 2026 22:36
Copy link
Copy Markdown
Collaborator

@RyanL1997 RyanL1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mengweieric , thanks for the change and I just left a quesiton.

// Preserve the parent's sort.getCount() → limit pushdown contract: SQL always sets count=0,
// but PPL or future callers may set a non-zero count to combine sort+limit in one node.
if (sort.getCount() != 0) {
requestBuilder.pushDownLimit(sort.getCount(), 0);
Copy link
Copy Markdown
Collaborator

@RyanL1997 RyanL1997 Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

pushDownSort calls requestBuilder.pushDownLimit(sort.getCount(), 0) directly, bypassing this.pushDownLimit() (L59-67) which has the LIMIT > k validation.

I understand vectorSearch is SQL-only for P0, and in the SQL path sort.getCount() is always 0 so this is a no-op today. But the comment on L90-91 explicitly anticipates the PPL path ("PPL or future callers may set a non-zero count") — so just want to confirm: if/when PPL does use this path with a non-zero count, should the LIMIT > k guard still apply? Or is it intentionally unchecked here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The sort.getCount() path was not preserving the same LIMIT <= k invariant as pushDownLimit(). I fixed that by routing both paths through the same validation helper and added tests for the sort-count case. Thanks for calling it out.

pushDownSort() called requestBuilder.pushDownLimit() directly, bypassing
the LIMIT > k guard in pushDownLimit(). Extract validateLimitWithinK()
helper and call it from both paths so the invariant holds when PPL or
future callers set a non-zero sort count.

Signed-off-by: Eric Wei <[email protected]>

@Test
public void testExplainTopKProducesKnnQuery() throws IOException {
String explain =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the Explain test to a separate ExplainIT suite so that we can view the results in a yaml or json file directly
ref: CalciteExplainIT

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done. The _explain cases now live in VectorSearchExplainIT, and VectorSearchIT keeps only behavior/validation tests. I kept the current SQL explain assertions for now; happy to follow up with file-backed expected outputs if we want to standardize that too.

Comment thread integ-test/src/test/java/org/opensearch/sql/sql/VectorSearchIT.java Outdated
Move all explainQuery()-based DSL shape tests into a dedicated
VectorSearchExplainIT suite. VectorSearchIT now contains only
validation and error-path tests.

Signed-off-by: Eric Wei <[email protected]>
@mengweieric mengweieric merged commit b7b37d1 into opensearch-project:feature/vector-search-p0 Apr 14, 2026
36 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis. SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants