Skip to content

[Feature] Add efficient filtering (knn.filter) support for vectorSearch()#5331

Merged
mengweieric merged 14 commits intoopensearch-project:feature/vector-search-p0from
mengweieric:feature/sql-vector-search-efficient-filtering
Apr 16, 2026
Merged

[Feature] Add efficient filtering (knn.filter) support for vectorSearch()#5331
mengweieric merged 14 commits intoopensearch-project:feature/vector-search-p0from
mengweieric:feature/sql-vector-search-efficient-filtering

Conversation

@mengweieric
Copy link
Copy Markdown
Collaborator

@mengweieric mengweieric commented Apr 9, 2026

Summary

Adds filter_type=post|efficient option to vectorSearch() so WHERE clauses can be placed inside the knn clause (knn.filter) for efficient pre-filtering during ANN search, or outside as bool.filter for post-filtering (default). Also adds mandatory LIMIT enforcement for radial search.

What this PR adds

FilterType enum and option parsing

  • New FilterType enum (POST, EFFICIENT) with fromString() validation
  • filter_type added to allowed option keys in VectorSearchTableFunctionImplementation
  • filter_type is stripped from options before knn JSON generation — it's a SQL-layer directive, not a knn parameter

Efficient filter pushdown

  • VectorSearchQueryBuilder.pushDownFilter() branches on filter type:
    • POST (default): knn in bool.must + WHERE in bool.filter (post-filtering)
    • EFFICIENT: rebuilds knn query with WHERE embedded in knn.filter via callback
  • Function<QueryBuilder, QueryBuilder> callback keeps JSON serialization in VectorSearchIndex
  • buildKnnQueryJson() collapsed to accept optional filter JSON parameter — no duplication

Build-time validation

  • build() override rejects explicit filter_type when no filter is pushed down (either no WHERE clause at all, or the WHERE clause was not pushdownable)
  • pushDownFilter() catches ScriptQueryUnSupportedException for non-pushdownable conditions:
    • With explicit filter_type: throws a clear error explaining the condition cannot be pushed down
    • Without explicit filter_type: returns false to fall back to in-memory filtering, matching base class behavior

Radial search LIMIT requirement

  • Radial search (max_distance or min_score) without an explicit LIMIT clause is rejected at build time with a clear error message
  • Prevents unbounded result sets from radial queries that could silently return up to maxResultWindow rows

Engine support

  • knn.filter is supported for lucene and faiss engines (HNSW, IVF). Engine compatibility is not validated by the SQL plugin — unsupported engines reject at execution time.

SQL syntax

-- Post-filtering (default, same as omitting filter_type)
SELECT v._id, v._score
FROM vectorSearch(table='my-index', field='embedding',
     vector='[1.0, 2.0, 3.0]', option='k=10,filter_type=post') AS v
WHERE v.city = 'Seattle'
LIMIT 10

-- Efficient pre-filtering (WHERE inside knn.filter)
SELECT v._id, v._score
FROM vectorSearch(table='my-index', field='embedding',
     vector='[1.0, 2.0, 3.0]', option='k=10,filter_type=efficient') AS v
WHERE v.city = 'Seattle'
LIMIT 10

-- Radial search requires LIMIT
SELECT v._id, v._score
FROM vectorSearch(table='my-index', field='embedding',
     vector='[1.0, 2.0, 3.0]', option='max_distance=10.5') AS v
LIMIT 100

Test plan

  • ./gradlew spotlessCheck — PASS
  • ./gradlew :opensearch:test — PASS
  • ./gradlew :integ-test:integTest -Dtests.class="*VectorSearchIT" — PASS
  • ./gradlew :integ-test:integTest -Dtests.class="*VectorSearchExplainIT" — PASS
  • ./gradlew build -x integTest — PASS (full build excluding integration tests)

@mengweieric mengweieric added feature skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis. SQL labels Apr 9, 2026
@mengweieric mengweieric force-pushed the feature/sql-vector-search-efficient-filtering branch 2 times, most recently from e795bf6 to bea6607 Compare April 14, 2026 22:37
Radial search (max_distance or min_score) can return unbounded results.
Add build-time validation that rejects radial queries without an explicit
LIMIT clause, with a clear error message guiding the user.

Signed-off-by: Eric Wei <[email protected]>
@mengweieric mengweieric force-pushed the feature/sql-vector-search-efficient-filtering branch from bea6607 to 1090b36 Compare April 14, 2026 23:04
pushDownSort with a non-zero sort.getCount() pushes a limit to
requestBuilder directly, bypassing pushDownLimit() and leaving
limitPushed=false. This causes build() to incorrectly reject radial
vector search when the limit arrives via the sort-with-count path
(e.g. PPL sort command). Set limitPushed=true in the sort.getCount()
block alongside the existing requestBuilder.pushDownLimit() call.

Signed-off-by: Eric Wei <[email protected]>
pushDownFilter() did not catch ScriptQueryUnSupportedException, so
non-pushdownable filters (e.g. struct-type fields) would propagate a
raw internal exception instead of a clean SQL-layer error.

With explicit filter_type: throw a clear error explaining the WHERE
clause cannot be pushed down for the requested filter placement.
Without explicit filter_type: return false to fall back to in-memory
filtering, matching the base class behavior.

Signed-off-by: Eric Wei <[email protected]>
Reject construction of VectorSearchQueryBuilder with
FilterType.EFFICIENT and a null rebuildKnnWithFilter callback at
construction time instead of deferring to an NPE in pushDownFilter.

Signed-off-by: Eric Wei <[email protected]>
+ "LIMIT 5");

// Efficient mode: knn rebuilt with filter inside, wrapped in WrapperQueryBuilder
assertTrue("Explain should contain wrapper query:\n" + explain, explain.contains("wrapper"));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test only asserts explain.contains("wrapper"), which would also pass for a top-k query without any filter. It doesn't verify the filter is actually embedded inside the knn JSON.
Should we consider adding an assertion that efficient-mode explain does NOT contain "bool" / "must", or positively verify the knn JSON contains the filter clause.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I strengthened the test. It now asserts efficient mode does not produce the outer bool / must post-filter shape, and it decodes the wrapper query payload to verify the embedded k-NN JSON contains the filter and predicate field.

… test, rename constructor comment

- Reword filter_type error message to be user-friendly and actionable
  (no longer leaks internal ScriptQueryUnSupportedException text)
- Strengthen efficient-mode explain IT: assert no bool/must (proves not
  post-filter shape), decode base64 knn payload to verify filter and
  predicate field are embedded inside knn query
- Rename "Backward-compatible constructor" to clarify intent

Signed-off-by: Eric Wei <[email protected]>
@mengweieric mengweieric merged commit 231d477 into opensearch-project:feature/vector-search-p0 Apr 16, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis. SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants