[Feature] Add efficient filtering (knn.filter) support for vectorSearch()#5331
Merged
mengweieric merged 14 commits intoopensearch-project:feature/vector-search-p0from Apr 16, 2026
Conversation
e795bf6 to
bea6607
Compare
Signed-off-by: Eric Wei <[email protected]>
Signed-off-by: Eric Wei <[email protected]>
…SearchIndex Signed-off-by: Eric Wei <[email protected]>
Signed-off-by: Eric Wei <[email protected]>
…on in VectorSearchQueryBuilder Signed-off-by: Eric Wei <[email protected]>
Signed-off-by: Eric Wei <[email protected]>
…fficient mode Signed-off-by: Eric Wei <[email protected]>
…matting Signed-off-by: Eric Wei <[email protected]>
Radial search (max_distance or min_score) can return unbounded results. Add build-time validation that rejects radial queries without an explicit LIMIT clause, with a clear error message guiding the user. Signed-off-by: Eric Wei <[email protected]>
bea6607 to
1090b36
Compare
pushDownSort with a non-zero sort.getCount() pushes a limit to requestBuilder directly, bypassing pushDownLimit() and leaving limitPushed=false. This causes build() to incorrectly reject radial vector search when the limit arrives via the sort-with-count path (e.g. PPL sort command). Set limitPushed=true in the sort.getCount() block alongside the existing requestBuilder.pushDownLimit() call. Signed-off-by: Eric Wei <[email protected]>
pushDownFilter() did not catch ScriptQueryUnSupportedException, so non-pushdownable filters (e.g. struct-type fields) would propagate a raw internal exception instead of a clean SQL-layer error. With explicit filter_type: throw a clear error explaining the WHERE clause cannot be pushed down for the requested filter placement. Without explicit filter_type: return false to fall back to in-memory filtering, matching the base class behavior. Signed-off-by: Eric Wei <[email protected]>
Reject construction of VectorSearchQueryBuilder with FilterType.EFFICIENT and a null rebuildKnnWithFilter callback at construction time instead of deferring to an NPE in pushDownFilter. Signed-off-by: Eric Wei <[email protected]>
ahkcs
reviewed
Apr 15, 2026
| + "LIMIT 5"); | ||
|
|
||
| // Efficient mode: knn rebuilt with filter inside, wrapped in WrapperQueryBuilder | ||
| assertTrue("Explain should contain wrapper query:\n" + explain, explain.contains("wrapper")); |
Collaborator
There was a problem hiding this comment.
The test only asserts explain.contains("wrapper"), which would also pass for a top-k query without any filter. It doesn't verify the filter is actually embedded inside the knn JSON.
Should we consider adding an assertion that efficient-mode explain does NOT contain "bool" / "must", or positively verify the knn JSON contains the filter clause.
Collaborator
Author
There was a problem hiding this comment.
Agreed. I strengthened the test. It now asserts efficient mode does not produce the outer bool / must post-filter shape, and it decodes the wrapper query payload to verify the embedded k-NN JSON contains the filter and predicate field.
dai-chen
reviewed
Apr 16, 2026
dai-chen
reviewed
Apr 16, 2026
… test, rename constructor comment - Reword filter_type error message to be user-friendly and actionable (no longer leaks internal ScriptQueryUnSupportedException text) - Strengthen efficient-mode explain IT: assert no bool/must (proves not post-filter shape), decode base64 knn payload to verify filter and predicate field are embedded inside knn query - Rename "Backward-compatible constructor" to clarify intent Signed-off-by: Eric Wei <[email protected]>
Signed-off-by: Eric Wei <[email protected]>
ahkcs
approved these changes
Apr 16, 2026
dai-chen
approved these changes
Apr 16, 2026
231d477
into
opensearch-project:feature/vector-search-p0
36 checks passed
3 tasks
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
filter_type=post|efficientoption tovectorSearch()so WHERE clauses can be placed inside the knn clause (knn.filter) for efficient pre-filtering during ANN search, or outside asbool.filterfor post-filtering (default). Also adds mandatory LIMIT enforcement for radial search.What this PR adds
FilterType enum and option parsing
FilterTypeenum (POST,EFFICIENT) withfromString()validationfilter_typeadded to allowed option keys inVectorSearchTableFunctionImplementationfilter_typeis stripped from options before knn JSON generation — it's a SQL-layer directive, not a knn parameterEfficient filter pushdown
VectorSearchQueryBuilder.pushDownFilter()branches on filter type:POST(default): knn inbool.must+ WHERE inbool.filter(post-filtering)EFFICIENT: rebuilds knn query with WHERE embedded inknn.filtervia callbackFunction<QueryBuilder, QueryBuilder>callback keeps JSON serialization inVectorSearchIndexbuildKnnQueryJson()collapsed to accept optional filter JSON parameter — no duplicationBuild-time validation
build()override rejects explicitfilter_typewhen no filter is pushed down (either no WHERE clause at all, or the WHERE clause was not pushdownable)pushDownFilter()catchesScriptQueryUnSupportedExceptionfor non-pushdownable conditions:filter_type: throws a clear error explaining the condition cannot be pushed downfilter_type: returnsfalseto fall back to in-memory filtering, matching base class behaviorRadial search LIMIT requirement
max_distanceormin_score) without an explicitLIMITclause is rejected at build time with a clear error messagemaxResultWindowrowsEngine support
knn.filteris supported for lucene and faiss engines (HNSW, IVF). Engine compatibility is not validated by the SQL plugin — unsupported engines reject at execution time.SQL syntax
Test plan
./gradlew spotlessCheck— PASS./gradlew :opensearch:test— PASS./gradlew :integ-test:integTest -Dtests.class="*VectorSearchIT"— PASS./gradlew :integ-test:integTest -Dtests.class="*VectorSearchExplainIT"— PASS./gradlew build -x integTest— PASS (full build excluding integration tests)