Skip to content

feat: Add feast rag retriver functionality#5405

Merged
franciscojavierarceo merged 1 commit intofeast-dev:masterfrom
Fiona-Waters:ragretriever
Jun 24, 2025
Merged

feat: Add feast rag retriver functionality#5405
franciscojavierarceo merged 1 commit intofeast-dev:masterfrom
Fiona-Waters:ragretriever

Conversation

@Fiona-Waters
Copy link
Contributor

@Fiona-Waters Fiona-Waters commented May 30, 2025

What this PR does / why we need it:

This PR adds a FeastRagRetriever that inherits from the HuggingFace Transformers RagRetriever class. It allows for integration with feast functionality allowing RAG to be performed more easily using the Feature Store.

The following has been added:

rag_retriever.py
This module implements a custom RAG retriever by combining Feast feature store capabilities with transformer-based models. The implementation provides:

  • A custom RAG retriever (FeastRAGRetriever) that supports three search modes:
    • Text-based search
    • Vector-based search
    • Hybrid search
  • Integration with Hugging Face's transformers library and sentence-transformers
  • Configurable document formatting options

vector_store.py
This module implements a FeastVectorStore class. The FeastVectorStore takes in a FeatureStore, FeatureView and features using these to query the store via the existing retrieve_online_documents_v2 function.

setup.py
Adding entries for RAG dependencies allowing for installation with pip install feast[rag].

init.py
Adding entries for new classes added in rag_retriever.py and vector_store.py

pyproject.toml
The dependencies required for the RAG implementation have been added here.
The relevant dependencies have also been added to the relevant requirements.txt files.

feature_store.py
Changes were made to the def _validate_vector_features function. The updated function now correctly validates the length of each individual vector within a specified DataFrame column, rather than checking the total number of DataFrame columns. This change ensures that every single vector matches its expected dimension, significantly improving data integrity and error reporting. This was improved following addition of unit tests.

Unit tests
Unit tests have been included to cover the added functionality, along with some minor changes in the existing example_feature_repo_1.py file.

examples/rag-retriever
An example has been added here including, feature_repo, README, low level design image and example notebook. This currently includes implementation on Red Hat OpenShift with a remote Milvus instance.

.github/workflows
The change here is a small update suggested by @ntkathole for torch installation and error handling.

Which issue(s) this PR fixes:

Related to #5391 but get_top_docs has not been implemented here as the FeastIndex is a dummy index - the retrieval functionality exists in the FeastRagRetriever via the FeastVector store which uses retrieve_online_documents_v2 .

Misc

@Fiona-Waters Fiona-Waters changed the title [WIP] feat: Add feast rag retriver functionality feat: Add feast rag retriver functionality May 30, 2025
@Fiona-Waters Fiona-Waters force-pushed the ragretriever branch 4 times, most recently from 0abf934 to fc99dd0 Compare May 30, 2025 22:47
@Fiona-Waters Fiona-Waters force-pushed the ragretriever branch 14 times, most recently from 069fe93 to 6873c84 Compare June 5, 2025 12:28
@Fiona-Waters Fiona-Waters force-pushed the ragretriever branch 6 times, most recently from ec5ba98 to 40dfde0 Compare June 5, 2025 19:15
Copy link
Collaborator

@jyejare jyejare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Progress, few comments.

@Fiona-Waters Fiona-Waters force-pushed the ragretriever branch 6 times, most recently from 7980963 to 04fd031 Compare June 17, 2025 11:37
@jyejare
Copy link
Collaborator

jyejare commented Jun 17, 2025

LGTM. Please address @ntkathole comments and fix unit tests!

@Fiona-Waters
Copy link
Contributor Author

LGTM. Please address @ntkathole comments and fix unit tests!

Thanks @jyejare I have addressed all comments now.
The unit tests are still failing

FAILED sdk/python/tests/unit/infra/online_store/test_dynamodb_online_store.py::test_dynamodb_online_store_update - botocore.exceptions.ClientError: An error occurred (UnrecognizedClientException) when calling the ListTagsOfResource operation: The security token included in the request is invalid.
FAILED sdk/python/tests/unit/infra/online_store/test_dynamodb_online_store.py::test_dynamodb_online_store_update_tags - botocore.exceptions.ClientError: An error occurred (UnrecognizedClientException) when calling the ListTagsOfResource operation: The security token included in the request is invalid.

I'm not sure if it is related to changes that I have made. Will look into it further.

@Fiona-Waters Fiona-Waters requested a review from ntkathole June 17, 2025 13:17
@Fiona-Waters Fiona-Waters force-pushed the ragretriever branch 2 times, most recently from 099ae82 to 6d4c12d Compare June 17, 2025 15:17
@Fiona-Waters Fiona-Waters force-pushed the ragretriever branch 3 times, most recently from 7b3ad36 to 6a9708d Compare June 18, 2025 16:12
Copy link
Member

@ntkathole ntkathole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

Copy link

@astefanutti astefanutti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, awesome work!

Co-authored-by: Esa Fazal <[email protected]>

Signed-off-by: Fiona Waters <[email protected]>
Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thank you @Fiona-Waters !! Can we add a link to this in the examples in the documentation? Can be done in a follow up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants