I created an issue for the refactor: Remove unit_primitive references from ActiveCon... (#592907)
cc @maddievn - sorry, I meant to post an explanation for this one but forgot
Explanation:
The unit_primitive parameter here is mainly to make the initialization call from ActiveContext::EmbeddingModel.generate_embeddings work.
But ideally, we should treat the unit_primitive in the the Gitlab::Llm::Embeddings::ModelDefinition object as the source of truth. This would follow the approach in the rest of the LLM module where the source of truth for the unit primitive is within the LLM classes.
In a later refactor, we can remove the unit_primitive references in ActiveContext classes except for the ModelSelector.
once we know how customers will be billed, we need to implement it. As with everything in AI-land, it's really hard to follow how things are implemented. Can we get someone with expertise on DAP billing to help out with what needs to be done?
@maddievn - There's an issue for implementing the billing in https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/work_items/1985+
I've written out what I know needs to be done + the unknowns, which are pretty major, and there may be more unknowns I'm not aware of. We definitely need someone who could guide us with this.
Order is relevant.
Note: I've updated the test setups for Collections::Code, References::Code, and Queries::Code so that these specs have no idea about the specific LLM class being used (Ai::ActiveContext::Embeddings::Code::VertexText vs ::Gitlab::Llm::Embeddings::CodeEmbeddings). The tests were getting a bit unwieldy and brittle with these classes having that knowledge
cc @maddievn
Pam Artiaga (4bb9d9a2) at 10 Mar 06:53
refactor: propagate bad request errors to client
Thanks for this summary @maddievn
I also posted a question in gitlab#536642 (comment 3144609929), about us possibly moving all Duo Self-hosted support (including Gitlab-operated models) to post-GA altogether.
A lot of the prep work we've done for Duo Self-hosted support are also foundations for billing tracking (which imo should be a non-negotiable requirement for GA), so we are not throwing away completed work.
In any case, that question is something for everyone to weigh in on and I'm OK with it either way, but the outcome of that discussion would also influence the GA/post-GA list here.
Pam Artiaga (c41ef31c) at 10 Mar 04:50
Remove llm class references in ActiveContext tests
Pam Artiaga (29a95c1a) at 10 Mar 04:28
Resolve test failures in ActiveContext classes
[ActiveContext SM]: UI for creating connection (#585318) could not be a blocker. This is just the frontend for self-hosted,
@arturoherrero - I thought we have settled on including Duo self-hosted support for GA? IMO, for GA, I think it would make sense to have frontend ready for that, instead of asking customers to configure their vector stores through the console.
Anyway, this does bring up the same question as #592744 (closed) about what we should really include for GA.
The GA requirements I listed above can be broken down into 3 groups: Billing, Duo Self-hosted support, and outstanding fixes/investigations.
Shall we go ahead and drop the Duo self-hosted support out of the requirements for GA? This does not change the timeline of when we can get it done, but it means we can go to GA earlier if we also make Semantic Code Search available outside of MCP.
WDYT @maddievn @arturoherrero @tgao3701908 @changzhengliu @mnohr
The order is important.
The order is important.
Is the billing/licensing model for semantic search calls resolved?
No, I still have an outstanding question regarding how to bill embeddings requests for indexing for SM customers, see https://gitlab.com/gitlab-org/gitlab/-/work_items/586372#note_3141209402.
Does exposing via REST (outside of Duo) change the billing consideration?
It shouldn't.
cc @dgruzd @tgao3701908 @changzhengliu @maddievn @arturoherrero
Regarding this point:
Unblock compliance-restricted customers without waiting for MCP to reach GA
The reason that Semantic Code Search is still in Beta is not just because of the MCP Server. I have listed the requirements for GA in #536642 (comment 3133196028), and we have committed that the timeline for completion would correspond with MCP Server going to GA.
This means that even if we add Semantic Code Search to a REST API, we still need to complete the issues in #536642 (comment 3133196028) to reach GA and thus make it available for compliance-restricted customers.
(There are maybe some work items we can move to post-GA, but we can continue discussions in #536642 (comment 3133196028))
Update:
I had to do a few rebases due to merge conflicts.
There is also a failure for ingest:dry-run, but that looks like a CI resource problem.
Pam Artiaga (473b02f6) at 09 Mar 07:06
Add tests and fix functionalities
Pam Artiaga (bb4c7100) at 09 Mar 04:31
Add MR url for feature flag config
Pam Artiaga (bfc5a134) at 09 Mar 04:30
refactor: add handling for empty response data
... and 7 more commits
Pam Artiaga (e38fface) at 09 Mar 04:29
Introduce generic embeddings llm class