Skip to content

feat(memory): add gemini-embedding-2-preview support#42501

Merged
gumadeiras merged 7 commits intoopenclaw:mainfrom
BillChirico:feat/gemini-embedding-2-preview
Mar 11, 2026
Merged

feat(memory): add gemini-embedding-2-preview support#42501
gumadeiras merged 7 commits intoopenclaw:mainfrom
BillChirico:feat/gemini-embedding-2-preview

Conversation

@BillChirico
Copy link
Contributor

Summary

Adds support for gemini-embedding-2-preview as an embedding model option alongside the existing gemini-embedding-001 default.

Changes

  • src/memory/embedding-model-limits.ts: Add 8192 token limit for gemini-embedding-2-preview
  • src/memory/embeddings-gemini.ts:
    • Add outputDimensionality support (768/1536/3072, default 3072)
    • Add taskType parameter for semantic retrieval optimization
    • Add multimodal part builders (buildInlineDataPart, buildFileDataPart, buildGeminiParts)
    • Add isGeminiEmbedding2Model() and resolveGeminiOutputDimensionality() helpers
  • src/memory/embeddings.ts: Add outputDimensionality and taskType to EmbeddingProviderOptions
  • docs/concepts/memory.md: Document new model option with config example and re-index warning
  • src/memory/embeddings-gemini.test.ts: Comprehensive test coverage (26 tests)

Key Features

gemini-embedding-001 gemini-embedding-2-preview
Dimensions 768 3072 (configurable: 768, 1536, 3072)
Input tokens 2048 8192
Multimodal ✅ (text, image, video, audio, PDF)
Task types ✅ (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.)

Backward Compatibility

  • gemini-embedding-001 behavior unchanged (no outputDimensionality or taskType sent)
  • Default model remains gemini-embedding-001
  • Existing configs work without modification

Test Plan

  • model: "gemini-embedding-2-preview" produces valid text embeddings
  • Token limit resolved as 8192 for the new model
  • outputDimensionality defaults to 3072; configurable via options
  • taskType passed appropriately for write vs. search operations
  • Multimodal part builders produce correct shapes
  • Existing gemini-embedding-001 behavior unchanged
  • Unit tests cover all new functionality (26 tests)
  • Lint passes (0 warnings, 0 errors)

Closes #42487

- Add gemini-embedding-2-preview to supported embedding models
- Support outputDimensionality (768/1536/3072, default 3072) for v2 models
- Support taskType parameter for semantic retrieval optimization
- Add multimodal part builders (buildInlineDataPart, buildFileDataPart)
- Set 8192 token limit for gemini-embedding-2-preview
- Maintain backward compatibility for gemini-embedding-001 (no new fields)
- Add comprehensive test coverage (26 tests)

Closes openclaw#42487
@openclaw-barnacle openclaw-barnacle bot added docs Improvements or additions to documentation size: L labels Mar 10, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR adds gemini-embedding-2-preview as a supported Gemini embedding model, bringing configurable output dimensionality (768/1536/3072), extended task types, multimodal part builders, and an 8192-token input limit. The implementation is well-structured and backward-compatible — existing gemini-embedding-001 configs are unaffected, and outputDimensionality is correctly gated on the v2 model flag in both inline and batch paths.

Key findings:

  • CHANGELOG.md: The gemini-embedding-2-preview entry replaces a Discord/auto-threads entry (feat(discord): add autoArchiveDuration config option #35065) rather than being added alongside it, causing a data-loss issue where a contribution from another author is silently removed from the public changelog.
  • manager-embedding-ops.ts async batch path: taskType is hardcoded to "RETRIEVAL_DOCUMENT" and cannot be overridden by user config, because taskType is not stored on the MemoryIndexManager class (unlike outputDimensionality, which is preserved on GeminiEmbeddingClient). Using the default is correct for indexing, but the inconsistency with the inline path means a user-configured taskType would silently be ignored when remote.batch.enabled = true.
  • The computeProviderKey correctly includes outputDimensionality, so changing dimensions will trigger automatic re-indexing as documented.
  • Test coverage is thorough (26 tests across helpers, model detection, dimension resolution, backward compat, and all new behaviors).

Confidence Score: 2/5

  • Code changes are sound and well-tested, but PR has two integration issues (CHANGELOG data loss and taskType configuration gap) that should be resolved before merge.
  • The embedding implementation itself is solid with comprehensive tests (26 tests), proper backward compatibility, and correct outputDimensionality handling in both code paths. However, the PR introduces two concrete issues: (1) CHANGELOG.md loses a prior contribution (feat(discord): add autoArchiveDuration config option #35065), which is a real data-integrity problem affecting attribution; (2) async batch path silently drops user-configured taskType, creating a configuration inconsistency where the same setting behaves differently depending on remote.batch.enabled. Both are fixable but should be addressed before merge.
  • CHANGELOG.md (must restore Discord/feat(discord): add autoArchiveDuration config option #35065 entry); src/memory/manager-embedding-ops.ts (address taskType configuration gap in async batch path)

Last reviewed commit: 38ddfb3

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b21f452df8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2bf605cd4e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@gumadeiras gumadeiras self-assigned this Mar 11, 2026
@openclaw-barnacle openclaw-barnacle bot added channel: bluebubbles Channel integration: bluebubbles channel: discord Channel integration: discord channel: googlechat Channel integration: googlechat channel: imessage Channel integration: imessage channel: line Channel integration: line channel: matrix Channel integration: matrix channel: mattermost Channel integration: mattermost channel: nextcloud-talk Channel integration: nextcloud-talk channel: nostr Channel integration: nostr channel: slack Channel integration: slack channel: telegram Channel integration: telegram channel: tlon Channel integration: tlon channel: whatsapp-web Channel integration: whatsapp-web channel: zalo Channel integration: zalo channel: zalouser Channel integration: zalouser app: web-ui App: web-ui gateway Gateway runtime commands Command implementations agents Agent runtime and tooling channel: irc labels Mar 11, 2026
@openclaw-barnacle openclaw-barnacle bot added size: L and removed channel: zalouser Channel integration: zalouser app: web-ui App: web-ui gateway Gateway runtime commands Command implementations channel: irc extensions: acpx size: XL labels Mar 11, 2026
@BillChirico BillChirico force-pushed the feat/gemini-embedding-2-preview branch from f829279 to 38ddfb3 Compare March 11, 2026 01:40
@BillChirico
Copy link
Contributor Author

@greptile-apps review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 38ddfb3c98

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ae7eeee37

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@BillChirico
Copy link
Contributor Author

I think I fixed all of the codex merging mess. Let me know if I missed anything. This PR will greatly improve embeddings.

@tristanmanchester
Copy link
Contributor

This doesn't appear to actually wire multimodal inputs into OpenClaw’s memory pipeline: it just adds some adds some multimodal-shaped helper types and builders in src/memory/embeddings-gemini.ts.

Is this intentional? You might want to specify that this is text only without further work.

@gumadeiras gumadeiras force-pushed the feat/gemini-embedding-2-preview branch from 8ae7eee to c57b1f8 Compare March 11, 2026 18:14
@gumadeiras
Copy link
Member

all supported memory files are text for now so that is intentional; adding support for non text (.md) is a bigger change and coming in a follow up PR

@gumadeiras gumadeiras merged commit 60aed95 into openclaw:main Mar 11, 2026
29 checks passed
@gumadeiras
Copy link
Member

Merged via squash.

Thanks @BillChirico!

dhoman pushed a commit to dhoman/chrono-claw that referenced this pull request Mar 11, 2026
Merged via squash.

Prepared head SHA: c57b1f8
Co-authored-by: BillChirico <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Reviewed-by: @gumadeiras
hydro13 pushed a commit to hydro13/openclaw that referenced this pull request Mar 12, 2026
Merged via squash.

Prepared head SHA: c57b1f8
Co-authored-by: BillChirico <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Reviewed-by: @gumadeiras
Ruijie-Ysp pushed a commit to Ruijie-Ysp/clawdbot that referenced this pull request Mar 12, 2026
Merged via squash.

Prepared head SHA: c57b1f8
Co-authored-by: BillChirico <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Reviewed-by: @gumadeiras
plabzzxx pushed a commit to plabzzxx/openclaw that referenced this pull request Mar 13, 2026
Merged via squash.

Prepared head SHA: c57b1f8
Co-authored-by: BillChirico <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Reviewed-by: @gumadeiras
Interstellar-code pushed a commit to Interstellar-code/operator1 that referenced this pull request Mar 16, 2026
Merged via squash.

Prepared head SHA: c57b1f8
Co-authored-by: BillChirico <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Reviewed-by: @gumadeiras

(cherry picked from commit 60aed95)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(memory): add gemini-embedding-2-preview as supported embedding model

3 participants