Skip to content

jasonmayes/VectorSearch.js

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VectorSearch.js

A library to perform semantic vector search, over millions of vectors in milliseconds, and can even visualize the tokens or embeddings too! Runs entirely client side in the web browser (custom Vector DB layer written on top of IndexDB) and currently supports Google's EmbeddingGemma model via Web AI libraries with WebGPU acceleration for speed.

🦾 As it runs in the browser on YOUR hardware it's totally private, costs zero dollars to use (other than your own electricity), and super low latency.

🤖 Powered by WebGPU for speed and builds upon no less than 3 popular Web ML Libraries and runtimes to get the best bits of all of them: LiteRT.js, Transformers.js, and TensorFlow.js.

⭐ Give it a star on Github if you want me to keep evolving the code or have ideas.

Show me a demo that works already

Sure check out my Codepen demo here!

Here's a screen shot of it in action:

Screenshot of VectorSearch.js in action

Got questions?

Reach out to me over on LinkedIn or follow for updates on related client side Web AI projects.

Usage

import { VectorSearch } from 'https://cdn.jsdelivr.net/gh/jasonmayes/VectorSearch.js@main/VectorSearch-min.js';

// Configuration.
const MODEL_URL = 'model/embeddinggemma-300M_seq1024_mixed-precision.tflite';  // Location of hosted EmbeddingGemma TFLite file.
const TOKENIZER_ID = 'onnx-community/embeddinggemma-300m-ONNX';  // Transformers.js Tokenizer to use.
const SEQ_LENGTH = 1024; // EmbeddingGemma version sequence length.
let vectorSearch = undefined;

// Initiation and usage example.
async function init(statusDomElement) {
  vectorSearch = new VectorSearch(MODEL_URL, TOKENIZER_ID, SEQ_LENGTH);
  await vectorSearch.load('wasm/', statusDomElement); // Location of hosted LiteRT.js Wasm runtime files (see demo).

  await store(['I love Web AI', 'I like cats', 'Dogs are cool too', 'AI rocks', 'Birds can fly', 'Web AI is client side AI', 'Fish can swim', 'Robots are neat', 'JavaScript rocks too!', 'and so on']);
  await find('Likes animals', 0.25);
}

init();


// How to store text in client side VectorDB
async function store(someArrayOfStrings) {
  await vectorSearch.storeTexts(someArrayOfStrings, 'DatabaseNameForThisData');
  // Optionally can specify callback to write status to a HTML DOM element:
  // await vectorSearch.storeTexts(someArrayOfStrings, 'DatabaseNameForThisData', STATUS_EL);
}


// Search example.
async function find(queryText, cosineSimilarityThreshold) {
  const {embedding: EMBEDDING_DATA, tokens: TOKENS} = await vectorSearch.getEmbedding(queryText);

  /** Optional: Visualize embeddings and tokens for the search query text.
  vectorSearch.renderTokens(TOKENS, SOME_DOM_ELEMENT);
  await vectorSearch.renderEmbedding(EMBEDDING_DATA, SOME_DOM_ELEMENT_FOR_VISUAL, SOME_DOM_ELEMENT_FOR_TEXT);
  **/

  // Now actually search the vector database.
  const {results: RESULTS, bestScore: BEST_SCORE, bestIndex: BEST_INDEX} = await vectorSearch.search(EMBEDDING_DATA, cosineSimilarityThreshold, 'DatabaseNameForThisData');

  if (RESULTS.length > 0) {  
    const BEST_MATCH_VECTOR = RESULTS[BEST_INDEX].vector;
    const BEST_MATCH_SCORE = RESULTS[BEST_INDEX].score;
    const BEST_MATCH_TEXT = RESULTS[BEST_INDEX].text;
    if (BEST_MATCH_TEXT) {
      console.log(BEST_MATCH_SCORE + ': ' + BEST_MATCH_TEXT);
      // Logs: 0.7519992589950562: I like cats.
    }
  } else {
    console.log('No matches found above threshold.');
  }
}

Performance

I tried to make this as fast as I could. I have tested with 100K vectors on my very old NVIDIA 1070 GPU and it can search those in tens of miliseconds. The largest cost is actually the embedding that takes around 300ms using the EmbeddingGemma model (high quality but large). You may want to swap this out for a leaner embedding model (e.g. all-MiniLM-L6-v2 that Transformers.js also supports) for the ultimate client side speed for embedding - if enough demand I can add support for that too - just open a bug.

Currently it is designed to preload the IndexDB vector DB I wrote (yes even the vector DB is client side) into GPU memory to perform as fast as possible when calculating cosine similarity for your target text across all stored vectors. So that means the first search you perform will be slower as it has to transfer memory from CPU to GPU for the first time (suggest doing a dummy vector search on page load to warm up). This also means that it currently takes roughly the SAME time for 100K vectors searched vs 1K vectors due to leveraging the GPU. I have not yet found the upper bound, but there is obviously a limit here, depending on your GPU type, VRAM size etc. I will later need to refactor to load in chunks to avoid any issues for larger vector stores on client side.

I have verified this works on Intel integrated GPUs, NVIDIA, AMD, and Apple M GPUs in any web browser that supports WebGPU (most of them do now).

Building and serving yourself

To build the minified version of the library from the src folder just run:

npm run build

Then to serve the demo folder to try it out on your own webserver run:

npm run demo

Please note that currently script.js in the dmeo/js folder imports the latest version of VectorSearch-min.js from this Github repo so change the import if you modify anything or want to host somewhere else.

Please also see below for things you need to host yourself to run on your own server.

Things to be aware of before hosting and running yourself

This project depends on a few things that need to be setup to work.

LiteRT.js Wasm files required

See the demo folder in this repo that contains a "wasm" sub folder with all the Web Assembly files needed for the LiteRT.js runtime. You will need to serve these files yourself to use the library. If you are curious to learn more about these files see the official LiteRT.js documentation.

By default the library assumes this "wasm" folder exists in the www root at "wasm/".

If your hosted version is not in the same location update the call to VECTOR_SEARCH.load() to specify the new Wasm folder location on your webserver as follows:

await VECTOR_SEARCH.load('wasm/');

Note when you call load you can also optionally specify a HTML element to render loading status updates to like this:

await VECTOR_SEARCH.load('wasm/', STATUS_EL);

EmbeddingGemma model

This repo uses Google's EmbeddingGemma model for the embedding model. Specifically this one: embeddinggemma-300M_seq1024_mixed-precision.tflite

This model is available to download from HuggingFace which you must do yourself manually:

Download it yourself from HuggingFace so any applicable T&C accepted. You can then place this downloaded model into the demo/model folder. If you place it somewhere else update the code in script.js accordingly:

const MODEL_URL = 'model/embeddinggemma-300M_seq1024_mixed-precision.tflite';

For more details see the model card page on HuggingFace.

This is a LiteRT.js Web AI compatible EmbeddingGemma model using the tflite model format.

Shoutouts

This project was made by Jason Mayes, and is possible by combining 3 amazing Web AI (client side AI) libraries and runtimes.

Huge Kudos to:

  1. LiteRT.js for the running of Google's EmbeddingGemma model.
  2. Transformers.js for the running of the tokenizer.
  3. TensorFlow.js for the WebGPU accelerated mathematics (yes Machine Learning libraries can be used to do Maths!) along with the pre/post processing of any Tensors that go into or come out of LiteRT.js for speed.

About

Client side vector search using EmbeddingGemma with Web AI (LiteRT.js, TensorFlow.js, and Transformers.js)

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors