Inspiration
Ok, so I was representing Durham University in the 48 hour IndySCC hackathon the night before which is still ongoing so I'm simultaneously competing in both at the same time. In my sleep-deprived state, I thought it would be reasonable to port GPT 2 to run in the browser and explicitly on the CPU.
What it does
It runs an implementation of GPT 2 entirely in your browser.
How I built it
I compiled a simple implementation of GPT 2 with the Emscripten compiler frontend.
Challenges we ran into
The model was too large to fit in the default amount of memory a web assembly process is allowed access to, I had to increase the size of the stack too. I had to embed the model into the compiled web assembly binary as there is no obvious way to interact with filesystems. The more sensible option is to load the model from inside JS and export a function to allow JS code to load a new model defined in but this would have taken time that I do not have.
While GPT 2 successfully runs and produces output from the GPT2 LLM both out of the browser using the Node javascript runtime and in the browser in Chrome and Firefox, I'm currently struggling with the issue that the GPT 2 implementation blocks the main browser thread, this means that the user is unable to interact with the application while computing and consequently cannot chat with the LLM which is catastrophic because the purpose of this chat application is to allow the user to chat to the LLM. I'm in the process of writing an async API in C using asyncifyjs which will enable me to use worker threads to communicate with the application, leaving the main thread free which will allow the user to interact with the page. This battle is tough and I'm stumbling into errors and runtime failures, I am low on time but I am determined to keep on going.
Accomplishments that I'm proud of
This project has brought shame upon me and my family
What I learned
It's very easy to underestimate how south things could go.
What's next for Browser LLM
The model runs on a single core, I should be able to use pthreads and rewrite the matrix-multiplication routine to use more of the compute power available on the client machine.
Impact
Ahem, Browser LLM brings forth a new wave of private LLM. Unlike traditional cloud-based LLM services, which require users to upload their data and queries to a third-party server, Browser LLM runs entirely on the client-side using WebAssembly on just the CPU. This way, users can benefit from the power of LLM without compromising their data privacy or security. Libraries like browser LLM will offer a friendly interface to allow developers to write new innovative and privacy-aware LLM-enabled applications.
Built With
- c
- emscripten
- gpt
- javascript
- wasm
- web-assembly
Log in or sign up for Devpost to join the conversation.