Inspiration

Google's recent release of sparse autoencoders for their gemma series of large language models inspired us to search and understand what they were. As a developer, imagine making your llm happier or better at storytelling just by saying so.

What it does

Our project takes Google's newly released sparse autoencoders and makes them operational. We developed steering code that allows effective control and utilization of these autoencoders, enabling various text transformations like generating rhymes, altering emotional tones, or creating detailed image prompts.

How we built it

  1. We obtained the sparse autoencoders (GemmaScope) recently released by Google.
  2. Upon discovering that the existing code didn't work out of the box, we developed our own steering code.
  3. We created a user-friendly command line to demonstrate the capabilities of these autoencoders. All you need to do is specify the dimension you want to search for (ex: storytelling, politeness, inspirational, etc)

Challenges we ran into

  1. Developing steering code from scratch when the provided code didn't function as expected
  2. Understanding and effectively manipulating sparse autoencoders
  3. Defining the evolutionary search space & evolutionary search. Special thanks to Claude for this one.

Accomplishments that we're proud of

  1. Successfully implementing functional steering code for Google's sparse autoencoders
  2. Rapid development and problem-solving in a cutting-edge AI domain
  3. Combining evolutionary search with pretrained LLM capabilities

What we learned

How sparse autoencoders train and steer.

What's next for sae-evolver

  1. Bigger search on bigger models
  2. Interface
  3. Extract the weight changes into a lora for easy use on useful SAEs

Built With

Share this project:

Updates