Skip to content

jamylak/bytepairencoding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Byte Pair Encoding Visualizer

This visualisation uses AI Generated code, finetuned for the best visualisation, not code quality

bytepairencoding.mov

Interactive C + Raylib visualizer for byte pair encoding, showing corpus statistics, pair counts, merge selection, vocabulary growth, and how repeated merges form larger learned tokens.

What This Visualisation Shows

  • How a raw corpus becomes symbol sequences
  • How adjacent pair counts determine the next merge
  • How one merge rewrites the corpus and changes the next statistics
  • A multi-panel view of merge history, current vocabulary, and the most useful pairs

Visual Map

flowchart LR
    A["Raw Corpus"]
    B["Token Sequence"]
    C["Count Adjacent Pairs"]
    D["Pick Best Pair"]
    E["Merge Into New Token"]
    F["Repeat With Updated Corpus"]

    A --> B
    B --> C
    C --> D
    D --> E
    E --> F
    F --> C
Loading

Controls

  • q: quit
  • Merge stepping and page-specific interactions are exposed in the app UI

Run

make run

About

Byte Pair Encoding Visualizer

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors