This visualisation uses AI Generated code, finetuned for the best visualisation, not code quality
bytepairencoding.mov
Interactive C + Raylib visualizer for byte pair encoding, showing corpus statistics, pair counts, merge selection, vocabulary growth, and how repeated merges form larger learned tokens.
- How a raw corpus becomes symbol sequences
- How adjacent pair counts determine the next merge
- How one merge rewrites the corpus and changes the next statistics
- A multi-panel view of merge history, current vocabulary, and the most useful pairs
flowchart LR
A["Raw Corpus"]
B["Token Sequence"]
C["Count Adjacent Pairs"]
D["Pick Best Pair"]
E["Merge Into New Token"]
F["Repeat With Updated Corpus"]
A --> B
B --> C
C --> D
D --> E
E --> F
F --> C
q: quit- Merge stepping and page-specific interactions are exposed in the app UI
make run