(yes I know this technically isn't an LLM and is built to be smaller than a typical SLM atm)
Building a full MoE multi-head attention LLM from scratch in Rust. Building off of my 3110 Project in a language I actually like. Also to complement the Graduate LLMs class I'm taking this semester, CS 6784.
Plan is to build a simple SHA LLM first and then move onto MHA.