Expand description
ForgeLLM Runtime — minimal inference runtime.
Provides KV cache management, token sampling, and tokenizer integration for compiled models.
Modules§
- chat
- Chat template formatting for instruct models.
- interpreter
- Interpreter — executes IR graphs directly on CPU.
- kernels
- Optimized compute kernels.
- kv_
cache - KV cache for autoregressive transformer generation.
- sampling
- Token sampling strategies for autoregressive generation.
- tokenizer
- Tokenizer wrapper — encode text to token IDs and decode back.