Skip to main content

Crate forgellm_runtime

Crate forgellm_runtime 

Source
Expand description

ForgeLLM Runtime — minimal inference runtime.

Provides KV cache management, token sampling, and tokenizer integration for compiled models.

Modules§

chat
Chat template formatting for instruct models.
interpreter
Interpreter — executes IR graphs directly on CPU.
kernels
Optimized compute kernels.
kv_cache
KV cache for autoregressive transformer generation.
sampling
Token sampling strategies for autoregressive generation.
tokenizer
Tokenizer wrapper — encode text to token IDs and decode back.

Functions§

version