Finetorch is a Rust-native CLI and library toolkit for practical LLM finetuning on a single GPU. It is designed around lightweight adapter training rather than full pretraining, with clear boundaries between dataset preparation, backend integration, training orchestration, and evaluation.
- Architecture
- Configuration Guide
- Getting Started
- CLI Workflows
- Use Cases
- Backend and Adapter Design
- Changelog
Create a small JSONL dataset:
mkdir -p data
cat > data/train.jsonl <<'EOF'
{"instruction":"Answer briefly","input":"What is LoRA?","output":"LoRA is a parameter-efficient finetuning method."}
{"prompt":"Complete: Gemma is","completion":"a family of language models."}
EOFPrepare shards:
cargo run -- prepare-dataset \
--input data/train.jsonl \
--output artifacts/datasetRun the scaffolded training flow:
cargo run -- train --config configs/example_run.tomlEvaluate a held-out file:
cargo run -- eval \
--config configs/example_run.toml \
--dataset data/train.jsonlFinetorch is split into four primary layers:
-
CLI layer (
src/cli/)- Parses commands and config paths.
- Orchestrates dataset preparation, training runs, and evaluation jobs.
- Emits user-facing summaries and output locations.
-
Data layer (
src/data/)- Reads JSONL instruction-tuning data.
- Normalizes mixed schemas into one internal example format.
- Applies tokenizer selection and tokenization.
- Produces shard manifests and train/val split directories.
-
Model layer (
src/model/)- Defines the
LlmBackendtrait for backend-neutral finetuning. - Hosts LoRA and QLoRA configuration structs.
- Wraps backend-specific loading and adapter persistence.
- Starts with a
llama_cppbridge and leaves room for more backends.
- Defines the
-
Training and evaluation layer (
src/train/,src/eval/)- Loads config-driven training jobs.
- Builds optimizer and scheduler state.
- Runs a lightweight training loop suitable for LoRA/QLoRA adapters.
- Computes task metrics for small-scale evaluation.
-
finetorch prepare-dataset --input data.jsonl --output dataset/- Read JSONL examples.
- Normalize records into
{ prompt, completion }pairs. - Tokenize with the selected tokenizer.
- Shuffle and shard into
train/andval/outputs. - Write a dataset manifest for downstream runs.
-
finetorch train --config configs/example_run.toml- Load
run.toml. - Instantiate the selected backend.
- Load the base model and apply LoRA/QLoRA settings.
- Run the training loop with optimizer, scheduler, and accumulation settings.
- Save adapter weights and JSONL training logs.
- Load
-
finetorch eval --config configs/example_run.toml --dataset eval.jsonl- Load config and backend.
- Read evaluation examples.
- Run forward passes over the dataset.
- Compute perplexity, exact match, BLEU, and ROUGE-L summaries.
src/
main.rs
lib.rs
config.rs
cli/
mod.rs
prepare.rs
train.rs
eval.rs
data/
mod.rs
jsonl.rs
tokenizer.rs
sharding.rs
model/
mod.rs
backend.rs
llama_cpp.rs
lora.rs
train/
mod.rs
loop.rs
optimizer.rs
scheduler.rs
eval/
mod.rs
metrics.rs
configs/
example_run.toml
docs/
architecture.md
configuration.md
getting-started.md
cli-workflows.md
use-cases.md
backends.md
Prepare a dataset:
cargo run -- prepare-dataset \
--input data/alpaca_like.jsonl \
--output artifacts/dataset \
--tokenizer sentencepiece:models/llama-3/tokenizer.model \
--train-ratio 0.95 \
--shard-size 2048Run a small finetuning job:
cargo run -- train --config configs/example_run.tomlEvaluate the resulting adapter:
cargo run -- eval \
--config configs/example_run.toml \
--dataset data/eval.jsonlThis scaffold focuses on:
- LoRA and QLoRA adapter workflows
- Config-driven orchestration
- Dataset preparation and sharding
- Backend extensibility
This scaffold does not yet implement a production-grade GPU training kernel. It establishes the module boundaries and execution flow needed to add those pieces incrementally.