zerfoo

Pure Go ML framework -- inference, training, and serving. Embed any GGUF model in your Go application with go build ./....

244 tok/s on Gemma 3 1B Q4_K_M (95% memory bandwidth utilization) -- 20% faster than Ollama. Zero CGo. 20 model architectures. Tabular ML and time-series forecasting built in.

Quick Start

m, _ := zerfoo.Load("google/gemma-3-4b")  // downloads from HuggingFace
defer m.Close()
response, _ := m.Chat("Explain Go interfaces in one sentence.")
fmt.Println(response)

Installation

go get github.com/zerfoo/zerfoo

HuggingFace Download

Load accepts HuggingFace model IDs. Models are downloaded and cached automatically:

// Download by repo ID (defaults to Q4_K_M quantization)
m, err := zerfoo.Load("google/gemma-3-4b")

// Specify a quantization variant
m, err := zerfoo.Load("google/gemma-3-4b/Q8_0")

// Or load a local GGUF file
m, err := zerfoo.Load("./models/gemma-3-1b.gguf")

Streaming

Stream tokens as they are generated via a channel:

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

ch, err := m.ChatStream(context.Background(), "Tell me a joke.")
if err != nil {
    log.Fatal(err)
}
for tok := range ch {
    if !tok.Done {
        fmt.Print(tok.Text)
    }
}
fmt.Println()

Embeddings

Extract L2-normalized embeddings and compute similarity:

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

embeddings, _ := m.Embed([]string{
    "Go is a statically typed language.",
    "Rust has a borrow checker.",
})
score := embeddings[0].CosineSimilarity(embeddings[1])
fmt.Printf("similarity: %.4f\n", score)

Structured Output

Constrain model output to valid JSON matching a schema:

import "github.com/zerfoo/zerfoo/generate/grammar"

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

schema := grammar.JSONSchema{
    Type: "object",
    Properties: map[string]*grammar.JSONSchema{
        "name": {Type: "string"},
        "age":  {Type: "number"},
    },
    Required: []string{"name", "age"},
}

result, _ := m.Generate(context.Background(),
    "Generate a person named Alice who is 30.",
    zerfoo.WithSchema(schema),
)
fmt.Println(result.Text) // {"name": "Alice", "age": 30}

Tool Calling

Detect tool/function calls in model output (OpenAI-compatible):

import "github.com/zerfoo/zerfoo/serve"

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

tools := []serve.Tool{{
    Type: "function",
    Function: serve.ToolFunction{
        Name:        "get_weather",
        Description: "Get the current weather for a city",
        Parameters:  json.RawMessage(`{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}`),
    },
}}

result, _ := m.Generate(context.Background(),
    "What is the weather in Paris?",
    zerfoo.WithTools(tools...),
)

for _, tc := range result.ToolCalls {
    fmt.Printf("call %s(%s)\n", tc.FunctionName, tc.Arguments)
}

Supported Models

LLM Inference (20 architectures)

Architecture	Format	Special Features
Gemma 3	GGUF Q4_K	Production. CUDA graph capture, 244 tok/s
Gemma 3n	GGUF	Mobile-optimized variant
Llama 3	GGUF	RoPE theta=500K
Llama 4	GGUF	Latest generation
Mistral	GGUF	Sliding window attention
Mixtral	GGUF	Mixture of Experts
Qwen 2	GGUF	Attention bias, RoPE theta=1M
Phi 3/4	GGUF	Partial rotary factor
DeepSeek V3	GGUF	MLA + MoE (batched)
Command R	GGUF	Cohere architecture
Falcon	GGUF	Multi-query attention
RWKV	GGUF	Linear attention
Mamba / Mamba 3	GGUF	State space models (MIMO SSM)
Jamba	GGUF	Hybrid Mamba-Transformer
Whisper	GGUF	Audio transcription
LLaVA	GGUF	Vision-language
Qwen-VL	GGUF	Vision-language

New architectures are auto-detected from GGUF metadata.

Tabular ML

Architecture	Package	Use Case
MLP / Ensemble	`tabular`	Baseline tabular prediction
FTTransformer	`tabular`	Attention-based tabular
TabNet	`tabular`	Attentive feature selection
SAINT	`tabular`	Self-attention + inter-sample
TabResNet	`tabular`	Residual tabular networks

Time-Series Forecasting

Architecture	Package	Use Case
TFT	`timeseries`	Temporal Fusion Transformer
N-BEATS	`timeseries`	Basis expansion forecasting
PatchTST	`timeseries`	Patch-based transformer

Training

Train tabular and time-series models with built-in AdamW, learning rate schedulers, and early stopping:

import "github.com/zerfoo/zerfoo/tabular"

model := tabular.NewEnsemble[float32](engine, tabular.EnsembleConfig{
    InputDim:  10,
    OutputDim: 1,
    Models:    3,
})
trainer := tabular.NewTrainer(model, engine, tabular.TrainerConfig{
    LR:     0.001,
    Epochs: 50,
})
trainer.Fit(ctx, trainX, trainY)
predictions, _ := model.Predict(ctx, testX)

CLI

go install github.com/zerfoo/zerfoo/cmd/zerfoo@latest

zerfoo pull gemma-3-1b-q4          # download a model
zerfoo run gemma-3-1b-q4 "Hello"   # generate text
zerfoo serve gemma-3-1b-q4         # OpenAI-compatible API server
zerfoo train -backend tabular ...  # train a tabular model
zerfoo list                         # list cached models

Examples

See the examples/ directory for runnable programs:

chat -- interactive chatbot CLI
rag -- retrieval-augmented generation with embeddings
json-output -- grammar-guided structured JSON output
embedding -- embed inference in an HTTP server
api-server -- standalone API server
inference -- basic text generation
streaming -- token streaming
fine-tuning -- LoRA fine-tuning
automl -- automated model selection
timeseries -- time-series forecasting
distributed-training -- multi-node training
agentic-tool-use -- function calling agent
audio-transcription -- Whisper transcription

Links

Getting Started -- full walkthrough: install, pull a model, run inference via CLI and library
GPU Setup -- configure CUDA, ROCm, or OpenCL for hardware-accelerated inference
Benchmarks -- throughput numbers across models and hardware
Design -- architecture overview and key design decisions
Blog -- development updates and deep dives
CONTRIBUTING.md -- how to contribute

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 2,727 Commits
.claude		.claude
.github		.github
autoopt		autoopt
benchmarks		benchmarks
bin		bin
causal		causal
cloud		cloud
cmd		cmd
compliance		compliance
config		config
crossasset		crossasset
data		data
deploy		deploy
distributed		distributed
docs		docs
docsite		docsite
examples		examples
features		features
federated		federated
generate		generate
gnn		gnn
gp		gp
health		health
inference		inference
infra/terraform/zerfoo-cloud		infra/terraform/zerfoo-cloud
integration		integration
integrations		integrations
internal		internal
layers		layers
marketplace		marketplace
meta		meta
mobile		mobile
model		model
modelcache		modelcache
modeldsl		modeldsl
monitor		monitor
provenance		provenance
recover		recover
regime		regime
registry		registry
rl		rl
scripts		scripts
security		security
serve		serve
shared		shared
shutdown		shutdown
support		support
synth		synth
tabular		tabular
testing		testing
tests		tests
timeseries		timeseries
training		training
.claude-checkpoint.md		.claude-checkpoint.md
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
api.go		api.go
api_test.go		api_test.go
api_tool_call_test.go		api_tool_call_test.go
debug-infer		debug-infer
go.mod		go.mod
go.sum		go.sum
release-please-config.json		release-please-config.json
zerfoo		zerfoo
zerfoo-train		zerfoo-train
zerfoo.go		zerfoo.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zerfoo

Quick Start

Installation

HuggingFace Download

Streaming

Embeddings

Structured Output

Tool Calling

Supported Models

LLM Inference (20 architectures)

Tabular ML

Time-Series Forecasting

Training

CLI

Examples

Links

License

About

Uh oh!

Releases 25

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

zerfoo

Quick Start

Installation

HuggingFace Download

Streaming

Embeddings

Structured Output

Tool Calling

Supported Models

LLM Inference (20 architectures)

Tabular ML

Time-Series Forecasting

Training

CLI

Examples

Links

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 25

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages