zerfoo

package module

v1.17.1 Latest Latest Go to latest Published: Mar 25, 2026 License: Apache-2.0 Imports: 21 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

README ¶

zerfoo

Pure Go ML framework -- inference, training, and serving. Embed any GGUF model in your Go application with go build ./....

244 tok/s on Gemma 3 1B Q4_K_M (95% memory bandwidth utilization) -- 20% faster than Ollama. Zero CGo. 20 model architectures. Tabular ML and time-series forecasting built in.

Quick Start

m, _ := zerfoo.Load("google/gemma-3-4b")  // downloads from HuggingFace
defer m.Close()
response, _ := m.Chat("Explain Go interfaces in one sentence.")
fmt.Println(response)

Installation

go get github.com/zerfoo/zerfoo

HuggingFace Download

Load accepts HuggingFace model IDs. Models are downloaded and cached automatically:

// Download by repo ID (defaults to Q4_K_M quantization)
m, err := zerfoo.Load("google/gemma-3-4b")

// Specify a quantization variant
m, err := zerfoo.Load("google/gemma-3-4b/Q8_0")

// Or load a local GGUF file
m, err := zerfoo.Load("./models/gemma-3-1b.gguf")

Streaming

Stream tokens as they are generated via a channel:

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

ch, err := m.ChatStream(context.Background(), "Tell me a joke.")
if err != nil {
    log.Fatal(err)
}
for tok := range ch {
    if !tok.Done {
        fmt.Print(tok.Text)
    }
}
fmt.Println()

Embeddings

Extract L2-normalized embeddings and compute similarity:

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

embeddings, _ := m.Embed([]string{
    "Go is a statically typed language.",
    "Rust has a borrow checker.",
})
score := embeddings[0].CosineSimilarity(embeddings[1])
fmt.Printf("similarity: %.4f\n", score)

Structured Output

Constrain model output to valid JSON matching a schema:

import "github.com/zerfoo/zerfoo/generate/grammar"

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

schema := grammar.JSONSchema{
    Type: "object",
    Properties: map[string]*grammar.JSONSchema{
        "name": {Type: "string"},
        "age":  {Type: "number"},
    },
    Required: []string{"name", "age"},
}

result, _ := m.Generate(context.Background(),
    "Generate a person named Alice who is 30.",
    zerfoo.WithSchema(schema),
)
fmt.Println(result.Text) // {"name": "Alice", "age": 30}

Tool Calling

Detect tool/function calls in model output (OpenAI-compatible):

import "github.com/zerfoo/zerfoo/serve"

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

tools := []serve.Tool{{
    Type: "function",
    Function: serve.ToolFunction{
        Name:        "get_weather",
        Description: "Get the current weather for a city",
        Parameters:  json.RawMessage(`{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}`),
    },
}}

result, _ := m.Generate(context.Background(),
    "What is the weather in Paris?",
    zerfoo.WithTools(tools...),
)

for _, tc := range result.ToolCalls {
    fmt.Printf("call %s(%s)\n", tc.FunctionName, tc.Arguments)
}

Supported Models

LLM Inference (20 architectures)

Architecture	Format	Special Features
Gemma 3	GGUF Q4_K	Production. CUDA graph capture, 244 tok/s
Gemma 3n	GGUF	Mobile-optimized variant
Llama 3	GGUF	RoPE theta=500K
Llama 4	GGUF	Latest generation
Mistral	GGUF	Sliding window attention
Mixtral	GGUF	Mixture of Experts
Qwen 2	GGUF	Attention bias, RoPE theta=1M
Phi 3/4	GGUF	Partial rotary factor
DeepSeek V3	GGUF	MLA + MoE (batched)
Command R	GGUF	Cohere architecture
Falcon	GGUF	Multi-query attention
RWKV	GGUF	Linear attention
Mamba / Mamba 3	GGUF	State space models (MIMO SSM)
Jamba	GGUF	Hybrid Mamba-Transformer
Whisper	GGUF	Audio transcription
LLaVA	GGUF	Vision-language
Qwen-VL	GGUF	Vision-language

New architectures are auto-detected from GGUF metadata.

Tabular ML

Architecture	Package	Use Case
MLP / Ensemble	`tabular`	Baseline tabular prediction
FTTransformer	`tabular`	Attention-based tabular
TabNet	`tabular`	Attentive feature selection
SAINT	`tabular`	Self-attention + inter-sample
TabResNet	`tabular`	Residual tabular networks

Time-Series Forecasting

Architecture	Package	Use Case
TFT	`timeseries`	Temporal Fusion Transformer
N-BEATS	`timeseries`	Basis expansion forecasting
PatchTST	`timeseries`	Patch-based transformer

Training

Train tabular and time-series models with built-in AdamW, learning rate schedulers, and early stopping:

import "github.com/zerfoo/zerfoo/tabular"

model := tabular.NewEnsemble[float32](engine, tabular.EnsembleConfig{
    InputDim:  10,
    OutputDim: 1,
    Models:    3,
})
trainer := tabular.NewTrainer(model, engine, tabular.TrainerConfig{
    LR:     0.001,
    Epochs: 50,
})
trainer.Fit(ctx, trainX, trainY)
predictions, _ := model.Predict(ctx, testX)

CLI

go install github.com/zerfoo/zerfoo/cmd/zerfoo@latest

zerfoo pull gemma-3-1b-q4          # download a model
zerfoo run gemma-3-1b-q4 "Hello"   # generate text
zerfoo serve gemma-3-1b-q4         # OpenAI-compatible API server
zerfoo train -backend tabular ...  # train a tabular model
zerfoo list                         # list cached models

Examples

See the examples/ directory for runnable programs:

chat -- interactive chatbot CLI
rag -- retrieval-augmented generation with embeddings
json-output -- grammar-guided structured JSON output
embedding -- embed inference in an HTTP server
api-server -- standalone API server
inference -- basic text generation
streaming -- token streaming
fine-tuning -- LoRA fine-tuning
automl -- automated model selection
timeseries -- time-series forecasting
distributed-training -- multi-node training
agentic-tool-use -- function calling agent
audio-transcription -- Whisper transcription

Links

Getting Started -- full walkthrough: install, pull a model, run inference via CLI and library
GPU Setup -- configure CUDA, ROCm, or OpenCL for hardware-accelerated inference
Benchmarks -- throughput numbers across models and hardware
Design -- architecture overview and key design decisions
Blog -- development updates and deep dives
CONTRIBUTING.md -- how to contribute

License

Apache 2.0

Documentation ¶

Overview ¶

Package zerfoo provides the core building blocks for creating and training neural networks. It offers a prelude of commonly used types to simplify development and enhance readability of model construction code.

Index ¶

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NewAdamW ¶

func NewAdamW[T tensor.Numeric](learningRate, beta1, beta2, epsilon, weightDecay T) *optimizer.AdamW[T]

NewAdamW creates a new AdamW optimizer with the given hyperparameters.

Stable.

func NewCPUEngine ¶

func NewCPUEngine[T tensor.Numeric]() compute.Engine[T]

NewCPUEngine creates a new CPU computation engine for the given numeric type.

Stable.

func NewDefaultTrainer ¶

func NewDefaultTrainer[T tensor.Numeric](
	g *graph.Graph[T],
	lossNode graph.Node[T],
	opt optimizer.Optimizer[T],
	strategy training.GradientStrategy[T],
) *training.DefaultTrainer[T]

NewDefaultTrainer creates a new default trainer for the given graph, loss, optimizer, and gradient strategy.

Stable.

func NewFloat32Ops ¶

func NewFloat32Ops() numeric.Arithmetic[float32]

NewFloat32Ops returns the float32 arithmetic operations.

Stable.

func NewGraph ¶

func NewGraph[T tensor.Numeric](engine compute.Engine[T]) *graph.Builder[T]

NewGraph creates a new computation graph builder for the given engine.

Stable.

func NewMSE ¶

func NewMSE[T tensor.Numeric](engine compute.Engine[T]) *loss.MSE[T]

NewMSE creates a new Mean Squared Error loss function.

Stable.

func NewRMSNorm ¶

func NewRMSNorm[T tensor.Numeric](name string, engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim int, options ...normalization.RMSNormOption[T]) (*normalization.RMSNorm[T], error)

NewRMSNorm creates a new RMSNorm normalization layer with the given configuration.

Stable.

func NewTensor ¶

func NewTensor[T tensor.Numeric](shape []int, data []T) (*tensor.TensorNumeric[T], error)

NewTensor creates a new tensor with the given shape and data.

Stable.

func RegisterLayer ¶

func RegisterLayer[T tensor.Numeric](opType string, builder model.LayerBuilder[T])

RegisterLayer registers a new layer builder for the given operation type.

Stable.

func UnregisterLayer ¶

func UnregisterLayer(opType string)

UnregisterLayer unregisters the layer builder for the given operation type.

Stable.

Types ¶

type Batch ¶

type Batch[T tensor.Numeric] struct {
	Inputs  map[graph.Node[T]]*tensor.TensorNumeric[T]
	Targets *tensor.TensorNumeric[T]
}

Batch represents a training batch of inputs and targets.

Stable.

type Embedding ¶

type Embedding struct {
	Vector []float32
}

Embedding holds a text embedding vector.

Stable.

func (Embedding) CosineSimilarity ¶

func (e Embedding) CosineSimilarity(other Embedding) float32

CosineSimilarity computes the cosine similarity between two embeddings.

Stable.

type Engine ¶

type Engine[T tensor.Numeric] interface {
	compute.Engine[T]
}

Engine represents a computation engine (e.g., CPU or GPU).

Stable.

type GenerateOption ¶

type GenerateOption func(*generateOptions)

GenerateOption configures the behavior of Model.Generate.

Stable.

func WithGenMaxTokens ¶

func WithGenMaxTokens(n int) GenerateOption

WithGenMaxTokens sets the maximum number of tokens to generate.

Stable.

func WithGenTemperature ¶

func WithGenTemperature(t float32) GenerateOption

WithGenTemperature sets the sampling temperature.

Stable.

func WithGenTopP ¶

func WithGenTopP(p float32) GenerateOption

WithGenTopP sets the top-p (nucleus) sampling parameter.

Stable.

func WithSchema ¶

func WithSchema(schema grammar.JSONSchema) GenerateOption

WithSchema enables grammar-guided decoding.

The model's output will be constrained to valid JSON matching the given schema.

Experimental.

func WithToolChoice ¶

func WithToolChoice(choice serve.ToolChoice) GenerateOption

WithToolChoice sets the tool choice mode for tool call detection.

Experimental.

func WithTools ¶

func WithTools(tools ...serve.Tool) GenerateOption

WithTools configures the tools available for tool call detection.

When tools are provided, Model.Generate will attempt to detect tool calls in the model output and populate [GenerateResult.ToolCalls].

Experimental.

type GenerateResult ¶

type GenerateResult struct {
	Text       string
	TokenCount int
	Duration   time.Duration
	ToolCalls  []ToolCall
}

GenerateResult holds the result of a text generation call.

Stable.

type Graph ¶

type Graph[T tensor.Numeric] struct {
	*graph.Graph[T]
}

Graph represents a computation graph.

Stable.

type LayerBuilder ¶

type LayerBuilder[T tensor.Numeric] func(
	engine compute.Engine[T],
	ops numeric.Arithmetic[T],
	name string,
	params map[string]*graph.Parameter[T],
	attributes map[string]interface{},
) (graph.Node[T], error)

LayerBuilder is a function that builds a computation graph layer.

Stable.

type Model ¶

type Model struct {
	// contains filtered or unexported fields
}

Model is a loaded language model ready for inference.

A Model is created via Load and used for text generation, embedding, and tool-call detection. Model.Close must be called when the model is no longer needed to release GPU and CPU resources.

Stable.

func Load ¶

func Load(pathOrID string) (*Model, error)

Load loads a model from a file path or HuggingFace model ID.

Paths starting with "/", "./" or "../" are treated as local GGUF files. All other strings are treated as HuggingFace model IDs (e.g. "google/gemma-3-4b" or "google/gemma-3-4b/Q8_0"). If the model is not cached locally it will be downloaded from HuggingFace.

Stable.

func (*Model) Chat ¶

func (m *Model) Chat(prompt string) (string, error)

Chat runs a simple one-shot generation and returns the generated text.

Stable.

func (*Model) ChatStream ¶

func (m *Model) ChatStream(ctx context.Context, prompt string, opts ...GenerateOption) (<-chan StreamToken, error)

ChatStream starts streaming generation and returns a receive-only channel that yields StreamToken values as they are generated. The channel is closed when generation completes or ctx is canceled. The error return is non-nil only if startup fails (e.g. the model is not loaded).

Stable.

func (*Model) Close ¶

func (m *Model) Close() error

Close releases model resources.

Stable.

func (*Model) Embed ¶

func (m *Model) Embed(texts []string) ([]Embedding, error)

Embed returns embeddings for the given texts.

Each input string is tokenized, its token embeddings are looked up from the model's embedding table, mean-pooled, and L2-normalized.

Stable.

func (*Model) Generate ¶

func (m *Model) Generate(ctx context.Context, prompt string, opts ...GenerateOption) (*GenerateResult, error)

Generate runs text generation with the given prompt and options.

Stable.

type Node ¶

type Node[T tensor.Numeric] interface {
	graph.Node[T]
}

Node represents a node in the computation graph.

Stable.

type Numeric ¶

type Numeric tensor.Numeric

Numeric represents a numeric type constraint for tensor elements.

Stable.

type Parameter ¶

type Parameter[T tensor.Numeric] struct {
	*graph.Parameter[T]
}

Parameter represents a trainable parameter in the model.

Stable.

type StreamToken ¶

type StreamToken struct {
	Text string
	Done bool
}

StreamToken represents a token received during streaming generation.

Stable.

type Tensor ¶

type Tensor[T tensor.Numeric] struct {
	*tensor.TensorNumeric[T]
}

Tensor represents a multi-dimensional array.

Stable.

type ToolCall ¶

type ToolCall struct {
	ID           string
	FunctionName string
	Arguments    json.RawMessage
}

ToolCall represents a tool invocation detected in model output.

Experimental.

Source Files ¶

View all Source files

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
autoopt Package autoopt provides automatic optimization recommendations based on hardware profiling.	Package autoopt provides automatic optimization recommendations based on hardware profiling.
causal Experimental — this package is not yet wired into the main framework.	Experimental — this package is not yet wired into the main framework.
cloud Package cloud provides a multi-tenant managed inference service for Zerfoo.	Package cloud provides a multi-tenant managed inference service for Zerfoo.
cmd
bench command Command bench runs a standardized benchmark harness for zerfoo models.	Command bench runs a standardized benchmark harness for zerfoo models.
bench-compare command Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.	Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.
bench_batch command Command bench_batch benchmarks continuous batching vs session pool throughput.	Command bench_batch benchmarks continuous batching vs session pool throughput.
bench_disagg command Command bench_disagg benchmarks disaggregated vs collocated serving throughput.	Command bench_disagg benchmarks disaggregated vs collocated serving throughput.
bench_mamba command Command bench_mamba benchmarks Mamba-3 SSM vs Transformer attention decode throughput using synthetic FLOPs-based timing estimates.	Command bench_mamba benchmarks Mamba-3 SSM vs Transformer attention decode throughput using synthetic FLOPs-based timing estimates.
bench_prefix command Command bench_prefix simulates a multi-turn chat workload to measure prefix cache hit rate and TTFT reduction.	Command bench_prefix simulates a multi-turn chat workload to measure prefix cache hit rate and TTFT reduction.
bench_spec command Command bench_spec benchmarks speculative decoding speedup by comparing standalone target model decode against speculative decode (target + draft).	Command bench_spec benchmarks speculative decoding speedup by comparing standalone target model decode against speculative decode (target + draft).
bench_tps command bench_tps measures tokens-per-second for a local ZMF model.	bench_tps measures tokens-per-second for a local ZMF model.
cli Package cli provides the command-line interface framework for Zerfoo.	Package cli provides the command-line interface framework for Zerfoo.
coverage-gate command Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.	Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.
debug-infer command
deprecation-check command Package main implements a linter that checks // Deprecated: doc comments for proper replacement guidance and version information.	Package main implements a linter that checks // Deprecated: doc comments for proper replacement guidance and version information.
finetune command Command finetune runs QLoRA fine-tuning on a GGUF model.	Command finetune runs QLoRA fine-tuning on a GGUF model.
train_distributed command Command train_distributed launches distributed training using FSDP.	Command train_distributed launches distributed training using FSDP.
ts_train command Command ts_train trains a PatchTST time-series signal model on offline feature data.	Command ts_train trains a PatchTST time-series signal model on offline feature data.
zerfoo command
zerfoo-edge command Package main provides a minimal edge/embedded inference binary for Zerfoo.	Package main provides a minimal edge/embedded inference binary for Zerfoo.
zerfoo-predict command
zerfoo-tokenize command
compliance Package compliance provides SOC 2 compliance automation tooling including Trust Services Criteria control mapping, evidence collection, policy document generation, and control status tracking.	Package compliance provides SOC 2 compliance automation tooling including Trust Services Criteria control mapping, evidence collection, policy document generation, and control status tracking.
audit Package audit provides SOC 2 Type I audit tooling including readiness assessment, evidence collection automation, gap analysis, and report generation.	Package audit provides SOC 2 Type I audit tooling including readiness assessment, evidence collection automation, gap analysis, and report generation.
observation Package observation implements the SOC 2 Type II observation period framework.	Package observation implements the SOC 2 Type II observation period framework.
config Package config provides file-based configuration loading with validation.	Package config provides file-based configuration loading with validation.
crossasset Experimental — this package is not yet wired into the main framework.	Experimental — this package is not yet wired into the main framework.
data Package data provides dataset containers for training batches and normalization.	Package data provides dataset containers for training batches and normalization.
deploy
aws Package aws provides AWS Marketplace Metering API integration for Zerfoo.	Package aws provides AWS Marketplace Metering API integration for Zerfoo.
distributed Package distributed provides multi-node distributed training for the Zerfoo ML framework.	Package distributed provides multi-node distributed training for the Zerfoo ML framework.
coordinator Package coordinator provides a distributed training coordinator.	Package coordinator provides a distributed training coordinator.
fsdp Package fsdp implements Fully Sharded Data Parallelism for distributed training.	Package fsdp implements Fully Sharded Data Parallelism for distributed training.
pb
docs
cookbook/01-basic-text-generation command Recipe 01: Basic Text Generation	Recipe 01: Basic Text Generation
cookbook/02-streaming-chat command Recipe 02: Streaming Chat	Recipe 02: Streaming Chat
cookbook/03-embedding-similarity command Recipe 03: Embedding and Cosine Similarity	Recipe 03: Embedding and Cosine Similarity
cookbook/04-openai-server command Recipe 04: OpenAI-Compatible Server	Recipe 04: OpenAI-Compatible Server