zerfoo

package module
v1.17.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 25, 2026 License: Apache-2.0 Imports: 21 Imported by: 0

README

zerfoo

Pure Go ML framework -- inference, training, and serving. Embed any GGUF model in your Go application with go build ./....

CI Go Reference

244 tok/s on Gemma 3 1B Q4_K_M (95% memory bandwidth utilization) -- 20% faster than Ollama. Zero CGo. 20 model architectures. Tabular ML and time-series forecasting built in.

Quick Start

m, _ := zerfoo.Load("google/gemma-3-4b")  // downloads from HuggingFace
defer m.Close()
response, _ := m.Chat("Explain Go interfaces in one sentence.")
fmt.Println(response)

Installation

go get github.com/zerfoo/zerfoo

HuggingFace Download

Load accepts HuggingFace model IDs. Models are downloaded and cached automatically:

// Download by repo ID (defaults to Q4_K_M quantization)
m, err := zerfoo.Load("google/gemma-3-4b")

// Specify a quantization variant
m, err := zerfoo.Load("google/gemma-3-4b/Q8_0")

// Or load a local GGUF file
m, err := zerfoo.Load("./models/gemma-3-1b.gguf")

Streaming

Stream tokens as they are generated via a channel:

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

ch, err := m.ChatStream(context.Background(), "Tell me a joke.")
if err != nil {
    log.Fatal(err)
}
for tok := range ch {
    if !tok.Done {
        fmt.Print(tok.Text)
    }
}
fmt.Println()

Embeddings

Extract L2-normalized embeddings and compute similarity:

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

embeddings, _ := m.Embed([]string{
    "Go is a statically typed language.",
    "Rust has a borrow checker.",
})
score := embeddings[0].CosineSimilarity(embeddings[1])
fmt.Printf("similarity: %.4f\n", score)

Structured Output

Constrain model output to valid JSON matching a schema:

import "github.com/zerfoo/zerfoo/generate/grammar"

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

schema := grammar.JSONSchema{
    Type: "object",
    Properties: map[string]*grammar.JSONSchema{
        "name": {Type: "string"},
        "age":  {Type: "number"},
    },
    Required: []string{"name", "age"},
}

result, _ := m.Generate(context.Background(),
    "Generate a person named Alice who is 30.",
    zerfoo.WithSchema(schema),
)
fmt.Println(result.Text) // {"name": "Alice", "age": 30}

Tool Calling

Detect tool/function calls in model output (OpenAI-compatible):

import "github.com/zerfoo/zerfoo/serve"

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

tools := []serve.Tool{{
    Type: "function",
    Function: serve.ToolFunction{
        Name:        "get_weather",
        Description: "Get the current weather for a city",
        Parameters:  json.RawMessage(`{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}`),
    },
}}

result, _ := m.Generate(context.Background(),
    "What is the weather in Paris?",
    zerfoo.WithTools(tools...),
)

for _, tc := range result.ToolCalls {
    fmt.Printf("call %s(%s)\n", tc.FunctionName, tc.Arguments)
}

Supported Models

LLM Inference (20 architectures)
Architecture Format Special Features
Gemma 3 GGUF Q4_K Production. CUDA graph capture, 244 tok/s
Gemma 3n GGUF Mobile-optimized variant
Llama 3 GGUF RoPE theta=500K
Llama 4 GGUF Latest generation
Mistral GGUF Sliding window attention
Mixtral GGUF Mixture of Experts
Qwen 2 GGUF Attention bias, RoPE theta=1M
Phi 3/4 GGUF Partial rotary factor
DeepSeek V3 GGUF MLA + MoE (batched)
Command R GGUF Cohere architecture
Falcon GGUF Multi-query attention
RWKV GGUF Linear attention
Mamba / Mamba 3 GGUF State space models (MIMO SSM)
Jamba GGUF Hybrid Mamba-Transformer
Whisper GGUF Audio transcription
LLaVA GGUF Vision-language
Qwen-VL GGUF Vision-language

New architectures are auto-detected from GGUF metadata.

Tabular ML
Architecture Package Use Case
MLP / Ensemble tabular Baseline tabular prediction
FTTransformer tabular Attention-based tabular
TabNet tabular Attentive feature selection
SAINT tabular Self-attention + inter-sample
TabResNet tabular Residual tabular networks
Time-Series Forecasting
Architecture Package Use Case
TFT timeseries Temporal Fusion Transformer
N-BEATS timeseries Basis expansion forecasting
PatchTST timeseries Patch-based transformer

Training

Train tabular and time-series models with built-in AdamW, learning rate schedulers, and early stopping:

import "github.com/zerfoo/zerfoo/tabular"

model := tabular.NewEnsemble[float32](engine, tabular.EnsembleConfig{
    InputDim:  10,
    OutputDim: 1,
    Models:    3,
})
trainer := tabular.NewTrainer(model, engine, tabular.TrainerConfig{
    LR:     0.001,
    Epochs: 50,
})
trainer.Fit(ctx, trainX, trainY)
predictions, _ := model.Predict(ctx, testX)

CLI

go install github.com/zerfoo/zerfoo/cmd/zerfoo@latest

zerfoo pull gemma-3-1b-q4          # download a model
zerfoo run gemma-3-1b-q4 "Hello"   # generate text
zerfoo serve gemma-3-1b-q4         # OpenAI-compatible API server
zerfoo train -backend tabular ...  # train a tabular model
zerfoo list                         # list cached models

Examples

See the examples/ directory for runnable programs:

  • Getting Started -- full walkthrough: install, pull a model, run inference via CLI and library
  • GPU Setup -- configure CUDA, ROCm, or OpenCL for hardware-accelerated inference
  • Benchmarks -- throughput numbers across models and hardware
  • Design -- architecture overview and key design decisions
  • Blog -- development updates and deep dives
  • CONTRIBUTING.md -- how to contribute

License

Apache 2.0

Documentation

Overview

Package zerfoo provides the core building blocks for creating and training neural networks. It offers a prelude of commonly used types to simplify development and enhance readability of model construction code.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewAdamW

func NewAdamW[T tensor.Numeric](learningRate, beta1, beta2, epsilon, weightDecay T) *optimizer.AdamW[T]

NewAdamW creates a new AdamW optimizer with the given hyperparameters.

Stable.

func NewCPUEngine

func NewCPUEngine[T tensor.Numeric]() compute.Engine[T]

NewCPUEngine creates a new CPU computation engine for the given numeric type.

Stable.

func NewDefaultTrainer

func NewDefaultTrainer[T tensor.Numeric](
	g *graph.Graph[T],
	lossNode graph.Node[T],
	opt optimizer.Optimizer[T],
	strategy training.GradientStrategy[T],
) *training.DefaultTrainer[T]

NewDefaultTrainer creates a new default trainer for the given graph, loss, optimizer, and gradient strategy.

Stable.

func NewFloat32Ops

func NewFloat32Ops() numeric.Arithmetic[float32]

NewFloat32Ops returns the float32 arithmetic operations.

Stable.

func NewGraph

func NewGraph[T tensor.Numeric](engine compute.Engine[T]) *graph.Builder[T]

NewGraph creates a new computation graph builder for the given engine.

Stable.

func NewMSE

func NewMSE[T tensor.Numeric](engine compute.Engine[T]) *loss.MSE[T]

NewMSE creates a new Mean Squared Error loss function.

Stable.

func NewRMSNorm

func NewRMSNorm[T tensor.Numeric](name string, engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim int, options ...normalization.RMSNormOption[T]) (*normalization.RMSNorm[T], error)

NewRMSNorm creates a new RMSNorm normalization layer with the given configuration.

Stable.

func NewTensor

func NewTensor[T tensor.Numeric](shape []int, data []T) (*tensor.TensorNumeric[T], error)

NewTensor creates a new tensor with the given shape and data.

Stable.

func RegisterLayer

func RegisterLayer[T tensor.Numeric](opType string, builder model.LayerBuilder[T])

RegisterLayer registers a new layer builder for the given operation type.

Stable.

func UnregisterLayer

func UnregisterLayer(opType string)

UnregisterLayer unregisters the layer builder for the given operation type.

Stable.

Types

type Batch

type Batch[T tensor.Numeric] struct {
	Inputs  map[graph.Node[T]]*tensor.TensorNumeric[T]
	Targets *tensor.TensorNumeric[T]
}

Batch represents a training batch of inputs and targets.

Stable.

type Embedding

type Embedding struct {
	Vector []float32
}

Embedding holds a text embedding vector.

Stable.

func (Embedding) CosineSimilarity

func (e Embedding) CosineSimilarity(other Embedding) float32

CosineSimilarity computes the cosine similarity between two embeddings.

Stable.

type Engine

type Engine[T tensor.Numeric] interface {
	compute.Engine[T]
}

Engine represents a computation engine (e.g., CPU or GPU).

Stable.

type GenerateOption

type GenerateOption func(*generateOptions)

GenerateOption configures the behavior of Model.Generate.

Stable.

func WithGenMaxTokens

func WithGenMaxTokens(n int) GenerateOption

WithGenMaxTokens sets the maximum number of tokens to generate.

Stable.

func WithGenTemperature

func WithGenTemperature(t float32) GenerateOption

WithGenTemperature sets the sampling temperature.

Stable.

func WithGenTopP

func WithGenTopP(p float32) GenerateOption

WithGenTopP sets the top-p (nucleus) sampling parameter.

Stable.

func WithSchema

func WithSchema(schema grammar.JSONSchema) GenerateOption

WithSchema enables grammar-guided decoding.

The model's output will be constrained to valid JSON matching the given schema.

Experimental.

func WithToolChoice

func WithToolChoice(choice serve.ToolChoice) GenerateOption

WithToolChoice sets the tool choice mode for tool call detection.

Experimental.

func WithTools

func WithTools(tools ...serve.Tool) GenerateOption

WithTools configures the tools available for tool call detection.

When tools are provided, Model.Generate will attempt to detect tool calls in the model output and populate [GenerateResult.ToolCalls].

Experimental.

type GenerateResult

type GenerateResult struct {
	Text       string
	TokenCount int
	Duration   time.Duration
	ToolCalls  []ToolCall
}

GenerateResult holds the result of a text generation call.

Stable.

type Graph

type Graph[T tensor.Numeric] struct {
	*graph.Graph[T]
}

Graph represents a computation graph.

Stable.

type LayerBuilder

type LayerBuilder[T tensor.Numeric] func(
	engine compute.Engine[T],
	ops numeric.Arithmetic[T],
	name string,
	params map[string]*graph.Parameter[T],
	attributes map[string]interface{},
) (graph.Node[T], error)

LayerBuilder is a function that builds a computation graph layer.

Stable.

type Model

type Model struct {
	// contains filtered or unexported fields
}

Model is a loaded language model ready for inference.

A Model is created via Load and used for text generation, embedding, and tool-call detection. Model.Close must be called when the model is no longer needed to release GPU and CPU resources.

Stable.

func Load

func Load(pathOrID string) (*Model, error)

Load loads a model from a file path or HuggingFace model ID.

Paths starting with "/", "./" or "../" are treated as local GGUF files. All other strings are treated as HuggingFace model IDs (e.g. "google/gemma-3-4b" or "google/gemma-3-4b/Q8_0"). If the model is not cached locally it will be downloaded from HuggingFace.

Stable.

func (*Model) Chat

func (m *Model) Chat(prompt string) (string, error)

Chat runs a simple one-shot generation and returns the generated text.

Stable.

func (*Model) ChatStream

func (m *Model) ChatStream(ctx context.Context, prompt string, opts ...GenerateOption) (<-chan StreamToken, error)

ChatStream starts streaming generation and returns a receive-only channel that yields StreamToken values as they are generated. The channel is closed when generation completes or ctx is canceled. The error return is non-nil only if startup fails (e.g. the model is not loaded).

Stable.

func (*Model) Close

func (m *Model) Close() error

Close releases model resources.

Stable.

func (*Model) Embed

func (m *Model) Embed(texts []string) ([]Embedding, error)

Embed returns embeddings for the given texts.

Each input string is tokenized, its token embeddings are looked up from the model's embedding table, mean-pooled, and L2-normalized.

Stable.

func (*Model) Generate

func (m *Model) Generate(ctx context.Context, prompt string, opts ...GenerateOption) (*GenerateResult, error)

Generate runs text generation with the given prompt and options.

Stable.

type Node

type Node[T tensor.Numeric] interface {
	graph.Node[T]
}

Node represents a node in the computation graph.

Stable.

type Numeric

type Numeric tensor.Numeric

Numeric represents a numeric type constraint for tensor elements.

Stable.

type Parameter

type Parameter[T tensor.Numeric] struct {
	*graph.Parameter[T]
}

Parameter represents a trainable parameter in the model.

Stable.

type StreamToken

type StreamToken struct {
	Text string
	Done bool
}

StreamToken represents a token received during streaming generation.

Stable.

type Tensor

type Tensor[T tensor.Numeric] struct {
	*tensor.TensorNumeric[T]
}

Tensor represents a multi-dimensional array.

Stable.

type ToolCall

type ToolCall struct {
	ID           string
	FunctionName string
	Arguments    json.RawMessage
}

ToolCall represents a tool invocation detected in model output.

Experimental.

Directories

Path Synopsis
Package autoopt provides automatic optimization recommendations based on hardware profiling.
Package autoopt provides automatic optimization recommendations based on hardware profiling.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Package cloud provides a multi-tenant managed inference service for Zerfoo.
Package cloud provides a multi-tenant managed inference service for Zerfoo.
cmd
bench command
Command bench runs a standardized benchmark harness for zerfoo models.
Command bench runs a standardized benchmark harness for zerfoo models.
bench-compare command
Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.
Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.
bench_batch command
Command bench_batch benchmarks continuous batching vs session pool throughput.
Command bench_batch benchmarks continuous batching vs session pool throughput.
bench_disagg command
Command bench_disagg benchmarks disaggregated vs collocated serving throughput.
Command bench_disagg benchmarks disaggregated vs collocated serving throughput.
bench_mamba command
Command bench_mamba benchmarks Mamba-3 SSM vs Transformer attention decode throughput using synthetic FLOPs-based timing estimates.
Command bench_mamba benchmarks Mamba-3 SSM vs Transformer attention decode throughput using synthetic FLOPs-based timing estimates.
bench_prefix command
Command bench_prefix simulates a multi-turn chat workload to measure prefix cache hit rate and TTFT reduction.
Command bench_prefix simulates a multi-turn chat workload to measure prefix cache hit rate and TTFT reduction.
bench_spec command
Command bench_spec benchmarks speculative decoding speedup by comparing standalone target model decode against speculative decode (target + draft).
Command bench_spec benchmarks speculative decoding speedup by comparing standalone target model decode against speculative decode (target + draft).
bench_tps command
bench_tps measures tokens-per-second for a local ZMF model.
bench_tps measures tokens-per-second for a local ZMF model.
cli
Package cli provides the command-line interface framework for Zerfoo.
Package cli provides the command-line interface framework for Zerfoo.
coverage-gate command
Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.
Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.
debug-infer command
deprecation-check command
Package main implements a linter that checks // Deprecated: doc comments for proper replacement guidance and version information.
Package main implements a linter that checks // Deprecated: doc comments for proper replacement guidance and version information.
finetune command
Command finetune runs QLoRA fine-tuning on a GGUF model.
Command finetune runs QLoRA fine-tuning on a GGUF model.
train_distributed command
Command train_distributed launches distributed training using FSDP.
Command train_distributed launches distributed training using FSDP.
ts_train command
Command ts_train trains a PatchTST time-series signal model on offline feature data.
Command ts_train trains a PatchTST time-series signal model on offline feature data.
zerfoo command
zerfoo-edge command
Package main provides a minimal edge/embedded inference binary for Zerfoo.
Package main provides a minimal edge/embedded inference binary for Zerfoo.
zerfoo-predict command
zerfoo-tokenize command
Package compliance provides SOC 2 compliance automation tooling including Trust Services Criteria control mapping, evidence collection, policy document generation, and control status tracking.
Package compliance provides SOC 2 compliance automation tooling including Trust Services Criteria control mapping, evidence collection, policy document generation, and control status tracking.
audit
Package audit provides SOC 2 Type I audit tooling including readiness assessment, evidence collection automation, gap analysis, and report generation.
Package audit provides SOC 2 Type I audit tooling including readiness assessment, evidence collection automation, gap analysis, and report generation.
observation
Package observation implements the SOC 2 Type II observation period framework.
Package observation implements the SOC 2 Type II observation period framework.
Package config provides file-based configuration loading with validation.
Package config provides file-based configuration loading with validation.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Package data provides dataset containers for training batches and normalization.
Package data provides dataset containers for training batches and normalization.
deploy
aws
Package aws provides AWS Marketplace Metering API integration for Zerfoo.
Package aws provides AWS Marketplace Metering API integration for Zerfoo.
Package distributed provides multi-node distributed training for the Zerfoo ML framework.
Package distributed provides multi-node distributed training for the Zerfoo ML framework.
coordinator
Package coordinator provides a distributed training coordinator.
Package coordinator provides a distributed training coordinator.
fsdp
Package fsdp implements Fully Sharded Data Parallelism for distributed training.
Package fsdp implements Fully Sharded Data Parallelism for distributed training.
pb
docs
cookbook/01-basic-text-generation command
Recipe 01: Basic Text Generation
Recipe 01: Basic Text Generation
cookbook/02-streaming-chat command
Recipe 02: Streaming Chat
Recipe 02: Streaming Chat
cookbook/03-embedding-similarity command
Recipe 03: Embedding and Cosine Similarity
Recipe 03: Embedding and Cosine Similarity
cookbook/04-openai-server command
Recipe 04: OpenAI-Compatible Server
Recipe 04: OpenAI-Compatible Server
cookbook/05-custom-sampling command
Recipe 05: Custom Sampling Parameters
Recipe 05: Custom Sampling Parameters
cookbook/06-structured-json-output command
Recipe 06: Structured JSON Output
Recipe 06: Structured JSON Output
cookbook/07-lora-fine-tuning command
Recipe 07: Fine-Tuning with LoRA
Recipe 07: Fine-Tuning with LoRA
cookbook/08-batch-inference command
Recipe 08: Batch Inference
Recipe 08: Batch Inference
cookbook/09-speculative-decoding command
Recipe 09: Speculative Decoding
Recipe 09: Speculative Decoding
cookbook/10-tool-calling command
Recipe 10: Tool / Function Calling
Recipe 10: Tool / Function Calling
cookbook/11-rag command
Recipe 11: Retrieval-Augmented Generation (RAG)
Recipe 11: Retrieval-Augmented Generation (RAG)
cookbook/12-vision-multimodal command
Recipe 12: Vision / Multimodal Inference
Recipe 12: Vision / Multimodal Inference
examples
agentic-tool-use command
Command agentic-tool-use demonstrates function calling (tool use) with a language model using the zerfoo one-line API.
Command agentic-tool-use demonstrates function calling (tool use) with a language model using the zerfoo one-line API.
api-server command
Command api-server demonstrates starting an OpenAI-compatible inference server.
Command api-server demonstrates starting an OpenAI-compatible inference server.
audio-transcription command
Command audio-transcription demonstrates speech-to-text using the Zerfoo OpenAI-compatible API server.
Command audio-transcription demonstrates speech-to-text using the Zerfoo OpenAI-compatible API server.
automl command
Command automl demonstrates using the AutoML coordinator to search over hyperparameter configurations with Bayesian optimization and early stopping.
Command automl demonstrates using the AutoML coordinator to search over hyperparameter configurations with Bayesian optimization and early stopping.
chat command
Command chat demonstrates a simple interactive chatbot using the zerfoo one-line API.
Command chat demonstrates a simple interactive chatbot using the zerfoo one-line API.
classification command
Command classification demonstrates text classification using grammar-constrained decoding to guarantee a valid JSON response with a category label.
Command classification demonstrates text classification using grammar-constrained decoding to guarantee a valid JSON response with a category label.
code-completion command
Command code-completion demonstrates using a language model for code completion.
Command code-completion demonstrates using a language model for code completion.
distributed-training command
Command distributed-training demonstrates setting up FSDP distributed training with gradient accumulation using the zerfoo distributed and training packages.
Command distributed-training demonstrates setting up FSDP distributed training with gradient accumulation using the zerfoo distributed and training packages.
embedding command
Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.
Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.
embedding-search command
Command embedding-search demonstrates semantic search using model embeddings.
Command embedding-search demonstrates semantic search using model embeddings.
fine-tuning command
Command fine-tuning demonstrates parameter-efficient fine-tuning using LoRA (Low-Rank Adaptation) on a tabular model.
Command fine-tuning demonstrates parameter-efficient fine-tuning using LoRA (Low-Rank Adaptation) on a tabular model.
inference command
Command inference demonstrates loading a GGUF model and generating text.
Command inference demonstrates loading a GGUF model and generating text.
json-output command
Command json-output demonstrates grammar-guided decoding with a JSON schema.
Command json-output demonstrates grammar-guided decoding with a JSON schema.
langchain-chatbot command
Command langchain-chatbot demonstrates using the Zerfoo LangChain adapter as a drop-in LLM for a simple interactive chatbot loop.
Command langchain-chatbot demonstrates using the Zerfoo LangChain adapter as a drop-in LLM for a simple interactive chatbot loop.
rag command
Command rag demonstrates retrieval-augmented generation using Zerfoo.
Command rag demonstrates retrieval-augmented generation using Zerfoo.
streaming command
Command streaming demonstrates streaming chat generation using the zerfoo API.
Command streaming demonstrates streaming chat generation using the zerfoo API.
summarization command
Command summarization demonstrates text summarization using a GGUF language model.
Command summarization demonstrates text summarization using a GGUF language model.
text-embedding command
Command text-embedding demonstrates extracting text embedding vectors from a loaded GGUF model using the inference package.
Command text-embedding demonstrates extracting text embedding vectors from a loaded GGUF model using the inference package.
timeseries command
Command timeseries demonstrates time-series forecasting with the N-BEATS model using the zerfoo timeseries package.
Command timeseries demonstrates time-series forecasting with the N-BEATS model using the zerfoo timeseries package.
translation command
Command translation demonstrates text translation using a GGUF language model.
Command translation demonstrates text translation using a GGUF language model.
vision-analysis command
Command vision-analysis demonstrates multimodal inference with image input.
Command vision-analysis demonstrates multimodal inference with image input.
weaviate-search command
Command weaviate-search demonstrates using the Zerfoo Weaviate adapter to embed a corpus of documents and perform cosine-similarity semantic search without requiring a live Weaviate instance.
Command weaviate-search demonstrates using the Zerfoo Weaviate adapter to embed a corpus of documents and perform cosine-similarity semantic search without requiring a live Weaviate instance.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Package federated provides federated learning interfaces and a FedAvg baseline implementation.
Package federated provides federated learning interfaces and a FedAvg baseline implementation.
Package generate implements autoregressive text generation for transformer models loaded by the inference package.
Package generate implements autoregressive text generation for transformer models loaded by the inference package.
agent
Package agent implements the agentic tool-use loop for multi-step reasoning.
Package agent implements the agentic tool-use loop for multi-step reasoning.
grammar
Package grammar converts a subset of JSON Schema into a context-free grammar for constrained decoding.
Package grammar converts a subset of JSON Schema into a context-free grammar for constrained decoding.
speculative
Package speculative implements speculative decoding strategies for accelerated generation.
Package speculative implements speculative decoding strategies for accelerated generation.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.
Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.
Package inference provides a high-level API for loading GGUF models and running text generation, chat, embedding, and speculative decoding with minimal boilerplate.
Package inference provides a high-level API for loading GGUF models and running text generation, chat, embedding, and speculative decoding with minimal boilerplate.
multimodal
Package multimodal provides audio preprocessing for audio-language model inference.
Package multimodal provides audio preprocessing for audio-language model inference.
parallel
Package parallel provides tensor and pipeline parallelism for distributing inference across multiple GPUs.
Package parallel provides tensor and pipeline parallelism for distributing inference across multiple GPUs.
sentiment
Package sentiment provides a high-level sentiment classification pipeline that wraps encoder model loading and inference.
Package sentiment provides a high-level sentiment classification pipeline that wraps encoder model loading and inference.
timeseries
Package timeseries implements time-series model builders.
Package timeseries implements time-series model builders.
timeseries/features
Package features provides a feature store for the Wolf time-series ML platform.
Package features provides a feature store for the Wolf time-series ML platform.
integrations
langchain
Package langchain provides an adapter that makes Zerfoo's OpenAI-compatible HTTP API compatible with LangChain-Go's LLM interface.
Package langchain provides an adapter that makes Zerfoo's OpenAI-compatible HTTP API compatible with LangChain-Go's LLM interface.
weaviate
Package weaviate provides an adapter for generating embeddings via Zerfoo's OpenAI-compatible HTTP API and inserting them into a Weaviate vector database client.
Package weaviate provides an adapter for generating embeddings via Zerfoo's OpenAI-compatible HTTP API and inserting them into a Weaviate vector database client.
internal
clblast
Package clblast provides Go wrappers for the CLBlast BLAS library.
Package clblast provides Go wrappers for the CLBlast BLAS library.
codegen
Package codegen generates CUDA megakernel source code from compiled computation graphs.
Package codegen generates CUDA megakernel source code from compiled computation graphs.
cublas
Package cublas provides low-level purego bindings for the cuBLAS library.
Package cublas provides low-level purego bindings for the cuBLAS library.
cuda
Package cuda provides low-level bindings for the CUDA runtime API using (Stability: stable) dlopen/dlsym (no CGo).
Package cuda provides low-level bindings for the CUDA runtime API using (Stability: stable) dlopen/dlsym (no CGo).
cuda/kernels
Package kernels provides Go wrappers for custom CUDA kernels.
Package kernels provides Go wrappers for custom CUDA kernels.
cudnn
Package cudnn provides purego bindings for the NVIDIA cuDNN library.
Package cudnn provides purego bindings for the NVIDIA cuDNN library.
gpuapi
Package gpuapi defines internal interfaces for GPU runtime operations.
Package gpuapi defines internal interfaces for GPU runtime operations.
hip
Package hip provides low-level bindings for the AMD HIP runtime API (Stability: alpha) using purego dlopen.
Package hip provides low-level bindings for the AMD HIP runtime API (Stability: alpha) using purego dlopen.
hip/kernels
Package kernels provides Go wrappers for custom HIP kernels via purego (Stability: alpha) dlopen.
Package kernels provides Go wrappers for custom HIP kernels via purego (Stability: alpha) dlopen.
miopen
Package miopen provides low-level bindings for the AMD MIOpen library (Stability: alpha) using purego dlopen.
Package miopen provides low-level bindings for the AMD MIOpen library (Stability: alpha) using purego dlopen.
nccl
Package nccl provides CGo bindings for the NVIDIA Collective Communications (Stability: beta) Library (NCCL).
Package nccl provides CGo bindings for the NVIDIA Collective Communications (Stability: beta) Library (NCCL).
opencl
Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.
Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.
opencl/kernels
Package kernels provides OpenCL kernel source and dispatch for elementwise operations.
Package kernels provides OpenCL kernel source and dispatch for elementwise operations.
rocblas
Package rocblas provides low-level bindings for the AMD rocBLAS library (Stability: alpha) using purego dlopen.
Package rocblas provides low-level bindings for the AMD rocBLAS library (Stability: alpha) using purego dlopen.
tensorrt
Package tensorrt provides bindings for the NVIDIA TensorRT inference (Stability: alpha) library via purego (dlopen/dlsym, no CGo).
Package tensorrt provides bindings for the NVIDIA TensorRT inference (Stability: alpha) library via purego (dlopen/dlsym, no CGo).
workerpool
Package workerpool provides a persistent pool of goroutines that process submitted tasks.
Package workerpool provides a persistent pool of goroutines that process submitted tasks.
xblas
Package xblas provides CPU BLAS wrappers with ARM NEON and AVX2 SIMD assembly.
Package xblas provides CPU BLAS wrappers with ARM NEON and AVX2 SIMD assembly.
Package layers provides neural network layer implementations for the Zerfoo ML framework.
Package layers provides neural network layer implementations for the Zerfoo ML framework.
activations
Package activations provides activation function layers.
Package activations provides activation function layers.
attention
Package attention provides attention mechanisms for neural networks.
Package attention provides attention mechanisms for neural networks.
audio
Package audio provides audio-related neural network layers.
Package audio provides audio-related neural network layers.
components
Package components provides reusable components for neural network layers.
Package components provides reusable components for neural network layers.
core
Package core provides core neural network layer implementations.
Package core provides core neural network layer implementations.
embeddings
Package embeddings provides neural network embedding layers.
Package embeddings provides neural network embedding layers.
gather
Package gather provides the Gather layer for embedding-table lookup.
Package gather provides the Gather layer for embedding-table lookup.
hrm
Package hrm implements the Hierarchical Reasoning Model.
Package hrm implements the Hierarchical Reasoning Model.
normalization
Package normalization provides normalization layers for neural networks.
Package normalization provides normalization layers for neural networks.
recurrent
Package recurrent provides recurrent neural network layers.
Package recurrent provides recurrent neural network layers.
reducesum
Package reducesum provides the ReduceSum layer for axis-wise reduction.
Package reducesum provides the ReduceSum layer for axis-wise reduction.
registry
Package registry provides a central registration point for all layer builders.
Package registry provides a central registration point for all layer builders.
regularization
Package regularization provides regularization layers for neural networks.
Package regularization provides regularization layers for neural networks.
residual
Package residual provides residual connection layers for neural networks.
Package residual provides residual connection layers for neural networks.
ssm
Package ssm implements state space model layers.
Package ssm implements state space model layers.
timeseries
Package timeseries provides time-series specific neural network layers.
Package timeseries provides time-series specific neural network layers.
transformer
Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.
Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.
transpose
Package transpose provides the Transpose layer for axis permutation.
Package transpose provides the Transpose layer for axis permutation.
vision
Package vision provides vision-related neural network layers.
Package vision provides vision-related neural network layers.
Package marketplace provides a unified abstraction layer for cloud marketplace integrations across AWS, GCP, and Azure.
Package marketplace provides a unified abstraction layer for cloud marketplace integrations across AWS, GCP, and Azure.
aws
Package aws provides AWS Marketplace integration for Zerfoo Cloud, including metering, subscription lifecycle management, entitlement verification, and token-based billing.
Package aws provides AWS Marketplace integration for Zerfoo Cloud, including metering, subscription lifecycle management, entitlement verification, and token-based billing.
azure
Package azure provides Azure Marketplace integration for Zerfoo Cloud, including SaaS Fulfillment API v2, Marketplace Metering Service, subscription lifecycle management, and webhook handling.
Package azure provides Azure Marketplace integration for Zerfoo Cloud, including SaaS Fulfillment API v2, Marketplace Metering Service, subscription lifecycle management, and webhook handling.
gcp
Package gcp provides GCP Marketplace integration for Zerfoo Cloud, including Cloud Commerce Partner Procurement API integration, SaaS entitlement management, Service Control API usage metering, and token-based billing.
Package gcp provides GCP Marketplace integration for Zerfoo Cloud, including Cloud Commerce Partner Procurement API integration, SaaS entitlement management, Service Control API usage metering, and token-based billing.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Package mobile provides gomobile-compatible bindings for zerfoo inference.
Package mobile provides gomobile-compatible bindings for zerfoo inference.
Package model provides adapter implementations for bridging existing and new model interfaces.
Package model provides adapter implementations for bridging existing and new model interfaces.
gguf
Package gguf provides GGUF file format parsing and writing.
Package gguf provides GGUF file format parsing and writing.
hrm
Package hrm provides experimental Hierarchical Reasoning Model types.
Package hrm provides experimental Hierarchical Reasoning Model types.
huggingface
Package huggingface provides HuggingFace model configuration parsing.
Package huggingface provides HuggingFace model configuration parsing.
Package modelcache provides an LRU model file cache for pre-caching GGUF models on Kubernetes nodes via a DaemonSet.
Package modelcache provides an LRU model file cache for pre-caching GGUF models on Kubernetes nodes via a DaemonSet.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Package registry provides a model registry with local cache, pull, get, list, and delete operations.
Package registry provides a model registry with local cache, pull, get, list, and delete operations.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Package security implements SOC 2 security controls for the Zerfoo ML framework.
Package security implements SOC 2 security controls for the Zerfoo ML framework.
Package serve provides an OpenAI-compatible HTTP API server for model inference.
Package serve provides an OpenAI-compatible HTTP API server for model inference.
adaptive
Package adaptive implements an adaptive batch scheduler that dynamically adjusts batch size based on queue depth and latency targets to maximize throughput while meeting latency SLOs.
Package adaptive implements an adaptive batch scheduler that dynamically adjusts batch size based on queue depth and latency targets to maximize throughput while meeting latency SLOs.
agent
Package agent adapts the generate/agent agentic loop to the serving layer.
Package agent adapts the generate/agent agentic loop to the serving layer.
batcher
Package batcher implements a continuous batching scheduler for inference serving.
Package batcher implements a continuous batching scheduler for inference serving.
cloud
Package cloud provides multi-tenant namespace isolation for the serving layer.
Package cloud provides multi-tenant namespace isolation for the serving layer.
disaggregated
Package disaggregated implements disaggregated prefill/decode serving.
Package disaggregated implements disaggregated prefill/decode serving.
disaggregated/proto
Package disaggpb defines the gRPC service contracts for disaggregated prefill/decode serving.
Package disaggpb defines the gRPC service contracts for disaggregated prefill/decode serving.
multimodel
Package multimodel provides a ModelManager that loads and unloads models on demand with LRU eviction when GPU memory budget is exceeded.
Package multimodel provides a ModelManager that loads and unloads models on demand with LRU eviction when GPU memory budget is exceeded.
operator
Package operator provides a Kubernetes operator for managing ZerfooInferenceService custom resources.
Package operator provides a Kubernetes operator for managing ZerfooInferenceService custom resources.
registry
Package registry provides a bbolt-backed model version registry for tracking and A/B testing.
Package registry provides a bbolt-backed model version registry for tracking and A/B testing.
repository
Package repository provides a model repository for storing and managing GGUF model files.
Package repository provides a model repository for storing and managing GGUF model files.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.
Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.
Package support implements an enterprise support ticketing system with priority routing, SLA tracking, and webhook notifications.
Package support implements an enterprise support ticketing system with priority routing, SLA tracking, and webhook notifications.
Experimental — this package is not yet wired into the main framework.
Experimental — this package is not yet wired into the main framework.
Package tabular provides tabular ML model types.
Package tabular provides tabular ML model types.
testing
benchmark
Package benchmark provides a standardized benchmark suite for measuring ML model inference performance: tok/s decode, tok/s prefill, memory usage, and time to first token.
Package benchmark provides a standardized benchmark suite for measuring ML model inference performance: tok/s decode, tok/s prefill, memory usage, and time to first token.
compare
Package compare provides a model comparison tool that runs the same prompts through multiple models and compares their performance metrics.
Package compare provides a model comparison tool that runs the same prompts through multiple models and compares their performance metrics.
tests
training
Package training contains end-to-end training loop integration tests.
Package training contains end-to-end training loop integration tests.
Package timeseries provides time-series forecasting models built on ztensor.
Package timeseries provides time-series forecasting models built on ztensor.
Package training provides adapter implementations for bridging existing and new interfaces.
Package training provides adapter implementations for bridging existing and new interfaces.
automl
Package automl provides automated machine learning utilities including Bayesian hyperparameter optimization.
Package automl provides automated machine learning utilities including Bayesian hyperparameter optimization.
fp8
Package fp8 implements FP8 mixed-precision training support.
Package fp8 implements FP8 mixed-precision training support.
lora
Package lora implements LoRA and QLoRA fine-tuning adapters.
Package lora implements LoRA and QLoRA fine-tuning adapters.
loss
Package loss provides various loss functions for neural networks.
Package loss provides various loss functions for neural networks.
nas
Package nas implements neural architecture search using DARTS.
Package nas implements neural architecture search using DARTS.
online
Package online implements online learning with drift detection and model rollback.
Package online implements online learning with drift detection and model rollback.
optimizer
Package optimizer provides various optimization algorithms for neural networks.
Package optimizer provides various optimization algorithms for neural networks.
scheduler
Package scheduler provides learning rate scheduling strategies for optimizers.
Package scheduler provides learning rate scheduling strategies for optimizers.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL