HyprStream is an agentic cloud infrastructure for applications that learn, build, and run. Integrating continous development, training, integration, and deployment of software and AI/ML models. Primary features include an LLM inference and training engine built in Rust, with PyTorch, featuring integrated training capabilities, version control, and secure tool use with microvm containers.
Users may communicate with open weight and custom LLMs via Hyprstream with an OpenAI API.
Easy to get started: download the AppImage and it auto-detects your NVIDIA or ROCm GPU. See docs/quickstart.md for a full walkthrough.
- Frontend-ready: Use the included TUI for easy of use and share terminals with collaborators and agents.
- Collaborative: Multi-user, multi-agent interfaces through a high-speed compositing multiplexer.
- LLM Inference & Training: Supporting the dense Qwen3.5 and Qwen3 model architectures.
- Test Time Training: Models train models using MCP tools, test-time-training, and the Muon optimizer.
- Security-minded: Zero-trust cryptographic architecture with ZK stream proxies, Casbin Policy, and OpenID integration.
- Industry-compatible: Providing compatibility with OpenAI's OpenAPI specification.
- Hardware Accelerated: NVIDIA CUDA and AMD ROCm support, universal binary.
- Version Controlled: Manages source and weights with Git, compatible with HuggingFace.
- Systemd Integration - Optional user-level service management for background workers, long-running services, and containers.
- Powered by Torch: Built on stable PyTorch C++ API (libtorch) using
tch-rs.
- Workers - Isolated workload execution using Kata microvms with cloud-hypervisor.
- [Workflows] - Git workflow file support for local continous integration, deployment, and functions-as-a-service.
- [Metrics] - Structured knowledge engine and time-series aggregation database powered by DuckDB, ADBC, and Flight.
Hyprstream requires git and git-lfs (available in all major Linux distros).
Download the Universal AppImage. We publish AppImages for each CPU/GPU configuration; the Universal image is recommended for ease-of-use and GPU auto-detection.
# Download and install (Universal recommended)
chmod +x hyprstream-v0.3.0-x86_64.AppImage
# Installer Path (v0.4.0+):
./hyprstream-v0.4.0-x86_64.AppImage wizard # add `-y` for autoinstall
# Manual path (< v0.3.0):
./hyprstream-v0.3.0-x86_64.AppImage service install
# Add to PATH
export PATH="$HOME/.local/bin:$PATH"
# Apply policy template (hyprstream is deny-by-default)
hyprstream quick policy apply-template local
hyprstream service startSee docs/quickstart.md for prerequisites, source build, and first-time setup.
NOTE: For CUDA systems, make sure you have installed CUDA Toolkit and set LD_PRELOAD:
systemctl --user set-environment LD_PRELOAD=libtorch_cuda.so && systemctl --user restart hyprstream-modelThe installed files will be located in $HOME/.local/hyprstream/ and $HOME/.local/bin/.
# Set LIBTORCH to your libtorch path, or use --features download-libtorch
cargo build --releaseSee docs/quickstart.md for prerequisites and DEVELOP.md for detailed build instructions.
Hyprstream can run inside containers. See README-Docker.md for Docker/Kubernetes deployment.
Hyprstream supports Qwen3 model inference from Git repositories (HuggingFace, GitHub, etc.).
# Clone a model
hyprstream quick clone https://huggingface.co/Qwen/Qwen3-0.6B
# Clone with a custom name
hyprstream quick clone https://huggingface.co/Qwen/Qwen3-0.6B --name qwen3-smallWorktrees are automatically managed by hyprstream.
# List all cached models
hyprstream quick list
# Get detailed model information (model:branch format)
hyprstream quick info qwen3-small
hyprstream quick info qwen3-small:main# Basic inference
hyprstream quick infer qwen3-small:main \
--prompt "Explain quantum computing in simple terms"
# With options
hyprstream quick infer qwen3-small:main \
--prompt "Write a Python function to sort a list" \
--temperature 0.7 \
--top-p 0.9 \
--max-tokens 1024HyprStream provides an OpenAI-compatible API endpoint for easy integration with existing tools and libraries:
# Start API server
hyprstream server --port 6789
# List available models (worktree-based)
curl http://localhost:6789/oai/v1/models
# Example response shows models as model:branch format
# {
# "object": "list",
# "data": [
# {
# "id": "qwen3-small:main",
# "object": "model",
# "created": 1762974327,
# "owned_by": "system driver:overlay2, saved:2.3GB, age:2h cached"
# },
# {
# "id": "qwen3-small:experiment-1",
# "object": "model",
# "created": 1762975000,
# "owned_by": "system driver:overlay2, saved:1.8GB, age:30m"
# }
# ]
# }
# Make chat completions request (OpenAI-compatible)
# NOTE: Models must be referenced with branch (model:branch format)
curl -X POST http://localhost:6789/oai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-small:main",
"messages": [
{"role": "user", "content": "Hello, world!"}
],
"max_tokens": 100,
"temperature": 0.7
}'
# Or use with any OpenAI-compatible client
export OPENAI_API_KEY="dummy"
export OPENAI_BASE_URL="http://localhost:6789/oai/v1"
# Now use any OpenAI client library
# Note: Specify model as "qwen3-small:main" not just "qwen3-small"HyprStream uses Git worktrees for model management. The /v1/models endpoint lists all worktrees (not base models):
- Format: Models are always shown as
model:branch(e.g.,qwen3-small:main) - Multiple Versions: Each worktree (branch) appears as a separate model
- Metadata: The
owned_byfield includes worktree metadata:- Storage driver (e.g.,
driver:overlay2) - Space saved via CoW (e.g.,
saved:2.3GB) - Worktree age (e.g.,
age:2h) - Cache status (
cachedif loaded in memory)
- Storage driver (e.g.,
Example: If you have a model qwen3-small with branches main, experiment-1, and training, the API will list three separate entries:
qwen3-small:mainqwen3-small:experiment-1qwen3-small:training
This allows you to work with multiple versions of the same model simultaneously, each in its own worktree with isolated changes.
HyprStream includes a built-in Model Context Protocol server that exposes inference, model management, and repository operations as tools for AI coding assistants.
1. Configure Claude Code:
claude mcp add --transport http hyprstream http://localhost:6790/mcp
2. Authenticate
Use /mcp, select hyprstream, and select Authenticate or Re-authenticate.
3. Available tools:
Once connected, Claude Code can use hyprstream tools directly:
| Tool | Description |
|---|---|
model.load |
Load a model for inference |
model.list |
List loaded models |
model.status |
Get model status and memory usage |
registry.list |
List all cloned repositories |
registry.clone |
Clone a model from HuggingFace/GitHub |
repo.* |
Branch, worktree, merge, and tag operations |
policy.* |
Policy checks and token management |
Configuration:
The MCP server listens on port 6790 by default. To change it, set in your hyprstream config:
[mcp]
host = "127.0.0.1"
http_port = 6790Or configure via the OAI-compatible API on port 6789 for non-MCP clients.
HyprStream can be configured via environment variables with the HYPRSTREAM_ prefix:
# Server configuration
export HYPRSTREAM_SERVER_HOST=0.0.0.0
export HYPRSTREAM_SERVER_PORT=6789
export HYPRSTREAM_API_KEY=your-api-key
# CORS settings
export HYPRSTREAM_CORS_ENABLED=true
export HYPRSTREAM_CORS_ORIGINS="*"
# Model management
export HYPRSTREAM_PRELOAD_MODELS=model1,model2,model3
export HYPRSTREAM_MAX_CACHED_MODELS=5
export HYPRSTREAM_MODELS_DIR=/custom/models/path
# Performance tuning
export HYPRSTREAM_USE_MMAP=true
export HYPRSTREAM_GENERATION_TIMEOUT=120Hyprstream implements layered security-in-depth:
| Layer | Technology | Purpose |
|---|---|---|
| Transport | CURVE encryption (TCP) | End-to-end encryption for TCP connections |
| Application | Ed25519 signed envelopes | Request authentication and integrity |
| Authorization | Casbin policy engine | RBAC/ABAC access control |
| Isolation | Kata Containers (optional) | VM-level workload isolation for workers |
All inter-service communication uses ZeroMQ with Cap'n Proto serialization:
- REQ/REP: Synchronous RPC calls (policy checks, model queries)
- PUB/SUB: Event streaming (sandbox lifecycle, training progress)
- XPUB/XSUB: Steerable proxy for event distribution
Every request is wrapped in a SignedEnvelope:
- Ed25519 signature over the request payload
- Nonce for replay protection
- Timestamp for clock skew validation
- Request identity (Local user, API token, Peer, or Anonymous)
Services can run in multiple modes:
- Tokio task: In-process async execution
- Dedicated thread: For
!Sendtypes (e.g., tch-rs tensors) - Subprocess: Isolated process with systemd or standalone backend
See docs/rpc-architecture.md for detailed RPC infrastructure documentation.
Quick Start:
# View current policy
hyprstream policy show
# Check if a user has permission
hyprstream policy check alice model:qwen3-small infer
# Create an API token
hyprstream policy token create \
--user alice \
--name "dev-token" \
--expires 30d \
--scope "model:*"
# Apply a built-in template -- allow all local users access to all actions on all resources
hyprstream policy apply-template localBuilt-in Templates:
local- Full access for local users (default)public-inference- Anonymous inference accesspublic-read- Anonymous read-only registry access
Worker Resources (experimental):
| Resource | Description |
|---|---|
sandbox:*, sandbox:{id} |
Pod sandbox (Kata VM) operations |
container:*, container:{id} |
Container lifecycle within sandboxes |
image:*, image:{name} |
Image pull/push/list operations |
workflow:*, workflow:{path} |
Workflow execution (.github/workflows/*.yml) |
tool:*, tool:{name} |
MCP tool access (tool:bash, tool:read_file) |
Policy History & Rollback:
# View policy commit history
hyprstream policy history
# Compare draft vs running policy
hyprstream policy diff
# Rollback to previous version
hyprstream policy rollback HEAD~1REST API Authentication:
# Create a token
hyprstream policy token create --user alice --name "my-token" --expires 1d
# Use with API requests
curl -H "Authorization: Bearer eyJ..." http://localhost:6789/v1/modelsSee docs/rpc-architecture.md for detailed RPC and service infrastructure documentation.
HyprStream supports OpenTelemetry for distributed tracing, enabled via the otel feature flag.
# Build with otel support
cargo build --features otel --release
# Combine with other features
cargo build --no-default-features --features tch-cuda,otel --release| Environment Variable | Purpose | Default |
|---|---|---|
HYPRSTREAM_OTEL_ENABLE |
Enable/disable telemetry | false |
OTEL_EXPORTER_OTLP_ENDPOINT |
OTLP backend endpoint | http://localhost:4317 |
OTEL_SERVICE_NAME |
Service name in traces | hyprstream |
HYPRSTREAM_LOG_DIR |
File logging directory | None (console only) |
Local development (stdout exporter):
export HYPRSTREAM_OTEL_ENABLE=true
export RUST_LOG=hyprstream=debug
hyprstream server --port 6789
# Spans printed to consoleProduction (OTLP to Jaeger/Tempo):
export HYPRSTREAM_OTEL_ENABLE=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317
export OTEL_SERVICE_NAME=hyprstream-prod
hyprstream server --port 6789File logging (separate from OTEL):
export HYPRSTREAM_LOG_DIR=/var/log/hyprstream
hyprstream server --port 6789
# Creates daily-rotated logs at /var/log/hyprstream/hyprstream.log- OTLP: Used automatically when running
servercommand; sends traces to backends like Jaeger, Tempo, or Datadog - Stdout: Used for CLI commands; prints spans to console for debugging
If the Universal AppImage is not detecting your GPU, you may override the settings:
# List all available backends
./hyprstream-v0.2.0-x86_64.AppImage --list-backends
# Detect available backends
./hyprstream-v0.2.0-x86_64.AppImage --detect-gpu
# Override backend selection for Universal AppImage:
HYPRSTREAM_BACKEND=cuda130 ./hyprstream-v0.2.0-x86_64.AppImage server- Operating System: Linux (x86_64, ARM64)
- Inference Service Requirements (optional):
- CPU: Full support (x86_64, ARM64)
- CUDA: NVIDIA host kernel modules (
nvidia-smiworks) - ROCm: AMDGPU kernel modules and userland (
rocm-smiworks)
- Workers Service Requirements (optional, experimental):
- Nested Virtualization: The host system running hyprstream-workers must support and have enabled nested virtualization, this may require a physical machine, bare-metal VM, or proper configuration in your QEMU/KVM settings.
- 8GB+ RAM for inference, 16GB+ for training
- Optional Dependencies:
systemd- For service management and worker process isolationcloud-hypervisor- For Kata container workers (experimental)
See CONTRIBUTING.md for guidelines.
This project uses a dual-licensing model:
AGPL-3.0 - The end-user experience and crates providing public APIs:
hyprstream(main application)hyprstream-metricshyprstream-flight
See LICENSE-AGPLV3 for details.
MIT - Library crates for broader reuse:
git2db- Git repository managementgittorrent- P2P git transportgit-xet-filter- XET large file storage filtercas-serve- CAS server for XET over SSHhyprstream-rpc- RPC infrastructurehyprstream-rpc-derive- RPC derive macros
See LICENSE-MIT for details.
Built with:
- PyTorch - Deep learning framework
- tch - Rust bindings for PyTorch
- SafeTensors - Efficient tensor serialization
- Git2 - Git operations in Rust
- Tokio - Async runtime
- Casbin - Authorization library for policy engine
- Kata Containers - VM-based container isolation (experimental)
- cloud-hypervisor - Virtual machine monitor (experimental)
