Self-hosted inference
for search & document processing

50x cheaper vs managed model APIs

Quality boost from 85+ SOTA models

Data doesn't leave your AWS/GCP

# Configure
module "sie" {
  source = "superlinked/sie/aws"
  region = "us-east-1"
  gpus   = ["a100-40gb", "l4-spot"]
}
# Deploy
> terraform apply
> helm install sie oci://ghcr.io/superlinked/charts/sie-cluster
# Use
> pip install sie-sdk
client.encode("BAAI/bge-m3", Item(text="indemnification"),
    options={"lora": "legal"})

# Configure
module "sie" {
  source = "superlinked/sie/google"
  region = "us-central1"
  gpus   = ["a100-40gb", "l4-spot"]
}
# Deploy
> terraform apply
> helm install sie oci://ghcr.io/superlinked/charts/sie-cluster
# Use
> pip install sie-sdk
client.encode("BAAI/bge-m3", Item(text="indemnification"),
    options={"lora": "legal"})

# Run
> docker run -p 8080:8080 ghcr.io/superlinked/sie-server
# Use
> pip install sie-sdk
client.encode("BAAI/bge-m3", Item(text="indemnification"),
    options={"lora": "legal"})

Works with your favorite tools

Browse integrations

"Haystack orchestrates multi-modal pipelines with any combination of models and SIE is the simplest way to self-host them all, including SOTA OCR."

Partner Team Member

"Chroma makes context engineering simple. SIE adds instruction-following rerankers and relationship extractors for even more precise retrieval."

Partner Team Member

"Weaviate's Query Agent unlocks natural language search and with SIE you can pre-process your query and data for even better quality."

Partner Team Member

"Modern search systems compose the best indexing, scoring, filtering and ranking models. With SIE you can host them all in one cluster."

Partner Team Member

Benefits of self-hosted inference

Pay for your own GPUs instead of per-token API pricing. Improve GPU utilization and stability vs. custom TEI/Infinity deployments.

Boost accuracy with latest task-specific open source models. Embeddings, rerankers, extraction — including multi-modal and multi-vector.

Deployment docs

under your control

Data never leaves your AWS/GCP. You pick models and configurations. SOC2 Type2 certified. Apache 2.0 licensed.

Learn from our example apps

Browse examples

SIE vs TEI vs OpenAI benchmark

SIE vs TEI vs OpenAI benchmark

Cost analysis, latency, and throughput — head-to-head comparison of SIE vs TEI vs OpenAI

Explore Example

OpenClaw semantic search

OpenClaw semantic search

SIE-powered semantic memory for the OpenClaw AI agent

Explore Example

Wine Recommender

Wine Recommender

Multimodal search over tasting notes and wine label photos using Florence2 and SigLIP

Explore Example

Regulatory Intelligence RAG

Regulatory Intelligence RAG

Custom pruner adapter and LoRA hot-loading showcase with a 3-stage pipeline on shared GPU

Explore Example

E-Commerce Product Search

E-Commerce Product Search

End-to-end product search using all 3 SIE primitives — extract, encode, and score

Explore Example

Semantic HF Model Search

Semantic HF Model Search

Semantic search over ~14K HF embedding model cards with task-specific MTEB scores

Explore Example

SIE: Superlinked Inference Engine

Run all your Search & Document processing inference in one centralized cluster across teams and workloads.

SIE SDKs

Build your apps

> pip install sie-sdk
> npm install @superlinked/sie-sdk

and 5+ framework integrations

Manage models & configurations via SDK

client.list_models()

SIE Cluster

Deploy the cluster

> helm install sie
    oci://ghcr.io/superlinked/
        charts/sie-cluster

Observe with cloud-native tools, grafana and

> sie-top

SIE Infra

Create the infrastructure

module "sie" {
  source = "superlinked/sie/aws"
  region = "us-east-1"
  gpus   = ["a100-40gb", "l4-spot"]
}

module "sie" {
  source = "superlinked/sie/google"
  region = "us-central1"
  gpus   = ["a100-40gb", "l4-spot"]
}

Deploy

> terraform apply

SIE Architecture

Plan your self-deployment

SIE deployment architecture

How SIE fits in your stack

See where SIE sits in a typical retrieval pipeline alongside vector databases, orchestration frameworks, and your application layer.

Cost Comparison

Compare across models, GPU types, and cloud providers.

ProviderCost per 1B tokensNotes

OpenAI API$20emb-3-small · $0.02/1M tok

Modal + TEI$1.30bge-base on A10G · $1.10/hr

Your Cloud + SIE$0.50bge-base on spot A10G · $0.38/hr

ProviderCost per 1B tokensNotes

Cohere Rerank$87Rerank 3.5 · $2/1K queries

Vertex AI Ranking$43Ranking API · $1/1K queries

Your Cloud + SIE$8.50modernbert-base · spot A10G · $0.38/hr

ProviderCost per 1B tokensNotes

OpenAI API$140GPT-4.1 Nano · $0.10/1M input

Google Cloud NL$5,000Entity Analysis · $1/1M chars

AWS Comprehend$5,000Entity Recog. · $1/1M chars

Your Cloud + SIE$5GLiNER · spot A10G · $0.38/hr

deployment documentation

SIE Blog

Read all articles

Daniel Svonava • Launch 02/04/26

Boost performance & reduce cost by self-hosting specialized AI models

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.