Why did we open-source our inference engine? Read the post

Self-hosted inference
for search & document processing

50x cheaper vs managed model APIs
Quality boost from 85+ SOTA models
Data doesn't leave your AWS/GCP
Github
1.5K
# Configure
module "sie" {
  source = "superlinked/sie/aws"
  region = "us-east-1"
  gpus   = ["a100-40gb", "l4-spot"]
}
# Deploy
> terraform apply
> helm install sie oci://ghcr.io/superlinked/charts/sie-cluster
# Use
> pip install sie-sdk
client.encode("BAAI/bge-m3", Item(text="indemnification"),
    options={"lora": "legal"})
# Configure
module "sie" {
  source = "superlinked/sie/google"
  region = "us-central1"
  gpus   = ["a100-40gb", "l4-spot"]
}
# Deploy
> terraform apply
> helm install sie oci://ghcr.io/superlinked/charts/sie-cluster
# Use
> pip install sie-sdk
client.encode("BAAI/bge-m3", Item(text="indemnification"),
    options={"lora": "legal"})
# Run
> docker run -p 8080:8080 ghcr.io/superlinked/sie-server
# Use
> pip install sie-sdk
client.encode("BAAI/bge-m3", Item(text="indemnification"),
    options={"lora": "legal"})

SIE: Superlinked Inference Engine

Run all your Search & Document processing inference in one centralized cluster across teams and workloads.

SIE SDKs

Build your apps

> pip install sie-sdk
> npm install @superlinked/sie-sdk

and 5+ framework integrations

Manage models & configurations via SDK

client.list_models()
SIE Cluster

Deploy the cluster

> helm install sie
    oci://ghcr.io/superlinked/
        charts/sie-cluster

Observe with cloud-native tools, grafana and

> sie-top
SIE Infra

Create the infrastructure

module "sie" {
  source = "superlinked/sie/aws"
  region = "us-east-1"
  gpus   = ["a100-40gb", "l4-spot"]
}

Deploy

> terraform apply
SIE Architecture

Plan your self-deployment

SIE deployment architecture

How SIE fits in your stack

See where SIE sits in a typical retrieval pipeline alongside vector databases, orchestration frameworks, and your application layer.

Cost Comparison

Compare across models, GPU types, and cloud providers.

ProviderCost per 1B tokensNotes
OpenAI API$20emb-3-small · $0.02/1M tok
Modal + TEI$1.30bge-base on A10G · $1.10/hr
Your Cloud + SIE$0.50bge-base on spot A10G · $0.38/hr
deployment documentation

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github
1.5K

Contact us

Tell us about your use case and we'll get back to you shortly.