Why did we open-source our inference engine? Read the post
62 models
Quality Performance
Model Params NDCG@10 F1 AP Throughput Latency
vidore/colqwen2.5-v0.2
Encode · Multi-Vec · Qwen2
7.0B 2 tps 1.9s
vidore/colpali-v1.3-hf
Encode · Multi-Vec · PaliGemma
3.0B 6 tps 619.1ms
Alibaba-NLP/gte-Qwen2-1.5B-instruct
Encode · Dense · Qwen2
1.8B 12.3K tps 261.1ms
NovaSearch/stella_en_1.5B_v5
Encode · Dense · Qwen2
1.5B 12.8K tps 265.9ms
laion/CLIP-ViT-H-14-laion2B-s32B-b79K
Encode · Dense · CLIP
986M 321 tps 503.8ms
google/siglip-so400m-patch14-384
Encode · Dense · SigLIP
878M 355 tps 488.3ms
google/siglip-so400m-patch14-224
Encode · Dense · SigLIP
877M 348 tps 439.8ms
Qwen/Qwen3-Embedding-0.6B
Encode · Dense · Qwen3
596M 20.6K tps 156.9ms
BAAI/bge-m3
Encode · Dense /Sparse /Multi-Vec · XLM-RoBERTa
568M 33.2K tps 93.4ms
BAAI/bge-m3
Score · Dense /Sparse /Multi-Vec · XLM-RoBERTa
568M 2.8K tps 56.8ms
BAAI/bge-reranker-large
Score · Score · XLM-RoBERTa
560M 6.6K tps 41.4ms
intfloat/multilingual-e5-large
Encode · Dense · XLM-RoBERTa
560M 29.8K tps 108.6ms
intfloat/multilingual-e5-large-instruct
Encode · Dense · XLM-RoBERTa
560M 29.4K tps 106.9ms
jinaai/jina-colbert-v2
Encode · Multi-Vec · XLM-RoBERTa
559M 28.5K tps 105.7ms
jinaai/jina-colbert-v2
Score · Multi-Vec · XLM-RoBERTa
559M 1.4K tps 226.1ms
nomic-ai/nomic-embed-text-v2-moe
Encode · Dense · NomicBERT
475M 13.0K tps 149.6ms
numind/NuNER_Zero
Extract · Entities · DeBERTa
449M
NovaSearch/stella_en_400M_v5
Encode · Dense · ModernBERT
435M 27.1K tps 115.7ms
EmergentMethods/gliner_large_news-v2.1
Extract · Entities · DeBERTa
435M
Ihor/gliner-biomed-large-v1.0
Extract · Entities · DeBERTa
435M
jackboyla/glirel-large-v0
Extract · Relations · DeBERTa
435M
mixedbread-ai/mxbai-rerank-large-v2
Encode · Score · Qwen2
435M
mixedbread-ai/mxbai-rerank-large-v2
Score · Score · Qwen2
435M 2.2K tps 1.4s
urchade/gliner_large-v2.1
Extract · Entities · DeBERTa
435M
urchade/gliner_multi-v2.1
Extract · Entities · DeBERTa
435M
urchade/gliner_multi_pii-v1
Extract · Entities · DeBERTa
435M
openai/clip-vit-large-patch14
Encode · Dense · CLIP
428M 706 tps 298.1ms
mixedbread-ai/mxbai-colbert-large-v1
Encode · Multi-Vec · BERT
335M 43.3K tps 74.9ms
mixedbread-ai/mxbai-colbert-large-v1
Score · Multi-Vec · BERT
335M 4.0K tps 45.6ms
intfloat/e5-large-v2
Encode · Dense · BERT
335M 33.2K tps 86.6ms
Alibaba-NLP/gte-multilingual-base
Encode · Dense · ModernBERT
305M 55.1K tps 63.1ms
lightonai/GTE-ModernColBERT-v1
Encode · Multi-Vec · ModernBERT
305M 28.0K tps 103.9ms
lightonai/GTE-ModernColBERT-v1
Score · Multi-Vec · ModernBERT
305M 231 tps 313.4ms
google/embeddinggemma-300m
Encode · Dense · Gemma 3
303M 79.6K tps 55.7ms
jinaai/jina-reranker-v2-base-multilingual
Score · Score · XLM-RoBERTa
278M 8.3K tps 32.0ms
BAAI/bge-reranker-base
Score · Score · XLM-RoBERTa
278M 5.0K tps 33.2ms
IDEA-Research/grounding-dino-base
Extract · Bounding Boxes · Swin
233M 0.8 mpix/s 785.8ms
urchade/gliner_medium-v2.1
Extract · Entities · DeBERTa
195M
IDEA-Research/grounding-dino-tiny
Extract · Bounding Boxes · Swin
172M 0.9 mpix/s 532.6ms
google/owlv2-base-patch16-ensemble
Extract · Bounding Boxes · CLIP
155M 1.0 mpix/s 954.6ms
laion/CLIP-ViT-B-32-laion2B-s34B-b79K
Encode · Dense · CLIP
151M 1.2K tps 178.6ms
openai/clip-vit-base-patch32
Encode · Dense · CLIP
151M 651 tps 319.4ms
mixedbread-ai/mxbai-rerank-base-v2
Encode · Score · Qwen2
150M
mixedbread-ai/mxbai-rerank-base-v2
Score · Score · Qwen2
150M 7.0K tps 457.1ms
Alibaba-NLP/gte-reranker-modernbert-base
Score · Score · ModernBERT
150M 6.2K tps 41.9ms
lightonai/Reason-ModernColBERT
Encode · Multi-Vec · ModernBERT
149M 33.0K tps 82.2ms
opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte
Encode · Sparse · ModernBERT
137M 34.2K tps 93.7ms
naver/splade-cocondenser-selfdistil
Encode · Sparse · BERT
110M 40.0K tps 72.4ms
naver/splade-v3
Encode · Sparse · BERT
110M 29.6K tps 83.7ms
numind/NuNER_Zero-span
Extract · Entities · DeBERTa
110M
intfloat/e5-base-v2
Encode · Dense · BERT
109M 53.2K tps 57.9ms
opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill
Encode · Sparse · DistilBERT
67M 49.1K tps 63.3ms
opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill
Encode · Sparse · DistilBERT
67M 50.1K tps 60.7ms
opensearch-project/opensearch-neural-sparse-encoding-v2-distill
Encode · Sparse · DistilBERT
67M 44.2K tps 63.3ms
urchade/gliner_small-v2.1
Extract · Entities · DeBERTa
60M
answerdotai/answerai-colbert-small-v1
Encode · Multi-Vec · BERT
33M 59.1K tps 47.9ms
answerdotai/answerai-colbert-small-v1
Score · Multi-Vec · BERT
33M 1.7K tps 121.7ms
cross-encoder/ms-marco-MiniLM-L-12-v2
Score · Score · BERT
33M 8.2K tps 31.7ms
intfloat/e5-small-v2
Encode · Dense · BERT
33M 58.3K tps 49.7ms
mixedbread-ai/mxbai-edge-colbert-v0-32m
Encode · Multi-Vec · ModernBERT
32M 45.9K tps 59.7ms
sentence-transformers/all-MiniLM-L6-v2
Encode · Dense · BERT
23M 55.3K tps 53.3ms
rasyosef/splade-mini
Encode · Sparse · BERT
11M 56.3K tps 56.0ms

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github
1.5K

Contact us

Tell us about your use case and we'll get back to you shortly.