Embeddings
Embedding models are neural networks that encode information into representative vectors that can be used for tasks like semantic retrieval, clustering, and recommender systems.zembed-1
zembed-1 is ZeroEntropy’s flagship, state-of-the-art, open-weight, multilingual embedding model.
You can read more about its performance in this blog post.
zembed-1 is the default embedding model used in zsearch, ZeroEntropy’s search engine.zembed-1:
- Calling the models/embed API endpoint, which is available via the Python and Node SDKs.
- Downloading the weights from HuggingFace and self-hosting the model.
- On the AWS Marketplace through SageMaker.
zembed-1:
- Latency mode: Control the trade-off between latency and throughput based on your use case.
- Embedding type: Specify whether you are embedding a query or a passage to take advantage of asymmetrical retrieval.
- Embedding size: Choose an output dimension from the available options: 2560 (default), 1280, 640, 320, 160, 80, or 40.
Rerankers
Rerankers are cross-encoder neural networks that can boost the accuracy of any search system. You can read more about what rerankers are and when they are most useful in this blog post.zerank-2 and zerank-1
zerank-2 is our flagship state-of-the-art reranker, you can read more about its performance at this blog post.
zerank-1 and zerank-1-small are our first generation of SOTA rerankers.
All our rerankers can be called using:
- Using the models/rerank API endpoint, which is callable via the Python and Node SDKs.
- By passing in the
rerankerquery parameter into top-snippets - Downloading from our HuggingFace and self-hosting the models.
- On the AWS Marketplace through SageMaker.
zerank-1-smallis also available on Baseten.
We’ve open-sourced
zerank-1-small under an Apache 2.0 license, and it is also available through HuggingFace and Baseten.Our flagship model zerank-2 can be downloaded from HuggingFace under a non-commercial license. To use in a commercial setting, contact us at [email protected] and we’ll get you a license ASAP!Using the ZeroEntropy SDK
Using top-snippets
When querying for /top-snippets from a ZeroEntropy collection, you can easily apply the reranker and get a significantly better ranking. Scores from a reranker are deterministic and more readily interpretable, which is another benefit over just hybrid search.Ratelimiting and Pricing
Rate limits
Each API key is limited to2,500,000 UTF-8 bytes per minute on the default latency mode "fast", both for embedding and reranking.
| Model | Latency Mode | TPM | RPM |
|---|---|---|---|
| zembed-1 | "fast" | 2,500,000 UTF-8 bytes | 100 |
| zembed-1 | "slow" | 25,000,000 UTF-8 bytes | 100 |
| zerank-2 | "fast" | 2,500,000 UTF-8 bytes | 100 |
| zerank-2 | "slow" | 25,000,000 UTF-8 bytes | 100 |
| zerank-1 | "fast" | 2,500,000 UTF-8 bytes | 100 |
| zerank-1 | "slow" | 25,000,000 UTF-8 bytes | 100 |
| zerank-1-small | "fast" | 2,500,000 UTF-8 bytes | 100 |
| zerank-1-small | "slow" | 25,000,000 UTF-8 bytes | 100 |
Pricing
Our pricing is simple and transparent.| Model | Price per 1000 Tokens | Price per 1M Tokens |
|---|---|---|
| zembed-1 | $0.000050 | $0.050 |
| zerank-2 | $0.000025 | $0.025 |
| zerank-1 | $0.000025 | $0.025 |
| zerank-1-small | $0.000025 | $0.025 |
Deployment Options
All of our models are open-weight and available through different deployment options. For help choosing the right option for your use case, reach out to our team.- API and SDKs
- AWS Marketplace
- Azure Marketplace
- Self-Hosted
The fastest way to get started. Fully managed infrastructure with no deployment overhead.Your data is never used for model training.
MSA, DPA, and BAA available on request.
See our Trust Portal for SOC 2 Type II, Pentest, and other compliance documentation.
- SDKs: Python | Node
- Authentication: API key via dashboard. All requests authenticated over TLS. SSO SAML through Okta available for enterprise customers.
- Regions available: US-East, US-West, Europe.
- Rate limits: You can refer to the rate limits shown above.
- Latency: We benchmarked our models latency in this open-source repository.
- Status Page: Visit our Status Page to monitor Uptime.
MSA, DPA, and BAA available on request.
See our Trust Portal for SOC 2 Type II, Pentest, and other compliance documentation.