Skip to main content
Inference.net is a full-stack LLM lifecycle management platform for optimizing the models and prompts that power your AI applications. There are four key components to the platform:
  1. Capture and observe production traffic. Use models from any provider.
  2. Create and run evals against production data to understand your application.
  3. Create datasets from observed traffic to train better, task-specific models.
  4. Deploy models behind a stable production endpoint to use in your application.
Together, these four steps create a feedback loop that helps you build better, more reliable AI applications on top of models that you have full control over, rather than relying on closed-source models.

Quickstart

Capture & Observe Traffic

Route an existing OpenAI-compatible app through Inference.net and verify your first observed request.

API Quickstart

Make your first direct API call when you want to start from the hosted API instead of Observe.

Search models

Browse the model catalog before you pick an API or deployment path.

Meet with Us

Talk to our team if you want help designing your eval, training, or deployment workflow.

Where to start

If this sounds like you…Start hereWhat comes next
”We already use OpenAI or Anthropic and want visibility first.”/start-here/observe-quickstartThen create datasets from observed traffic
”We want to prototype directly against the API first.”/quickstartThen choose realtime, background, or batch
”We need a release gate before changing models.”/guides/build-a-real-traffic-eval-baselineThen use the same eval to decide whether to train or deploy