Introduction

Inference.net is a full-stack LLM lifecycle management platform for optimizing the models and prompts that power your AI applications. There are four key components to the platform:

Capture and observe production traffic. Use models from any provider.
Create and run evals against production data to understand your application.
Create datasets from observed traffic to train better, task-specific models.
Deploy models behind a stable production endpoint to use in your application.

Together, these four steps create a feedback loop that helps you build better, more reliable AI applications on top of models that you have full control over, rather than relying on closed-source models.

Quickstart

Capture & Observe Traffic

Route an existing OpenAI-compatible app through Inference.net and verify your first observed request.

API Quickstart

Make your first direct API call when you want to start from the hosted API instead of Observe.

Search models

Browse the model catalog before you pick an API or deployment path.

Meet with Us

Talk to our team if you want help designing your eval, training, or deployment workflow.

Where to start

If this sounds like you…	Start here	What comes next
”We already use OpenAI or Anthropic and want visibility first.”	/start-here/observe-quickstart	Then create datasets from observed traffic
”We want to prototype directly against the API first.”	/quickstart	Then choose realtime, background, or batch
”We need a release gate before changing models.”	/guides/build-a-real-traffic-eval-baseline	Then use the same eval to decide whether to train or deploy

API Quickstart

⌘I

Workhorse Models

Guides

Reference

Tutorials

Introduction

Quickstart

Capture & Observe Traffic

API Quickstart

Search models

Meet with Us

Where to start

Introduction

Workhorse Models

Guides

Reference

Tutorials

​Quickstart

Capture & Observe Traffic

API Quickstart

Search models

Meet with Us

​Where to start

Quickstart

Where to start