Serverless AI

Serverless AI is a Nebius AI Cloud service for running containerized AI workloads as interactive endpoints or non-interactive jobs. By deploying your workloads in Serverless AI, you can focus on them without worrying about the infrastructure: the service handles resource provisioning and lifecycle, and usage-based, per-second billing. The service is available in all Nebius AI Cloud regions.

Overview

Read about how Serverless AI works and how to choose between endpoints and jobs

Getting started with jobs

Create your first job that runs nvidia-smi and prints information about the GPUs in use

Getting started with endpoints

Launch a simple endpoint and send authenticated requests to it

Monitoring

Track resource utilization to schedule quota increases and to quickly identify anomalies

Pricing and quotas

Learn what other services Serverless AI uses and how this affects pricing and quotas

⌘I

Managed Service for MLflow

Applications in Nebius AI Cloud