Overview

Test and compare LLM configurations, prompts, and parameters before deploying to production.

About

Prototype is Future AGI’s pre-production testing environment for AI applications. When you change a prompt, switch models, or adjust how your AI behaves, you need a way to verify the change actually improves things before it reaches real users. Without a structured testing step, teams either ship blind or run informal tests that don’t reflect real usage, and find out something is wrong only after it has caused problems.

Prototype solves this by letting you run multiple versions of your application side by side against real inputs. Each version is traced and scored automatically using evaluations you define: output quality, tone, safety, factual accuracy, or any custom criteria. Once you have results, the Prototype dashboard shows all versions compared by eval scores, cost, and latency. You use the Choose Winner flow to set how much each metric matters, let the platform rank the versions, and promote the best one to production.


How Prototype Connects to Other Features

  • Evaluation: Prototype uses the same eval templates as the rest of the platform. Scores from 70+ built-in metrics are calculated automatically per version. Learn more
  • Observability: Every prototype run is traced. After promoting a winner, traces continue in Observe so you monitor production performance. Learn more
  • Optimization: Use prototype results to identify which prompt to optimize further. Learn more

Getting Started

Was this page helpful?

Questions & Discussion