🏆 #1 on OpenAI MLE-Bench
demo.mp4
AIBuildAI is an AI agent that automatically builds AI models. Given a task, it runs an agent loop that analyzes the problem, designs models, writes code to implement them, trains them, tunes hyperparameters, evaluates model performance, and iteratively improves the models. By automating the model development workflow, AIBuildAI reduces much of the manual effort required to build AI models.
On OpenAI MLE-Bench, AIBuildAI ranked #1, demonstrating strong performance on real-world AI model building tasks.
AIBuildAI requires a Linux x86_64 machine.
curl -L -O https://github.com/aibuildai/AI-Build-AI/releases/latest/download/aibuildai-linux-x86_64-v0.1.0.tar.gz
tar -xzf aibuildai-linux-x86_64-v0.1.0.tar.gz
cd aibuildai-linux-x86_64-v0.1.0
./install.shexport ANTHROPIC_API_KEY=your-api-keyExample task: Predict the enzyme class of a protein from its amino acid sequence (Yu et al., Science 2023).
git clone https://github.com/aibuildai/AI-Build-AI.git && cd AI-Build-AI
aibuildai --task-name protein-ec-prediction \
--data-dir data/protein-ec-prediction \
--playground-dir /path/to/playground \
--model claude-opus-4-6 \
--max-agent-calls 8 \
--run-budget-minutes 60 \
--num-candidates 3 \
--instruction "$(cat tasks/protein-ec-prediction.md)" \
--pipeline-budget-minutes 90 \
--no-formAIBuildAI takes two key inputs: --data-dir, the path to the training data for the task, and --instruction, a natural-language description of the AI task to solve.
Important:
Run the command directly in your terminal. Do not wrap the command in a .sh or .bash script. Running it through a script may cause the TUI (Text User Interface) to crash.
After a run completes, the output directory usually looks like (structure may slightly vary by task):
├── candidate_1/ candidate_2/ candidate_3/ # Auto-generated training scripts and model checkpoints
├── checkpoint.pth # Best model checkpoint
├── inference.py # Standalone inference script for the final model
├── submission.csv # Test predictions (if test inputs are provided)
└── progress.pdf # Visual progress report
The main outputs of an AIBuildAI run are the model checkpoints and the script inference.py, which runs predictions with the final model on any data.
In the example protein-ec-prediction task, we provide unlabeled test data in the data folder, so AIBuildAI also generates a predicted-label file submission.csv. To evaluate the predictions against ground-truth labels:
python scripts/eval_protein_ec.py \
--labels data/labels/protein-ec-prediction.csv \
--submission /path/to/playground/code/protein-ec-prediction/timestamp/submission.csvWe provide additional task markdowns in the tasks/ folder. You can also write your own task markdown and point --data-dir to your own dataset.
To see all available options, run:
aibuildai -hAlternatively, you can run AIBuildAI using the interactive form interface by running without --no-form:
aibuildaiThis will launch a TUI (Text User Interface) where you can fill in the required parameters interactively.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
@misc{zhang2026aibuildai,
title={AIBuildAI: An AI agent that automatically builds AI models},
author={Ruiyi Zhang and Peijia Qin and Qi Cao and Li Zhang and Pengtao Xie},
year={2026}
}
