AlphaApollo

A System for Deep Agentic Reasoning

AlphaApollo is an agentic reasoning framework that integrates multiple models and tools to enable iterative, verifiable, and self-evolving reasoning. It supports a wide range of agentic reasoning paradigms, including tool-integrated reasoning, agentic post-training (multi-turn SFT and reinforcement learning), and agentic self-evolution. AlphaApollo incorporates multiple post-training algorithms such as PPO, GRPO, and DAPO, and provides dataset-backed agentic evaluation pipelines. AlphaApollo also offers flexible and extensible agentic environments and tool-set configurations, allowing users to easily customize, extend, and scale agentic reasoning workflows.

Key Features

Agentic Reasoning

Multi-turn agentic reasoning through an iterative cycle of model reasoning, tool execution, and environment feedback.

Agentic Learning

Stable agentic learning via turn-level optimization that decouples model generations and environmental feedback.

Agentic Evolution

Multi-round agentic evolution through a propose-judge-update evolutionary loop with long-term memory.

Demo

Question 1

Every morning Aya goes for a 9-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her 4 hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her 2 hours and 24 minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+\frac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her, including the $t$ minutes spent in the coffee shop.

Question 2

Jen enters a lottery by picking 4 distinct numbers from $S=\{1, 2, 3, \dots, 9, 10\}$ . 4 numbers are randomly chosen from $S$ . She wins a prize if at least two of her numbers were of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $m/n$ , where $m$ and $n$ are relatively prime positive integers. Find $m+n$ .

Question 3

Find the largest possible real part of $(75+117i)z+\frac{96+144i}{z}$ where $z$ is a complex number with $|z|=4$ .

Followup 1

Could you write a short Python script using SymPy to solve the system of equations for $s$ and $t$ , and then calculate the final answer in minutes?

Followup 2

Could you write a Python script to simulate this lottery 100,000 times and verify if the empirical probability closely matches your analytical result?

Followup 3

Please write a Python program using scipy.optimize or a simple search over $\theta \in [0, 2\pi]$ to numerically find the maximum real part of this expression and verify your analytical maximum.

Quick Start

Installation

bash

conda create -n alphaapollo python==3.12 -y
conda activate alphaapollo

git clone https://github.com/tmlr-group/AlphaApollo.git
cd AlphaApollo

bash installation.sh

Demo Programs

bash

# Method 1: workflow entrypoint
# no-tool reasoning
python3 -m alphaapollo.workflows.test \
  --model.path=Qwen/Qwen2.5-3B-Instruct \
  --preprocess.data_source=math-ai/aime24

# tool-integrated reasoning
python3 -m alphaapollo.workflows.test \
  --model.path=Qwen/Qwen2.5-3B-Instruct \
  --preprocess.data_source=math-ai/aime24 \
  --env.informal_math.enable_python_code=true \
  --env.informal_math.enable_local_rag=false \
  --env.max_steps=4

# Method 2: script entrypoint
bash examples/test/run_test_informal_math_no_tool.sh # no-tool reasoning
bash examples/test/run_test_informal_math.sh # tool-integrated reasoning