AlphaApollo is an agentic reasoning framework that integrates multiple models and tools to enable iterative, verifiable, and self-evolving reasoning. It supports a wide range of agentic reasoning paradigms, including tool-integrated reasoning, agentic post-training (multi-turn SFT and reinforcement learning), and agentic self-evolution. AlphaApollo incorporates multiple post-training algorithms such as PPO, GRPO, and DAPO, and provides dataset-backed agentic evaluation pipelines. AlphaApollo also offers flexible and extensible agentic environments and tool-set configurations, allowing users to easily customize, extend, and scale agentic reasoning workflows.
Key Features

Agentic Reasoning
Multi-turn agentic reasoning through an iterative cycle of model reasoning, tool execution, and environment feedback.

Agentic Learning
Stable agentic learning via turn-level optimization that decouples model generations and environmental feedback.

Agentic Evolution
Multi-round agentic evolution through a propose-judge-update evolutionary loop with long-term memory.
Demo
Every morning Aya goes for a 9-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of kilometers per hour, the walk takes her 4 hours, including minutes spent in the coffee shop. When she walks kilometers per hour, the walk takes her 2 hours and 24 minutes, including minutes spent in the coffee shop. Suppose Aya walks at kilometers per hour. Find the number of minutes the walk takes her, including the minutes spent in the coffee shop.

Jen enters a lottery by picking 4 distinct numbers from . 4 numbers are randomly chosen from . She wins a prize if at least two of her numbers were of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is , where and are relatively prime positive integers. Find .

Find the largest possible real part of where is a complex number with .

Could you write a short Python script using SymPy to solve the system of equations for and , and then calculate the final answer in minutes?

Could you write a Python script to simulate this lottery 100,000 times and verify if the empirical probability closely matches your analytical result?

Please write a Python program using scipy.optimize or a simple search over to numerically find the maximum real part of this expression and verify your analytical maximum.

Quick Start
Installation
conda create -n alphaapollo python==3.12 -yconda activate alphaapollo
git clone https://github.com/tmlr-group/AlphaApollo.gitcd AlphaApollo
bash installation.shDemo Programs
# Method 1: workflow entrypoint# no-tool reasoningpython3 -m alphaapollo.workflows.test \ --model.path=Qwen/Qwen2.5-3B-Instruct \ --preprocess.data_source=math-ai/aime24
# tool-integrated reasoningpython3 -m alphaapollo.workflows.test \ --model.path=Qwen/Qwen2.5-3B-Instruct \ --preprocess.data_source=math-ai/aime24 \ --env.informal_math.enable_python_code=true \ --env.informal_math.enable_local_rag=false \ --env.max_steps=4
# Method 2: script entrypointbash examples/test/run_test_informal_math_no_tool.sh # no-tool reasoningbash examples/test/run_test_informal_math.sh # tool-integrated reasoning