Industry-Level Benchmark

SWE-BENCH MOBILE

Evaluating AI coding agents on real-world mobile development tasks from an industry-level iOS codebase.

Tasks

449

Test Cases

Agents

Models

Leaderboard

Top performing agents

50 industry-level mobile development tasks

UI Components

18 tasks

12.5%

avg pass

Data Management

10 tasks

15.3%

avg pass

Gesture & Interaction

8 tasks

avg pass

Media & Assets

7 tasks

9.8%

avg pass

Networking

4 tasks

11.2%

avg pass

Other

3 tasks

10.5%

avg pass

Task details are private. Contact us for research collaboration.

Tasks derived from actual product requirement documents used in mobile app development.

Comprehensive test suites that validate functionality, not just syntax correctness.

Standardized evaluation pipeline ensures consistent and comparable results.

We are currently preparing the repo for public release.Please follow our project updates.