Skip to content

Popular repositories Loading

  1. skillsbench skillsbench Public

    SkillsBench evaluates how well skills work and how effective agents are at using them

    PDDL 863 225

  2. benchflow benchflow Public

    AI benchmark runtime framework that allows you to integrate and evaluate AI tasks using Docker-based benchmarks.

    Python 192 15

  3. pokemon-gym pokemon-gym Public

    Python 92 8

  4. smolclaw smolclaw Public

    High resolution mock environments for testing and improving claw like agents

    Python 15 5

  5. jfkarena jfkarena Public

    TypeScript 7

  6. llm-builds-linux llm-builds-linux Public

    Python 6 1

Repositories

Showing 10 of 15 repositories

Top languages

Loading…

Most used topics

Loading…