Researching agents that see, think, and act. Mostly debugging agents that don't.



systems

  • MultiNet 2025
    Benchmarking vision-language-action models across robotic learning tasks.
    [project] [src]
  • Software Control 2025
    Agents that navigate and control arbitrary software environments.
    [src]