Could you please eval your harness in Terminal Bench 2.0? It's interesting to compare results with Claude Code and OpenCode.
Could you please eval your harness in Terminal Bench 2.0? It's interesting to compare results with Claude Code and OpenCode.