Can we consider using a yaml file as our one-stop configuration solution?
Currently, we have both .env and command line to configure arguments for the bench. I observe that if we try to config using different LLMs for different things (agent, judge, tool summaries, etc.), our current infra does not support that, because all tools and agents use get_llm_backend_for_tools to get their LLM backends. This, combined with our hybrid configuration interface, is not very ergonomic if the user want to configure different LLMs for different things.
I think a better interface is that we provide a version of get_llm_backend_for_tools that takes in a provider/model_id string (e.g., openai/gpt-5.2), and return an litellm backend for that. Then, in the .env, we ask the user to specify different LLMs, like JUDGE_MODEL="openai/gpt-5.2". Within each tool/agent, we can simply say judge_llm_backend = get_llm_backend_for_tools(os.environ.get("JUDGE_MODEL")).
The user can also use this yaml file to configure other parameters which we now configure with .env. Technically speaking, .env should preferably store sensitive information like API keys and login tokens, but now it's bloated with endpoints and other things. Using yaml also gives us a schema to do a preflight check of the configuration, without wasting time to setup the cluster and then fail because we spell gpt as gtp.
Let me know what folks think.
Can we consider using a yaml file as our one-stop configuration solution?
Currently, we have both
.envand command line to configure arguments for the bench. I observe that if we try to config using different LLMs for different things (agent, judge, tool summaries, etc.), our current infra does not support that, because all tools and agents useget_llm_backend_for_toolsto get their LLM backends. This, combined with our hybrid configuration interface, is not very ergonomic if the user want to configure different LLMs for different things.I think a better interface is that we provide a version of
get_llm_backend_for_toolsthat takes in aprovider/model_idstring (e.g.,openai/gpt-5.2), and return anlitellmbackend for that. Then, in the.env, we ask the user to specify different LLMs, likeJUDGE_MODEL="openai/gpt-5.2". Within each tool/agent, we can simply sayjudge_llm_backend = get_llm_backend_for_tools(os.environ.get("JUDGE_MODEL")).The user can also use this yaml file to configure other parameters which we now configure with
.env. Technically speaking,.envshould preferably store sensitive information like API keys and login tokens, but now it's bloated with endpoints and other things. Using yaml also gives us a schema to do a preflight check of the configuration, without wasting time to setup the cluster and then fail because we spellgptasgtp.Let me know what folks think.