Inspiration
A/B testing can be repetitive and tedious. AB Agent, powered by Firework's function-calling firefunction-v1 API and open-source mixtral-8x7b-instruct models, aims to automate A/B test design and inference.
Having worked as data scientists, we noticed that designing and interpreting A/B test results often involved repetitive use of the same tools, such as sample size calculators and t-test functions. We hypothesized that AI agents, with access to these tools in the form of Python functions, could automate this process. Thus, we built AB Agent!
Now, non-technical folks from product teams can design and interpret A/B tests using just natural language.
What It Does
AB Agent automates two parts of the A/B testing workflow:
Design:
- Initially, we take a user's natural language instruction to set up an experiment. E.g., "Design an A/B test to test a UI change where the metric to increase is browsing time. We want the minimum effect to be 10% and we want to be 95% confident in the results."
- Then, we pass this through the mixtral-8x7b-instruct model to rephrase the user query in a more statistically oriented way (this assists the function-calling model).
- Subsequently, it gets passed to firefunction-v1, which ideally calls the sample size calculator to determine the sample size and other important details needed to design the A/B test.
Inference: For interpreting the results of an A/B test, the function-calling model uses a t-test calculator along with other functions to make a go/no-go decision on whether to implement feature B. This is where the agent makes a decision for the A/B test using function calls.
Challenges We Ran Into
AI agents, especially those driven by open-source models, are highly unpredictable. Given more time, we would have dedicated more effort to prompt engineering to stabilize the outputs.
What's Next for AB Agent
Automate more A/B tests!
Slide deck: https://docs.google.com/presentation/d/1XjJ3ju01sKQEU14mc5DJGAM1JGTfrO4-MrNCbLl835s/edit?usp=sharing
Log in or sign up for Devpost to join the conversation.