LLMonPy is pronounced "Lemon Pie"
Inspiration
https://arxiv.org/abs/2408.02666 -- paper on self taught evaluators
What it does
Uses sample requests from https://rajpurkar.github.io/SQuAD-explorer/. For each request, it writes a validation checklist that and LLM can use to validate responses for this request.
To generate the responses it uses a variation on mixture-of-agents. It uses a 2 layer MOA, but then ranks the output of the last aggregation layer. It uses then runs the MOA again, but uses the best outputs from the last round as examples for the the generation layer.
It ranks the outputs in a series of one-on-one contests, where the best output is the one with the most victories. Each one-on-one contest generates question, best answer, worse answer data that can be used for fine-tuning.
My test run generated 1325 rows of data generating validators for 5 requests.
Log in or sign up for Devpost to join the conversation.