LLMonPy

LLMonPy is pronounced "Lemon Pie"

Inspiration

https://arxiv.org/abs/2408.02666 -- paper on self taught evaluators

What it does

Uses sample requests from https://rajpurkar.github.io/SQuAD-explorer/. For each request, it writes a validation checklist that and LLM can use to validate responses for this request.

To generate the responses it uses a variation on mixture-of-agents. It uses a 2 layer MOA, but then ranks the output of the last aggregation layer. It uses then runs the MOA again, but uses the best outputs from the last round as examples for the the generation layer.

It ranks the outputs in a series of one-on-one contests, where the best output is the one with the most victories. Each one-on-one contest generates question, best answer, worse answer data that can be used for fine-tuning.

My test run generated 1325 rows of data generating validators for 5 requests.

Built With

python

Updates

Tom Burns started this project — Sep 22, 2024 04:14 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.