Inspiration
We are interested in interpretability for LLMs, specifically, in discrepancies between prompts and outputs. Inspired by the recent finding that LLMs are injective, i.e. there exists an inverse function from outputs back to prompts, and general cutting-edge research about Natural Language Processor embeddings.
Philosophically, we are interested in the butterfly effect and chaos theory and want to apply it to LLMs.
What it does
We propose a multi-agent system, designed for professionals, based on cutting-edge embeddings research, to evaluate the so-called 'butterfly effect' in LLMs: small changes in prompts cascading into drastic changes in LLM output.
How we built it
Our solution comprises of 5 AI Agents:
- a Variant Generator, to create small perturbations in prompt embeddings,
- a Task-Solver to provide responses to each perturbed prompt,
- an Explainer to both quantitatively and qualitatively identify and reason about discrepancies in LLM output embeddings,
- an Orchestrator to take the role of a human researcher, directing iterations of the research programme,
- a Summariser to provide both a per-iteration summary and a general summary of research findings, along with a metric of LLM robustness.
Specifically, the Explainer harnesses the piecewise linearity of neural networks for small perturbations in input. The idea can successfully be extended to LLMs. Thus, we can similarly define a linear transformation between the (cosine) similarity of prompt embedding pairs and the (cosine) similarity between Task-Solver output pairs. We train a linear regression model and have proved via a simple F-test that our model is robust.
Challenges we ran into
Pipeline configuration. Last-minute frontend design. Researching papers that allow us to make educated assumptions about LLM structure.
Accomplishments that we're proud of
Producing a mathematically significant predictor of how differences in prompts reflect in differences in outputs. Producing innovative research in 24 hours.
Log in or sign up for Devpost to join the conversation.