Inspiration
Agents and LLMs have skyrocketing adoption but have glaring weaknesses. Tree of Attack white paper from Enkrypt's Tanay.
What it does
It takes in a prompt for a type of hack and writes the code for the hack that can bypass an LLM of your choice
How we built it
We used an Attacker and Evaluating LLM that work to recursively optimize obfuscation of malicious code until it can get past the evaluating LLM.
Challenges we ran into
Generalizing the prompt to work on natural language as well as code to bypass censorship in LLMs. Political, Racism, Safety controls can be bypassed. How about hidden information like payments and PII next?
Accomplishments that we're proud of
We implemented the white paper and were able to create malware from a prompt that can bypass GPT-4o
What we learned
LLMs are weak to attacks like these! Adoption is increasing and agentic systems are only being created at higher rates with even less security with vibe coding. There is a huge vulnerability here
What's next for Grokify
To generalize Grokify to work with natural language and bypass censorship of content. Make Grokify work with more Attacking and Evaluating LLMs
Log in or sign up for Devpost to join the conversation.