Inspiration

Agents and LLMs have skyrocketing adoption but have glaring weaknesses. Tree of Attack white paper from Enkrypt's Tanay.

What it does

It takes in a prompt for a type of hack and writes the code for the hack that can bypass an LLM of your choice

How we built it

We used an Attacker and Evaluating LLM that work to recursively optimize obfuscation of malicious code until it can get past the evaluating LLM.

Challenges we ran into

Generalizing the prompt to work on natural language as well as code to bypass censorship in LLMs. Political, Racism, Safety controls can be bypassed. How about hidden information like payments and PII next?

Accomplishments that we're proud of

We implemented the white paper and were able to create malware from a prompt that can bypass GPT-4o

What we learned

LLMs are weak to attacks like these! Adoption is increasing and agentic systems are only being created at higher rates with even less security with vibe coding. There is a huge vulnerability here

What's next for Grokify

To generalize Grokify to work with natural language and bypass censorship of content. Make Grokify work with more Attacking and Evaluating LLMs

Built With

Share this project:

Updates