Inspiration

Back in middle school and high school, I was the kid everyone warned others about. I spent lunch breaks and weekends crafting spoofed emails, cloning login pages, and “phishing” friends just to see who’d bite. At first it was a game: a challenge to outsmart email filters and social-engineer my way into forgotten passwords or clicking links. But over time I saw the real impact—friends locked out of accounts, companies losing billions of dollars in data breaches and targeted blackmail.

That curiosity for deception transformed into a drive for defense. This project is my redemption arc: taking the same tactics I once exploited and turning them into a learning experience that empowers teams to stay one step ahead of real attackers.

With attackers increasingly leveraging AI to craft hyper-personalized lures, organizations can no longer rely on generic, one-size-fits-all templates. While legacy providers like KnowBe4 and Huntress still deploy static, boilerplate campaigns, our platform raises the bar, automating dynamic, context-aware phishing simulations that mirror real-world threats. By harnessing AI for reconnaissance, email generation, and continuous optimization, we empower security teams to train, test, and fortify their defenses against the very tactics hackers use.

What It Does

Our App does the following.

  • Automated Reconnaissance
    Crawls public sources (e.g., LinkedIn, corporate websites, social media) through SERP API and BrightData to collect target-specific context—names, roles, recent activities, and organizational structure.

  • Contextual Vector Search
    Embeds collected data in PineconeDB for fast, relevance-ranked retrieval, ensuring each email’s pretext is grounded in real user information. This ensures that the template is relevant to the user.

  • Dynamic Email Generation
    Uses GPT-4 with several master prompts and dynamic placeholders (name, department, project details) to craft multi-paragraph phishing emails that mimic legitimate corporate communications. We also allow users to clone an existing emails format.

  • Campaign Orchestration & Tracking
    Provides a Next.js dashboard (Supabase-authenticated) where admins can perform research on specific users and craft phishing lures to test them.

How we built it

We built this app using a variety of APIs. To ensure our websites get scraped even with spam filters we used Brightdata API for websites with heavy anti-bot detection and jira API for easy html to markdown conversion so our LLMs could process that better.

We then used PineconeDB and Langchain to recursively split up our scraped data and upload them into a vector database using OpenAI embeddings. Our retrieval chain uses BM25 + semantic search for a hybrid search that finds context needed for the phishing email.

With the context that we have, we dynamically generate phishing emails using a small library of high quality and already tested phishing prompts. This email is then provided to the user where they could send it to company employees etc.

Challenges we ran into

One of the biggest problems we ran too was web scraping LinkedIn. Linkedin has heavy anti-bot measures. We initially tries using Puppeteer to automate information extraction but LinkedIn blocked that. We decided to use Brightdata for proxy rotation so we could bypass those measures.

Another big issue we faced was overfitting the context to the prompt. Whenever we generated a prompt it would always try to include all the context in the email, making it sound robotic and chunky. That is when we opted in to employ more heavy prompt engineering and vector search to find the most related chunks to the needed context.

Accomplishments that we're proud of

We actually wanted to take our product further. Instead of having a prototype, we wanted to test what we actually created. So, we scraped 20+ random people attending Spurhacks via personal websites, githubs etc. and tried to phish them.

We ended up getting a near 100% open rate with a 41% click rate on our website. (Our temporary google sites website explaining what our project was, no data was harvested for legal purposes).

What we learned

We discovered first‐hand that technical expertise alone is not a silver bullet against phishing. our AI-driven simulations consistently achieved high click‐through rates among even the most technical users.

We also learned how to properly handle big data, implementing clever techniques such as vector databases to efficiently retrieve data.

What's next for Fischer

We have an ambitious vision for this app. First, we aim to automate the domain discovery and purchasing process by identifying domains that closely resemble legitimate business emails, a critical factor in why phishing attacks succeed. Second, we plan to integrate company-specific information to tailor simulations and improve relevance for each business. Third, we intend to build our own email infrastructure to track key engagement metrics such as email opens and link clicks. Lastly, we want to fine-tune language models on real-world phishing data and apply A/B testing to create a specialized phishing LLM that more accurately mirrors the tactics used by modern threat actors.

Built With

Share this project:

Updates