Safefier

Even AI needs a responsible parent.

The Problem

Conversational AIs and chatbots are ubiquitous; whether it is for customers who would like to express

their concerns about their orders through the Uber Eats app, or a user placing a job application at Hot

Topic, with the help of built-in AI agents and customer service chatbots, consumers and workers alike

can expedite and automate the process of common questions and issues, with little-to-no human

supervision.

While these systems tend to be more convenient and efficient, they also introduce serious safety gaps

that can lead to psychological harm, legal liability, and the erosion of trust when it comes to AI-first

companies and technologies. According to Ars Technica (Belanger, 2025), a professional technology

news publication founded in 1998 by Condé Nast, a lawsuit was filed in August against OpenAI, in

response to ChatGPT allegedly assisting a teenager in writing his own suicide note. The teenager then

committed suicide.

Our solution

Safefier acts like a “responsible parent” for AI chatbots. Instead of replacing a chatbot, it sits between

the user and the AI and shows how unsafe responses could be intercepted and handled differently.

In our demo, a user types a message, sees how a normal chatbot might reply, and then sees how

Safefier would step in. Safefier’s logic applies simple safety rules to the AI response and then either

keeps it as-is or, for unsafe cases, blocks the bot’s reply and instead shows a safety message, resource

links, or a suggestion that a human should step in. The goal is to show how a lightweight safety layer can

reduce harm while still letting companies use AI tools.

How it works

  1. The user opens the Safefier demo and types a message into the chat box.

  2. The app generates a “regular” chatbot-style reply.

  3. That reply is then passed through Safefier’s safety logic inside our Next.js code.

  4. If the reply looks safe, it is shown normally.

  5. If it matches one of our unsafe patterns, Safefier hides the original reply and instead shows a safety

card with supportive language, relevant resources (for example, crisis or help links), and/or a note that a

human should respond instead of the bot.

The page shows the before/after effect so people can see what Safefier changed.

How we built it

For the hackathon, we built Safefier as a web-based demo that shows what a “responsible parent” for AI

could look like in practice. On the front end, we used Next.js with TypeScript and

JavaScript to handle the main app logic, and Tailwind CSS for styling so we could move quickly

without spending too much time on custom CSS.

The interface lets a user type a message, see how a normal chatbot might reply, and then see how

Safefier would intervene and provide a safer alternative. Our logic for the demo is implemented directly

in the Next.js code, where we add simple rule-based checks and example transformations to simulate

how a safety layer would filter and adjust responses. We used GitHub to collaborate on the code

and Vercel to host the Next.js app so the demo can be accessed through a single shareable link.

On the analysis side, we also used Python for our detectors. We used VADER to score the emotional

tone of user messages and flag emotional dependence, Gemini to detect dangerous or self-harm advice

in model responses, and a small hallucination detector based on embeddings and cosine similarity to

check whether answers stay close to our reference data.

Challenges we ran into

One challenge was deciding how much of the safety logic to implement in a weekend. Fully building and

integrating real moderation systems is a big task, so we had to scope things down to something we

could realistically demo. We focused on designing a clear flow and simulating safety checks instead of

trying to cover every possible edge case.

We also ran into the usual hackathon issues: debugging UI state, getting Tailwind classes to behave the

way we wanted, and making sure the pages looked consistent on different screen sizes. Coordinating

changes through GitHub and keeping the deployed version on Vercel up to date while everyone was

pushing commits was another learning curve.

Accomplishments that we're proud of

We are proud that we were able to turn an abstract idea “AI needs a responsible parent” into a

concrete, interactive demo. Instead of just slides, we now have a working interface that shows how

unsafe responses could be intercepted and turned into a safety card with resources or a handoff to a

human instead of a normal reply.

We are also proud of how polished the frontend feels given the time limit. Using Next.js, TypeScript, and

Tailwind CSS, we created a clean layout that clearly communicates the before-and-after effect of

Safefier. Getting the app hosted through Vercel and sharing it with others during the event was a big

milestone for us.

What we learned

We learned how important it is to think about AI safety from both a technical and human perspective.

Even in a simplified demo, we had to ask questions like: What counts as “unsafe”? How should the

system respond to someone in distress? How do we avoid over-censoring while still protecting users?

On the technical side, we gained more experience with the Next.js and TypeScript stack, learned how to

structure a small project so multiple people can work on it, and practiced using GitHub for collaboration

and Vercel for quick deployments. We also saw how useful it is to prototype ideas visually instead of

keeping them only in documents.

What's next for Safefier

Next, we would like to move from simulated checks to deeper safety integrations. That could include

connecting Safefier to real moderation APIs, expanding the rules to handle more nuanced scenarios,

and logging flagged messages for review.

We also want to build an admin or dashboard view where organizations can customize their own safety

policies, see statistics on what is being blocked or escalated, and fine-tune the level of strictness. In the

long term, our goal is for Safefier to become a small, plug-in safety layer that can sit in front of many

different chatbots and make AI interactions safer by default.

Team Members

Emdya Permuy - University of Maryland, Frontend Developer and Frontend-to-Backend Integration Nazim Malwan - George Mason University, Backend Developer Ridha Mahmood - George Mason University, Backend Developer Jannatul Nayeem - George Mason University, Backend Developer

Disclosures

Claude was consulted in developing the Hallucination detector, as well as the format in https://tinyurl.com/3s9cyvbj was used in developing the RAG.

Built With

Share this project:

Updates