Aiber: Adaptive Intelligent Boundary for Enterprise Response

Inspiration

As a team with a shared passion for security and privacy, we have watched the rise of enterprise LLMs with a mix of excitement and alarm. We’ve all had that moment of hesitation, finding ourselves asking, "Should I really be pasting this into a public AI?"

For individuals, it is a personal risk. For corporations, it is a catastrophic one. We’ve seen the headlines: consultants at firms like McKinsey have exposed 100 billion tokens of sensitive client data to public models, and federal and military officials have used public tools like ChatGPT for sensitive work. This isn't just a theoretical risk; the OpenAI chat history bug in March 2023 proved that once your data is on someone else's server, you’ve already lost control. Enterprises need a secure gateway: a private, trusted filter that inspects all data before it’s sent to a third-party model. We built Aiber to be that filter, running entirely within an enterprise’s own secure Vultr cloud.

What It Does

Aiber is a cloud-native microservice architecture that protects an enterprise at every level. Our system is designed to meet the Drive Capital Challenge by building the "guardrails for LLMs" that enterprises actually need: ensuring data privacy, improving auditability, and creating a human-in-the-loop (HITL) feedback system.

Our architecture is a two-tier system. The first tier, our Sentry Agent, is the front-door for all user queries. It is a high-performance API hosted on a Vultr GPU instance that inspects every prompt in a two-stage process. First, a DLP (Data Loss Prevention) module (implemented under the hood using Microsoft Presidio) instantly scans for PII. If no PII is found, the prompt is fed to our “Consensus Classifier,” where a BERT-based Small Language Model (SLM) and a fine-tuned Unsloth Llama SLM run in parallel to determine the final classification:

ACCEPT: Harmless. The prompt is passed to the Cloud Supervisor Agent.
BLOCK: Malicious (known injection attack), leaks sensitive company info, or other blatant violations such as spam. The prompt is blocked.
FLAG: Suspicious or novel. The prompt is "quarantined" and sent to our Cloud Supervisor Agent for investigation.

Now this leads us to our second tier, namely the Cloud Supervisor Agent & Retraining layer, which is a collection of Vultr-hosted APIs. FLAG prompts are sent to our Supervisor Agent, which uses a Gemini Judge to get a second opinion. If the prompt is cleared by the judge, it’s sent to the external LLM. That LLM's response is then passed to our Hallucination Module to detect potential falsehoods before it's then sent back to the user. Every FLAG or BLOCK decision is formatted as an "Incident Report" and sent to a MongoDB logging endpoint, populating a dashboard for a human security analyst. This closes the loop with our Adaptive Retraining API. When an analyst confirms a BLOCK verdict, they trigger this endpoint, which uses an Unsloth + LoRA script to perform instant fine-tuning on the Sentry Agent's SLMs. The newly trained adapters are saved, and Aiber adapts.

How We Built It

Our team’s diverse background (a PhD in Mechanistic Interpretability, an MS in Robotics, a BS/MS in ML, and an OMSCS student) allowed us to tackle this full-stack, distributed challenge. We are proud to run our entire application on Vultr, where we built a high-performance, GPU-accelerated microservice architecture. Each component, namely the Sentry Agent, Cloud Supervisor, Logger, and Retraining pipeline, is its own FastAPI endpoint deployed on Vultr GPU instances. Vultr’s GPU power is what makes our system feasible, allowing us to run our entire Sentry Agent (BERT + Unsloth Llama) with low latency. This GPU acceleration is most critical for our Adaptive Immunity Loop; we wrote and deployed a complete retraining script using Unsloth + LoRA as its own API. This endpoint can take a single (prompt, verdict) pair, load the "best" model, fine-tune it in seconds, and save the new adapters. To manage this system, we used MongoDB as our database for logging incidents and the Google Gemini API as the "expert judge" in our cloud agent’s toolbox.

Challenges We Ran Into

Our core challenge was deploying and orchestrating multiple GPU-accelerated models as independent APIs on Vultr. We had to build custom environments with a specific CUDA toolkit (cu124) and manage a complex web of dependencies to get our Unsloth and BERT models running harmoniously. This technical puzzle forced us to solve a key design problem, which was essentially how to stop sensitive data from reaching a third-party while still using powerful models. Our 2-tier, "Flag-to-Investigate" architecture, all hosted within a secure Vultr VPC, was the solution. This was a true team effort, as our diverse backgrounds from PhD to undergrad meant we had to get everyone up to speed on different parts of the stack, from Docker and Vultr deployment to advanced PEFT/LoRA fine-tuning.

Accomplishments That We’re Proud Of

Our biggest win was successfully deploying a fully GPU-accelerated microservice architecture on Vultr; our entire multi-model, multi-API application is deployed and running. We are also incredibly proud of our live "Adaptive Immunity" API. Our Unsloth retraining script isn't just a local file; it's a live API endpoint deployed on a Vultr GPU, proving that fast, single-prompt fine-tuning is a feasible feature for a real-world system. Finally, we successfully built a high-speed "Consensus" classifier that screens for a wide range of threats, from PII and company secrets to malicious injections. Overall, we’re incredibly proud of designing and deploying a robust, end-to-end system that addresses a real-world enterprise problem in a novel way.

What We Learned

We learned that the gap between a "cool idea" and a "working deployment" is a gauntlet of package conflicts, CUDA errors, and environment variables. We learned how to debug complex Docker/Vultr issues and the power of building a distributed system where each component is its own independent, testable API.

What's Next for Aiber

Our modules are deployed, and the next step is scaling. We plan to enhance the retraining/fine-tuning API to pull batches of new prompts from the MongoDB log, rather than just one.

Next, our Hallucination Module was actually just the first step in monitoring the external LLM’s response. We plan to expand this into a full “Egress Agent” API that also checks the response for potential data leaks or other compliance violations (e.g., the external LLM could be using RAG which could inadvertently make this a possibility).

Finally, we plan to implement our original plan to use Solana to create an immutable audit log for the Drive Capital challenge and expand the Hallucination Module to include more Chain-of-Thought (CoT) monitoring and perhaps other advanced methods based on literature in the field (https://doi.org/10.48550/arXiv.2405.19648).