Inspiration
We kept hearing the same thing from small business owners: “I’m spending nights doing the books, chasing invoices, and following up on stuff I already talked about on calls.” The hardest part wasn’t spreadsheets, it was the messy, human back-and-forth across phone calls, where commitments and exceptions get buried. We wanted an agent that lives in that reality: always listening, always updating the ground truth, and actually taking the next action. Modulate’s Velma Transcribe clicked for us because it’s built for real conversations, overlapping speakers, interruptions, noise, exactly what SMBs deal with every day. (Modulate)
What it does
SMB Autonomous FinanceOps is an always-on “Agentic OS” for the back office: it tracks invoices, reconciles payments, follows up with customers and vendors, and keeps a clean audit trail, without someone having to babysit it. It ingests real-time signals (incoming invoices, payment events, and call audio), turns them into structured tasks, and then executes those tasks automatically. When money needs to move, it uses delegated payments with tight constraints (amount caps, merchant scope, expiry) so it can act safely instead of just making suggestions. (Agentic Commerce Protocol)
How we built it
We built the system around an event stream: everything that happens becomes an event (invoice received, payment failed, call transcribed, promise-to-pay detected), and every action the agent takes becomes an event too (follow-up sent, payment delegated, payment executed, exception opened). For calls, we used Velma Transcribe to convert messy audio into reliable transcripts and then extracted commitments (“I’ll pay Friday”), blockers (“need a revised invoice”), and escalation signals. (Modulate) For money movement, we wired in Agentic Commerce Protocol delegated payments so the agent can pay bills or reserve funds within guardrails instead of having open-ended access. (Agentic Commerce Protocol) We used Agent Lightning to continuously improve the agent’s decision policy (when to follow up, when to wait, when to escalate, what evidence to request) by learning from outcomes over time. (Microsoft) Finally, we embedded a Lightdash dashboard via JWT iframe embedding so the demo shows the agent’s actions and ROI live (collections velocity, exceptions cleared, and hours saved). (Lightdash)
Challenges we ran into
The biggest challenge was making it truly autonomous without making it reckless. “Just let it pay bills” is a terrifying sentence, so we leaned hard on delegated payments and built the agent to operate inside explicit limits (and log everything). (Agentic Commerce Protocol) Another challenge was dealing with the reality of calls: people interrupt each other, change their mind mid-sentence, and talk around details instead of stating them cleanly; we had to be careful about what we treat as a commitment versus a maybe. (Modulate) And honestly, fitting a self-improvement loop into hackathon time was tough, we kept it simple: define outcomes we care about and train the agent to get better at hitting them, rather than trying to build a perfect “finance brain” overnight. (Microsoft)
Accomplishments that we're proud of
- The system runs continuously, no “run agent” button, because everything is driven by real-time events.
- We turned phone calls into structured, actionable finance work using voice transcription built for real conversations. (Modulate)
- We gave the agent a safe way to act on payments using constrained delegation instead of mock actions. (Agentic Commerce Protocol)
- We made the demo legible: the Lightdash “Mission Control” board shows what the agent did and why, in real time. (Lightdash)
What we learned
If the inputs aren’t event-driven and the actions aren’t real, you don’t have an autonomous agent, you have a chatbot with a button. We also learned that voice is the missing piece for SMB ops: the “truth” of what needs to happen next often lives in calls, and if you can reliably extract commitments and blockers, you unlock automation that rule engines never see. (Modulate) And the self-improvement part only works when you define rewards tied to outcomes: e.g., collect cash faster, reduce failed payments, reduce the number of “unnecessary pings”, and then actually let the system learn from its own history. (Microsoft)
A simple objective we used was:
$$ R = \alpha,C - \beta,L - \gamma,U - \delta,F $$
What's next for SMB Autonomous FinanceOps
Next, we want to expand from “housekeeping autopilot” to “exception killer”, automatically resolving the annoying edge cases (duplicate invoices, partial payments, mismatched remittance notes, vendor disputes) that eat hours. We also want to make the learning loop more personal per business, different industries and customers respond to different follow-up strategies, while keeping the same hard safety boundaries on payment delegation. (Agentic Commerce Protocol) And we’ll keep doubling down on transparency: richer dashboards and audit trails so a business can trust the agent’s decisions, not just its results. (Lightdash)
Log in or sign up for Devpost to join the conversation.