Inspiration
With the rapid increase of AI agents on the internet, it is getting more and more important for these agents to be able to interact with web applications reliably. Current approaches such as scraping the DOM, dumping raw HTML into a prompt, or injecting entire codebases are brittle, token-heavy, and non-deterministic. A single CSS class rename or div restructure can silently break an agent script overnight. We wanted to build a better contract between AI agents and the web: one that is structured, stable, and requires zero runtime DOM access.
What it does
Agent Native introduces the Agentic Object Model (AOM) the DOM for AI agents. Upload a ZIP of any React source code and our system produces a machine-readable capability manifest describing every interactive element (buttons, inputs, links, form submits) with stable IDs, handler names, permission levels, safety scores, and human-readable descriptions. High-risk actions (checkout, delete, transfer) are automatically flagged needs_review so agents know to apply human intervention before executing them autonomously.
On top of the generator, we built an Agent Dashboard embedded in demo apps (Amazon, Instagram clones) that connects to a Groq-powered LLM (Llama 3.3 70b). The agent reads the live AOM registry, translates plain-English tasks ("add AirPods to cart") into deterministic AOM commands (window.AOM.execute('product.1.add_to_cart')), executes them, waits for the DOM to update, and loops until the task is complete — all without ever touching a CSS selector.
How we built it
Frontend: React 19 + Vite 7, React Router 7, Three.js for particle animations
Backend: Node.js + Express — accepts a ZIP upload via Multer, returns a ZIP of .aom.js wrapper files
Parser: Babel Parser + @babel/traverse — fully static AST analysis, no code execution required
AOM Wrappers: Zero-overhead AOMAction, AOMInput, AOMLink JSX components that register with a global AOMRegistry singleton (window.AOM) on mount and unregister on unmount, keeping the agent's view perfectly in sync with the live UI
LLM Integration: Groq API (llama-3.3-70b-versatile) with a multi-step memory loop. The agent fetches current UI state, predicts the next action, executes it, observes the updated state, and re-prompts itself until task_completed: true.
Challenges we ran into
Making Babel AST traversal robust across the wide variety of real-world JSX patterns (spread props, conditionally-rendered handlers, HOCs, anonymous arrow functions)
Designing stable action_id hashing that survives file moves and minor refactors without breaking agent scripts
Building the multi-step agent loop with proper state synchronization requires the LLM to re-observe the current AOM state after every action, not a stale snapshot
Keeping the AOM manifest strictly deterministic so diffs are meaningful in CI/CD pipelines
Accomplishments that we're proud of
10x–100x prompt size reduction compared to feeding raw DOM HTML to an LLM
Complete DOM decoupling. Agents never need CSS selectors, XPath, or pixel coordinates
Resilience by design. Changes to styling, div nesting, or virtualized lists no longer break agent scripts as long as action IDs remain stable
The needs_review safety flag automatically catching financial and destructive operations across both demo apps
A fully working end-to-end demo where a natural-language instruction autonomously navigates a multi-page Amazon clone to checkout
What we learned
Static AST analysis is surprisingly powerful for extracting semantic intent from UI code. You can infer a lot about what an element does without ever running it
Determinism is the foundation of reliable agentic systems; the more you can remove probabilistic reasoning from the execution path, the more trustworthy agents become
Designing for agents requires thinking about capability surfaces, not just user flows. The same app can expose very different surfaces depending on how the AOM is authored
LLMs perform significantly better when given a strict, structured menu of possible actions rather than an open-ended HTML dump
What's next for Agent Native
CI/CD integration: publish the AOM manifest as a versioned artifact on every deploy so agents always have an up-to-date capability contract
Built With
- express.js
- groq
- javascript
- jsx
- parser
- react
- three.js
- vite
Log in or sign up for Devpost to join the conversation.