Agent Native

Landing Page
Pipeline
Wrapper Component
4 Step Pipeline
Agent Executing Command
Agent Waiting For Human Approval

Inspiration

With the rapid increase of AI agents on the internet, it is getting more and more important for these agents to be able to interact with web applications reliably. Current approaches such as scraping the DOM, dumping raw HTML into a prompt, or injecting entire codebases are brittle, token-heavy, and non-deterministic. A single CSS class rename or div restructure can silently break an agent script overnight. We wanted to build a better contract between AI agents and the web: one that is structured, stable, and requires zero runtime DOM access.

What it does

Agent Native introduces the Agentic Object Model (AOM) the DOM for AI agents. Upload a ZIP of any React source code and our system produces a machine-readable capability manifest describing every interactive element (buttons, inputs, links, form submits) with stable IDs, handler names, permission levels, safety scores, and human-readable descriptions. High-risk actions (checkout, delete, transfer) are automatically flagged needs_review so agents know to apply human intervention before executing them autonomously.

On top of the generator, we built an Agent Dashboard embedded in demo apps (Amazon, Instagram clones) that connects to a Groq-powered LLM (Llama 3.3 70b). The agent reads the live AOM registry, translates plain-English tasks ("add AirPods to cart") into deterministic AOM commands (window.AOM.execute('product.1.add_to_cart')), executes them, waits for the DOM to update, and loops until the task is complete — all without ever touching a CSS selector.

How we built it

Frontend: React 19 + Vite 7, React Router 7, Three.js for particle animations

Backend: Node.js + Express — accepts a ZIP upload via Multer, returns a ZIP of .aom.js wrapper files

Parser: Babel Parser + @babel/traverse — fully static AST analysis, no code execution required

AOM Wrappers: Zero-overhead AOMAction, AOMInput, AOMLink JSX components that register with a global AOMRegistry singleton (window.AOM) on mount and unregister on unmount, keeping the agent's view perfectly in sync with the live UI

LLM Integration: Groq API (llama-3.3-70b-versatile) with a multi-step memory loop. The agent fetches current UI state, predicts the next action, executes it, observes the updated state, and re-prompts itself until task_completed: true.

Challenges we ran into

Making Babel AST traversal robust across the wide variety of real-world JSX patterns (spread props, conditionally-rendered handlers, HOCs, anonymous arrow functions)

Designing stable action_id hashing that survives file moves and minor refactors without breaking agent scripts

Building the multi-step agent loop with proper state synchronization requires the LLM to re-observe the current AOM state after every action, not a stale snapshot

Keeping the AOM manifest strictly deterministic so diffs are meaningful in CI/CD pipelines

Accomplishments that we're proud of

10x–100x prompt size reduction compared to feeding raw DOM HTML to an LLM

Complete DOM decoupling. Agents never need CSS selectors, XPath, or pixel coordinates

Resilience by design. Changes to styling, div nesting, or virtualized lists no longer break agent scripts as long as action IDs remain stable

The needs_review safety flag automatically catching financial and destructive operations across both demo apps

A fully working end-to-end demo where a natural-language instruction autonomously navigates a multi-page Amazon clone to checkout

What we learned

Static AST analysis is surprisingly powerful for extracting semantic intent from UI code. You can infer a lot about what an element does without ever running it

Determinism is the foundation of reliable agentic systems; the more you can remove probabilistic reasoning from the execution path, the more trustworthy agents become

Designing for agents requires thinking about capability surfaces, not just user flows. The same app can expose very different surfaces depending on how the AOM is authored

LLMs perform significantly better when given a strict, structured menu of possible actions rather than an open-ended HTML dump