Skip to content

aadya940/orbit

Repository files navigation

Orbit logo

Autonomous agents are demos. Controlled agents are products.


Watch Orbit in action


The problem

AI agents can use computers now.

But in practice:

  • they loop
  • they click the wrong thing
  • they get stuck on simple steps
  • they're impossible to steer mid-task

Most frameworks either hide everything in a black box, or hand you raw tools with no structure.

Neither works in production.

Orbit

Natural language controls the screen.
Python controls the flow.

Instead of one monolithic agent, Orbit breaks execution into independent steps:

Do · Read · Check · Navigate · Fill

Each step runs its own model, has its own budget, and returns typed output. All steps share context.

Why this matters

  • Use a cheap model for simple clicks, a powerful one for complex reasoning
  • Cap LLM calls per step , nothing runs forever
  • Inject guidance mid-execution when the agent is struggling
  • Extract structured data directly into Pydantic models
  • Toggle planner=False for low-latency direct execution

This turns agents from demos into usable systems.

Key difference

Most agents see pixels.

Orbit sees the UI.

It reads the OS accessibility tree , no screenshots, no DOM hacks. Works across desktop apps and browsers with lower token usage.

Quickstart

pip install orbit-cua
from dotenv import load_dotenv
load_dotenv()

from orbit import Agent
import asyncio

async def main():
    result = await Agent(
        task="Open Chrome and go to Wikipedia",
        llm="gemini-3-pro-preview",
        verbose=True,
    ).run()
    print(result.status)

asyncio.run(main())

Set your API key , Orbit supports any model via LiteLLM:

export GEMINI_API_KEY="your-key"   # or OPENAI_API_KEY / ANTHROPIC_API_KEY

Composable SDK

When you need precision, drop to the SDK:

from dotenv import load_dotenv
load_dotenv()


from orbit import Do, Read, Check, Navigate, session
from pydantic import BaseModel
import asyncio

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool

class ProductList(BaseModel):
    products: list[Product]

async def main():
    action_model = "gemini-3-flash-preview"

    async with session() as s:
        await Navigate(
            "https://www.amazon.com/s?k=mechanical+keyboard",
            session=s, llm=action_model, max_steps=30, planner=False,
            extra_info="Avoid bookmark bar links; use direct navigation tools first.",
            verbose=True,
        ).run()

        if await Check(
            "The current page is a Captcha page and `Continue Shopping` button is visible",
            session=s, llm=action_model, max_steps=30, planner=False,
        ).check():
            await Do(
                "Click `Continue Shopping`, then solve the Captcha.",
                session=s, llm=action_model, max_steps=30,
            ).run()

        products = await Read(
            "All search results",
            schema=ProductList,
            session=s, llm=action_model, max_steps=30, verbose=True,
        ).run()

        cheapest = min(products.output.products, key=lambda p: p.price)

        await Do(f"click on '{cheapest.name}'", session=s, llm=action_model, max_steps=30).run()

        if await Check("Add to Cart button is visible", session=s, llm=action_model, max_steps=30).check():
            await Do("click Add to Cart", session=s, llm=action_model, max_steps=30).run()

asyncio.run(main())

The idea

Agents shouldn't be one giant prompt.

They should be composable systems.

Orbit gives you:

  • verbs instead of prompts
  • steps instead of guesswork
  • control instead of hope

Custom actions

Build reusable, domain-specific actions by subclassing BaseActionAgent:

from dotenv import load_dotenv
load_dotenv()

from orbit import BaseActionAgent, Navigate, session
from pydantic import BaseModel
import asyncio

class ProductList(BaseModel):
    products: list[dict]

class ReadTopProducts(BaseActionAgent):
    def __init__(self, category: str, **kw):
        super().__init__(max_steps=12, planner=False, **kw)
        self.category = category

    def task_prompt(self) -> str:
        return (
            f"Read top products for '{self.category}' from the current page. "
            "Extract name, price, and stock status only. Do not click or navigate."
        )

    def output_schema(self):
        return ProductList

async def main():
    async with session() as s:
        await Navigate("https://www.amazon.com/s?k=mechanical+keyboard", session=s).run()
        result = await ReadTopProducts(
            category="mechanical keyboard",
            session=s, llm="gemini-3-flash-preview", verbose=True,
        ).run()
        print(result.output.products[:3])

asyncio.run(main())

Install from source

Build from source (requires Rust)
git clone --recurse-submodules https://github.com/aadya940/orbit.git
cd orbit

cd oculos && cargo build --release && cd ..
mkdir -p orbit/_bin

# Linux/macOS
cp oculos/target/release/oculos orbit/_bin/oculos

# Windows
copy oculos\target\release\oculos.exe orbit\_bin\oculos.exe

pip install .

macOS users: grant accessibility permissions as described here.

Support matrix

OS Architectures
Windows x86-64 (win_amd64)
Linux x86-64 (manylinux)
macOS Intel + Apple Silicon (universal2)
Python 3.10 · 3.11 · 3.12 · 3.13

Safety

No permanent file deletion , destructive operations go to Trash/Recycle Bin. Disk writes require explicit human approval via a configurable callback.

License

Apache 2.0 , Special thanks to OculOS and the open-source packages that make this possible.

Packages

 
 
 

Contributors

Languages