GitHub - aadya940/orbit: Building Blocks to automate desktop workflows end-to-end using AI

Autonomous agents are demos. Controlled agents are products.

The problem

AI agents can use computers now.

But in practice:

they loop
they click the wrong thing
they get stuck on simple steps
they're impossible to steer mid-task

Most frameworks either hide everything in a black box, or hand you raw tools with no structure.

Neither works in production.

Orbit

Natural language controls the screen.
Python controls the flow.

Instead of one monolithic agent, Orbit breaks execution into independent steps:

Do · Read · Check · Navigate · Fill

Each step runs its own model, has its own budget, and returns typed output. All steps share context.

Why this matters

Use a cheap model for simple clicks, a powerful one for complex reasoning
Cap LLM calls per step , nothing runs forever
Inject guidance mid-execution when the agent is struggling
Extract structured data directly into Pydantic models
Toggle planner=False for low-latency direct execution

This turns agents from demos into usable systems.

Key difference

Most agents see pixels.

Orbit sees the UI.

It reads the OS accessibility tree , no screenshots, no DOM hacks. Works across desktop apps and browsers with lower token usage.

Quickstart

pip install orbit-cua

from dotenv import load_dotenv
load_dotenv()

from orbit import Agent
import asyncio

async def main():
    result = await Agent(
        task="Open Chrome and go to Wikipedia",
        llm="gemini-3-pro-preview",
        verbose=True,
    ).run()
    print(result.status)

asyncio.run(main())

Set your API key , Orbit supports any model via LiteLLM:

export GEMINI_API_KEY="your-key"   # or OPENAI_API_KEY / ANTHROPIC_API_KEY

Composable SDK

When you need precision, drop to the SDK:

from dotenv import load_dotenv
load_dotenv()


from orbit import Do, Read, Check, Navigate, session
from pydantic import BaseModel
import asyncio

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool

class ProductList(BaseModel):
    products: list[Product]

async def main():
    action_model = "gemini-3-flash-preview"

    async with session() as s:
        await Navigate(
            "https://www.amazon.com/s?k=mechanical+keyboard",
            session=s, llm=action_model, max_steps=30, planner=False,
            extra_info="Avoid bookmark bar links; use direct navigation tools first.",
            verbose=True,
        ).run()

        if await Check(
            "The current page is a Captcha page and `Continue Shopping` button is visible",
            session=s, llm=action_model, max_steps=30, planner=False,
        ).check():
            await Do(
                "Click `Continue Shopping`, then solve the Captcha.",
                session=s, llm=action_model, max_steps=30,
            ).run()

        products = await Read(
            "All search results",
            schema=ProductList,
            session=s, llm=action_model, max_steps=30, verbose=True,
        ).run()

        cheapest = min(products.output.products, key=lambda p: p.price)

        await Do(f"click on '{cheapest.name}'", session=s, llm=action_model, max_steps=30).run()

        if await Check("Add to Cart button is visible", session=s, llm=action_model, max_steps=30).check():
            await Do("click Add to Cart", session=s, llm=action_model, max_steps=30).run()

asyncio.run(main())

The idea

Agents shouldn't be one giant prompt.

They should be composable systems.

Orbit gives you:

verbs instead of prompts
steps instead of guesswork
control instead of hope

Custom actions

Build reusable, domain-specific actions by subclassing BaseActionAgent:

from dotenv import load_dotenv
load_dotenv()

from orbit import BaseActionAgent, Navigate, session
from pydantic import BaseModel
import asyncio

class ProductList(BaseModel):
    products: list[dict]

class ReadTopProducts(BaseActionAgent):
    def __init__(self, category: str, **kw):
        super().__init__(max_steps=12, planner=False, **kw)
        self.category = category

    def task_prompt(self) -> str:
        return (
            f"Read top products for '{self.category}' from the current page. "
            "Extract name, price, and stock status only. Do not click or navigate."
        )

    def output_schema(self):
        return ProductList

async def main():
    async with session() as s:
        await Navigate("https://www.amazon.com/s?k=mechanical+keyboard", session=s).run()
        result = await ReadTopProducts(
            category="mechanical keyboard",
            session=s, llm="gemini-3-flash-preview", verbose=True,
        ).run()
        print(result.output.products[:3])

asyncio.run(main())

Install from source

Build from source (requires Rust)

git clone --recurse-submodules https://github.com/aadya940/orbit.git
cd orbit

cd oculos && cargo build --release && cd ..
mkdir -p orbit/_bin

# Linux/macOS
cp oculos/target/release/oculos orbit/_bin/oculos

# Windows
copy oculos\target\release\oculos.exe orbit\_bin\oculos.exe

pip install .

macOS users: grant accessibility permissions as described here.

Support matrix

OS	Architectures
Windows	x86-64 (`win_amd64`)
Linux	x86-64 (`manylinux`)
macOS	Intel + Apple Silicon (`universal2`)

Python	3.10 · 3.11 · 3.12 · 3.13

Safety

No permanent file deletion , destructive operations go to Trash/Recycle Bin. Disk writes require explicit human approval via a configurable callback.

License

Apache 2.0 , Special thanks to OculOS and the open-source packages that make this possible.

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.github/workflows		.github/workflows
oculos @ 5542eb4		oculos @ 5542eb4
orbit		orbit
.env		.env
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
demo_preview.svg		demo_preview.svg
example.py		example.py
logo.png		logo.png
orbit_demo_new.mp4		orbit_demo_new.mp4
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The problem

Orbit

Why this matters

Key difference

Quickstart

Composable SDK

The idea

Custom actions

Install from source

Support matrix

Safety

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The problem

Orbit

Why this matters

Key difference

Quickstart

Composable SDK

The idea

Custom actions

Install from source

Support matrix

Safety

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages