Autonomous agents are demos. Controlled agents are products.
AI agents can use computers now.
But in practice:
- they loop
- they click the wrong thing
- they get stuck on simple steps
- they're impossible to steer mid-task
Most frameworks either hide everything in a black box, or hand you raw tools with no structure.
Neither works in production.
Natural language controls the screen.
Python controls the flow.
Instead of one monolithic agent, Orbit breaks execution into independent steps:
Do · Read · Check · Navigate · Fill
Each step runs its own model, has its own budget, and returns typed output. All steps share context.
- Use a cheap model for simple clicks, a powerful one for complex reasoning
- Cap LLM calls per step , nothing runs forever
- Inject guidance mid-execution when the agent is struggling
- Extract structured data directly into Pydantic models
- Toggle
planner=Falsefor low-latency direct execution
This turns agents from demos into usable systems.
Most agents see pixels.
Orbit sees the UI.
It reads the OS accessibility tree , no screenshots, no DOM hacks. Works across desktop apps and browsers with lower token usage.
pip install orbit-cuafrom dotenv import load_dotenv
load_dotenv()
from orbit import Agent
import asyncio
async def main():
result = await Agent(
task="Open Chrome and go to Wikipedia",
llm="gemini-3-pro-preview",
verbose=True,
).run()
print(result.status)
asyncio.run(main())Set your API key , Orbit supports any model via LiteLLM:
export GEMINI_API_KEY="your-key" # or OPENAI_API_KEY / ANTHROPIC_API_KEYWhen you need precision, drop to the SDK:
from dotenv import load_dotenv
load_dotenv()
from orbit import Do, Read, Check, Navigate, session
from pydantic import BaseModel
import asyncio
class Product(BaseModel):
name: str
price: float
in_stock: bool
class ProductList(BaseModel):
products: list[Product]
async def main():
action_model = "gemini-3-flash-preview"
async with session() as s:
await Navigate(
"https://www.amazon.com/s?k=mechanical+keyboard",
session=s, llm=action_model, max_steps=30, planner=False,
extra_info="Avoid bookmark bar links; use direct navigation tools first.",
verbose=True,
).run()
if await Check(
"The current page is a Captcha page and `Continue Shopping` button is visible",
session=s, llm=action_model, max_steps=30, planner=False,
).check():
await Do(
"Click `Continue Shopping`, then solve the Captcha.",
session=s, llm=action_model, max_steps=30,
).run()
products = await Read(
"All search results",
schema=ProductList,
session=s, llm=action_model, max_steps=30, verbose=True,
).run()
cheapest = min(products.output.products, key=lambda p: p.price)
await Do(f"click on '{cheapest.name}'", session=s, llm=action_model, max_steps=30).run()
if await Check("Add to Cart button is visible", session=s, llm=action_model, max_steps=30).check():
await Do("click Add to Cart", session=s, llm=action_model, max_steps=30).run()
asyncio.run(main())Agents shouldn't be one giant prompt.
They should be composable systems.
Orbit gives you:
- verbs instead of prompts
- steps instead of guesswork
- control instead of hope
Build reusable, domain-specific actions by subclassing BaseActionAgent:
from dotenv import load_dotenv
load_dotenv()
from orbit import BaseActionAgent, Navigate, session
from pydantic import BaseModel
import asyncio
class ProductList(BaseModel):
products: list[dict]
class ReadTopProducts(BaseActionAgent):
def __init__(self, category: str, **kw):
super().__init__(max_steps=12, planner=False, **kw)
self.category = category
def task_prompt(self) -> str:
return (
f"Read top products for '{self.category}' from the current page. "
"Extract name, price, and stock status only. Do not click or navigate."
)
def output_schema(self):
return ProductList
async def main():
async with session() as s:
await Navigate("https://www.amazon.com/s?k=mechanical+keyboard", session=s).run()
result = await ReadTopProducts(
category="mechanical keyboard",
session=s, llm="gemini-3-flash-preview", verbose=True,
).run()
print(result.output.products[:3])
asyncio.run(main())Build from source (requires Rust)
git clone --recurse-submodules https://github.com/aadya940/orbit.git
cd orbit
cd oculos && cargo build --release && cd ..
mkdir -p orbit/_bin
# Linux/macOS
cp oculos/target/release/oculos orbit/_bin/oculos
# Windows
copy oculos\target\release\oculos.exe orbit\_bin\oculos.exe
pip install .macOS users: grant accessibility permissions as described here.
| OS | Architectures |
|---|---|
| Windows | x86-64 (win_amd64) |
| Linux | x86-64 (manylinux) |
| macOS | Intel + Apple Silicon (universal2) |
| Python | 3.10 · 3.11 · 3.12 · 3.13 |
|---|
No permanent file deletion , destructive operations go to Trash/Recycle Bin. Disk writes require explicit human approval via a configurable callback.
Apache 2.0 , Special thanks to OculOS and the open-source packages that make this possible.
