AI-powered autonomous browser agent that generates step-by-step mini-tutorials by navigating websites and completing tasks.
Features • Quick Start • Usage • Configuration • Safety
- Autonomous Navigation - Browses websites, clicks, scrolls, and types autonomously
- Tutorial Generation - Creates structured mini-tutorials from completed tasks
- Session Recording - Records full video + screenshots of each step
- Safety System - Prompts for user confirmation on sensitive actions
- Configurable - Custom turn limits, safety rules, and headless mode
- Organized Output - Saves each tutorial in its own folder with all media
pip install -r requirements.txt
playwright install chromiumGet an API key from Vertex AI.
Then, create a .env file:
GOOGLE_AI_API_KEY=your_api_key_here
MODEL_NAME=gemini-3-flash-previewpython main.py "Go to ai.google.dev and explain how to check the docs for Gemma models"python main.py "<your task>"# Generate a tutorial about API setup
python main.py "How to set up API keys in Google Cloud Console"
# With more turns for complex tasks
python main.py "Create a complete guide for deploying to Vertex AI" --turns 15
# With custom safety instructions
python main.py "Search for documentation" --safety-instructions "Do not accept cookies"
# Short form for turns
python main.py "Find pricing" -t 10| Argument | Alias | Type | Default | Description |
|---|---|---|---|---|
prompt |
- | str | required | Task description for the agent |
--turns |
-t |
int | 5 | Maximum agent turns/actions |
--safety-instructions |
- | str | None | Additional safety rules |
Each tutorial is saved in its own folder:
tutorials/
└── How_to_set_up_API_keys/
├── video.webm # Full session recording
├── screenshot_0.png # Initial state
├── screenshot_1.png # After turn 1
├── screenshot_2.png # After turn 2
└── result.md # Generated tutorial with media
## Mini-Tutorial: How to set up API keys
### Steps:
1. Navigate to console.cloud.google.com
2. Select or create a project from the dropdown
3. Go to "APIs & Services" in the left menu
4. Click "Credentials" then "Create Credentials"
5. Select "API Key" and copy the generated key
### Notes:
- Restrict your API key to specific APIs
- Never commit keys to version control
---
## Session Recording
### Video

### Screenshots


...| Variable | Description | Default |
|---|---|---|
GOOGLE_AI_API_KEY |
Your Vertex AI API key | - |
MODEL_NAME |
Gemini model to use | gemini-3-flash-preview |
SCREEN_WIDTH = 1440 # Browser viewport width
SCREEN_HEIGHT = 900 # Browser viewport height
DEFAULT_TURN_LIMIT = 5 # Default max turnsEdit main.py to run without a visible browser:
browser_manager = BrowserManager(headless=True, task_name=user_prompt)The agent will ask for confirmation before:
| Category | Examples |
|---|---|
| Legal | Terms of Service, Privacy Policies, EULAs |
| Verification | CAPTCHAs, anti-bot checks |
| Financial | Purchases, payments, transfers |
| Communication | Emails, messages, posts |
| Sensitive Data | Health, financial, government records |
| Account Access | Logins, saved passwords |
| Data Management | Downloads, file saves |
Add via CLI:
python main.py "Task" --safety-instructions "Never click ads. Do not accept cookies."Or edit gemini_client.py to add permanent rules.
flowchart TB
subgraph User
U[User Prompt]
end
subgraph Agent["Agent Core (main.py)"]
AL[Agent Loop]
end
subgraph AI["Vertex AI API"]
GC[Google GenAI SDK]
CU[Computer Use Tool]
end
subgraph Browser["Browser Layer (browser.py)"]
BM[Browser Manager]
AE[Action Executor]
CH[Coordinate Helper]
end
subgraph Output["Output"]
VID[Video Recording]
SCR[Screenshots]
MD[Tutorial Markdown]
end
U --> AL
AL --> GC
GC --> CU
CU --> AL
AL --> AE
AE --> BM
BM --> VID
BM --> SCR
AL --> MD
style Agent fill:#e1f5ff
style AI fill:#fff4e1
style Browser fill:#e8f5e9
style Output fill:#f3e5f5
Gemini/
├── config.py # Configuration constants
├── browser.py # Playwright browser automation
├── gemini_client.py # Google Gemini API client
├── main.py # Entry point & agent loop
├── requirements.txt # Python dependencies
├── .env # Environment variables
└── tutorials/ # Generated tutorials
1. Capture initial screenshot
2. Send screenshot + prompt to Vertex AI
3. Vertex AI responds with function_call(s)
4. Execute actions in browser (with safety checks)
5. Capture new screenshot + results
6. Send results back to Vertex AI
7. Repeat until:
- Task complete (text response)
- Turn limit reached
- User denies safety-critical action
sequenceDiagram
participant U as User
participant M as main.py
participant G as GeminiClient
participant A as Vertex AI API
participant E as ActionExecutor
participant B as BrowserManager
participant P as Playwright
U->>M: Run task prompt
M->>B: Start browser
B->>P: Launch Chromium
B->>B: Take initial screenshot
M->>G: Send prompt + screenshot
G->>A: Generate content request
A-->>G: function_call(s)
G-->>M: Parsed actions
M->>E: Execute action
E->>P: Click/Type/Navigate
P-->>E: Action result
M->>B: Capture screenshot
B-->>M: Screenshot + URL
M->>G: Send function response
loop Until complete or limit
G->>A: Continue conversation
A-->>G: Next actions or text response
end
G-->>M: Final tutorial text
M->>B: Save result.md
B->>B: Close browser
MIT
Agente autónomo de navegación web que genera mini-tutoriales paso a paso navegando sitios y completando tareas.
Características • Inicio Rápido • Uso • Configuración • Seguridad
- Navegación Autónoma - Navega sitios, hace clic, scroll y escribe automáticamente
- Generación de Tutoriales - Crea mini-tutoriales estructurados de tareas completadas
- Grabación de Sesiones - Graba video completo + capturas de cada paso
- Sistema de Seguridad - Pide confirmación en acciones sensibles
- Configurable - Límite de turnos, reglas de seguridad, modo headless
- Salida Organizada - Guarda cada tutorial en su propia carpeta con todo el contenido
pip install -r requirements.txt
playwright install chromiumObtén una clave de API desde Vertex AI.
Luego, crea un archivo .env:
GOOGLE_AI_API_KEY=tu_api_key_aqui
MODEL_NAME=gemini-3-flash-previewpython main.py "Ve a ai.google.dev y explica cómo consultar la documentación de los modelos Gemma."python main.py "<tu tarea>"# Generar tutorial sobre configuración de API
python main.py "Cómo configurar API keys en Google Cloud Console"
# Con más turnos para tareas complejas
python main.py "Crear guía completa para desplegar en Vertex AI" --turns 15
# Con instrucciones de seguridad personalizadas
python main.py "Buscar documentación" --safety-instructions "No aceptar cookies"
# Forma corta para turnos
python main.py "Buscar precios" -t 10| Argumento | Alias | Tipo | Default | Descripción |
|---|---|---|---|---|
prompt |
- | str | requerido | Descripción de tarea para el agente |
--turns |
-t |
int | 5 | Máximo de turnos/acciones |
--safety-instructions |
- | str | None | Reglas de seguridad adicionales |
Cada tutorial se guarda en su propia carpeta:
tutorials/
└── Como_configurar_API_keys/
├── video.webm # Grabación completa de sesión
├── screenshot_0.png # Estado inicial
├── screenshot_1.png # Después del turn 1
├── screenshot_2.png # Después del turn 2
└── result.md # Tutorial generado con multimedia
## Mini-Tutorial: Cómo configurar API keys
### Steps:
1. Navega a console.cloud.google.com
2. Selecciona o crea un proyecto
3. Ve a "APIs y Servicios" en el menú izquierdo
4. Haz clic en "Credenciales" luego "Crear Credenciales"
5. Selecciona "API Key" y copia la key generada
### Notes:
- Restringe tu API key a APIs específicas
- Nunca hagas commit de keys a control de versiones
---
## Session Recording
### Video

### Screenshots


...| Variable | Descripción | Default |
|---|---|---|
GOOGLE_AI_API_KEY |
Tu API key de Vertex AI | - |
MODEL_NAME |
Modelo Gemini a usar | gemini-3-flash-preview |
SCREEN_WIDTH = 1440 # Ancho del viewport
SCREEN_HEIGHT = 900 # Alto del viewport
DEFAULT_TURN_LIMIT = 5 # Turnos máximos por defectoEdita main.py para ejecutar sin navegador visible:
browser_manager = BrowserManager(headless=True, task_name=user_prompt)El agente pedirá confirmación antes de:
| Categoría | Ejemplos |
|---|---|
| Legal | Términos de Servicio, Políticas de Privacidad, EULAs |
| Verificación | CAPTCHAs, checks anti-bot |
| Financiero | Compras, pagos, transferencias |
| Comunicación | Emails, mensajes, publicaciones |
| Datos Sensibles | Salud, finanzas, registros gubernamentales |
| Acceso a Cuentas | Logins, contraseñas guardadas |
| Gestión de Datos | Descargas, guardado de archivos |
Vía CLI:
python main.py "Tarea" --safety-instructions "Nunca hagas clic en anuncios. No aceptar cookies."O edita gemini_client.py para agregar reglas permanentes.
Gemini/
├── config.py # Constantes de configuración
├── browser.py # Automatización del navegador (Playwright)
├── gemini_client.py # Cliente de Vertex AI API
├── main.py # Punto de entrada y loop del agente
├── requirements.txt # Dependencias de Python
├── .env # Variables de entorno (gitignored)
└── tutorials/ # Tutoriales generados (gitignored)
1. Capturar screenshot inicial
2. Enviar screenshot + prompt a Vertex AI
3. Vertex AI responde con function_call(s)
4. Ejecutar acciones en navegador (con checks de seguridad)
5. Capturar nuevo screenshot + resultados
6. Enviar resultados a Vertex AI
7. Repetir hasta:
- Tarea completa (respuesta en texto)
- Límite de turnos alcanzado
- Usuario deniega acción crítica
MIT