create a draft gmail to [email protected] and ask him if he is free for lunch on coming saturday at 1PM. Congratulate on the baby - write one para.
draft.gmail.to.rob.ask.for.lunch.n.congratulate.for.baby.mp4
Can you open the browser at https://www.google.com/maps/ and answer the corresponding task: Find bus stops in Alanson, MI
Browse.Google.Maps.Find.Bus.Stops.mp4
start.a.3+2.game.in.lichess.mp4
Currently supporting local models via Ollama (Llama 3.2-vision, qwen3.5:4b), Gemini, GPT 4o. The current code is highly experimental and will evolve in future commits. Please use at your own risk.
The best result currently comes from using Gemini 3.1 Flash-Lite as both planner and finder.
- Quick Start
- Prerequisites
- Installation
- Usage
- Configuration
- Model Recommendations
- Examples
- Troubleshooting
- Contributing
- License
Ollama models: qwen3.5:4b works as planner (slow, basic navigation) but not reliable as finder (poor UI element detection). Use with --image-quality=45 for better performance.
Clone the repository and navigate into the project directory:
git clone https://github.com/yourusername/clickclickclick
cd clickclickclickIt is recommended to create a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`Install the dependencies:
pip install -r requirements.txtPut your model specific settings in config/models.yaml and export the keys specified in the yaml file.
Install the tool
pip install <repo-whl>./click3 run open google.com in browserBefore running any tasks, you need to configure your planner and finder models using the setup command:
python main.py setupYou will be prompted to choose the planner and finder models and provide any necessary API keys.
To execute a task, use the run command. The basic usage is:
pip install <repo-whl>./click3 run <task-prompt>-
--platform: Specifies the platform to use, eitherandroidorosx. Default isandroid.python main.py run "example task" --platform=osx -
--planner-model: Specifies the planner model to use, eitheropenai,gemini, orollama. Default isopenai.python main.py run "example task" --planner-model=gemini -
--finder-model: Specifies the finder model to use, eitheropenai,gemini, orollama. Default isgemini.python main.py run "example task" --finder-model=ollama -
--image-quality: Image quality percentage (1-100). Lower values reduce image size for faster processing. Default is100.python main.py run "example task" --image-quality=45
python main.py run "Open Google news" --platform=android --planner-model=openai --finder-model=geminiWith local ollama model and reduced image quality:
python main.py run "Open Reddit" --platform=android --planner-model=ollama --finder-model=gemini --image-quality=45This endpoint executes a task based on the provided task prompt, platform, planner model, and finder model.
task_prompt(string): The prompt for the task that needs to be executed.platform(string, optional): The platform on which the task is to be executed. Default is "android". Supported platforms: "android", "osx".planner_model(string, optional): The planner model to be used for planning the task. Default is "openai". Supported models: "openai", "gemini", "ollama".finder_model(string, optional): The finder model to be used for finding elements to interact with. Default is "gemini". Supported models: "gemini", "openai", "ollama".image_quality(integer, optional): Image quality percentage (1-100). Lower values reduce processing time. Default is 100.
200 OK:result(object): The result of the task execution.
400 Bad Request:detail(string): Description of why the request is invalid (e.g., unsupported platform, unsupported planner model, unsupported finder model).
500 Internal Server Error:detail(string): Description of the error that occurred during task execution.
curl -X POST "http://localhost:8000/execute" -H "Content-Type: application/json" -d '{
"task_prompt": "Open uber app",
"platform": "android",
"planner_model": "gemini",
"finder_model": "openai"
}'{
"result": {
"status": "success",
"data": {
// actual task execution result
}
}
}This project needs adb to be installed on your local machine where the code is being executed.
Contributions are welcome! Please open an issue or submit a pull request.
Three components-
- Planner
- Finder
- Executor
pip install -r requirements.txt
uvicorn api:app --reload
curl -X POST "http://127.0.0.1:8000/execute" -H "Content-Type: application/json" -d '{ "task_prompt": "Open Safari" }'
Pre-commit -
pre-commit install pre-commit autoupdate
pre-commit run --all-files
This project is licensed under the MIT License. See the LICENSE file for details.
