GitHub - instavm/clickclickclick: A framework to enable autonomous android and computer use using any LLM (local or remote)

ClickClickClick

A framework to enable autonomous android and computer use using any LLM (local or remote)

Demos

create a draft gmail to [email protected] and ask him if he is free for lunch on coming saturday at 1PM. Congratulate on the baby - write one para.

draft.gmail.to.rob.ask.for.lunch.n.congratulate.for.baby.mp4

Can you open the browser at https://www.google.com/maps/ and answer the corresponding task: Find bus stops in Alanson, MI

Browse.Google.Maps.Find.Bus.Stops.mp4

start a 3+2 game on lichess

start.a.3+2.game.in.lichess.mp4

Currently supporting local models via Ollama (Llama 3.2-vision, qwen3.5:4b), Gemini, GPT 4o. The current code is highly experimental and will evolve in future commits. Please use at your own risk.

The best result currently comes from using Gemini 3.1 Flash-Lite as both planner and finder.

Ollama models: qwen3.5:4b works as planner (slow, basic navigation) but not reliable as finder (poor UI element detection). Use with --image-quality=45 for better performance.

How to install

Clone the repository and navigate into the project directory:

git clone https://github.com/yourusername/clickclickclick
cd clickclickclick

It is recommended to create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the dependencies:

pip install -r requirements.txt

How to use

Put your model specific settings in config/models.yaml and export the keys specified in the yaml file.

As CLI tool

Install the tool

pip install <repo-whl>

./click3 run open google.com in browser

Setup

Before running any tasks, you need to configure your planner and finder models using the setup command:

python main.py setup

You will be prompted to choose the planner and finder models and provide any necessary API keys.

Running Tasks

To execute a task, use the run command. The basic usage is:

pip install <repo-whl>

./click3 run <task-prompt>

Options

--platform: Specifies the platform to use, either android or osx. Default is android.
```
python main.py run "example task" --platform=osx
```
--planner-model: Specifies the planner model to use, either openai, gemini, or ollama. Default is openai.
```
python main.py run "example task" --planner-model=gemini
```
--finder-model: Specifies the finder model to use, either openai, gemini, or ollama. Default is gemini.
```
python main.py run "example task" --finder-model=ollama
```
--image-quality: Image quality percentage (1-100). Lower values reduce image size for faster processing. Default is 100.
```
python main.py run "example task" --image-quality=45
```

Example

python main.py run "Open Google news" --platform=android --planner-model=openai --finder-model=gemini

With local ollama model and reduced image quality:

python main.py run "Open Reddit" --platform=android --planner-model=ollama --finder-model=gemini --image-quality=45

Use as an API

POST /execute

Description:

This endpoint executes a task based on the provided task prompt, platform, planner model, and finder model.

Request Body:

task_prompt (string): The prompt for the task that needs to be executed.
platform (string, optional): The platform on which the task is to be executed. Default is "android". Supported platforms: "android", "osx".
planner_model (string, optional): The planner model to be used for planning the task. Default is "openai". Supported models: "openai", "gemini", "ollama".
finder_model (string, optional): The finder model to be used for finding elements to interact with. Default is "gemini". Supported models: "gemini", "openai", "ollama".
image_quality (integer, optional): Image quality percentage (1-100). Lower values reduce processing time. Default is 100.

Response:

200 OK:
- result (object): The result of the task execution.
400 Bad Request:
- detail (string): Description of why the request is invalid (e.g., unsupported platform, unsupported planner model, unsupported finder model).
500 Internal Server Error:
- detail (string): Description of the error that occurred during task execution.

Example Request:

curl -X POST "http://localhost:8000/execute" -H "Content-Type: application/json" -d '{
  "task_prompt": "Open uber app",
  "platform": "android",
  "planner_model": "gemini",
  "finder_model": "openai"
}'

Example Response:

{
  "result": {
    "status": "success",
    "data": {
      // actual task execution result
    }
  }
}

Prerequisites

This project needs adb to be installed on your local machine where the code is being executed.

Project structure

How to contribute

Contributions are welcome! Please open an issue or submit a pull request.

Things to do

Three components-

Planner
Finder
Executor

pip install -r requirements.txt

uvicorn api:app --reload

curl -X POST "http://127.0.0.1:8000/execute" -H "Content-Type: application/json" -d '{ "task_prompt": "Open Safari" }'

Pre-commit -

pre-commit install pre-commit autoupdate

pre-commit run --all-files

License

This project is licensed under the MIT License. See the LICENSE file for details.

Made with ❤️ by InstaVM | Follow us for updates!

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
clickclickclick		clickclickclick
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README-raspberry.md		README-raspberry.md
README.md		README.md
api.py		api.py
interface.py		interface.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClickClickClick

A framework to enable autonomous android and computer use using any LLM (local or remote)

Demos

create a draft gmail to [email protected] and ask him if he is free for lunch on coming saturday at 1PM. Congratulate on the baby - write one para.

Can you open the browser at https://www.google.com/maps/ and answer the corresponding task: Find bus stops in Alanson, MI

start a 3+2 game on lichess

How to install

How to use

As CLI tool

Setup

Running Tasks

Options

Example

Use as an API

POST /execute

Description:

Request Body:

Response:

Example Request:

Example Response:

Prerequisites

Project structure

How to contribute

Things to do

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ClickClickClick

A framework to enable autonomous android and computer use using any LLM (local or remote)

Demos

create a draft gmail to [email protected] and ask him if he is free for lunch on coming saturday at 1PM. Congratulate on the baby - write one para.

Can you open the browser at https://www.google.com/maps/ and answer the corresponding task: Find bus stops in Alanson, MI

start a 3+2 game on lichess

How to install

How to use

As CLI tool

Setup

Running Tasks

Options

Example

Use as an API

POST /execute

Description:

Request Body:

Response:

Example Request:

Example Response:

Prerequisites

Project structure

How to contribute

Things to do

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages