ADK with Multimodal Tool Interaction

⚠️ DISCLAIMER: THIS IS NOT AN OFFICIALLY SUPPORTED GOOGLE PRODUCT. THIS PROJECT IS INTENDED FOR DEMONSTRATION PURPOSES ONLY. IT IS NOT INTENDED FOR USE IN A PRODUCTION ENVIRONMENT.

This demo showcases how to implement multimodal tool interaction flow in ADK using a creative product marketing agent use case. In this use case, the agent can refer to the user-uploaded images and perform the required edits by referencing the artifact identifier which is given as context in the model callback. Furthermore, the tool can produce multimodal data (images and videos) and save it as artifact to be used in the conversation context.

Key Capabilities:

Image Editing: Transform and edit product photos using custom tools
Video Generation: Create professional product marketing video clips from images using Google's Veo 3.1 API
Multimodal Context: Seamlessly reference and work with both image and video artifacts throughout the conversation
Automatic Prompt Enrichment: User prompts are automatically enhanced with professional production quality guidelines for marketing-ready videos

Prerequisites

If you are executing this project from your local IDE, Login to Gcloud using CLI with the following command :
```
gcloud auth application-default login
```

Enable the following APIs

gcloud services enable aiplatform.googleapis.com

Install uv dependencies and prepare the python env

curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install 3.12
uv sync --frozen

How to Run

Rename example_full_agent dir to product_photo_editor
Copy the product_photo_editor/.env.example file to product_photo_editor/.env and fill in the values
Rename example_mcp_server dir to veo_mcp
Copy the veo_mcp/.env.example file to veo_mcp/.env and fill in the values
Run the agent using the following command:
```
uv run adk web
```

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
assets		assets
example_full_agent		example_full_agent
example_mcp_server		example_mcp_server
part1_ckpt_agent		part1_ckpt_agent
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
prompt.py		prompt.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock
web_ui.py		web_ui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADK with Multimodal Tool Interaction

Prerequisites

How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ADK with Multimodal Tool Interaction

Prerequisites

How to Run

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages