EdgeAI - CLIP with ExecuTorch + QNN

On-device CLIP model on Android with ExecuTorch + Qualcomm QNN backend for zero-shot image classification and vision-language tasks

What's New in v1.4.0

🆕 CLIP Model Support: Full integration of OpenAI's CLIP for vision-language understanding
🖼️ Zero-Shot Classification: Image classification with natural language queries
🚀 Hardware Acceleration: Optimized inference with Qualcomm QNN backend
📸 Camera Integration: Capture images directly from camera for real-time inference
🔧 Comprehensive Documentation: Detailed setup guides and troubleshooting
🎯 Production Ready: Robust error handling and memory management

Overview

EdgeAI is an Android application showcasing on-device CLIP model inference using ExecuTorch with Qualcomm QNN backend. This implementation demonstrates real multimodal AI inference with actual trained models and hardware acceleration for vision-language tasks.

Supported Models

Model	Size	Use Case	Status
CLIP	~400MB	Zero-shot image classification, Image-text matching, Visual Q&A	✅ Full support

Key Capabilities

✅ Real Model Inference: Actual trained CLIP model, not simulations
✅ Hardware Acceleration: Qualcomm HTP/DSP via QNN backend
✅ Zero-Shot Learning: Classify images without predefined categories
✅ Vision-Language Understanding: Match images with natural language descriptions
✅ Production Ready: Proper error handling and resource management

Features

Core Features

🤖 CLIP Vision-Language Model: OpenAI's CLIP for multimodal understanding
�️ Zero-Shot Classification: Classify images using natural language without training
📸 Camera Integration: Capture photos directly from device camera
� Image-Text Matching: Match images with text descriptions and queries
⚡ Hardware Acceleration: Qualcomm HTP/DSP acceleration via QNN
📱 Android Native: Optimized for mobile devices
🎯 Real-time Inference: Fast vision-language processing

Technical Features

⚙️ Context Binary Support: v79/SoC Model-69 compatibility
🚀 Optimized Performance: ExecuTorch optimizations + QNN acceleration
💾 Efficient Model Loading: External storage for large models
⚡ Real-time Inference: Fast multimodal response generation
🛠️ Developer Friendly: Clean API and comprehensive documentation

Architecture

High-Level Architecture

+-----------------+     +-------------------+     +-----------------+
|   Android App   |     |   ExecuTorch     |     |   Qualcomm QNN  |
|                 |     |                   |     |                 |
|  +-----------+  | <-> |  +-------------+ | <-> |  +-----------+  |
|  | Kotlin UI |  |     |  | Runtime     | |     |  | HTP/DSP   |  |
|  | Camera    |  |     |  | (.pte model)| |     |  | Backend   |  |
|  +-----------+  | <-> |  +-------------+ | <-> |  +-----------+  |
|  +-----------+  |     |  | CLIP        | |     |  | Context   |  |
|  | JNI Layer |  |     |  | Text/Image  | |     |  | Binaries  |  |
|  +-----------+  |     |  | Encoders    | |     |  +-----------+  |
|                 |     |  +-------------+ |     |                 |
+-----------------+     +-------------------+     +-----------------+

Implementation Layers

Android UI Layer: Kotlin-based user interface with camera integration
JNI Bridge: Communication between Kotlin and C++
ExecuTorch Runtime: CLIP model execution and management
QNN Backend: Hardware acceleration layer
Model Layer: CLIP vision and text encoders with real weights

Quick Start

Prerequisites

Android Studio Arctic Fox or later
Android NDK r25 or later
Qualcomm device with HTP/DSP support (e.g., Snapdragon 8 Gen 2/3, Snapdragon 8 Elite)
ExecuTorch 0.7.0+
QNN SDK v79+

Installation

Clone the repository

git clone https://github.com/carrycooldude/EdgeAIApp-ExecuTorch.git
cd EdgeAIApp-ExecuTorch

Download CLIP Model

# Download the CLIP model using the provided script
python download_clip_model.py

Build and install

.\gradlew assembleDebug
adb install app\build\outputs\apk\debug\app-debug.apk

Grant permissions
- Allow camera and storage permissions when prompted

Usage

Launch the EdgeAI app on your device
Tap "Take Photo" to capture an image or "Select Image" from gallery
Enter a question or description about the image (e.g., "What is in this image?")
Tap "Analyze Image" to run CLIP inference
View zero-shot classification results and similarity scores!

Documentation

Technical Documentation

Setup Guides

Release Notes

Setup Guide

1. ExecuTorch Setup

# Clone ExecuTorch
git clone https://github.com/pytorch/executorch.git
cd executorch

# Install dependencies
pip install -e .
pip install torch torchvision torchaudio

2. CLIP Model Download

# Use the provided download script
python download_clip_model.py

# Or manually download from Hugging Face
# The CLIP model will be automatically converted to ExecuTorch format

3. Qualcomm QNN Setup

# Download QNN SDK from Qualcomm
# Extract and set environment variables
export QNN_SDK_ROOT=/path/to/qnn-sdk
export LD_LIBRARY_PATH=$QNN_SDK_ROOT/lib/aarch64-android:$LD_LIBRARY_PATH

4. Model Compilation for QNN

# Export CLIP model to ExecuTorch format with QNN backend
python -m executorch.examples.models.clip \
    --export \
    --model_name clip-vit-base-patch32 \
    --backend qnn

Technical Details

Model Specifications

Model: OpenAI CLIP (ViT-B/32)
Vision Encoder: Vision Transformer Base
Text Encoder: Transformer-based text encoder
Patch Size: 32x32
Image Resolution: 224x224
Embedding Dimension: 512
Vocabulary Size: 49,408
Context Length: 77 tokens

Hardware Requirements

CPU: ARM64-v8a (aarch64)
Accelerator: Qualcomm HTP/DSP
Context Version: v79
SoC Model: 69 (Snapdragon 8 Gen 2/3/Elite)
Architecture: aarch64-android
Minimum RAM: 4GB
Recommended RAM: 6GB+

Performance Metrics

Inference Speed: ~100-150ms per image-text pair
Memory Usage: ~800MB RAM
Model Size: ~400MB
Power Efficiency: Optimized for mobile with QNN acceleration

Development

Project Structure

EdgeAI/
|-- app/                          # Android application
|   |-- src/main/
|   |   |-- cpp/                  # Native C++ implementation
|   |   |   |-- executorch_clip_proper.cpp  # CLIP ExecuTorch + QNN integration
|   |   |   |-- CMakeLists.txt    # Build configuration
|   |   |   `-- ...
|   |   |-- java/                 # Kotlin/Java code
|   |   |   |-- MainActivity.kt   # CLIP UI and inference
|   |   |   `-- ml/ExecutorTorchCLIP.kt  # CLIP model wrapper
|   |   `-- assets/               # Model files and resources
|-- docs/                         # Documentation
|   |-- technical/                # Technical documentation
|   |-- setup/                    # Setup guides
|   `-- releases/                 # Release notes
|-- scripts/                      # Build and setup scripts
|-- download_clip_model.py        # CLIP model download script
`-- README.md                     # This file

Building from Source

# Debug build
.\gradlew assembleDebug

# Release build
.\gradlew assembleRelease

# Clean build
.\gradlew clean

Testing

# Run tests
.\gradlew test

# Run Android tests
.\gradlew connectedAndroidTest

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

Code Style

Follow Android Kotlin style guide
Use meaningful variable names
Add comments for complex logic
Maintain consistent formatting

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

ExecuTorch - PyTorch's mobile inference framework
Qualcomm AI Stack - AI acceleration platform
OpenAI CLIP - Contrastive Language-Image Pre-training model
Android NDK - Native development kit

Support

📧 Email: [email protected]
🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

Made with ❤️ for the AI community

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github		.github
.idea		.idea
app		app
debugging		debugging
docs		docs
executorch		executorch
gemma3_models		gemma3_models
gradle		gradle
models/gemma3-1b		models/gemma3-1b
output		output
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
ABOUT.md		ABOUT.md
CHANGELOG.md		CHANGELOG.md
CLIP_MODEL_STATUS.md		CLIP_MODEL_STATUS.md
CONTRIBUTING.md		CONTRIBUTING.md
GITHUB_CONTENT.md		GITHUB_CONTENT.md
GITHUB_ISSUE_EXECUTORCH.md		GITHUB_ISSUE_EXECUTORCH.md
GITHUB_PAGES_SETUP.md		GITHUB_PAGES_SETUP.md
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
LICENSE		LICENSE
MODEL_DOWNLOAD_STEPS.md		MODEL_DOWNLOAD_STEPS.md
PACKAGE_INFO.md		PACKAGE_INFO.md
PACKAGE_README.md		PACKAGE_README.md
README.md		README.md
SETUP_COMPLETE.md		SETUP_COMPLETE.md
TECHNICAL_BLOG_JOURNEY.md		TECHNICAL_BLOG_JOURNEY.md
TESTING_GUIDE.md		TESTING_GUIDE.md
build.gradle.kts		build.gradle.kts
build_log.txt		build_log.txt
copy_model_to_device.ps1		copy_model_to_device.ps1
download_clip_model.py		download_clip_model.py
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
index.html		index.html
local.properties.template		local.properties.template
settings.gradle.kts		settings.gradle.kts

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

EdgeAI - CLIP with ExecuTorch + QNN

What's New in v1.4.0

Table of Contents

Overview

Supported Models

Key Capabilities

Features

Core Features

Technical Features

Architecture

High-Level Architecture

Implementation Layers

Quick Start

Prerequisites

Installation

Usage

Documentation

Technical Documentation

Setup Guides

Release Notes

Setup Guide

1. ExecuTorch Setup

2. CLIP Model Download

3. Qualcomm QNN Setup

4. Model Compilation for QNN

Technical Details

Model Specifications

Hardware Requirements

Performance Metrics

Development

Project Structure

Building from Source

Testing

Contributing

Development Workflow

Code Style

License

Acknowledgments

Support

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 1

Languages

Packages