On-device CLIP model on Android with ExecuTorch + Qualcomm QNN backend for zero-shot image classification and vision-language tasks
- 🆕 CLIP Model Support: Full integration of OpenAI's CLIP for vision-language understanding
- 🖼️ Zero-Shot Classification: Image classification with natural language queries
- 🚀 Hardware Acceleration: Optimized inference with Qualcomm QNN backend
- 📸 Camera Integration: Capture images directly from camera for real-time inference
- 🔧 Comprehensive Documentation: Detailed setup guides and troubleshooting
- 🎯 Production Ready: Robust error handling and memory management
- Overview
- Features
- Architecture
- Quick Start
- Documentation
- Setup Guide
- Technical Details
- Contributing
- License
EdgeAI is an Android application showcasing on-device CLIP model inference using ExecuTorch with Qualcomm QNN backend. This implementation demonstrates real multimodal AI inference with actual trained models and hardware acceleration for vision-language tasks.
| Model | Size | Use Case | Status |
|---|---|---|---|
| CLIP | ~400MB | Zero-shot image classification, Image-text matching, Visual Q&A | ✅ Full support |
- ✅ Real Model Inference: Actual trained CLIP model, not simulations
- ✅ Hardware Acceleration: Qualcomm HTP/DSP via QNN backend
- ✅ Zero-Shot Learning: Classify images without predefined categories
- ✅ Vision-Language Understanding: Match images with natural language descriptions
- ✅ Production Ready: Proper error handling and resource management
- 🤖 CLIP Vision-Language Model: OpenAI's CLIP for multimodal understanding
- �️ Zero-Shot Classification: Classify images using natural language without training
- 📸 Camera Integration: Capture photos directly from device camera
- � Image-Text Matching: Match images with text descriptions and queries
- ⚡ Hardware Acceleration: Qualcomm HTP/DSP acceleration via QNN
- 📱 Android Native: Optimized for mobile devices
- 🎯 Real-time Inference: Fast vision-language processing
- ⚙️ Context Binary Support: v79/SoC Model-69 compatibility
- 🚀 Optimized Performance: ExecuTorch optimizations + QNN acceleration
- 💾 Efficient Model Loading: External storage for large models
- ⚡ Real-time Inference: Fast multimodal response generation
- 🛠️ Developer Friendly: Clean API and comprehensive documentation
+-----------------+ +-------------------+ +-----------------+
| Android App | | ExecuTorch | | Qualcomm QNN |
| | | | | |
| +-----------+ | <-> | +-------------+ | <-> | +-----------+ |
| | Kotlin UI | | | | Runtime | | | | HTP/DSP | |
| | Camera | | | | (.pte model)| | | | Backend | |
| +-----------+ | <-> | +-------------+ | <-> | +-----------+ |
| +-----------+ | | | CLIP | | | | Context | |
| | JNI Layer | | | | Text/Image | | | | Binaries | |
| +-----------+ | | | Encoders | | | +-----------+ |
| | | +-------------+ | | |
+-----------------+ +-------------------+ +-----------------+
- Android UI Layer: Kotlin-based user interface with camera integration
- JNI Bridge: Communication between Kotlin and C++
- ExecuTorch Runtime: CLIP model execution and management
- QNN Backend: Hardware acceleration layer
- Model Layer: CLIP vision and text encoders with real weights
- Android Studio Arctic Fox or later
- Android NDK r25 or later
- Qualcomm device with HTP/DSP support (e.g., Snapdragon 8 Gen 2/3, Snapdragon 8 Elite)
- ExecuTorch 0.7.0+
- QNN SDK v79+
-
Clone the repository
git clone https://github.com/carrycooldude/EdgeAIApp-ExecuTorch.git cd EdgeAIApp-ExecuTorch -
Download CLIP Model
# Download the CLIP model using the provided script python download_clip_model.py -
Build and install
.\gradlew assembleDebug adb install app\build\outputs\apk\debug\app-debug.apk
-
Grant permissions
- Allow camera and storage permissions when prompted
- Launch the EdgeAI app on your device
- Tap "Take Photo" to capture an image or "Select Image" from gallery
- Enter a question or description about the image (e.g., "What is in this image?")
- Tap "Analyze Image" to run CLIP inference
- View zero-shot classification results and similarity scores!
# Clone ExecuTorch
git clone https://github.com/pytorch/executorch.git
cd executorch
# Install dependencies
pip install -e .
pip install torch torchvision torchaudio# Use the provided download script
python download_clip_model.py
# Or manually download from Hugging Face
# The CLIP model will be automatically converted to ExecuTorch format# Download QNN SDK from Qualcomm
# Extract and set environment variables
export QNN_SDK_ROOT=/path/to/qnn-sdk
export LD_LIBRARY_PATH=$QNN_SDK_ROOT/lib/aarch64-android:$LD_LIBRARY_PATH# Export CLIP model to ExecuTorch format with QNN backend
python -m executorch.examples.models.clip \
--export \
--model_name clip-vit-base-patch32 \
--backend qnn- Model: OpenAI CLIP (ViT-B/32)
- Vision Encoder: Vision Transformer Base
- Text Encoder: Transformer-based text encoder
- Patch Size: 32x32
- Image Resolution: 224x224
- Embedding Dimension: 512
- Vocabulary Size: 49,408
- Context Length: 77 tokens
- CPU: ARM64-v8a (aarch64)
- Accelerator: Qualcomm HTP/DSP
- Context Version: v79
- SoC Model: 69 (Snapdragon 8 Gen 2/3/Elite)
- Architecture: aarch64-android
- Minimum RAM: 4GB
- Recommended RAM: 6GB+
- Inference Speed: ~100-150ms per image-text pair
- Memory Usage: ~800MB RAM
- Model Size: ~400MB
- Power Efficiency: Optimized for mobile with QNN acceleration
EdgeAI/
|-- app/ # Android application
| |-- src/main/
| | |-- cpp/ # Native C++ implementation
| | | |-- executorch_clip_proper.cpp # CLIP ExecuTorch + QNN integration
| | | |-- CMakeLists.txt # Build configuration
| | | `-- ...
| | |-- java/ # Kotlin/Java code
| | | |-- MainActivity.kt # CLIP UI and inference
| | | `-- ml/ExecutorTorchCLIP.kt # CLIP model wrapper
| | `-- assets/ # Model files and resources
|-- docs/ # Documentation
| |-- technical/ # Technical documentation
| |-- setup/ # Setup guides
| `-- releases/ # Release notes
|-- scripts/ # Build and setup scripts
|-- download_clip_model.py # CLIP model download script
`-- README.md # This file
# Debug build
.\gradlew assembleDebug
# Release build
.\gradlew assembleRelease
# Clean build
.\gradlew clean# Run tests
.\gradlew test
# Run Android tests
.\gradlew connectedAndroidTestWe welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- Follow Android Kotlin style guide
- Use meaningful variable names
- Add comments for complex logic
- Maintain consistent formatting
This project is licensed under the MIT License - see the LICENSE file for details.
- ExecuTorch - PyTorch's mobile inference framework
- Qualcomm AI Stack - AI acceleration platform
- OpenAI CLIP - Contrastive Language-Image Pre-training model
- Android NDK - Native development kit
- 📧 Email: [email protected]
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
Made with ❤️ for the AI community