Train YOLO + VLM with one command. Auto-generate vision-language training data from YOLO labels - no extra labeling needed.
-
Updated
Apr 21, 2026 - Python
Train YOLO + VLM with one command. Auto-generate vision-language training data from YOLO labels - no extra labeling needed.
A benchmark suite for lightweight generative multimodal Vision-Language Models, comparing ViLT and SmolVLM under resource-constrained inference environments. Demonstrates CPU-only deployment, model evaluation, and multimodal reasoning with images and text, highlighting practical GenAI engineering for real-world applications.
AI storyteller built on Qwen2-VL that transforms cultural imagery into authentic narratives using multimodal and progressive fine-tuning.
A Text-Conditioned Segmentation system using CLIPSeg and LoRA to automate industrial quality assurance. The model dynamically segments structural cracks and drywall joints based on natural language prompts (e.g., "segment wall crack").
A Survey on Trends, Challenges and Future Direction in Fire Management using Autonomous Unmanned Aerial Vehicles
Add a description, image, and links to the visionlanguagemodel topic page so that developers can more easily learn about it.
To associate your repository with the visionlanguagemodel topic, visit your repo's landing page and select "manage topics."