RealTimeVQACaptioning

A real-time image captioning and visual question answering (VQA) system.
This project combines computer vision and NLP to generate descriptive captions for images and answer user questions about them.

GitHub Repo

🚀 Setup

Clone and install dependencies in a fresh virtual environment:

git clone https://github.com/<your-username>/RealTimeVQACaptioning.git
cd RealTimeVQACaptioning
python -m venv venv
source venv/bin/activate   # (Linux/Mac)
venv\Scripts\activate      # (Windows)
pip install -r requirements.txt

🧩 Features

Real-time video/image caption generation
Visual Question Answering (VQA) module with co-attention mechanism
Deep learning pipeline based on CNN/ResNet, Transformers, Faster R-CNN/DETR
Modular code for extensibility and research
Built using PyTorch and Hugging Face tools

Topics and Technologies

Computer Vision
NLP & Deep Learning
CNN, Transformer, VQA, OpenAI models
Object Detection, Video Processing

📊 Pipeline Overview

Pipeline Diagram

Pipeline Steps

Input: Accepts image or video streams.
FeatureExtractor: Extracts visual features using CNN/ResNet backbones.
ObjectDetector: Detects objects via Faster R-CNN or DETR models.
CaptionEncoder: Processes extracted features with a Transformer-based encoder.
CaptionDecoder: Generates natural language captions from encoded features.
VQAModule: Handles Visual Question Answering by encoding questions, applying co-attention, and predicting answers.
VideoOverlay: Superimposes generated captions and VQA answers onto the original video or image frames.
Output: Produces fully annotated frames or processed video as the system output.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

🏷️ Keywords

ai-project ・ caption-generation ・ visual-question-answering ・ deep-learning ・ pytorch ・ transformer ・ object-detection ・ nlp ・ video-processing

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
backend		backend
frontend		frontend
frontend_react		frontend_react
scripts		scripts
src		src
LICENCE		LICENCE
README.md		README.md
app.py		app.py
config.yaml		config.yaml
flow_diagram.svg		flow_diagram.svg
requirements.txt		requirements.txt
run_app.py		run_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RealTimeVQACaptioning

🚀 Setup

🧩 Features

Topics and Technologies

📊 Pipeline Overview

Pipeline Diagram

Pipeline Steps

📄 License

🏷️ Keywords

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RealTimeVQACaptioning

🚀 Setup

🧩 Features

Topics and Technologies

📊 Pipeline Overview

Pipeline Diagram

Pipeline Steps

📄 License

🏷️ Keywords

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages