🖥️ Computer Vision — CNN + RNN + LSTM

A Complete Hands-On Guide for Students & Professionals

Learn how Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Long Short-Term Memory (LSTM) networks work — individually and together — then deploy to the cloud.

📑 Table of Contents

#	Section	Description
1	What is Computer Vision?	Core concepts & real-world applications
2	Project Architecture	Repository layout & file map
3	Installation & Setup	Complete step-by-step guide for Windows & macOS/Linux
4	Why Virtual Environment?	Understanding venv and why it matters
5	Running the Code	How to execute each module
6	Learning Path	Structured order of study
7	CNN Module	Image classification from scratch
8	RNN Module	Sequence modeling on image features
9	LSTM Module	Long-range dependency modeling
10	Combined Module	Video classification pipeline
11	Azure Deployment	Cloud hosting & inference API
12	LinkedIn Guide	Showcase your project professionally
13	Troubleshooting	Common issues and solutions
14	FAQ	Common questions answered
15	Contributing	How to contribute

🌍 What is Computer Vision?

Computer Vision (CV) is a field of artificial intelligence that enables machines to interpret and understand visual information from the world — images, videos, and real-time camera feeds.

graph LR
    A["📷 Image/Video Input"] --> B["🔍 Preprocessing"]
    B --> C["🧠 Deep Learning Model"]
    C --> D["📊 Prediction / Classification"]
    D --> E["🎯 Action / Decision"]

    style A fill:#4CAF50,stroke:#333,color:#fff
    style B fill:#2196F3,stroke:#333,color:#fff
    style C fill:#FF9800,stroke:#333,color:#fff
    style D fill:#9C27B0,stroke:#333,color:#fff
    style E fill:#F44336,stroke:#333,color:#fff

Real-World Applications

Domain	Application	Models Used
🏥 Healthcare	X-ray / MRI diagnosis	CNN
🚗 Autonomous Vehicles	Object detection & tracking	CNN + RNN
📹 Surveillance	Activity recognition in video	CNN + LSTM
🛒 Retail	Product recognition	CNN
🎮 Gaming / AR	Gesture & pose estimation	CNN + LSTM
📝 OCR	Handwriting recognition	CNN + RNN + LSTM

🏗️ Project Architecture

computerV/
├── 📄 README.md                  ← You are here
├── 📄 requirements.txt           ← All Python dependencies (libraries needed)
├── 📄 setup.ps1                  ← One-click setup script (Windows PowerShell)
├── 📄 setup.sh                   ← One-click setup script (macOS/Linux Terminal)
├── 📄 .gitignore                 ← Files Git should ignore
│
├── 📂 docs/                      ← Detailed concept guides & tutorials
│   ├── 01_introduction_to_computer_vision.md
│   ├── 02_convolutional_neural_networks.md
│   ├── 03_recurrent_neural_networks.md
│   ├── 04_long_short_term_memory.md
│   ├── 05_cnn_rnn_lstm_combined.md
│   ├── 06_azure_deployment_guide.md
│   ├── 07_linkedin_publishing_guide.md
│   └── 08_run_setup_lean.md
│
├── 📂 src/                       ← All Python source code
│   ├── 📂 01_cnn/
│   │   └── cnn_image_classifier.py    ← CNN model training script
│   ├── 📂 02_rnn/
│   │   └── rnn_sequence_model.py      ← RNN model training script
│   ├── 📂 03_lstm/
│   │   └── lstm_model.py              ← LSTM model training script
│   ├── 📂 04_combined/
│   │   └── cnn_rnn_lstm_combined.py   ← Combined architecture script
│   └── 📂 utils/
│       ├── data_loader.py             ← Dataset loading utilities
│       └── visualization.py           ← Plotting and visualization helpers
│
├── 📂 data/                      ← Datasets (auto-downloaded when you run scripts)
├── 📂 models/                    ← Saved model weights (.pth files)
└── 📂 outputs/                   ← Training plots & metrics (generated during training)

🚀 Installation & Setup Guide

This section provides complete step-by-step instructions for both Windows and macOS/Linux users.

Prerequisites

Before you begin, make sure you have these installed:

Software	Version	Download Link	Why You Need It
Python	3.10 or higher	python.org/downloads	Runs the machine learning code
Git	Any recent version	git-scm.com/downloads	Downloads the project code
VS Code (recommended)	Any recent version	code.visualstudio.com	Best code editor for Python

💡 Tip for Windows users: When installing Python, make sure to check ✅ "Add Python to PATH" during installation!

Step 1: Download the Project

Open your terminal (or PowerShell on Windows) and run:

git clone https://github.com/EricKart/computerV.git
cd computerV

What this does:

git clone downloads all the project files from GitHub to your computer
cd computerV moves you into the project folder

Step 2: Run the Setup Script

The setup script automatically creates a virtual environment and installs all required packages.

🪟 Windows (PowerShell)

.\setup.ps1

⚠️ If you get an error about "execution policy", run this first:
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
Then try .\setup.ps1 again.

🍎 macOS / 🐧 Linux (Terminal)

chmod +x setup.sh
./setup.sh

What the setup script does:

✅ Creates a venv/ folder (virtual environment) to isolate project dependencies
✅ Upgrades pip (Python's package installer) to the latest version
✅ Installs all required libraries from requirements.txt (PyTorch, NumPy, etc.)
✅ Creates necessary folders: data/, models/, outputs/, logs/

Step 3: Activate the Virtual Environment

Before running any Python code, you must activate the virtual environment:

🪟 Windows (PowerShell)

.\venv\Scripts\Activate.ps1

You'll see (venv) appear at the beginning of your command line, like this:

(venv) PS C:\Users\YourName\computerV>

🍎 macOS / 🐧 Linux (Terminal)

source venv/bin/activate

You'll see (venv) appear at the beginning of your command line, like this:

(venv) user@computer:~/computerV$

💡 Remember: You need to activate the virtual environment every time you open a new terminal window to work on this project!

Step 4: Verify Installation

Run these commands to make sure everything is set up correctly:

python --version

Expected output: Python 3.10.x or higher

python -c "import torch, torchvision, matplotlib, onnx; print('✅ Environment OK - All packages installed!')"

Expected output: ✅ Environment OK - All packages installed!

Step 5: You're Ready! 🎉

Now you can run any of the training scripts. Start with the CNN module:

python src/01_cnn/cnn_image_classifier.py

🤔 Why Do We Use a Virtual Environment (venv)?

Understanding why we use virtual environments is important for every Python developer.

The Problem Without Virtual Environments

Imagine you have two Python projects:

Project A needs numpy version 1.20
Project B needs numpy version 1.24

If you install both globally, they will conflict! Only one version can exist at a time.

The Solution: Virtual Environments

A virtual environment is like a separate, isolated Python installation for each project.

Your Computer
├── Global Python (system-wide)
│
├── Project A/
│   └── venv/          ← Has its own numpy 1.20
│       └── numpy 1.20
│
└── Project B/
    └── venv/          ← Has its own numpy 1.24
        └── numpy 1.24

Benefits of Using venv

Benefit	Explanation
🔒 Isolation	Each project has its own packages — no conflicts
📦 Reproducibility	Anyone can recreate the exact same environment
🧹 Clean System	Your global Python stays clean and uncluttered
🔄 Easy Reset	Delete `venv/` folder and run setup again to start fresh
👥 Team Collaboration	Everyone on the team uses the same package versions

How venv Works in This Project

setup.sh / setup.ps1 creates the venv/ folder
requirements.txt lists all packages and their versions
Activating (source venv/bin/activate or .\venv\Scripts\Activate.ps1) tells your terminal to use this project's Python and packages
Deactivating (deactivate) returns you to global Python

💡 Pro Tip: The venv/ folder is listed in .gitignore so it's never uploaded to GitHub. Each person creates their own venv/ using the setup script.

▶️ Running the Code

Understanding the Module Scripts

Each script in src/ is a complete, runnable program that:

Downloads the CIFAR-10 dataset (10 categories of 32x32 images)
Builds a neural network model (CNN, RNN, LSTM, or Combined)
Trains the model for several epochs
Evaluates accuracy on test data
Saves the trained model to models/
Generates visualization plots in outputs/

Running Each Module

Make sure your virtual environment is activated (you see (venv) in your terminal), then:

# Module 1: CNN (Convolutional Neural Network)
python src/01_cnn/cnn_image_classifier.py

# Module 2: RNN (Recurrent Neural Network)
python src/02_rnn/rnn_sequence_model.py

# Module 3: LSTM (Long Short-Term Memory)
python src/03_lstm/lstm_model.py

# Module 4: Combined CNN + LSTM
python src/04_combined/cnn_rnn_lstm_combined.py

What to Expect When Running

When you run a script, you'll see:

Dataset download (first time only) — downloads ~170MB of images
Training progress — shows loss and accuracy for each epoch
Final results — test accuracy and best model saved

Example output:

============================================================
  MODULE 1: CNN IMAGE CLASSIFIER
  Dataset : CIFAR-10  |  Model : CifarCNN
============================================================

Epoch [01/15]  Train Loss: 1.4532  Acc: 47.23%  │  Test Loss: 1.1234  Acc: 60.12%  │  23.5s
Epoch [02/15]  Train Loss: 1.0123  Acc: 63.45%  │  Test Loss: 0.9876  Acc: 65.78%  │  22.1s
...
🏆 Best Test Accuracy: 78.45%

Generated Files

After training, check these folders:

Folder	Contents
`outputs/`	Training curves, confusion matrices, sample predictions (PNG images)
`models/`	Saved model weights (`.pth` files) and ONNX exports
`data/`	Downloaded CIFAR-10 dataset

📚 Learning Path

Follow this order for the best learning experience:

graph TD
    A["📖 1. Introduction to CV"] --> B["🔲 2. CNN – See Patterns"]
    B --> C["🔄 3. RNN – Understand Sequences"]
    C --> D["🧠 4. LSTM – Remember Long-Term"]
    D --> E["🔗 5. CNN+RNN+LSTM Combined"]
    E --> F["☁️ 6. Deploy to Azure"]
    F --> G["📢 7. Publish on LinkedIn"]

    style A fill:#E8F5E9,stroke:#4CAF50,color:#1B5E20
    style B fill:#E3F2FD,stroke:#2196F3,color:#0D47A1
    style C fill:#FFF3E0,stroke:#FF9800,color:#E65100
    style D fill:#F3E5F5,stroke:#9C27B0,color:#4A148C
    style E fill:#FCE4EC,stroke:#E91E63,color:#880E4F
    style F fill:#E0F7FA,stroke:#00BCD4,color:#006064
    style G fill:#FFF9C4,stroke:#FFEB3B,color:#F57F17

Step	Document	Time	What You'll Learn
1	Introduction to CV	15 min	Pixels, color spaces, image processing basics
2	CNN Deep Dive	30 min	Convolutions, pooling, feature maps, architectures
3	RNN Deep Dive	25 min	Sequential data, hidden states, backprop through time
4	LSTM Deep Dive	25 min	Gates, cell state, vanishing gradient solution
5	Combined Architecture	30 min	Feature extraction + temporal modeling
6	Azure Deployment	45 min	Model hosting, REST API, Microsoft Foundry
7	LinkedIn Guide	15 min	Project showcase, post templates

🔲 Module 1 — CNN (Convolutional Neural Network)

Goal: Classify images from the CIFAR-10 dataset into 10 categories.

graph LR
    A["Input Image\n32×32×3"] --> B["Conv Layer 1\n+ ReLU"]
    B --> C["MaxPool"]
    C --> D["Conv Layer 2\n+ ReLU"]
    D --> E["MaxPool"]
    E --> F["Flatten"]
    F --> G["Fully Connected"]
    G --> H["Output\n10 classes"]

    style A fill:#4CAF50,stroke:#333,color:#fff
    style B fill:#2196F3,stroke:#333,color:#fff
    style C fill:#03A9F4,stroke:#333,color:#fff
    style D fill:#2196F3,stroke:#333,color:#fff
    style E fill:#03A9F4,stroke:#333,color:#fff
    style F fill:#FF9800,stroke:#333,color:#fff
    style G fill:#9C27B0,stroke:#333,color:#fff
    style H fill:#F44336,stroke:#333,color:#fff

Key Concepts:

Convolution — Slides a small filter across the image to detect edges, textures, shapes
Pooling — Reduces spatial dimensions (downsampling) while retaining important features
ReLU — Non-linear activation: f(x) = max(0, x)

📂 Code: src/01_cnn/cnn_image_classifier.py
📖 Docs: docs/02_convolutional_neural_networks.md

python src/01_cnn/cnn_image_classifier.py

🔄 Module 2 — RNN (Recurrent Neural Network)

Goal: Process sequential image features to understand temporal/spatial patterns.

graph LR
    A["x₁"] --> B["RNN Cell"]
    B --> C["h₁"]
    C --> D["RNN Cell"]
    A2["x₂"] --> D
    D --> E["h₂"]
    E --> F["RNN Cell"]
    A3["x₃"] --> F
    F --> G["h₃"]
    G --> H["Output"]

    style B fill:#FF9800,stroke:#333,color:#fff
    style D fill:#FF9800,stroke:#333,color:#fff
    style F fill:#FF9800,stroke:#333,color:#fff
    style H fill:#F44336,stroke:#333,color:#fff

Key Concepts:

Hidden State — Memory that carries information from previous time steps
Recurrence — Same weights applied at every time step
Backpropagation Through Time (BPTT) — Training across sequences

📂 Code: src/02_rnn/rnn_sequence_model.py
📖 Docs: docs/03_recurrent_neural_networks.md

python src/02_rnn/rnn_sequence_model.py

🧠 Module 3 — LSTM (Long Short-Term Memory)

Goal: Solve the vanishing gradient problem and model long-range dependencies.

graph TD
    subgraph LSTM Cell
        A["Forget Gate 🚪"] --> D["Cell State"]
        B["Input Gate 🚪"] --> D
        D --> E["Output Gate 🚪"]
        E --> F["Hidden State"]
    end

    style A fill:#F44336,stroke:#333,color:#fff
    style B fill:#4CAF50,stroke:#333,color:#fff
    style D fill:#2196F3,stroke:#333,color:#fff
    style E fill:#FF9800,stroke:#333,color:#fff
    style F fill:#9C27B0,stroke:#333,color:#fff

Key Concepts:

Forget Gate — Decides what information to discard from cell state
Input Gate — Decides what new information to store
Output Gate — Decides what to output based on cell state
Cell State — Long-term memory highway

📂 Code: src/03_lstm/lstm_model.py
📖 Docs: docs/04_long_short_term_memory.md

python src/03_lstm/lstm_model.py

🔗 Module 4 — CNN + RNN + LSTM Combined

Goal: Build a video classification pipeline — CNN extracts spatial features per frame, LSTM captures temporal patterns across frames.

graph TD
    subgraph "Per-Frame Feature Extraction (CNN)"
        A["Frame 1 🖼️"] --> B["CNN"]
        A2["Frame 2 🖼️"] --> B2["CNN"]
        A3["Frame 3 🖼️"] --> B3["CNN"]
        AN["Frame N 🖼️"] --> BN["CNN"]
    end

    B --> C["Feature Vector 1"]
    B2 --> C2["Feature Vector 2"]
    B3 --> C3["Feature Vector 3"]
    BN --> CN["Feature Vector N"]

    subgraph "Temporal Modeling (LSTM)"
        C --> D["LSTM"]
        C2 --> D
        C3 --> D
        CN --> D
        D --> E["Final Hidden State"]
    end

    E --> F["🎯 Video Class Prediction"]

    style B fill:#2196F3,stroke:#333,color:#fff
    style B2 fill:#2196F3,stroke:#333,color:#fff
    style B3 fill:#2196F3,stroke:#333,color:#fff
    style BN fill:#2196F3,stroke:#333,color:#fff
    style D fill:#FF9800,stroke:#333,color:#fff
    style F fill:#F44336,stroke:#333,color:#fff

📂 Code: src/04_combined/cnn_rnn_lstm_combined.py
📖 Docs: docs/05_cnn_rnn_lstm_combined.md

python src/04_combined/cnn_rnn_lstm_combined.py

☁️ Azure & Microsoft Foundry Deployment

Take your trained model from local to production in the cloud.

graph LR
    A["Trained Model\n(.pth)"] --> B["Export to ONNX"]
    B --> C["Azure ML\nWorkspace"]
    C --> D["Deploy as\nREST API"]
    D --> E["Consume from\nAny App"]

    style A fill:#4CAF50,stroke:#333,color:#fff
    style B fill:#2196F3,stroke:#333,color:#fff
    style C fill:#0078D4,stroke:#333,color:#fff
    style D fill:#FF9800,stroke:#333,color:#fff
    style E fill:#9C27B0,stroke:#333,color:#fff

📖 Full Guide: docs/06_azure_deployment_guide.md

📢 Publish on LinkedIn

Showcase your completed project to recruiters and the tech community.

📖 Full Guide: docs/07_linkedin_publishing_guide.md

🔧 Troubleshooting

Common Issues and Solutions

Problem	Cause	Solution
`python` command not found	Python not installed or not in PATH	Install Python 3.10+ and check "Add to PATH" during installation. Restart your terminal.
`ModuleNotFoundError: No module named 'torch'`	Virtual environment not activated	Run `source venv/bin/activate` (Mac/Linux) or `.\venv\Scripts\Activate.ps1` (Windows)
PowerShell script execution blocked	Windows security policy	Run `Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass` first
`pip install` fails with permission error	Trying to install globally	Make sure venv is activated (you should see `(venv)` in your terminal)
Training is very slow	Running on CPU	This is normal without a GPU. Reduce `EPOCHS` in the script to speed up.
CUDA out of memory	GPU memory full	Reduce `BATCH_SIZE` in the script (e.g., 64 → 32 → 16)
macOS: `python3-venv` not found	venv module not installed	Install with `brew install python3` or use the Python installer from python.org

Fresh Start

If something is broken and you want to start over:

# Delete the virtual environment
rm -rf venv          # macOS/Linux
rmdir /s /q venv     # Windows

# Run setup again
./setup.sh           # macOS/Linux
.\setup.ps1          # Windows

❓ FAQ

Q: Do I need a GPU?

No. All scripts detect whether a GPU is available and fall back to CPU. Training will be slower on CPU but fully functional. The CIFAR-10 dataset is small enough for CPU training.

Q: Which Python version should I use?

Python 3.10 or later. We recommend 3.11 for best compatibility.

Q: The setup script fails — what do I do?

Make sure python is in your PATH: python --version
On Windows, you may need to allow script execution: Set-ExecutionPolicy -Scope CurrentUser RemoteSigned
On Linux/Mac, ensure you have python3-venv: sudo apt install python3-venv (Ubuntu/Debian)

Q: Can I use TensorFlow instead of PyTorch?

This project uses PyTorch throughout. The concepts are framework-agnostic — once you understand them here, translating to TensorFlow/Keras is straightforward.

Q: What is the `venv/` folder and why is it so large?

The `venv/` folder contains your project's virtual environment with all installed packages (PyTorch, NumPy, etc.). It can be 2-5 GB depending on your system. This is normal! Each student creates their own `venv/` locally — it's never uploaded to GitHub.

Q: How do I update to the latest code from the instructor?

# Pull the latest changes (--ff-only prevents merge commits)
git pull --ff-only

# Re-run setup to install any new dependencies
./setup.sh           # macOS/Linux
.\setup.ps1          # Windows

💡 If git pull fails, you may have local changes. Run git stash first to save your changes, then git pull --ff-only, then git stash pop to restore your changes.

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-addition
Commit your changes: git commit -m 'Add amazing addition'
Push to the branch: git push origin feature/amazing-addition
Open a Pull Request

📜 License

This project is licensed under the MIT License — see the LICENSE file for details.

Built with ❤️ for the student community

Star ⭐ this repo if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
docs		docs
models		models
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.ps1		setup.ps1
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

🖥️ Computer Vision — CNN + RNN + LSTM

A Complete Hands-On Guide for Students & Professionals

📑 Table of Contents

🌍 What is Computer Vision?

Real-World Applications

🏗️ Project Architecture

🚀 Installation & Setup Guide

Prerequisites

Step 1: Download the Project

Step 2: Run the Setup Script

🪟 Windows (PowerShell)

🍎 macOS / 🐧 Linux (Terminal)

Step 3: Activate the Virtual Environment

🪟 Windows (PowerShell)

🍎 macOS / 🐧 Linux (Terminal)

Step 4: Verify Installation

Step 5: You're Ready! 🎉

🤔 Why Do We Use a Virtual Environment (venv)?

The Problem Without Virtual Environments

The Solution: Virtual Environments

Benefits of Using venv

How venv Works in This Project

▶️ Running the Code

Understanding the Module Scripts

Running Each Module

What to Expect When Running

Generated Files

📚 Learning Path

🔲 Module 1 — CNN (Convolutional Neural Network)

🔄 Module 2 — RNN (Recurrent Neural Network)

🧠 Module 3 — LSTM (Long Short-Term Memory)

🔗 Module 4 — CNN + RNN + LSTM Combined

☁️ Azure & Microsoft Foundry Deployment

📢 Publish on LinkedIn

🔧 Troubleshooting

Common Issues and Solutions

Fresh Start

❓ FAQ

🤝 Contributing

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages