Learn how Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Long Short-Term Memory (LSTM) networks work — individually and together — then deploy to the cloud.
| # | Section | Description |
|---|---|---|
| 1 | What is Computer Vision? | Core concepts & real-world applications |
| 2 | Project Architecture | Repository layout & file map |
| 3 | Installation & Setup | Complete step-by-step guide for Windows & macOS/Linux |
| 4 | Why Virtual Environment? | Understanding venv and why it matters |
| 5 | Running the Code | How to execute each module |
| 6 | Learning Path | Structured order of study |
| 7 | CNN Module | Image classification from scratch |
| 8 | RNN Module | Sequence modeling on image features |
| 9 | LSTM Module | Long-range dependency modeling |
| 10 | Combined Module | Video classification pipeline |
| 11 | Azure Deployment | Cloud hosting & inference API |
| 12 | LinkedIn Guide | Showcase your project professionally |
| 13 | Troubleshooting | Common issues and solutions |
| 14 | FAQ | Common questions answered |
| 15 | Contributing | How to contribute |
Computer Vision (CV) is a field of artificial intelligence that enables machines to interpret and understand visual information from the world — images, videos, and real-time camera feeds.
graph LR
A["📷 Image/Video Input"] --> B["🔍 Preprocessing"]
B --> C["🧠 Deep Learning Model"]
C --> D["📊 Prediction / Classification"]
D --> E["🎯 Action / Decision"]
style A fill:#4CAF50,stroke:#333,color:#fff
style B fill:#2196F3,stroke:#333,color:#fff
style C fill:#FF9800,stroke:#333,color:#fff
style D fill:#9C27B0,stroke:#333,color:#fff
style E fill:#F44336,stroke:#333,color:#fff
| Domain | Application | Models Used |
|---|---|---|
| 🏥 Healthcare | X-ray / MRI diagnosis | CNN |
| 🚗 Autonomous Vehicles | Object detection & tracking | CNN + RNN |
| 📹 Surveillance | Activity recognition in video | CNN + LSTM |
| 🛒 Retail | Product recognition | CNN |
| 🎮 Gaming / AR | Gesture & pose estimation | CNN + LSTM |
| 📝 OCR | Handwriting recognition | CNN + RNN + LSTM |
computerV/
├── 📄 README.md ← You are here
├── 📄 requirements.txt ← All Python dependencies (libraries needed)
├── 📄 setup.ps1 ← One-click setup script (Windows PowerShell)
├── 📄 setup.sh ← One-click setup script (macOS/Linux Terminal)
├── 📄 .gitignore ← Files Git should ignore
│
├── 📂 docs/ ← Detailed concept guides & tutorials
│ ├── 01_introduction_to_computer_vision.md
│ ├── 02_convolutional_neural_networks.md
│ ├── 03_recurrent_neural_networks.md
│ ├── 04_long_short_term_memory.md
│ ├── 05_cnn_rnn_lstm_combined.md
│ ├── 06_azure_deployment_guide.md
│ ├── 07_linkedin_publishing_guide.md
│ └── 08_run_setup_lean.md
│
├── 📂 src/ ← All Python source code
│ ├── 📂 01_cnn/
│ │ └── cnn_image_classifier.py ← CNN model training script
│ ├── 📂 02_rnn/
│ │ └── rnn_sequence_model.py ← RNN model training script
│ ├── 📂 03_lstm/
│ │ └── lstm_model.py ← LSTM model training script
│ ├── 📂 04_combined/
│ │ └── cnn_rnn_lstm_combined.py ← Combined architecture script
│ └── 📂 utils/
│ ├── data_loader.py ← Dataset loading utilities
│ └── visualization.py ← Plotting and visualization helpers
│
├── 📂 data/ ← Datasets (auto-downloaded when you run scripts)
├── 📂 models/ ← Saved model weights (.pth files)
└── 📂 outputs/ ← Training plots & metrics (generated during training)
This section provides complete step-by-step instructions for both Windows and macOS/Linux users.
Before you begin, make sure you have these installed:
| Software | Version | Download Link | Why You Need It |
|---|---|---|---|
| Python | 3.10 or higher | python.org/downloads | Runs the machine learning code |
| Git | Any recent version | git-scm.com/downloads | Downloads the project code |
| VS Code (recommended) | Any recent version | code.visualstudio.com | Best code editor for Python |
💡 Tip for Windows users: When installing Python, make sure to check ✅ "Add Python to PATH" during installation!
Open your terminal (or PowerShell on Windows) and run:
git clone https://github.com/EricKart/computerV.git
cd computerVWhat this does:
git clonedownloads all the project files from GitHub to your computercd computerVmoves you into the project folder
The setup script automatically creates a virtual environment and installs all required packages.
.\setup.ps1
⚠️ If you get an error about "execution policy", run this first:Set-ExecutionPolicy -Scope Process -ExecutionPolicy BypassThen try
.\setup.ps1again.
chmod +x setup.sh
./setup.shWhat the setup script does:
- ✅ Creates a
venv/folder (virtual environment) to isolate project dependencies - ✅ Upgrades
pip(Python's package installer) to the latest version - ✅ Installs all required libraries from
requirements.txt(PyTorch, NumPy, etc.) - ✅ Creates necessary folders:
data/,models/,outputs/,logs/
Before running any Python code, you must activate the virtual environment:
.\venv\Scripts\Activate.ps1You'll see (venv) appear at the beginning of your command line, like this:
(venv) PS C:\Users\YourName\computerV>
source venv/bin/activateYou'll see (venv) appear at the beginning of your command line, like this:
(venv) user@computer:~/computerV$
💡 Remember: You need to activate the virtual environment every time you open a new terminal window to work on this project!
Run these commands to make sure everything is set up correctly:
python --versionExpected output: Python 3.10.x or higher
python -c "import torch, torchvision, matplotlib, onnx; print('✅ Environment OK - All packages installed!')"Expected output: ✅ Environment OK - All packages installed!
Now you can run any of the training scripts. Start with the CNN module:
python src/01_cnn/cnn_image_classifier.pyUnderstanding why we use virtual environments is important for every Python developer.
Imagine you have two Python projects:
- Project A needs
numpy version 1.20 - Project B needs
numpy version 1.24
If you install both globally, they will conflict! Only one version can exist at a time.
A virtual environment is like a separate, isolated Python installation for each project.
Your Computer
├── Global Python (system-wide)
│
├── Project A/
│ └── venv/ ← Has its own numpy 1.20
│ └── numpy 1.20
│
└── Project B/
└── venv/ ← Has its own numpy 1.24
└── numpy 1.24
| Benefit | Explanation |
|---|---|
| 🔒 Isolation | Each project has its own packages — no conflicts |
| 📦 Reproducibility | Anyone can recreate the exact same environment |
| 🧹 Clean System | Your global Python stays clean and uncluttered |
| 🔄 Easy Reset | Delete venv/ folder and run setup again to start fresh |
| 👥 Team Collaboration | Everyone on the team uses the same package versions |
setup.sh/setup.ps1creates thevenv/folderrequirements.txtlists all packages and their versions- Activating (
source venv/bin/activateor.\venv\Scripts\Activate.ps1) tells your terminal to use this project's Python and packages - Deactivating (
deactivate) returns you to global Python
💡 Pro Tip: The
venv/folder is listed in.gitignoreso it's never uploaded to GitHub. Each person creates their ownvenv/using the setup script.
Each script in src/ is a complete, runnable program that:
- Downloads the CIFAR-10 dataset (10 categories of 32x32 images)
- Builds a neural network model (CNN, RNN, LSTM, or Combined)
- Trains the model for several epochs
- Evaluates accuracy on test data
- Saves the trained model to
models/ - Generates visualization plots in
outputs/
Make sure your virtual environment is activated (you see (venv) in your terminal), then:
# Module 1: CNN (Convolutional Neural Network)
python src/01_cnn/cnn_image_classifier.py
# Module 2: RNN (Recurrent Neural Network)
python src/02_rnn/rnn_sequence_model.py
# Module 3: LSTM (Long Short-Term Memory)
python src/03_lstm/lstm_model.py
# Module 4: Combined CNN + LSTM
python src/04_combined/cnn_rnn_lstm_combined.pyWhen you run a script, you'll see:
- Dataset download (first time only) — downloads ~170MB of images
- Training progress — shows loss and accuracy for each epoch
- Final results — test accuracy and best model saved
Example output:
============================================================
MODULE 1: CNN IMAGE CLASSIFIER
Dataset : CIFAR-10 | Model : CifarCNN
============================================================
Epoch [01/15] Train Loss: 1.4532 Acc: 47.23% │ Test Loss: 1.1234 Acc: 60.12% │ 23.5s
Epoch [02/15] Train Loss: 1.0123 Acc: 63.45% │ Test Loss: 0.9876 Acc: 65.78% │ 22.1s
...
🏆 Best Test Accuracy: 78.45%
After training, check these folders:
| Folder | Contents |
|---|---|
outputs/ |
Training curves, confusion matrices, sample predictions (PNG images) |
models/ |
Saved model weights (.pth files) and ONNX exports |
data/ |
Downloaded CIFAR-10 dataset |
Follow this order for the best learning experience:
graph TD
A["📖 1. Introduction to CV"] --> B["🔲 2. CNN – See Patterns"]
B --> C["🔄 3. RNN – Understand Sequences"]
C --> D["🧠 4. LSTM – Remember Long-Term"]
D --> E["🔗 5. CNN+RNN+LSTM Combined"]
E --> F["☁️ 6. Deploy to Azure"]
F --> G["📢 7. Publish on LinkedIn"]
style A fill:#E8F5E9,stroke:#4CAF50,color:#1B5E20
style B fill:#E3F2FD,stroke:#2196F3,color:#0D47A1
style C fill:#FFF3E0,stroke:#FF9800,color:#E65100
style D fill:#F3E5F5,stroke:#9C27B0,color:#4A148C
style E fill:#FCE4EC,stroke:#E91E63,color:#880E4F
style F fill:#E0F7FA,stroke:#00BCD4,color:#006064
style G fill:#FFF9C4,stroke:#FFEB3B,color:#F57F17
| Step | Document | Time | What You'll Learn |
|---|---|---|---|
| 1 | Introduction to CV | 15 min | Pixels, color spaces, image processing basics |
| 2 | CNN Deep Dive | 30 min | Convolutions, pooling, feature maps, architectures |
| 3 | RNN Deep Dive | 25 min | Sequential data, hidden states, backprop through time |
| 4 | LSTM Deep Dive | 25 min | Gates, cell state, vanishing gradient solution |
| 5 | Combined Architecture | 30 min | Feature extraction + temporal modeling |
| 6 | Azure Deployment | 45 min | Model hosting, REST API, Microsoft Foundry |
| 7 | LinkedIn Guide | 15 min | Project showcase, post templates |
Goal: Classify images from the CIFAR-10 dataset into 10 categories.
graph LR
A["Input Image\n32×32×3"] --> B["Conv Layer 1\n+ ReLU"]
B --> C["MaxPool"]
C --> D["Conv Layer 2\n+ ReLU"]
D --> E["MaxPool"]
E --> F["Flatten"]
F --> G["Fully Connected"]
G --> H["Output\n10 classes"]
style A fill:#4CAF50,stroke:#333,color:#fff
style B fill:#2196F3,stroke:#333,color:#fff
style C fill:#03A9F4,stroke:#333,color:#fff
style D fill:#2196F3,stroke:#333,color:#fff
style E fill:#03A9F4,stroke:#333,color:#fff
style F fill:#FF9800,stroke:#333,color:#fff
style G fill:#9C27B0,stroke:#333,color:#fff
style H fill:#F44336,stroke:#333,color:#fff
Key Concepts:
- Convolution — Slides a small filter across the image to detect edges, textures, shapes
- Pooling — Reduces spatial dimensions (downsampling) while retaining important features
- ReLU — Non-linear activation:
f(x) = max(0, x)
📂 Code: src/01_cnn/cnn_image_classifier.py
📖 Docs: docs/02_convolutional_neural_networks.md
python src/01_cnn/cnn_image_classifier.pyGoal: Process sequential image features to understand temporal/spatial patterns.
graph LR
A["x₁"] --> B["RNN Cell"]
B --> C["h₁"]
C --> D["RNN Cell"]
A2["x₂"] --> D
D --> E["h₂"]
E --> F["RNN Cell"]
A3["x₃"] --> F
F --> G["h₃"]
G --> H["Output"]
style B fill:#FF9800,stroke:#333,color:#fff
style D fill:#FF9800,stroke:#333,color:#fff
style F fill:#FF9800,stroke:#333,color:#fff
style H fill:#F44336,stroke:#333,color:#fff
Key Concepts:
- Hidden State — Memory that carries information from previous time steps
- Recurrence — Same weights applied at every time step
- Backpropagation Through Time (BPTT) — Training across sequences
📂 Code: src/02_rnn/rnn_sequence_model.py
📖 Docs: docs/03_recurrent_neural_networks.md
python src/02_rnn/rnn_sequence_model.pyGoal: Solve the vanishing gradient problem and model long-range dependencies.
graph TD
subgraph LSTM Cell
A["Forget Gate 🚪"] --> D["Cell State"]
B["Input Gate 🚪"] --> D
D --> E["Output Gate 🚪"]
E --> F["Hidden State"]
end
style A fill:#F44336,stroke:#333,color:#fff
style B fill:#4CAF50,stroke:#333,color:#fff
style D fill:#2196F3,stroke:#333,color:#fff
style E fill:#FF9800,stroke:#333,color:#fff
style F fill:#9C27B0,stroke:#333,color:#fff
Key Concepts:
- Forget Gate — Decides what information to discard from cell state
- Input Gate — Decides what new information to store
- Output Gate — Decides what to output based on cell state
- Cell State — Long-term memory highway
📂 Code: src/03_lstm/lstm_model.py
📖 Docs: docs/04_long_short_term_memory.md
python src/03_lstm/lstm_model.pyGoal: Build a video classification pipeline — CNN extracts spatial features per frame, LSTM captures temporal patterns across frames.
graph TD
subgraph "Per-Frame Feature Extraction (CNN)"
A["Frame 1 🖼️"] --> B["CNN"]
A2["Frame 2 🖼️"] --> B2["CNN"]
A3["Frame 3 🖼️"] --> B3["CNN"]
AN["Frame N 🖼️"] --> BN["CNN"]
end
B --> C["Feature Vector 1"]
B2 --> C2["Feature Vector 2"]
B3 --> C3["Feature Vector 3"]
BN --> CN["Feature Vector N"]
subgraph "Temporal Modeling (LSTM)"
C --> D["LSTM"]
C2 --> D
C3 --> D
CN --> D
D --> E["Final Hidden State"]
end
E --> F["🎯 Video Class Prediction"]
style B fill:#2196F3,stroke:#333,color:#fff
style B2 fill:#2196F3,stroke:#333,color:#fff
style B3 fill:#2196F3,stroke:#333,color:#fff
style BN fill:#2196F3,stroke:#333,color:#fff
style D fill:#FF9800,stroke:#333,color:#fff
style F fill:#F44336,stroke:#333,color:#fff
📂 Code: src/04_combined/cnn_rnn_lstm_combined.py
📖 Docs: docs/05_cnn_rnn_lstm_combined.md
python src/04_combined/cnn_rnn_lstm_combined.pyTake your trained model from local to production in the cloud.
graph LR
A["Trained Model\n(.pth)"] --> B["Export to ONNX"]
B --> C["Azure ML\nWorkspace"]
C --> D["Deploy as\nREST API"]
D --> E["Consume from\nAny App"]
style A fill:#4CAF50,stroke:#333,color:#fff
style B fill:#2196F3,stroke:#333,color:#fff
style C fill:#0078D4,stroke:#333,color:#fff
style D fill:#FF9800,stroke:#333,color:#fff
style E fill:#9C27B0,stroke:#333,color:#fff
📖 Full Guide: docs/06_azure_deployment_guide.md
Showcase your completed project to recruiters and the tech community.
📖 Full Guide: docs/07_linkedin_publishing_guide.md
| Problem | Cause | Solution |
|---|---|---|
python command not found |
Python not installed or not in PATH | Install Python 3.10+ and check "Add to PATH" during installation. Restart your terminal. |
ModuleNotFoundError: No module named 'torch' |
Virtual environment not activated | Run source venv/bin/activate (Mac/Linux) or .\venv\Scripts\Activate.ps1 (Windows) |
| PowerShell script execution blocked | Windows security policy | Run Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass first |
pip install fails with permission error |
Trying to install globally | Make sure venv is activated (you should see (venv) in your terminal) |
| Training is very slow | Running on CPU | This is normal without a GPU. Reduce EPOCHS in the script to speed up. |
| CUDA out of memory | GPU memory full | Reduce BATCH_SIZE in the script (e.g., 64 → 32 → 16) |
macOS: python3-venv not found |
venv module not installed | Install with brew install python3 or use the Python installer from python.org |
If something is broken and you want to start over:
# Delete the virtual environment
rm -rf venv # macOS/Linux
rmdir /s /q venv # Windows
# Run setup again
./setup.sh # macOS/Linux
.\setup.ps1 # WindowsQ: Do I need a GPU?
No. All scripts detect whether a GPU is available and fall back to CPU. Training will be slower on CPU but fully functional. The CIFAR-10 dataset is small enough for CPU training.Q: Which Python version should I use?
Python 3.10 or later. We recommend 3.11 for best compatibility.Q: The setup script fails — what do I do?
- Make sure
pythonis in your PATH:python --version - On Windows, you may need to allow script execution:
Set-ExecutionPolicy -Scope CurrentUser RemoteSigned - On Linux/Mac, ensure you have
python3-venv:sudo apt install python3-venv(Ubuntu/Debian)
Q: Can I use TensorFlow instead of PyTorch?
This project uses PyTorch throughout. The concepts are framework-agnostic — once you understand them here, translating to TensorFlow/Keras is straightforward.Q: What is the `venv/` folder and why is it so large?
The `venv/` folder contains your project's virtual environment with all installed packages (PyTorch, NumPy, etc.). It can be 2-5 GB depending on your system. This is normal! Each student creates their own `venv/` locally — it's never uploaded to GitHub.Q: How do I update to the latest code from the instructor?
# Pull the latest changes (--ff-only prevents merge commits)
git pull --ff-only
# Re-run setup to install any new dependencies
./setup.sh # macOS/Linux
.\setup.ps1 # Windows💡 If
git pullfails, you may have local changes. Rungit stashfirst to save your changes, thengit pull --ff-only, thengit stash popto restore your changes.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-addition - Commit your changes:
git commit -m 'Add amazing addition' - Push to the branch:
git push origin feature/amazing-addition - Open a Pull Request
This project is licensed under the MIT License — see the LICENSE file for details.
Built with ❤️ for the student community
Star ⭐ this repo if you find it helpful!