Skip to content

IbrahimExe/PalmStation

Repository files navigation

PalmStationLogo

A (definitely not PlayStation inspired) gesture controlled console environment where you can play classic games!

Overview

A small, Python & AI project that demonstrates gesture recognition using Google's MediaPipe Hands and OpenCV to open webcams!

*This is a student project built to further learn data manipulation with NumPy & Pandas, practice and understand how Artificial Intelligence actually works with neural networks (MLPs & CNNs), and how we can use Transfer Learning to edit models to better suit our final classes and needs.

The original model was trained in a Google Collab Notebook Trained Collab Model Attempt, however, this build uses Google's pre-built MediaPipe model, while also allowing you, the user, to input and save your very own dataset!*

Like the name suggests, this is supposed to be a console that allows you to play hopefully down the line a plethora of games. For now, however, you may try two playable demos: a gesture‑controlled Snake game and a gesture‑controlled Rock–Paper–Scissors game!

The optional small transfer‑learning pipeline (collect → train → use) mentioned above allows you to improve gesture accuracy by training a classifier on MediaPipe landmarks using data you can collect on your own!

Features

  • Real-time hand landmark detection using MediaPipe Hands
  • Gesture controlled Snake (index or thumb gestures) with smoothing and classifier support for better playing comfort
  • Gesture controlled Rock–Paper‑Scissors with stable-hold confirmation, confidence score, and result screens
  • Optional data collection and training pipeline (collect_data.py → train_classifier.py)
  • Easy to run in a Python virtual environment (Windows / macOS / Linux)
Place the gif/ video here

Installation

Prerequisites

  • Python 3.11 (recommended) — keep venv per project
  • Webcam for real-time detection
  • Windows users: install Visual C++ Redistributable if builds fail

Setup (create & activate venv then install)

# create venv (one-time)
py -3.11 -m venv .venv
# activate on Windows (PowerShell)
.\.venv\Scripts\Activate.ps1
# upgrade pip and install requirements
python -m pip install --upgrade pip
pip install -r requirements.txt

requirements.txt should include:

mediapipe==0.10.21
opencv-python
numpy
scikit-learn
pandas
joblib

If you have trouble installing mediapipe on your system, check Python version (we recommend 3.11) and platform wheel availability.

Usage

Command to Play Gesture Snake:

python snake_gesture.py

Command to Play Rock–Paper–Gesture:

python rock_paper_gesture.py

Collect your own Data:

Collect landmark data (for transfer learning)

python collect_data.py
# Keys: r=right, l=left, u=up, d=down, s=save, q=quit

Train Gesture Classifier

python train_classifier.py
# produces models/gesture_model.joblib and models/label_encoder.joblib

Running Snake (with classifier if available)

snake_gesture.py will automatically load models/gesture_model.joblib if present and use it; otherwise it falls back to the angle+thumb heuristics.

How it works

  • The system uses MediaPipe Hands to extract 21 hand landmarks per detected hand (x,y normalized coordinates). These landmarks are used directly for heuristic rules (finger extended / folded) or flattened into a feature vector for training a small classifier (RandomForest by default).
  • Gesture smoothing uses an exponential moving average (EMA) on angles and short voting buffers to avoid flicker and accidental flips, this was done to mainly allow players to use their thumbs to navigate smoother as well.
  • The RPS game implements a stable‑hold confirmation (player must hold the same pose for a short duration) and a visible result + countdown flow to prioritize the user experience.

Model information

  • Pretrained model: Google's MediaPipe Hand Landmarker (pretrained hand landmark model). The project uses MediaPipe’s prebuilt models for landmark extraction.
  • Custom classifier (optional): RandomForest pipeline trained on flattened MediaPipe landmark X/Y coordinates (42 features). Saved with joblib.

Performance

  • Accuracy: Original accuracy using a Kaggle Rock Paper Scissors dataset:
  • Custom data would obviously depends per user.
  • FPS / Latency: Depends on CPU and camera; MediaPipe Hands runs in real time on modern laptops (tens of FPS).

See the Training Models used on my Collab Notebook:

Results

3 Training Models Created:

  • 1 - Model 1 (4200811 trainable parameters) & 10 Epochs at a learning rate of 0.001.
  • 2 - Model 2 (2417547 trainable parameters) & 15 Epochs at a learning rate of 0.1.
  • 3 - Model 3 (4794739 trainable parameters) & 20 Epochs at a learning rate of 0.001
Experiment Train Batch Size Test Batch Size Parameters Num Conv Layers Padding Used Learning Rate Epochs Final Train Acc Final Val Acc
Model 1 64 16 4,200,811 2 0 0.001 10 98.40% 98.26%
Model 2 128 32 2,417,547 3 1 0.1 15 34.28% 34.28%
Model 3 64 16 4,794,739 3 1 0.01 20 98.97% 99.32%

Challenges & Solutions

  • Python version compatibility - Newer Python versions (3.12+) pose issues when using models, as they are usually built on older versions of Python.
  • Mediapipe / wheel compatibility: fixed by using Python 3.11 and the mediapipe 0.10.21 wheel on Windows.
  • Gesture jitter & misclassification: solved with EMA smoothing, majority voting, and an optional transfer‑learning classifier trained on user‑collected landmarks.
  • Thumb vs index pointing: resolved with a simple heuristic that compares normalized distances relative to palm size (thumb & index selection logic).
  • Datasets Used: Both datasets used were relatively small, this allowed training to be quick, however, at the cost of accuracy and high amounts of loss.
  • Particularly with the Rock Paper Scissors dataset, as the data had the palms on a greenscreen, but a webcam would usually not have just a palm and greenscreen being recorded.
  • This meant it was just better and more efficient to use the pre-trained MediaPipe model.

Future improvements

  • Add better visuals, menus, sound effects and a high-scores and other UI related stuff.
  • Add a small CNN or lightweight neural network trained on image crops (or landmark sequences) for even better robustness
  • Export a web demo (WebRTC) using TensorFlow.js + MediaPipe on the browser perhaps.

Troubleshooting

  • If mediapipe import has a yellow squiggle in VS Code but the script runs, reselect the .venv interpreter and reload the window. Ensure the terminal is activated with .\.venv\Scripts\Activate.ps1 on Windows.
  • If OpenCV camera doesn’t open, try different camera indices (cv2.VideoCapture(1)), close other apps using the camera (Zoom/Discord), and check Windows camera privacy settings.

Acknowledgments

Kaggle datasets used:


About

A gesture controlled environment where you can play classic games like Snake!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages