An interactive, deep learning-based application for dynamic gesture recognition. LSTM Gesture Learner uses Long Short-Term Memory (LSTM) neural networks to recognize continuous hand gestures. The standout feature is its ability to learn user-taught gestures on the fly, translating them into actions or Text-to-Speech (TTS) feedback.
- Dynamic Gesture Recognition: Captures spatial and temporal sequences for fluid, real-time gesture tracking.
- Custom Gesture Learning: Teach the model custom gestures tailored to your specific needs.
- MediaPipe Integration: Fast, precise, and robust hand-landmark detection.
- LSTM Neural Networks: Employs an LSTM-based architecture specialized for recognizing temporal sequences.
- Text-to-Speech (TTS) Feedback: Real-time auditory translation of recognized gestures (ideal for sign language).
- Interactive UI: A friendly visual interface built for seamless data collection, training, and testing.
lstm-gesture-learner/
├── data/ # Stores raw coordinate datasets and user-collected landmark sequences
├── models/ # Contains the saved, trained LSTM model weights (e.g., .h5 files)
├── src/ # Core scripts (data extraction, model architecture, UI logic)
├── app.py # Main application GUI / Interface entry point
├── main.py # Script for backend testing, CLI data collection, or training
├── requirements.txt # Python dependencies required to run the project
├── .gitignore # Ignored files
└── LICENSE # MIT License
-
Clone the repository:
git clone [https://github.com/dhanush-alla/lstm-gesture-learner.git](https://github.com/dhanush-alla/lstm-gesture-learner.git) cd lstm-gesture-learner -
Create a virtual environment (Recommended):
python -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate
-
Install the dependencies:
pip install -r requirements.txt
To launch the user interface and start using or teaching the model:
python app.py(Alternatively, run python main.py if operating via a CLI pipeline).
- Select the Record/Add Gesture option in the app.
- Enter a label for your new gesture.
- Perform the gesture smoothly in front of the camera. The system will record the sequential landmarks.
- Trigger the Train function to update the LSTM weights with the newly collected data.
- Switch to Prediction/Recognition mode.
- Perform a trained gesture. The application will track your hand, predict the sequence using the LSTM model, and output the result visually and audibly via Text-to-Speech.
- Feature Extraction: OpenCV accesses the webcam, and MediaPipe isolates the 21 3D landmarks of the human hand frame-by-frame.
- Sequence Padding & Formatting: The extracted landmarks are flattened and stored as a sequence of frames representing the motion over time.
- LSTM Inference: The time-series data is fed into the LSTM Deep Learning model, which is highly capable of connecting the context between the previous and current frames to predict the final action.
OpenCV(Computer Vision)MediaPipe(Hand tracking/landmarks)TensorFlow/Keras(LSTM model building and training)pyttsx3/gTTS(Text-to-speech synthesis)NumPy&Pandas(Data manipulation)
This project is licensed under the MIT License.
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
Built by Dhanush Alla


