BitVision
BitVision is a Python-based computer vision app that allows users to record actions (3D poses) on video using Mediapipe and map them to keyboard inputs. Training data is associated with actions using random forest classification. During live recording, recorded video frames are processed against the random forest model, generating a set of key presses to perform according to the action being performed.
Project Setup
- clone the github repo
- Setup virtual environment and install dependencies according to
INSTALLATIONS.md. - Install Mediapipe. Follow instructions at https://developers.google.com/mediapipe/solutions/setup_python.
Video Recording and Data Generation
- In DataGenerator.py, set the data file output for the specific action to be recorded on line 70.
- In terminal, navigate to the
/predictdirectory and runpython DataGenerator.py {action_class}. This will start the webcam and begin recording mediapipe data for the specified action. - The output data .csv file will be stored in the
/train/training_datadirectory.
Random Forest Model Training
- Go to the
/traindirectory. - Run
python ModelGenerator.py {model_name}. This will append the action (training data file name) as the class type for the associated data and then concatenate all training data into one file. - The trained model will be saved to
/modelsas a pickled model,{model_name}.pkl.
Live Video Capture and Keyboard Input Generation
- Go to the project root directory.
- Run
python main.py. This will start the webcam video capture and begin generating key inputs according to the trained model and controller inputs specified in theControllermodule'scontrol_scheme.
Built With
- computer-vision
- emulator
- machine-learning
- mediapipe
- opencv
- python
- scikit-learn
Log in or sign up for Devpost to join the conversation.