Inspiration
Progress in computer vision and image understanding opens new opportunities in entertainment. Simple web camera or phone can deliver fun and engaging experience.
What do we do
We use the video stream from user's camera, analyze it to detect the user, crop them out of their environment and put them in the music videos. We also transcript the song with the subtitles. All in real-time.
Technology
We use existing neural network models for semantic segmentation and pose estimation, and process video stream from user's camera on the server. We run the inference in docker container, which is deployed on amazon web services
Future
What's next for kARaoke: We will work on improving the precision of computer vision algorithms and post-processing to make the final video more realistic and of higher quality. We will also work on further NN models to be able to produce feedback and interaction from and with the characters on the screen.
Log in or sign up for Devpost to join the conversation.