Inspiration

Progress in computer vision and image understanding opens new opportunities in entertainment. Simple web camera or phone can deliver fun and engaging experience.

What do we do

We use the video stream from user's camera, analyze it to detect the user, crop them out of their environment and put them in the music videos. We also transcript the song with the subtitles. All in real-time.

Technology

We use existing neural network models for semantic segmentation and pose estimation, and process video stream from user's camera on the server. We run the inference in docker container, which is deployed on amazon web services

Future

What's next for kARaoke: We will work on improving the precision of computer vision algorithms and post-processing to make the final video more realistic and of higher quality. We will also work on further NN models to be able to produce feedback and interaction from and with the characters on the screen.

presentation link

google drive

Share this project:

Updates