Inspiration
We were inspired by years of playing Just Dance on the Wii. It was a great way to combine fun and exercise, but we noticed two main issues. The Wii only scored you based on the position of your Wii remote, and there was no room for customization if you wanted to add your own songs and dances. To try and fix this, we decided to build our own AI dance coach that can teach you any dance in the world, and actually give you real advice that you can use to improve!
What it does
Dance CV allows the user to upload an .mp4 of any solo dance they want. It will then use AI to chunk the dance into easily learnable movements so the user can spend time perfecting each move. From then on, the user just needs to turn on their webcam and dance away! It will use computer vision to determine how closely the student's dance moves match the reference, and then it will provide a score along with specific feedback for each try. The voice dictation also allows the user to easily restart the chunk or go to the next one seamlessly. No need to walk back to your computer each time you want to redo the section, simply say "Restart the song!"
How we built it
- Frontend: React, Vite, TypeScript
- Styling: Tailwind CSS
- AI & ML:
- Google Gemini API (Video analysis & Voice control)
- MediaPipe Pose (Real-time pose detection)
- UI Components: Radix UI, Lucide React
Challenges we ran into
Although we are proud of what we were able to achieve, it wasn't smooth sailing all the way. Here are some challenges that we encountered while building Dance CV.
- We initially started coding the backend with OpenCV in Python, but we then realized that it might be slow and difficult to communicate the marked-up webcam feed frame by frame through an API to our frontend. To avoid this, we decided to migrate everything to run in Typescript. This was quite annoying, but a needed change that made it much easier to integrate the frontend and backend seamlessly
- Just when we thought we had a good scoring system going on, we realized that it wasn't just one of our laptops that was slow, but the webcam was moving at 1 frame per second on both of our machines. We spent some time digging into the code and realized that some of the React hooks we were using to communicate between the frontend and backend were causing a massive issue. We had certain hooks that would update the angles of the joints whenever the person would move, but the issue was that we were calling the hook every few frames, which would then trigger a re-render each time and slow our app to a halt. The solution we came up with was to switch these hooks to useRef variables, that way we could constantly update their values without causing a re-render each time. This massively sped up our app, and we no longer experienced the awful lag we were having before.
- We noticed that the user could sometimes score highly just by hiding certain joints from the camera view. After diving through our algorithm, we realized that the issue actually occurred early on, in the pose estimation code.
- The voice dictation was another thing that was really difficult to implement. The documentation for Gemini Voice was not that great, so we had to look into the actual source code to find the function signatures in order to get things to work.
Accomplishments that we're proud of
- We are proud of the overall design of the app. We think it is very intuitive to use, and the voice controls make it easy to keep using it while you are away from your computer, learning how to dance.
- We are proud of the chunking feature. We think it was a very innovative use of the Gemini API, and it works quite well too! From our personal experiences, it feels a lot easier to learn dances move by move, so we think it was an important feature that really improves the user experience.
- We are proud of our scoring algorithm. The algorithm compares your body’s joint angles from the camera to the tutorial video a few times per second, automatically aligning them to the closest matching video frame to account for timing delays. It measures how far each joint is off on average over the whole dance and converts that error into a percentage score, with separate scores for arms and legs. The algorithm also includes a large penalty for having missing joints, so the user cannot get a high score by simply hiding from the camera.
What we learned
It was both of our first times integrating a real-time computer vision component into a full-stack app, so we learned a lot from that. Specifically, we learned how to efficiently markup the video feed without causing major lag, and how to create a fair scoring system that cannot be exploited by simply hiding certain joints from the camera.
- We also learned how to work highly efficiently as a duo, splitting up tasks in a way to maximize parallel work output and reduce the amount of merge conflicts we would have
What's next for Dance CV
Some features we really wanted to have but didn't have time to implement include the following:
- The ability to merge chunks together so the user can customize the sections they want to work on
- The ability to continuously loop a chunk for endless practice
- The ability to do multi-person dances
- A login system so that users can save their favourite dances and track their progress
Built With
- gemini
- mediapipe
- opencv
- react
- tailwind
- typescript
- vite

Log in or sign up for Devpost to join the conversation.