Inspiration
The idea behind Take-A-Note was to build an AI-powered application that integrates natural language processing (NLP), speech-to-text conversion, and transformer models to create a seamless user experience. We wanted to leverage Flask as the backbone for a lightweight yet powerful API that supports various AI-driven tasks.
What it does
- User uploads video recordings
- Create a transcript from the video file (Speech to text API)
- Ensure the trascript is corrected for misspellings, or incomplete sentences due to poor audio.
- From the transcript, create a detailed summary that includes. The summary in .txt and .docx format. The summary should include
- Example problems (if possible)
- Key definitions and their definitions (autocomplete transcript)
- From Summary txt file:
- create an audio file for it (TTS API),
- a slide presentation (Slides API),
- quiz or example problems (GPT API)
- Create a transcript from the video file (Speech to text API)
How we built it
We built this project using the following tech stack:
- Flask – Web framework for API creation.
- OpenAI API – For AI-generated responses.
- Whisper – OpenAI’s speech-to-text model.
- Transformers – NLP tasks using pre-trained models.
- Torch – Deep learning computations.
- dotenv – For managing environment variables securely.
- pypandoc – Converts documents into different formats.
- Logging & JSON – Handles structured data processing and logging.
Challenges we ran into
- API Rate Limits: OpenAI’s API has rate restrictions that required optimization of requests.
- Audio Processing Accuracy: Whisper’s transcription sometimes struggled with noisy input.
- Optimizing Response Speed: Ensuring fast API responses while handling longer videos.
- Limited Video processing: Limited file size to 25 MB for Whisper video files.
Accomplishments that we're proud of
- Successfully integrated multiple AI models into a single Flask application.
- Implemented secure API key management to prevent leaks.
- Achieved fast and accurate speech-to-text transcription.
- Built a scalable and modular codebase for future enhancements.
What we learned
- How to efficiently use OpenAI’s API for natural language generation.
- Best practices for handling sensitive environment variables in a Flask app.
- Improving speech recognition accuracy by pre-processing audio files.
- The importance of logging and debugging when dealing with AI models.
What's next for Take-A-Note
Fine Tuning and Cloud Deployment.
Log in or sign up for Devpost to join the conversation.