Take-A-Note

Inspiration

The idea behind Take-A-Note was to build an AI-powered application that integrates natural language processing (NLP), speech-to-text conversion, and transformer models to create a seamless user experience. We wanted to leverage Flask as the backbone for a lightweight yet powerful API that supports various AI-driven tasks.

What it does

User uploads video recordings
- Create a transcript from the video file (Speech to text API)
  - Ensure the trascript is corrected for misspellings, or incomplete sentences due to poor audio.
- From the transcript, create a detailed summary that includes. The summary in .txt and .docx format. The summary should include
  - Example problems (if possible)
  - Key definitions and their definitions (autocomplete transcript)
- From Summary txt file:
  - create an audio file for it (TTS API),
  - a slide presentation (Slides API),
  - quiz or example problems (GPT API)

How we built it

We built this project using the following tech stack:

Flask – Web framework for API creation.
OpenAI API – For AI-generated responses.
Whisper – OpenAI’s speech-to-text model.
Transformers – NLP tasks using pre-trained models.
Torch – Deep learning computations.
dotenv – For managing environment variables securely.
pypandoc – Converts documents into different formats.
Logging & JSON – Handles structured data processing and logging.

Challenges we ran into

API Rate Limits: OpenAI’s API has rate restrictions that required optimization of requests.
Audio Processing Accuracy: Whisper’s transcription sometimes struggled with noisy input.
Optimizing Response Speed: Ensuring fast API responses while handling longer videos.
Limited Video processing: Limited file size to 25 MB for Whisper video files.