Inspiration

The idea behind Take-A-Note was to build an AI-powered application that integrates natural language processing (NLP), speech-to-text conversion, and transformer models to create a seamless user experience. We wanted to leverage Flask as the backbone for a lightweight yet powerful API that supports various AI-driven tasks.

What it does

  1. User uploads video recordings
    • Create a transcript from the video file (Speech to text API)
      • Ensure the trascript is corrected for misspellings, or incomplete sentences due to poor audio.
    • From the transcript, create a detailed summary that includes. The summary in .txt and .docx format. The summary should include
      • Example problems (if possible)
      • Key definitions and their definitions (autocomplete transcript)
    • From Summary txt file:
      • create an audio file for it (TTS API),
      • a slide presentation (Slides API),
      • quiz or example problems (GPT API)

How we built it

We built this project using the following tech stack:

  • Flask – Web framework for API creation.
  • OpenAI API – For AI-generated responses.
  • Whisper – OpenAI’s speech-to-text model.
  • Transformers – NLP tasks using pre-trained models.
  • Torch – Deep learning computations.
  • dotenv – For managing environment variables securely.
  • pypandoc – Converts documents into different formats.
  • Logging & JSON – Handles structured data processing and logging.

Challenges we ran into

  • API Rate Limits: OpenAI’s API has rate restrictions that required optimization of requests.
  • Audio Processing Accuracy: Whisper’s transcription sometimes struggled with noisy input.
  • Optimizing Response Speed: Ensuring fast API responses while handling longer videos.
  • Limited Video processing: Limited file size to 25 MB for Whisper video files.

Accomplishments that we're proud of

  • Successfully integrated multiple AI models into a single Flask application.
  • Implemented secure API key management to prevent leaks.
  • Achieved fast and accurate speech-to-text transcription.
  • Built a scalable and modular codebase for future enhancements.

What we learned

  • How to efficiently use OpenAI’s API for natural language generation.
  • Best practices for handling sensitive environment variables in a Flask app.
  • Improving speech recognition accuracy by pre-processing audio files.
  • The importance of logging and debugging when dealing with AI models.

What's next for Take-A-Note

Fine Tuning and Cloud Deployment.

Built With

Share this project:

Updates