Skip to content

sanjayjr8/ai-meeting-summarizer

Repository files navigation

App Logo

AI Meeting Summarizer

An intelligent web application that transcribes meeting audio and generates structured, action-oriented summaries, complete with interactive AI chat capabilities.

View Live Demo »

App Demo GIF

This project was engineered to exceed the requirements of the hiring process for Unthinkable Solutions. It transforms raw meeting audio into a clear, actionable, and searchable knowledge base, demonstrating a deep understanding of full-stack AI application development.


✨ Core Features

This isn't just a summarizer; it's a complete meeting intelligence platform.

Feature Description Benefit
High-Accuracy Transcription Industry-leading speech-to-text powered by OpenAI's Whisper. Eliminates manual note-taking and creates a perfect text record.
Multi-Model Quality Control Users can choose from multiple Whisper models (tiny, base, small, medium) to balance transcription speed against accuracy. Provides flexibility for quick checks or mission-critical record-keeping.
AI-Powered Summary Generates a structured output with a Summary, Key Decisions, and Action Items using a highly-detailed prompt for Google Gemini. Instantly understand meeting outcomes and required actions.
Advanced AI Interrogation Ask custom questions (custom prompting) about a single meeting, or perform cross-analysis by chatting with the entire meeting history. Unlocks deeper insights and turns your archive into a searchable knowledge base.
Persistent History Every summary and transcript is automatically saved to a local SQLite database, creating a permanent and reviewable meeting log. Never lose track of past decisions or action items.
Professional UI A clean, intuitive, and responsive interface built with Streamlit, featuring tabs and organized layouts for a seamless user experience. Easy to use for both technical and non-technical users.

🏗️ Architecture & Workflow

The application follows a logical, robust data processing pipeline designed for efficiency and clarity.

Application Architecture Diagram
  1. Upload: The user uploads an audio file via the Streamlit frontend.
  2. Transcribe: The audio is processed by the selected Whisper model to generate an accurate text transcript.
  3. Analyze: The transcript is sent to Google Gemini with a sophisticated, multi-part prompt that commands it to perform a detailed analysis.
  4. Store: The resulting summary and the full transcript are saved to the SQLite database with a timestamp.
  5. Display & Interact: The structured summary is presented in a clean, tabbed interface, and the user can now ask custom questions about the meeting.

💡 Technology Choices & Rationale

Every component was chosen to meet professional standards for quality, efficiency, and scalability. This directly addresses the Technical Expectations of the project.

Component Technology Why It Was Chosen
Frontend Framework Streamlit For rapid development of a beautiful, interactive data science application with pure Python, enabling a focus on core logic over complex web development.
ASR Engine OpenAI Whisper Selected for its state-of-the-art transcription accuracy across a wide range of accents and audio qualities, directly addressing the accuracy requirement.
LLM Engine Google Gemini Chosen for its advanced reasoning capabilities and reliable, structured JSON output, which is essential for a stable application backend.
Data Persistence SQLite Provides a zero-configuration, serverless, and robust SQL database. It perfectly fulfills the requirement for a backend that can store and process data.

✅ Fulfilling the Evaluation Focus

This project was built from the ground up to excel in the specific areas of evaluation.

1. Transcription Accuracy & Summary Quality

  • Solution: By integrating Whisper, a best-in-class ASR model, and Gemini, a top-tier LLM, the core output of the application is of the highest possible quality. The user-selectable model size for Whisper (tiny to medium) further demonstrates an understanding of the critical engineering trade-off between performance and precision.

2. LLM Prompt Effectiveness

  • Solution: The project's "secret sauce" is its highly detailed, multi-part system prompt. It goes far beyond a simple request, acting as a comprehensive set of rules for the AI. This ensures consistent, high-quality, and structured output.

    Click to view the prompt's core instructions
    
    You are an expert meeting summarizer and analyst with advanced skills in understanding structured and unstructured spoken-language transcripts. Your role is to transform the following meeting transcript into a clear, insight-rich summary.
    
    ---
    ### PRIMARY ANALYTICAL INSTRUCTIONS
    Your goal is to extract maximum meaning, clarity, and actionable insight from the transcript. You will analyze the content based on the following rules before formulating your final response.
    
    **1. Understanding and Precision:**
    - Read through the entire transcript before summarizing to grasp full context.
    - Identify speaker roles and topic shifts.
    - Merge fragmented or interrupted speech into coherent ideas.
    
    **2. Summarization Style:**
    - Be concise, objective, and factual. Paraphrase in a professional tone.
    - Limit the main summary to 6–10 sentences but ensure completeness.
    
    **3. Decisions Extraction:**
    - Include only confirmed decisions, not suggestions.
    - Start each decision with a strong action verb (e.g., Approved, Finalized, Agreed).
    - If a decision is conditional, mark it as *Pending confirmation*.
    
    **4. Action Items Extraction:**
    - Each action item must include an Owner, a Task, and a Deadline.
    - If the transcript lacks an owner or deadline, infer logically from context and mark any missing info as **(TBD)**.
    - Action items must start with a verb (e.g., Prepare, Review, Submit).
    
    **5. Handling Edge Cases:**
    - Ignore conversational fillers ("uh", "you know") and off-topic digressions.
    - Merge discussions on the same topic that occur at different times into one cohesive point.
    - If critical information is missing due to poor audio, mention "Information incomplete in transcript."
    
    ---
    ### CRITICAL OUTPUT REQUIREMENT
    
    After performing the detailed analysis above, your entire output **MUST BE A SINGLE, VALID JSON OBJECT** and nothing else.
    
    - Do **NOT** use Markdown headings (like '### Summary').
    - Do **NOT** add any introductory text, explanations, or closing remarks.
    - The response must start with `{` and end with `}`.
    
    The JSON object must have the following exact structure:
    {
        "summary": "A concise but comprehensive paragraph overview of the entire meeting.",
        "key_decisions": [
            "Decision 1...",
            "Decision 2..."
        ],
        "action_items": [
            {
                "owner": "Person or Team responsible",
                "task": "The specific action to be taken.",
                "deadline": "The deadline for the task, or (TBD)."
            }
        ]
    }
    ---
    
    Now, analyze the following transcript and provide the JSON output:
    
    

3. Code Structure

  • Solution: The codebase is logically partitioned into app.py for the user interface and logic.py for all backend processing (database, AI calls). This separation of concerns is a professional best practice that makes the code clean, scalable, and easy to maintain.
Code Structure Diagram

🚀 Getting Started Locally

  1. Clone the repository:

    git clone [https://github.com/sanjayjr8/ai-meeting-summarizer.git](https://github.com/sanjayjr8/ai-meeting-summarizer.git)
    cd ai-meeting-summarizer
  2. Create and activate a virtual environment:

    python -m venv venv
    .\venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set up your API key:

    • Create a folder: .streamlit
    • Inside it, create a file: secrets.toml
    • Add your key: GEMINI_API_KEY = "YOUR_KEY_HERE"
  5. Run the app:

    streamlit run app.py

About

Intelligent Meeting Summarizer: Transforms audio into structured summaries, key decisions, and action items with interactive AI chat capabilities.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages