FocusNote

Made by Brian Hui, Eric Kwon and Abbe Azale

An intelligent meeting assistant that automatically detects calls, transcribes conversations in real-time, and generates AI-powered summaries, meeting minutes, and action items.

Overview

FocusNote is a desktop application that monitors your computer for active calls on Discord, Zoom, or Microsoft Teams. When a call is detected, it automatically:

Records system audio and microphone input
Transcribes speech in real-time using Whisper
Generates meeting summaries, formal minutes, and action items using Gemini AI
Saves all outputs organized by date and time

Features

Automatic Call Detection: Monitors Discord, Zoom, and Teams for active calls
Real-time Transcription: Uses Whisper AI for accurate speech-to-text
Smart Audio Capture: Records both system audio and microphone on macOS and Windows
AI-Powered Analysis:
- Concise meeting summaries
- Formal meeting minutes
- Actionable items with context
User-Friendly UI: Clean PyQt6 interface with live status updates
Organized Output: All transcripts and AI outputs saved with timestamps

Architecture

FocusNote consists of three main components:

Desktop App (DesktopApp/): PyQt6 GUI application that handles call detection and audio recording
Transcription Server (DesktopApp/src/transcription/): Whisper-based real-time speech-to-text service
Meeting Microservice (MeetingAssistant/): Gemini AI service for generating summaries and action items

Prerequisites

Python: 3.11 or higher
Operating System: macOS or Windows

ffmpeg: Required for macOS audio capture

# macOS
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Gemini API Key: Required for AI features
- Get one at Google AI Studio

Installation

Running FocusNote

Quick Start (Recommended)

We provide startup scripts that automatically install/update dependencies and launch all three components in separate terminal windows.

macOS/Linux:

cd DesktopApp
bash scripts/start-all.sh

Or make it executable first:

chmod +x scripts/start-all.sh
./scripts/start-all.sh

Windows:

cd DesktopApp
scripts\start-all.bat

The script will:

Check and install/update all Python dependencies
Open three terminal windows:
- Transcription Server - Whisper AI (port 17483)
- Meeting Microservice - Gemini AI (port 8888)
- Desktop App - FocusNote UI

Note: The first run may take a few moments to install dependencies. Subsequent runs will be faster as pip only updates changed packages.

1. Clone the Repository

git clone [email protected]:kyulyeon/focusnote.git
cd focusnote

2. Configure API Key

Create a .env file in the MeetingAssistant directory:

GEMINI_API_KEY=your_api_key_here
PORT=8888

Important: Never commit your .env file or API key to version control!

3. (Optional) Manual Dependency Installation

The startup script automatically installs dependencies, but if you prefer to install them manually:

Desktop App:

cd DesktopApp
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

Meeting Microservice:

cd MeetingAssistant
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

Manual Start (Alternative)

If you prefer to start components individually:

Terminal 1 - Transcription Server:

cd DesktopApp/src/transcription
python server.py

Terminal 2 - Meeting Microservice:

cd MeetingAssistant
python meeting_microservice.py

Terminal 3 - Desktop App:

cd DesktopApp/src
python main.py

Usage

Start the application using one of the methods above
Join a call on Discord, Zoom, or Teams
FocusNote automatically detects the call and starts recording
View live transcription in the console
When the call ends, FocusNote automatically:
- Stops recording
- Sends transcript to AI service
- Generates summary, minutes, and action items
- Saves all outputs to DesktopApp/meeting_output/

Output Files

All meeting data is saved in DesktopApp/meeting_output/ organized by timestamp:

DesktopApp/meeting_output/
├── 2025-11-09T14:30:45/
│   ├── meeting_summary.txt      # AI-generated summary
│   ├── action_items.txt         # Extracted action items
│   └── meeting_minutes.txt      # Formal meeting minutes
└── meeting_recordings/
    └── meeting_discord_20251109_143045.wav

Project Structure

focusnote/
├── DesktopApp/                     # Main desktop application
│   ├── src/
│   │   ├── main.py                 # Application entry point
│   │   ├── ui/                     # PyQt6 user interface
│   │   ├── audio/                  # Audio capture logic
│   │   ├── detection/              # Call detection (Discord, Zoom, Teams)
│   │   ├── transcription/          # Whisper transcription server
│   │   │   ├── server.py           # Transcription WebSocket server
│   │   │   └── websocket_client.py # Client for real-time transcription
│   │   └── api/                    # Microservice communication
│   ├── scripts/
│   │   ├── start-all.sh            # macOS/Linux startup script
│   │   └── start-all.bat           # Windows startup script
│   ├── meeting_output/             # AI-generated outputs (created automatically)
│   ├── meeting_recordings/         # Audio recordings (created automatically)
│   ├── requirements.txt            # Python dependencies
│   └── README.md
│
├── MeetingAssistant/               # AI microservice
│   ├── meeting_microservice.py     # FastAPI service
│   ├── test_service.py             # Test script
│   ├── requirements.txt            # Python dependencies
│   └── README.md
│
└── README.md                       # This file

API Endpoints

The Meeting Microservice exposes the following endpoints:

Generate Summary

POST http://localhost:8888/summary

Generate Minutes

POST http://localhost:8888/minutes

Extract Action Items

POST http://localhost:8888/action-items

Health Check

GET http://localhost:8888/health

Request format:

{
  "transcript": "Meeting transcript text...",
  "meeting_title": "Optional title",
  "meeting_date": "Optional date",
  "participants": ["Optional", "list"]
}

Development

Running Tests

Desktop App:

cd DesktopApp
pip install -r requirements-dev.txt
pytest

Meeting Microservice:

cd MeetingAssistant
python test_service.py

Testing Audio Only

cd DesktopApp
python src/detection/detect_test.py --test

Troubleshooting

"GEMINI_API_KEY not configured"

Ensure .env file exists in MeetingAssistant/
Verify your API key is correct
Restart the microservice after creating/updating .env

"Address already in use"

Port 8888 or 17483 is being used
Change PORT in .env or kill the conflicting process

Audio sounds fast/high-pitched

This has been fixed in the latest version
Sample rates are now properly matched (48kHz)
Mono mic audio is converted to stereo for mixing

Call not detected

Ensure Discord/Zoom/Teams is actually in a call
Check CPU usage is above the threshold (actively transmitting audio)
Wait for 3 consecutive detections (3 seconds)

No system audio on macOS

Install ffmpeg: brew install ffmpeg
Ensure microphone permissions are granted in System Preferences

Connection errors

Verify all three components are running
Check the transcription server is on port 17483
Check the meeting microservice is on port 8888
Verify internet connection for Gemini API

Platform-Specific Notes

macOS

Uses ffmpeg for system audio capture
Requires microphone permissions
Audio is captured at 48kHz stereo

Windows

Uses PyAudioWPatch for loopback audio
May require running with administrator privileges
Supports WASAPI loopback

Security & Privacy

All processing happens locally except AI generation
Audio recordings stay on your machine
Only transcripts are sent to Gemini API
API keys are stored in .env files (git-ignored)
No data is collected or transmitted to third parties

Requirements

Python 3.11+
PyQt6
PyAudio (Windows: PyAudioWPatch)
Whisper (pywhispercpp)
FastAPI
Google Generative AI SDK
ffmpeg (macOS)

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
DesktopApp		DesktopApp
MeetingAssistant		MeetingAssistant
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

FocusNote

Overview

Features

Architecture

Prerequisites

Installation

Running FocusNote

Quick Start (Recommended)

macOS/Linux:

Windows:

1. Clone the Repository

2. Configure API Key

3. (Optional) Manual Dependency Installation

Manual Start (Alternative)

Usage

Output Files

Project Structure

API Endpoints

Generate Summary

Generate Minutes

Extract Action Items

Health Check

Development

Running Tests

Testing Audio Only

Troubleshooting

"GEMINI_API_KEY not configured"

"Address already in use"

Audio sounds fast/high-pitched

Call not detected

No system audio on macOS

Connection errors

Platform-Specific Notes

macOS

Windows

Security & Privacy

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages