Made by Brian Hui, Eric Kwon and Abbe Azale
An intelligent meeting assistant that automatically detects calls, transcribes conversations in real-time, and generates AI-powered summaries, meeting minutes, and action items.
FocusNote is a desktop application that monitors your computer for active calls on Discord, Zoom, or Microsoft Teams. When a call is detected, it automatically:
- Records system audio and microphone input
- Transcribes speech in real-time using Whisper
- Generates meeting summaries, formal minutes, and action items using Gemini AI
- Saves all outputs organized by date and time
- Automatic Call Detection: Monitors Discord, Zoom, and Teams for active calls
- Real-time Transcription: Uses Whisper AI for accurate speech-to-text
- Smart Audio Capture: Records both system audio and microphone on macOS and Windows
- AI-Powered Analysis:
- Concise meeting summaries
- Formal meeting minutes
- Actionable items with context
- User-Friendly UI: Clean PyQt6 interface with live status updates
- Organized Output: All transcripts and AI outputs saved with timestamps
FocusNote consists of three main components:
- Desktop App (
DesktopApp/): PyQt6 GUI application that handles call detection and audio recording - Transcription Server (
DesktopApp/src/transcription/): Whisper-based real-time speech-to-text service - Meeting Microservice (
MeetingAssistant/): Gemini AI service for generating summaries and action items
- Python: 3.11 or higher
- Operating System: macOS or Windows
- ffmpeg: Required for macOS audio capture
# macOS brew install ffmpeg # Windows # Download from https://ffmpeg.org/download.html
- Gemini API Key: Required for AI features
- Get one at Google AI Studio
We provide startup scripts that automatically install/update dependencies and launch all three components in separate terminal windows.
cd DesktopApp
bash scripts/start-all.shOr make it executable first:
chmod +x scripts/start-all.sh
./scripts/start-all.shcd DesktopApp
scripts\start-all.batThe script will:
- Check and install/update all Python dependencies
- Open three terminal windows:
- Transcription Server - Whisper AI (port 17483)
- Meeting Microservice - Gemini AI (port 8888)
- Desktop App - FocusNote UI
Note: The first run may take a few moments to install dependencies. Subsequent runs will be faster as pip only updates changed packages.
git clone [email protected]:kyulyeon/focusnote.git
cd focusnoteCreate a .env file in the MeetingAssistant directory:
GEMINI_API_KEY=your_api_key_here
PORT=8888Important: Never commit your .env file or API key to version control!
The startup script automatically installs dependencies, but if you prefer to install them manually:
Desktop App:
cd DesktopApp
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtMeeting Microservice:
cd MeetingAssistant
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtIf you prefer to start components individually:
Terminal 1 - Transcription Server:
cd DesktopApp/src/transcription
python server.pyTerminal 2 - Meeting Microservice:
cd MeetingAssistant
python meeting_microservice.pyTerminal 3 - Desktop App:
cd DesktopApp/src
python main.py- Start the application using one of the methods above
- Join a call on Discord, Zoom, or Teams
- FocusNote automatically detects the call and starts recording
- View live transcription in the console
- When the call ends, FocusNote automatically:
- Stops recording
- Sends transcript to AI service
- Generates summary, minutes, and action items
- Saves all outputs to
DesktopApp/meeting_output/
All meeting data is saved in DesktopApp/meeting_output/ organized by timestamp:
DesktopApp/meeting_output/
├── 2025-11-09T14:30:45/
│ ├── meeting_summary.txt # AI-generated summary
│ ├── action_items.txt # Extracted action items
│ └── meeting_minutes.txt # Formal meeting minutes
└── meeting_recordings/
└── meeting_discord_20251109_143045.wav
focusnote/
├── DesktopApp/ # Main desktop application
│ ├── src/
│ │ ├── main.py # Application entry point
│ │ ├── ui/ # PyQt6 user interface
│ │ ├── audio/ # Audio capture logic
│ │ ├── detection/ # Call detection (Discord, Zoom, Teams)
│ │ ├── transcription/ # Whisper transcription server
│ │ │ ├── server.py # Transcription WebSocket server
│ │ │ └── websocket_client.py # Client for real-time transcription
│ │ └── api/ # Microservice communication
│ ├── scripts/
│ │ ├── start-all.sh # macOS/Linux startup script
│ │ └── start-all.bat # Windows startup script
│ ├── meeting_output/ # AI-generated outputs (created automatically)
│ ├── meeting_recordings/ # Audio recordings (created automatically)
│ ├── requirements.txt # Python dependencies
│ └── README.md
│
├── MeetingAssistant/ # AI microservice
│ ├── meeting_microservice.py # FastAPI service
│ ├── test_service.py # Test script
│ ├── requirements.txt # Python dependencies
│ └── README.md
│
└── README.md # This file
The Meeting Microservice exposes the following endpoints:
POST http://localhost:8888/summaryPOST http://localhost:8888/minutesPOST http://localhost:8888/action-itemsGET http://localhost:8888/healthRequest format:
{
"transcript": "Meeting transcript text...",
"meeting_title": "Optional title",
"meeting_date": "Optional date",
"participants": ["Optional", "list"]
}Desktop App:
cd DesktopApp
pip install -r requirements-dev.txt
pytestMeeting Microservice:
cd MeetingAssistant
python test_service.pycd DesktopApp
python src/detection/detect_test.py --test- Ensure
.envfile exists inMeetingAssistant/ - Verify your API key is correct
- Restart the microservice after creating/updating
.env
- Port 8888 or 17483 is being used
- Change
PORTin.envor kill the conflicting process
- This has been fixed in the latest version
- Sample rates are now properly matched (48kHz)
- Mono mic audio is converted to stereo for mixing
- Ensure Discord/Zoom/Teams is actually in a call
- Check CPU usage is above the threshold (actively transmitting audio)
- Wait for 3 consecutive detections (3 seconds)
- Install ffmpeg:
brew install ffmpeg - Ensure microphone permissions are granted in System Preferences
- Verify all three components are running
- Check the transcription server is on port 17483
- Check the meeting microservice is on port 8888
- Verify internet connection for Gemini API
- Uses ffmpeg for system audio capture
- Requires microphone permissions
- Audio is captured at 48kHz stereo
- Uses PyAudioWPatch for loopback audio
- May require running with administrator privileges
- Supports WASAPI loopback
- All processing happens locally except AI generation
- Audio recordings stay on your machine
- Only transcripts are sent to Gemini API
- API keys are stored in
.envfiles (git-ignored) - No data is collected or transmitted to third parties
- Python 3.11+
- PyQt6
- PyAudio (Windows: PyAudioWPatch)
- Whisper (pywhispercpp)
- FastAPI
- Google Generative AI SDK
- ffmpeg (macOS)