Modern local AI assistant for video understanding and document generation. Auralink pairs a sleek Next.js UI with a Rust (Tauri) app and Python micro-agents over gRPC for transcription, vision analysis, and content generation (PDF/PPT).
Note
Development has stopped indefinitely!
Warning
Auralink is currently in the early stages of development and is not yet ready for daily use!
- Chat Workflow: Ask questions about a video or request outcomes (e.g., “Create a PowerPoint”).
- Local Agents:
- Transcription (audio → text).
- Vision (objects, graphs/plots, caption).
- Generation (summary, PDF, PowerPoint).
- Fast UI: Smooth, bottom-up chronological chat with optimistic updates and typing animation.
- Local persistence: SQLite (via rusqlite) stores files and chat history locally.
Next.js (UI) ──invoke──▶ Tauri (Rust) ──gRPC──▶ Python agents
- Chat UI - Commands - transcription_server.py :50051
- Upload UI - SQLite DB - vision_server.py :50052
- File list - ffmpeg/thumbs - generation_server.py :50053
- UI:
src/(Next.js 14, React 18, Tailwind). Chat components insrc/components/chat/*. - Desktop shell:
src-tauri/(Rust, Tauri 2). Exposes commands with#[tauri::command]insrc-tauri/src/lib.rs. - Agents:
backend/mcp/*.py(Python, gRPC servers). Protos inproto/audio_service.proto. Python stubs generated tobackend/generated/.
- User submits a prompt in
ChatInput.tsx. - Frontend invokes
send_message(Tauri) →src-tauri/src/lib.rs. - Rust persists the user message, scores intent, and calls agents via gRPC as needed.
- Agent responses are post-processed into friendly text and saved as assistant messages.
- Frontend invalidates the messages query; UI displays messages in chronological order. Assistant responses appear immediately after the user’s prompt.
- Transcription:
backend/mcp/transcription_server.py(gRPC :50051) - Vision:
backend/mcp/vision_server.py(gRPC :50052) - Generation:
backend/mcp/generation_server.py(gRPC :50053)
Rust auto-generates Python gRPC stubs when the app starts (best effort) and launches agents. It also waits for ports to be ready to reduce initial transport errors.
- Users can upload or register video files and see them listed on dashboard.
- Users can chat about a selected file; messages are persisted locally.
- The system can:
- Transcribe audio from the video.
- Detect objects and identify graphs from a representative frame.
- Generate summaries of the chat/file.
- Produce a PDF and/or a PowerPoint.
- When generating a file (PDF/PPT), the assistant message includes a clickable
file://hyperlink with the full local path for quick access. - Access colab to see full training pipeline →
- UI: Next.js 14, React 18, Tailwind, React Query, React Markdown
- Desktop: Tauri 2 (Rust), rusqlite, tonic (gRPC client)
- Agents: Python 3, gRPC, Whisper, OpenVINO/ONNX Runtime, MoviePy, NumPy
- Media tooling: ffmpeg (required on host)
- Node.js 18+ and pnpm/npm
- Rust toolchain and Tauri prerequisites (see Tauri docs for your OS)
- Python 3.11+
npm installInstall Python deps for agents:
python3 -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements.txtThis starts Next.js in dev and runs the Tauri app which launches agents.
npm run tauri devAlternatively, you can run the web dev server only (without desktop shell):
npm run devNote: Local agents and Tauri commands are expected; web-only mode is limited.
-
Open the app and upload a video.
-
Open the chat for a file and ask a question or request an action.
-
For generation tasks (PDF/PPT), the assistant will reply with a link like:
PowerPoint generated. [Open file](file:///path/to/output.pptx)
Clicking the link opens the local file in your OS. The raw path is also shown for reference.
src/app/*– Next.js routes and pagessrc/components/chat/*– Chat UI componentssrc-tauri/src/lib.rs– Tauri commands, agent orchestration, thumbnails, DB accesssrc-tauri/src/db.rs– SQLite schema and queriessrc-tauri/src/grpc_client.rs– gRPC client calls to agentsbackend/mcp/*.py– Python agent serversproto/audio_service.proto– Protobuf definitions for services
- Initial chat responses are hardcoded to provide immediate feedback while Python agents load their models into memory to ensure a responsive user experience during the 10–30 second startup period.
- Model inference runs locally and may be slower on CPU-only systems. OpenVINO optimization is used where available to improve performance.
-
-
Model Limitations: The project uses pre-trained models that are not fine-tuned on MP4-specific content:
-
Transcription: OpenAI Whisper
basemodel for speech-to-text. -
Vision: DETR (object detection), BLIP (captioning), and TrOCR (OCR) for visual analysis.
-
Generation: FLAN-T5-small for text summarization.
-
If time permits, these models can be fine-tuned on domain-specific MP4 video content to improve accuracy and contextual understanding for video-based queries.
-
- Ensure
ffmpegis installed and accessible on PATH if thumbnails fail. - If agents don’t start, verify Python env and the packages in
backend/requirements.txtare installed; check console logs for[agent stdout]/[agent stderr]. - Port conflicts (50051–50053) will prevent connections; free or change ports as needed.
Proprietary. All rights reserved.
