GitHub - aqilmarwan/auralink: Local AI desktop assistant with MCP servers for MP4 files

Modern local AI assistant for video understanding and document generation. Auralink pairs a sleek Next.js UI with a Rust (Tauri) app and Python micro-agents over gRPC for transcription, vision analysis, and content generation (PDF/PPT).

Note

Development has stopped indefinitely!

Warning

Auralink is currently in the early stages of development and is not yet ready for daily use!

Features

Chat Workflow: Ask questions about a video or request outcomes (e.g., “Create a PowerPoint”).
Local Agents:
- Transcription (audio → text).
- Vision (objects, graphs/plots, caption).
- Generation (summary, PDF, PowerPoint).
Fast UI: Smooth, bottom-up chronological chat with optimistic updates and typing animation.
Local persistence: SQLite (via rusqlite) stores files and chat history locally.

Architecture

Next.js (UI)  ──invoke──▶  Tauri (Rust)  ──gRPC──▶  Python agents
  - Chat UI                 - Commands       - transcription_server.py :50051
  - Upload UI               - SQLite DB      - vision_server.py       :50052
  - File list               - ffmpeg/thumbs  - generation_server.py   :50053

UI: src/ (Next.js 14, React 18, Tailwind). Chat components in src/components/chat/*.
Desktop shell: src-tauri/ (Rust, Tauri 2). Exposes commands with #[tauri::command] in src-tauri/src/lib.rs.
Agents: backend/mcp/*.py (Python, gRPC servers). Protos in proto/audio_service.proto. Python stubs generated to backend/generated/.

Data Flow (chat)

User submits a prompt in ChatInput.tsx.
Frontend invokes send_message (Tauri) → src-tauri/src/lib.rs.
Rust persists the user message, scores intent, and calls agents via gRPC as needed.
Agent responses are post-processed into friendly text and saved as assistant messages.
Frontend invalidates the messages query; UI displays messages in chronological order. Assistant responses appear immediately after the user’s prompt.

Agents

Transcription: backend/mcp/transcription_server.py (gRPC :50051)
Vision: backend/mcp/vision_server.py (gRPC :50052)
Generation: backend/mcp/generation_server.py (gRPC :50053)

Rust auto-generates Python gRPC stubs when the app starts (best effort) and launches agents. It also waits for ports to be ready to reduce initial transport errors.

Functional Requirements

Users can upload or register video files and see them listed on dashboard.
Users can chat about a selected file; messages are persisted locally.
The system can:
- Transcribe audio from the video.
- Detect objects and identify graphs from a representative frame.
- Generate summaries of the chat/file.
- Produce a PDF and/or a PowerPoint.
When generating a file (PDF/PPT), the assistant message includes a clickable file:// hyperlink with the full local path for quick access.
Access colab to see full training pipeline →

Tech Stack

UI: Next.js 14, React 18, Tailwind, React Query, React Markdown
Desktop: Tauri 2 (Rust), rusqlite, tonic (gRPC client)
Agents: Python 3, gRPC, Whisper, OpenVINO/ONNX Runtime, MoviePy, NumPy
Media tooling: ffmpeg (required on host)

Getting Started

Prerequisites

Node.js 18+ and pnpm/npm
Rust toolchain and Tauri prerequisites (see Tauri docs for your OS)
Python 3.11+

Install dependencies

npm install

Install Python deps for agents:

python3 -m venv .venv
source .venv/bin/activate 
pip install -r backend/requirements.txt

Run in development (desktop)

This starts Next.js in dev and runs the Tauri app which launches agents.

npm run tauri dev

Alternatively, you can run the web dev server only (without desktop shell):

npm run dev

Note: Local agents and Tauri commands are expected; web-only mode is limited.

Usage Guide

Open the app and upload a video.
Open the chat for a file and ask a question or request an action.
For generation tasks (PDF/PPT), the assistant will reply with a link like:
- PowerPoint generated. [Open file](file:///path/to/output.pptx)
Clicking the link opens the local file in your OS. The raw path is also shown for reference.

Key Files & Directories

src/app/* – Next.js routes and pages
src/components/chat/* – Chat UI components
src-tauri/src/lib.rs – Tauri commands, agent orchestration, thumbnails, DB access
src-tauri/src/db.rs – SQLite schema and queries
src-tauri/src/grpc_client.rs – gRPC client calls to agents
backend/mcp/*.py – Python agent servers
proto/audio_service.proto – Protobuf definitions for services

Limitation

Initial chat responses are hardcoded to provide immediate feedback while Python agents load their models into memory to ensure a responsive user experience during the 10–30 second startup period.
Model inference runs locally and may be slower on CPU-only systems. OpenVINO optimization is used where available to improve performance.
- Model Limitations: The project uses pre-trained models that are not fine-tuned on MP4-specific content:
- Transcription: OpenAI Whisper base model for speech-to-text.
- Vision: DETR (object detection), BLIP (captioning), and TrOCR (OCR) for visual analysis.
- Generation: FLAN-T5-small for text summarization.
- If time permits, these models can be fine-tuned on domain-specific MP4 video content to improve accuracy and contextual understanding for video-based queries.

Troubleshooting

Ensure ffmpeg is installed and accessible on PATH if thumbnails fail.
If agents don’t start, verify Python env and the packages in backend/requirements.txt are installed; check console logs for [agent stdout]/[agent stderr].
Port conflicts (50051–50053) will prevent connections; free or change ports as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
backend		backend
proto		proto
public		public
src-tauri		src-tauri
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Architecture

Data Flow (chat)

Agents

Functional Requirements

Tech Stack

Getting Started

Prerequisites

Install dependencies

Run in development (desktop)

Usage Guide

Key Files & Directories

Limitation

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Features

Architecture

Data Flow (chat)

Agents

Functional Requirements

Tech Stack

Getting Started

Prerequisites

Install dependencies

Run in development (desktop)

Usage Guide

Key Files & Directories

Limitation

Troubleshooting

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages