ARCHITECTURAL_GUARDRAILS.md

Architectural Guardrails

Mandukya AI — Smart Video Lesson Companion
A locally-run knowledge extraction pipeline for educational videos.

Tech Stack

Layer	Technology	Purpose
Pipeline (Backend)	Python 3.10+	Core processing: download, transcribe, index, search
Audio Transcription	Whisper.cpp	Local audio-to-text transcription
Video Processing	FFmpeg / ffmpeg-python	Frame extraction, slide detection
Knowledge Graph	Cognee + Kùzu + LanceDB	RDBMS, Vector, and Graph databases
LLM / Reasoning	Ollama	Local LLM for insights, diagrams, slide generation
Frontend	React 18 + React Three Fiber + Three.js	Split-Helix 3D UI
UI Components	Lucide React	Icon library
Presentation	Reveal.js	Generated slideshow output
Package Management	pip (Python), npm (Frontend)	Dependency management
Task Runner	Make	Unified CLI (`make setup`, `make download`, etc.)

Architectural Principles

1. Local-First, Privacy-First

No cloud dependencies for core functionality. All processing runs locally.
Ollama, Whisper.cpp, and Cognee all operate on the user's machine.
Never introduce cloud API calls without explicit ADR approval.

2. Modular Pipeline Architecture

The system follows a four-stage pipeline:

Perception → Memory → Reasoning → Presentation

Each stage is independently testable and communicates via file-based artifacts (transcripts, frames, slides) and database records (Cognee knowledge graph).

Stage	Input	Output
Perception	YouTube URL / Video file	Transcripts (`.txt`), Frames (`.png`)
Memory	Transcripts	Knowledge Graph (RDBMS + Vector + Graph DB)
Reasoning	Knowledge Graph + Context	Insights, Mermaid diagrams, Slide content
Presentation	Slide content	Reveal.js HTML slides

3. File-Based Communication

Pipeline stages communicate through files on disk, not in-memory state.
transcripts/ — Audio transcription outputs
downloads/ — Downloaded video/audio files
slides/ — Generated Reveal.js presentations
audio/ — Extracted audio tracks
diagrams/ — Generated Mermaid/visual diagrams

4. Frontend-Backend Separation

Backend: Python pipeline scripts (process_pipeline.py, indexer.py, search.py, etc.)
Frontend: React SPA in frontend/ directory
Communication: File system + database queries (no REST API layer currently)
The frontend reads from the same data stores the pipeline writes to.

Directory Structure

video_analysis/
├── pipeline/              # Pipeline modules
├── transcripts/           # Generated transcripts
├── downloads/             # Downloaded media
├── slides/                # Generated Reveal.js slides
├── audio/                 # Extracted audio files
├── diagrams/              # Generated diagrams
├── frontend/              # React SPA (Split-Helix UI)
│   ├── src/
│   ├── public/
│   └── package.json
├── docs/                  # Documentation
│   ├── architecture/      # Architecture diagrams
│   └── adr/               # Architecture Decision Records
├── tests/                 # Test suite
├── process_pipeline.py    # Main pipeline orchestrator
├── indexer.py             # Knowledge graph indexing
├── search.py              # Semantic search
├── downloader.py          # YouTube download
├── slide.py               # Slide generation
├── generate_slide.py      # Slide generation helper
├── cognee_setup.py        # Cognee configuration
├── cognee_indexer.py      # Cognee indexing logic
├── setup_wizard.py        # Interactive setup
├── requirements.txt       # Python dependencies
├── Makefile               # Task runner
└── ARCHITECTURAL_GUARDRAILS.md  # This file

Guardrails

Pipeline Guardrails

No blocking UI calls — Pipeline scripts must be non-interactive (except setup_wizard.py).
Idempotent operations — Re-running a stage should not duplicate data.
Graceful degradation — If Ollama is unavailable, pipeline should fail with clear error, not silently skip.
Logging — Use pipeline.log for structured logging. No print() statements in production code.

Frontend Guardrails

No backend server required — Frontend reads directly from file system / database.
React 18 only — Do not upgrade React major version without ADR.
Three.js for 3D — All 3D visualization uses React Three Fiber + Three.js.
No external API calls — Frontend must not call external services.

Data Guardrails

Cognee is the source of truth — Knowledge graph data lives in Cognee-managed databases.
Files are artifacts — Transcripts, slides, and diagrams are outputs, not inputs (except for pipeline consumption).
No hardcoded paths — Use .env configuration for all file paths.

Dependency Guardrails

Local-first dependencies — Prefer local tools (Whisper.cpp, Ollama) over cloud APIs.
Pin major versions — requirements.txt and package.json should pin major versions.
New dependency requires ADR — Any new external service or major library addition needs an Architecture Decision Record.

Running the System

# Setup (one-time)
make setup

# Download a lesson
make download URL="https://www.youtube.com/playlist?list=..."

# Run the pipeline
make run

# Build knowledge graph
make index

# Search
make search QUERY="your query"

# Frontend
cd frontend && npm start

Decision Records

All architectural decisions are tracked in docs/adr/.
See docs/adr/TEMPLATE.md for the format.

C4 Diagrams

System architecture diagrams are in docs/architecture/diagrams/ in Mermaid format. Start with system-context.mmd for the C4 System Context view.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architectural Guardrails

Tech Stack

Architectural Principles

1. Local-First, Privacy-First

2. Modular Pipeline Architecture

3. File-Based Communication

4. Frontend-Backend Separation

Directory Structure

Guardrails

Pipeline Guardrails

Frontend Guardrails

Data Guardrails

Dependency Guardrails

Running the System

Decision Records

C4 Diagrams

FilesExpand file tree

ARCHITECTURAL_GUARDRAILS.md

Latest commit

History

ARCHITECTURAL_GUARDRAILS.md

File metadata and controls

Architectural Guardrails

Tech Stack

Architectural Principles

1. Local-First, Privacy-First

2. Modular Pipeline Architecture

3. File-Based Communication

4. Frontend-Backend Separation

Directory Structure

Guardrails

Pipeline Guardrails

Frontend Guardrails

Data Guardrails

Dependency Guardrails

Running the System

Decision Records

C4 Diagrams