You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AI-powered YouTube video dubbing pipeline. Automatically transcribes (Whisper), translates (Google), and generates neural dubbing (Edge-TTS) with smart audio-video synchronization and background music preservation.
A real-time Speech-to-Speech translation pipeline (ASR ➡️ NMT ➡️ TTS) using OpenAI Whisper, MarianMT, and gTTS. Features a Flask backend and a responsive web UI for low-latency multilingual communication and audio synthesis.
Dictator – Supercharge Cursor Chat with voice-to-text, custom AI prompts, and workflow automation. Speak your ideas, inject templates instantly, and code faster with AI-powered assistance.
Listening Between the Lines: An explainable multimodal framework for MCI detection from spontaneous speech. Leverages Selective State Space Models (Mamba) and Gated Fusion to integrate linguistic disfluencies and eGeMAPS biomarkers across multi-corpus benchmarks (Pitt, ADReSS, TAUKADIAL)
A real-time audio streaming POC featuring Voice Activity Detection (VAD), Faster-Whisper ASR, NLLB-200 translation, and Piper TTS. Built with FastAPI and React to demonstrate a low-latency, end-to-end speech-to-speech pipeline.
A specialized AI-powered educational tool for mastering mental arithmetic. Features local LLM integration (Llama 3), real-time voice transcription (Whisper), and an interactive Canvas-based Soroban abacus.