A web application for real-time speech recognition and translation using AssemblyAI and DeepL APIs.
- Real-time Speech Recognition: Speech-to-text with AssemblyAI WebSocket API v3
- Instant Translation: Text translation using DeepL API
- Dual Audio Capture: Supports both microphone input and system audio (tab audio)
- Live Transcription: Real-time interim and final transcript display
- Text Selection Translation: Translate any selected text on the page
- Interactive Result Navigation: Click-to-scroll linking between speech recognition and translation results
- Note Panel: Store the original and translation pairs and download as a file
- High Accuracy: >91% speech recognition accuracy with AssemblyAI
- Low Latency: ~300ms recognition latency for real-time performance
- Audio Processing: AudioContext + PCM16 format for optimal compatibility
- Reconnection Logic: Automatic WebSocket reconnection with exponential backoff
- Static HTML/CSS/JavaScript (no build process)
- ES6 module-based component architecture
- Responsive UI with Tailwind CSS
- Interactive Features with Click navigation
- Netlify Functions: Serverless backend for API key management
- AssemblyAI: Real-time speech recognition via WebSocket streaming
- DeepL: Translation services
- Netlify: Hosting and serverless functions
// AudioContext-based audio processing
const audioContext = new AudioContext({ sampleRate: 16000 });
const processor = audioContext.createScriptProcessor(4096, 1, 1);
// Float32 to PCM16 conversion for AssemblyAI
const pcm16Buffer = convertFloat32ToInt16(audioData);
websocket.send(pcm16Buffer);// Data-linking system between recognition and translation results
const resultId = `result_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
recognitionResult.setAttribute('data-result-id', resultId);
translationResult.setAttribute('data-translation-for', resultId);
// Smart scroll positioning with responsive layout detection
const isVerticalLayout = finalResultsTop !== translationContainerTop;
const scrollBehavior = isVerticalLayout ? 'center' : 'position-sync';- Primary: Tab audio capture using
getDisplayMedia()(Chrome/Edge) - Fallback: Microphone capture
- Echo Cancellation: Disabled to allow speaker audio recognition
- Connection Management: Automatic reconnection with backoff strategy
- Message Handling: Real-time processing of interim and final transcripts
- Modern web browser with WebRTC support
- HTTPS hosting (required for microphone access)
- AssemblyAI API key
- DeepL API key
Create environment variables in your hosting platform:
ASSEMBLYAI_API_KEY=your_assemblyai_api_key
DEEPL_API_KEY=your_deepl_api_key
npm install
npm run dev
npm run deploy- Connect repository to Netlify
- Configure environment variables in Netlify dashboard
- Deploy automatically on push to main branch
- Chrome/Edge: Full feature support including tab audio capture
- Firefox: Microphone-only audio capture
- Safari: Basic functionality with limitations
- WebRTC MediaDevices API
- WebSocket API
- AudioContext/Web Audio API
- Fetch API
- Accuracy: >91% for clear English speech
- Latency: ~300ms from speech to transcript
- Sample Rate: 16kHz mono audio
- Format: PCM16 for optimal AssemblyAI compatibility
- Speed: <500ms for typical phrases
- Quality: Professional-grade DeepL translation
- Languages: Supports all DeepL language pairs
- Memory: <50MB typical usage
- CPU: <5% on modern devices
- Network: ~64kbps for audio streaming
// Secure token generation via Netlify function
const response = await fetch('/.netlify/functions/assemblyai-token');
const { wsUrl } = await response.json();
const websocket = new WebSocket(wsUrl);// CORS-free translation via proxy function
const response = await fetch('/.netlify/functions/deepl-translate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text, targetLang: 'KO' })
});- Server-side token generation prevents client-side API key exposure
- Environment variable configuration for secure key management
- Netlify Functions provide secure execution environment
- Required for microphone access via getUserMedia()
- Ensures secure WebSocket connections
- Protects against man-in-the-middle attacks
assets/
├── js/
│ ├── modules/ # Feature modules
│ │ ├── ui.js # UI management and navigation
│ │ ├── realtimeTranslation.js # Translation handling
│ │ ├── speechRecognition.js # AssemblyAI integration
│ │ ├── notePanel.js # Note management system
│ │ ├── noteStorage.js # Local storage operations
│ │ └── noteInteraction.js # Hover interactions
│ ├── utils/ # Utility functions
│ └── app.js # Main application entry
├── css/
│ ├── styles.css # Core styling
│ ├── highlight-effects.css # Interactive animations
│ └── note-panel.css # Note panel styling
└── ...
netlify/functions/ # Serverless backend
docs/ # Documentation and analysis
- Note Management System: Hover-to-save functionality with sliding panel interface
- Local Storage: Persistent note storage with search, export (JSON/TXT), and management features
- Enhanced User Experience: Pin button overlays and smooth animations
- Interactive Result Navigation: Click-to-scroll linking between speech recognition and translation results
- Visual Feedback: Animations and hover effects for better interaction
- Responsive Navigation: Adjusts scroll position based on desktop/mobile layout
- Data Linking: Unique ID system connecting recognition results to translations
- AssemblyAI API integration with dual audio capture
- AudioContext-based audio processing
- Enhanced error handling and reconnection logic
- Production-ready performance optimizations
- Initial release with Web Speech API
- Basic real-time translation functionality
- DeepL API integration via Netlify Functions
