Real-time Translator

English | 한글

A web application for real-time speech recognition and translation using AssemblyAI and DeepL APIs.

Features

Core Functionality

Real-time Speech Recognition: Speech-to-text with AssemblyAI WebSocket API v3
Instant Translation: Text translation using DeepL API
Dual Audio Capture: Supports both microphone input and system audio (tab audio)
Live Transcription: Real-time interim and final transcript display
Text Selection Translation: Translate any selected text on the page
Interactive Result Navigation: Click-to-scroll linking between speech recognition and translation results
Note Panel: Store the original and translation pairs and download as a file

Technical Highlights

High Accuracy: >91% speech recognition accuracy with AssemblyAI
Low Latency: ~300ms recognition latency for real-time performance
Audio Processing: AudioContext + PCM16 format for optimal compatibility
Reconnection Logic: Automatic WebSocket reconnection with exponential backoff

Architecture

Frontend

Static HTML/CSS/JavaScript (no build process)
ES6 module-based component architecture
Responsive UI with Tailwind CSS
Interactive Features with Click navigation

Backend Services

Netlify Functions: Serverless backend for API key management

APIs and Services

AssemblyAI: Real-time speech recognition via WebSocket streaming
DeepL: Translation services
Netlify: Hosting and serverless functions

Technical Implementation

Speech Recognition

// AudioContext-based audio processing
const audioContext = new AudioContext({ sampleRate: 16000 });
const processor = audioContext.createScriptProcessor(4096, 1, 1);

// Float32 to PCM16 conversion for AssemblyAI
const pcm16Buffer = convertFloat32ToInt16(audioData);
websocket.send(pcm16Buffer);

Interactive Result Navigation

// Data-linking system between recognition and translation results
const resultId = `result_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
recognitionResult.setAttribute('data-result-id', resultId);
translationResult.setAttribute('data-translation-for', resultId);

// Smart scroll positioning with responsive layout detection
const isVerticalLayout = finalResultsTop !== translationContainerTop;
const scrollBehavior = isVerticalLayout ? 'center' : 'position-sync';

Dual Audio Strategy

Primary: Tab audio capture using getDisplayMedia() (Chrome/Edge)
Fallback: Microphone capture
Echo Cancellation: Disabled to allow speaker audio recognition

WebSocket Integration

Connection Management: Automatic reconnection with backoff strategy
Message Handling: Real-time processing of interim and final transcripts

Setup and Installation

Prerequisites

Modern web browser with WebRTC support
HTTPS hosting (required for microphone access)
AssemblyAI API key
DeepL API key

Environment Configuration

Create environment variables in your hosting platform:

ASSEMBLYAI_API_KEY=your_assemblyai_api_key
DEEPL_API_KEY=your_deepl_api_key

Local Development

npm install
npm run dev
npm run deploy

Netlify Deployment

Connect repository to Netlify
Configure environment variables in Netlify dashboard
Deploy automatically on push to main branch

Browser Compatibility

Supported Browsers

Chrome/Edge: Full feature support including tab audio capture
Firefox: Microphone-only audio capture
Safari: Basic functionality with limitations

Required APIs

WebRTC MediaDevices API
WebSocket API
AudioContext/Web Audio API
Fetch API

Performance Characteristics

Speech Recognition

Accuracy: >91% for clear English speech
Latency: ~300ms from speech to transcript
Sample Rate: 16kHz mono audio
Format: PCM16 for optimal AssemblyAI compatibility

Translation

Speed: <500ms for typical phrases
Quality: Professional-grade DeepL translation
Languages: Supports all DeepL language pairs

Resource Usage

Memory: <50MB typical usage
CPU: <5% on modern devices
Network: ~64kbps for audio streaming

API Integration

AssemblyAI WebSocket

// Secure token generation via Netlify function
const response = await fetch('/.netlify/functions/assemblyai-token');
const { wsUrl } = await response.json();
const websocket = new WebSocket(wsUrl);

DeepL Translation

// CORS-free translation via proxy function
const response = await fetch('/.netlify/functions/deepl-translate', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ text, targetLang: 'KO' })
});

Security Considerations

API Key Protection

Server-side token generation prevents client-side API key exposure
Environment variable configuration for secure key management
Netlify Functions provide secure execution environment

HTTPS Requirement

Required for microphone access via getUserMedia()
Ensures secure WebSocket connections
Protects against man-in-the-middle attacks

Code Structure

assets/
├── js/
│   ├── modules/           # Feature modules
│   │   ├── ui.js         # UI management and navigation
│   │   ├── realtimeTranslation.js  # Translation handling
│   │   ├── speechRecognition.js   # AssemblyAI integration
│   │   ├── notePanel.js  # Note management system
│   │   ├── noteStorage.js # Local storage operations
│   │   └── noteInteraction.js # Hover interactions
│   ├── utils/            # Utility functions
│   └── app.js           # Main application entry
├── css/
│   ├── styles.css        # Core styling
│   ├── highlight-effects.css  # Interactive animations
│   └── note-panel.css    # Note panel styling
└── ...

netlify/functions/    # Serverless backend
docs/                # Documentation and analysis

Version History

v2.2 (Current)

Note Management System: Hover-to-save functionality with sliding panel interface
Local Storage: Persistent note storage with search, export (JSON/TXT), and management features
Enhanced User Experience: Pin button overlays and smooth animations

v2.1

Interactive Result Navigation: Click-to-scroll linking between speech recognition and translation results
Visual Feedback: Animations and hover effects for better interaction
Responsive Navigation: Adjusts scroll position based on desktop/mobile layout
Data Linking: Unique ID system connecting recognition results to translations

v2.0

AssemblyAI API integration with dual audio capture
AudioContext-based audio processing
Enhanced error handling and reconnection logic
Production-ready performance optimizations

v1.0

Initial release with Web Speech API
Basic real-time translation functionality
DeepL API integration via Netlify Functions

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets		assets
docs/analysis		docs/analysis
.gitignore		.gitignore
CNAME		CNAME
README.md		README.md
README_KR.md		README_KR.md
index.html		index.html
netlify.toml		netlify.toml
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

Real-time Translator

Features

Core Functionality

Technical Highlights

Architecture

Frontend

Backend Services

APIs and Services

Technical Implementation

Speech Recognition

Interactive Result Navigation

Dual Audio Strategy

WebSocket Integration

Setup and Installation

Prerequisites

Environment Configuration

Local Development

Netlify Deployment

Browser Compatibility

Supported Browsers

Required APIs

Performance Characteristics

Speech Recognition

Translation

Resource Usage

API Integration

AssemblyAI WebSocket

DeepL Translation

Security Considerations

API Key Protection

HTTPS Requirement

Code Structure

Version History

v2.2 (Current)

v2.1

v2.0

v1.0

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages