Hands-free browsing powered by head tracking and Chrome Built-in AI
Nutshell makes the web accessible to everyone by combining computer vision-based head tracking with Chrome's Built-in AI to enable completely hands-free browsing and instant link summaries.
Nutshell showcases the power of Chrome's Built-in AI APIs by bringing sophisticated, privacy-first accessibility directly into the browser. This project demonstrates real-world application of on-device AI to solve critical accessibility challenges.
- Streaming summarization with
summarizeStreaming()for real-time updates - Configurable summary types:
key-points,tl;dr,teaser,headline - Markdown-formatted output with adjustable length
- 100% on-device processing with Gemini Nano
Implementation Highlight:
const summarizer = await ai.summarizer.create({
type: 'key-points',
format: 'markdown',
length: 'medium',
sharedContext: 'This is an article from a webpage.',
outputLanguage: 'en'
});
const stream = summarizer.summarizeStreaming(processedText);
for await (const chunk of stream) {
// Real-time UI updates as summary generates
updateTooltip(chunk);
}- Specialized content processing for YouTube
- Custom system prompts for context-aware summarization
- Streaming responses with
promptStreaming()
Example - YouTube Video Summaries:
const session = await ai.languageModel.create({
expectedOutputs: [{ type: 'text', languages: ['en'] }]
});
const prompt = `Analyze this YouTube video and create a structured summary:
TRANSCRIPT: ${captionText}
DESCRIPTION: ${videoDescription}
Provide: Main theme, key points (3-5 bullets), important timestamps`;
const stream = session.promptStreaming(prompt);Privacy by Design:
- ✅ Zero data sent to external servers
- ✅ Camera feed processed locally with Human.js
- ✅ AI runs entirely in browser with Gemini Nano
- ✅ Perfect for users with disabilities who need privacy-respecting tools
Accessibility at Scale:
- ✅ No expensive hardware ($0 vs. $10,000+ for eye-gaze systems)
- ✅ No cloud API costs
- ✅ Instant responses (no network latency)
- ✅ Works offline after model download
- ✅ Democratizes assistive technology
See Nutshell in action - browse Wikipedia entirely hands-free and get AI-powered summaries with just head movements!
For millions of people with mobility impairments, ALS, cerebral palsy, RSI, or temporary disabilities, using a traditional mouse and keyboard is painful, difficult, or impossible. Existing assistive technologies often:
- Cost $10,000+ for eye-gaze systems
- Require specialized hardware and setup
- Send data to cloud servers (privacy concerns)
- Don't work in web browsers
Meanwhile, browsing the web means clicking countless links just to preview content—creating friction for everyone, especially users relying on alternative input methods.
Nutshell solves both problems with two powerful features:
Head Tracking:
- Look left/right → cursor moves horizontally
- Tilt up/down → cursor moves vertically
- Uses One-Euro filter for smooth, jitter-free movement
- Personalized calibration adapts to your natural range
Mouth-Open Clicking:
- Open mouth → triggers click
- Calibrated to your facial structure
- 800ms cooldown prevents accidental double-clicks
- Real-time visual feedback
Dwell-Based Interaction:
- Hover on links → automatic activation
- Visual progress indicator (growing ring)
- Magnetic snapping to nearby clickables (45px radius)
- Configurable timing (300-1500ms)
Smart Navigation Zones:
- Look top/bottom → auto-scroll (180px zones)
- Look left edge → browser back (80px zone)
- Look right edge → browser forward (80px zone)
- Colored visual feedback shows active zones
Chrome's Built-in AI generates instant summaries for:
- 📄 Web articles - Clean, concise key points
- 🎥 YouTube videos - Summarized from captions + description
Special Feature: YouTube Caption Extraction
- Intercepts XHR requests for captions
- Supports JSON3 (new) and XML (legacy) formats
- Combines transcript + description for better context
- All processed on-device by Gemini Nano
No cloud, no data collection, just pure private accessibility.
- 🎯 Head tracking cursor control (no hands required)
- 👄 Mouth-open clicking with calibration
- ⏱️ Dwell activation (hover to click)
- 🧲 Magnetic snapping helps target links
- 📜 Auto-scrolling zones (top/bottom)
- ⬅️➡️ Browser navigation zones (left/right edges)
- 🤖 On-device AI (Gemini Nano via Chrome)
- 📺 YouTube caption extraction & summarization
- ⚡ Real-time streaming updates
- 💾 Smart caching (30-minute retention)
- 🎨 Dual display (tooltip + side panel)
- 🎚️ Adjustable dwell time (300-1500ms)
- 🎯 Head calibration (5-point personalization)
- 👄 Mouth calibration (adaptive thresholds)
- 🎨 Display modes (tooltip, panel, or both)
- ⚙️ API choice (Summarizer or custom Prompt)
- 🔒 100% local processing (no external servers)
- 📷 Webcam-based (any standard camera)
- ⚡ GPU-accelerated tracking (WebGL)
- 🎯 Lightweight (~2MB extension)
-
Chrome Dev or Canary (version 128+)
- Download: Chrome Dev or Chrome Canary
-
Enable Chrome AI Flags:
- Navigate to
chrome://flags/#optimization-guide-on-device-model - Set to "Enabled BypassPerfRequirement"
- Navigate to
chrome://flags/#prompt-api-for-gemini-nano - Set to "Enabled"
- Navigate to
chrome://flags/#summarization-api-for-gemini-nano - Set to "Enabled"
- Restart Chrome
- Navigate to
-
Verify Model Download:
- Open DevTools Console (F12)
- Run:
await ai.summarizer.availability() - Should return
"readily"or"available" - If
"downloadable", wait 5-10 minutes for model download
-
Clone repository:
git clone https://github.com/yourusername/nutshell.git cd nutshell -
Load extension:
- Open
chrome://extensions/ - Enable "Developer mode" (toggle top-right)
- Click "Load unpacked"
- Select the repository folder
- Open
-
Grant permissions:
- Click Nutshell icon in toolbar
- Allow camera access when prompted
- Wait for models to load (~5-10 seconds)
- Open Nutshell side panel (click extension icon)
- Toggle "Enable Head Tracking"
- Grant camera permission
- Wait for face detection models to load
- Click "Calibrate Head Tracking" (or press
Alt+H) - Follow 5-point calibration:
- Look at CENTER → press SPACE
- Look LEFT → press SPACE
- Look RIGHT → press SPACE
- Look UP → press SPACE
- Look DOWN → press SPACE
- Cursor now follows your head! 🎉
- Toggle "Enable Mouth Click"
- Click "Calibrate Mouth Click" (or press
Alt+M) - Keep mouth closed when prompted
- Open mouth wide when prompted
- Test by opening mouth to click
- Navigate - Move head to control cursor
- Preview links - Hover over any link for 600ms
- Click - Open mouth OR dwell on buttons/links
- Scroll - Look at top (scroll up) or bottom (scroll down)
- Navigate - Look at left edge (back) or right edge (forward)
| Shortcut | Action |
|---|---|
Alt+H |
Calibrate head tracking |
Alt+M |
Calibrate mouth clicking |
Esc |
Cancel active summary |
- Human.js - 468-point facial landmark detection
- One-Euro Filter - Jitter elimination (fc=0.4, β=0.0025)
- Adaptive Smoothing - Lerp interpolation (0.06 center, 0.10 edge)
- WebGL Acceleration - GPU-based processing
- Chrome Summarizer API - Key-point extraction, markdown formatting
- Chrome Prompt API - Custom prompting for specialized content
- Gemini Nano - On-device language model
- Streaming Responses - Real-time character-by-character updates
- Readability.js - Mozilla's article extraction
- XHR Interception - YouTube caption capture
- Smart Truncation - Beginning/middle/end preservation for long content
- Manifest V3 - Modern extension architecture
- Side Panel API - Dedicated control interface
- Content Scripts - Page interaction & tooltip rendering
- Background Service Worker - AI processing & job management
Webcam Feed → Human.js (Face Detection) → Facial Landmarks (468 points)
→ Head Pose Estimation (pitch/yaw) → One-Euro Filter (smoothing)
→ Screen Coordinates → Mouse Events → Dwell Detection → Action
Link Hover (600ms) → Fetch Page HTML → Readability.js (Extract Content)
→ Smart Truncation (fit context) → Chrome Summarizer/Prompt API
→ Gemini Nano Processing → Streaming Response → Tooltip Display
Page Load → Inject XHR Interceptor → Monitor Network Requests
→ Capture Caption Response (JSON/XML) → Parse Timestamps & Text
→ Combine with Video Description → Custom Prompt API Call
→ Structured Summary → Display
nutshell/
├── manifest.json # Extension config (MV3)
├── background.js # AI processing, job management
├── content.js # Link detection, tooltips
├── sidepanel.js/.html # Settings UI
│
├── gaze/ # Head tracking system
│ ├── gaze-core.js # Computer vision, Human.js
│ ├── gaze-dwell.js # Dwell detection, interaction
│ ├── gaze-overlay.js # Visual feedback (zones)
│ ├── head-cal.js # Head calibration
│ ├── mouth-cal.js # Mouth click calibration
│ └── human/ # Human.js + TensorFlow models
│
├── youtube/ # YouTube features
│ ├── youtube-caption-handler.js # XHR interception
│ └── youtube-content-bridge.js # Content script bridge
│
├── twitter/ # Twitter/X integration
│ └── twitter-interceptor.js # GraphQL interception
│
├── lib/ # Third-party libraries
│ └── Readability.js # Mozilla content extraction
│
└── icons/ # Extension icons
Enable logging in respective files:
content.js:const DEBUG_ENABLED = truegaze-dwell.js:const DEBUG_DWELL = true
Open DevTools console on any page:
// Check Summarizer API
const summarizerStatus = await ai.summarizer.availability();
console.log('Summarizer:', summarizerStatus);
// Check Prompt API
const promptStatus = await ai.languageModel.availability();
console.log('Prompt API:', promptStatus);
// Test summarization
if (summarizerStatus === 'readily') {
const summarizer = await ai.summarizer.create({
type: 'key-points',
format: 'markdown',
length: 'short'
});
const result = await summarizer.summarize('Your text here...');
console.log(result);
}Check browser console for:
- Frame processing times (target: 30fps)
- AI streaming latency
- Cache hit rates
- Job abort reasons
- ✅ Verify Chrome flags enabled (see Installation)
- ✅ Check API status:
await ai.summarizer.availability() - ✅ Wait for model download (~5 min first time)
- ✅ Restart Chrome after enabling flags
- ✅ Good lighting (front-facing light works best)
- ✅ Camera permissions granted
- ✅ Recalibrate if cursor feels off
- ✅ Keep torso stable, move head not body
- ✅ Recalibrate head tracking
- ✅ Improve lighting conditions
- ✅ Ensure stable seated position
- ✅ Adjust
HEAD_FILTER_MIN_CUTOFFingaze-core.js
- ✅ Recalibrate mouth detection
- ✅ Ensure camera sees mouth clearly
- ✅ Toggle "Enable Mouth Click" on
- ✅ Exaggerate opening during calibration
| Feature | Specification |
|---|---|
| AI Model | Gemini Nano (Chrome Built-in) |
| Face Detection | 468-point facial landmarks (Human.js) |
| Signal Filter | One-Euro (fc=0.4, β=0.0025, d_cutoff=1.0) |
| Smoothing | Adaptive lerp (0.06 center, 0.10 edge) |
| Dwell Time | 600ms default (300-1500ms range) |
| Click Cooldown | 800ms (mouth-open) |
| Snap Radius | 45px magnetic targeting |
| Scroll Zones | 180px top/bottom edges |
| Nav Zones | 80px left/right edges |
| Cache TTL | 30 minutes |
| Max Content | 4000 chars (Summarizer), 3000 chars (Prompt) |
- ♿ Users with mobility impairments (paralysis, ALS, cerebral palsy)
- 🤕 Repetitive strain injury (RSI) prevention/management
- 🩹 Temporary disabilities (broken arm, surgery recovery)
- 🧠 Alternative input for motor control challenges
- 📝 Hands-free research while taking notes
- 🍕 Browse while eating or multitasking
- 💻 Second screen setups
- ⚡ Quick link previews without navigation
- 📚 Wikipedia exploration
- 📄 Academic paper browsing
- 📰 News aggregation
- 🎓 Topic learning with previews
Built with amazing open-source tools:
- Human.js by Vladimir Mandic - Face tracking
- Readability.js by Mozilla - Content extraction
- Chrome Built-in AI by Google - On-device AI with Gemini Nano
- One-Euro Filter by Géry Casiez - Signal smoothing algorithm
MIT License - see LICENSE for details
Built for the Chrome Built-in AI Hackathon! Contributions welcome:
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push branch (
git push origin feature/amazing-feature) - Open Pull Request
- Chrome Built-in AI Documentation
- Gemini Nano Information
- Human.js GitHub
- Web Accessibility Guidelines
Made with ❤️ for the Chrome Built-in AI Hackathon
Empowering digital independence through on-device AI
🥜 In a nutshell: Browse hands-free, understand faster.
