Skip to content

katkes/ConUHacksX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

125 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ConUHacksX

Idea

Working with your computer is pretty tedious and manual, what if it wasn't? Our project allows users to seemlessly present and use their computer to do tasks such as go over slides, docuements, send emails and more!

Features

  • Operating through your computer through your camera
  • Operating through your computer through your voice
  • Presenter mode
    • Record presenter
    • Generate transcript and subtitles
  • OS control actions
    • Left click, right click, move mouse
    • Drag-and-drop, scroll, click-and-hold, window snapping, app switching, volume/brightness, media keys
  • Macro system
    • User-defined sequences triggered by a gesture or voice intent
  • Gesture personalization
    • Map your own gestures to actions and save presets
  • Accessibility modes
    • Dwell-click, sticky keys, large-pointer mode, reduced motion
  • Collaboration tools
    • Remote slide control, laser pointer overlay, live captions
  • Telemetry & debugging
    • Action logs, latency charts, and a safe replay sandbox for testing
  • Gemini intelligence
    • Intent parsing: convert natural language into a structured action plan
    • Context-aware shortcuts: detect on-screen context and auto-switch profiles
    • Presenter assistant: generate speaker notes, summarize slides, prep Q&A
    • Meeting mode: transcript summaries, action items, slide search by topic
    • Gesture refinement: reduce false positives using vision prompts
    • Error recovery: explain failures and suggest fixes
  • Safety & privacy
    • Pause/kill toggle in the UI and a quick hotkey
    • Local-only mode for users who don’t want cloud calls
    • Clear opt-in for any external API data sharing

Tech Stack

  • Languages
    • Python
    • JavaScript
  • Libraries
    • OpenCV
  • APIs
    • Gemini
  • Supported Operating Systems
    • Mac

Steps

1 - Set up gestures (Gesture Model)

We want to set up detection of the following gestures:

  • Left hand index finger
  • Right hand index finger

2 - OS Link

We want to set up functions applicable for operating systems to link the gestures to operations. Targeted operations

  • Left click
  • Right click
  • Moving mouse

3 - UI Setup

This is a desktop applicaiton built with Electron. We want to have a clean and slick UI users can use that's intuitive.

Usage

Run the app (window menu):

poetry run conux

Test detection:

Use the window menu and select "Gesture tester".

Controls: C clear, Q quit.

Tips:

  • Left hand: pinch to click/drag. Right hand: swipe left/right.
  • Try pinches and short swipes at different distances from the camera.
  • If detection is too sensitive, tune thresholds in gesture_model/src/cv/gesture_detector.py.

Python (gesture_model)

Setup

  • cd gesture_model
  • poetry install

Run

  • poetry run conux --mode all
  • poetry run conux --mode cv
  • poetry run conux --mode input
  • poetry run conux --mode audio
  • poetry run conux --mode summarize
  • poetry run conux --mode summarize_video
  • poetry run conux --mode screen
  • poetry run conux --mode mux
  • poetry run conux --mode capture (screen + audio together, then mux)
  • poetry run conux --mode audio --audio-translate

Configuration (Python)

Central defaults live in gesture_model/src/config.py (see AudioConfig). You can either:

  • Set env vars for one-off runs, or
  • Import and pass a config in code (e.g., audio.streaming.run(config=...)).

Components & modes

  • cv: camera + gesture detection + OSLink mouse control.
  • audio: live speech-to-text with optional translation + subtitle buffering.
  • screen: screen recording via ffmpeg.
  • mux: mux latest screen recording with saved audio clips.
  • capture: runs audio + screen together, then muxes; sets AUDIO_SAVE=1 and SCREEN_SAVE=1.
  • menu: simple window menu to launch CV/tester tools.

Live subtitles (audio streaming)

Default backend is Vosk with partials enabled; switch to SpeechRecognition with AUDIO_BACKEND=speech_recognition.

Run with Poetry:

AUDIO_BACKEND=vosk poetry run conux --mode audio

Translate audio on the fly:

AUDIO_TRANSLATE=1 AUDIO_TRANSLATE_TARGET=fr AUDIO_TRANSLATE_SOURCE=auto \
AUDIO_BACKEND=vosk poetry run conux --mode audio

Or use the CLI flag:

poetry run conux --mode audio --audio-translate

First run auto-downloads the Vosk model to ~/.cache/conux/vosk. Override with:

  • VOSK_MODEL_CACHE_DIR=/path/to/cache
  • VOSK_MODEL_URL=... (model zip URL)
  • VOSK_MODEL_PATH=/path/to/extracted/model (skip download)

Audio pipeline configuration:

  • AUDIO_BACKEND=vosk|speech_recognition
  • AUDIO_STREAM_PARTIALS=1 (emit partial Vosk hypotheses; default on)
  • AUDIO_TRANSLATE=1, AUDIO_TRANSLATE_TARGET=en, AUDIO_TRANSLATE_SOURCE=auto
  • AUDIO_TRANSLATE_PARTIALS=1
  • AUDIO_SENTENCE_GAP=1.2, AUDIO_MAX_SENTENCE_SECONDS=10, AUDIO_MAX_SENTENCE_CHARS=120

Audio device selection:

  • AUDIO_DEVICE_INDEX=0 (force a specific mic index)
  • AUDIO_SKIP_DEVICE_SCAN=1 (skip device enumeration)

SpeechRecognition tuning:

  • AUDIO_LISTEN_TIMEOUT=5, AUDIO_PHRASE_TIME_LIMIT=5, AUDIO_QUIET=1
  • AUDIO_DYNAMIC_ENERGY=1, AUDIO_ENERGY_THRESHOLD=...
  • AUDIO_PAUSE_THRESHOLD=..., AUDIO_PHRASE_THRESHOLD=...
  • AUDIO_NON_SPEAKING_DURATION=...

Vosk streaming tuning:

  • VOSK_LOG_LEVEL=-1
  • AUDIO_SAMPLE_RATE=16000, AUDIO_CHUNK_SIZE=8000
  • AUDIO_PRE_ROLL_SECONDS=0.5
  • AUDIO_PREPROCESS=1, AUDIO_TARGET_RMS=3000
  • AUDIO_MAX_GAIN=3.0, AUDIO_GATE_THRESHOLD=400
  • VOSK_MODEL_PATH=/path/to/model
  • VOSK_MODEL_CACHE_DIR=..., VOSK_MODEL_VARIANT=large|small
  • VOSK_MODEL_URL=..., VOSK_MODEL_ZIP_PATH=...

CLI audio flags (override env defaults):

  • --audio-backend vosk|speech_recognition
  • --audio-stream-partials/--no-audio-stream-partials
  • --audio-translate, --audio-translate-target, --audio-translate-source
  • --audio-translate-partials/--no-audio-translate-partials
  • --audio-sentence-gap, --audio-max-sentence-seconds, --audio-max-sentence-chars
  • --captions (audio/capture modes), --text-mode original|translated

Captions overlay (desktop subtitle bar):

  • --captions to show the overlay in audio or capture mode.
  • --text-mode original|translated selects what the overlay shows.
  • If --text-mode translated is used, translation is enabled automatically.
  • The overlay is a topmost window anchored to the bottom of the desktop.
  • When using --mode capture, the overlay is part of the desktop and will be recorded by screen capture if your OS allows capturing topmost windows.

Vosk required/optional env vars:

  • Required to use Vosk: none (default backend)
  • Optional model control:
    • VOSK_MODEL_PATH=/path/to/model (skip download, point at extracted model)
    • VOSK_MODEL_CACHE_DIR=... (override cache dir for auto-download)
    • VOSK_MODEL_VARIANT=large|small (select default model URL)
    • VOSK_MODEL_URL=... (custom model zip URL)
    • VOSK_MODEL_ZIP_PATH=... (use a pre-downloaded zip)
  • Optional streaming/audio knobs:
    • AUDIO_STREAM_PARTIALS=1 (emit partial hypotheses)
    • AUDIO_SAMPLE_RATE=16000, AUDIO_CHUNK_SIZE=8000
    • AUDIO_PRE_ROLL_SECONDS=0.5
    • AUDIO_PREPROCESS=1, AUDIO_TARGET_RMS=3000
    • AUDIO_MAX_GAIN=3.0, AUDIO_GATE_THRESHOLD=400
    • VOSK_LOG_LEVEL=-1

Required env vars (only when needed)

There are no required env vars for the default poetry run conux --mode all flow. Set these only if your environment needs them:

  • FFMPEG_PATH=/path/to/ffmpeg if ffmpeg is not on PATH (screen/mux).
  • FFPROBE_PATH=/path/to/ffprobe if ffprobe is not on PATH (mux duration detection).
  • VOSK_MODEL_PATH=/path/to/model if you run AUDIO_BACKEND=vosk without allowing the auto-download.
  • SCREEN_INPUT=... if the default screen capture input does not work on your OS.
  • AUDIO_DEVICE_INDEX=... if you need to force a non-default microphone.

Audio saving

Set one of:

  • AUDIO_SAVE=1 (saves to audio_recordings/)
  • AUDIO_SAVE_DIR=/path/to/dir

Transcript saving

Set one of:

  • TRANSCRIPT_SAVE=1 (saves to repo-root transcripts/)
  • TRANSCRIPT_SAVE_DIR=/path/to/dir
  • TRANSCRIPT_SAVE_PATH=/path/to/transcript.txt Note: transcripts/ is intentionally untracked; do not commit transcript files.

Screen recording (ffmpeg required)

Set one of:

  • SCREEN_SAVE=1 (saves to screen_recordings/)
  • SCREEN_SAVE_DIR=/path/to/dir

Optional:

  • FFMPEG_PATH=ffmpeg
  • SCREEN_FPS=30
  • SCREEN_SIZE=1920x1080
  • SCREEN_INPUT=... (Windows default desktop, macOS auto-detects Capture screen and falls back to 1, Linux default $DISPLAY)
  • SCREEN_QUIET=1 (suppress ffmpeg output)

Post-process muxing

Uses the newest screen_*.mp4 in SCREEN_SAVE_DIR (or screen_recordings/) and timestamped audio_*.wav in AUDIO_SAVE_DIR (or audio_recordings/).

Optional:

  • MUX_OUTPUT_DIR=/path/to/dir
  • FFPROBE_PATH=ffprobe

Gesture + OS control tuning

Gesture thresholds, smoothing, and mouse motion tuning are code-level constants:

  • gesture_model/src/cv/gesture_detector.py (pinch/swipe/click/drag thresholds)
  • gesture_model/src/cv/camera.py (hand tracking confidence, motion gains, accel)

OSLink safety

OSLink exposes a SafetyConfig(paused=False) and pause/kill toggles in code (oslink/api.py); there are no env vars yet.

Summaries (Gemini)

Summarize the newest transcript:

GEMINI_API_KEY=... poetry run conux --mode summarize

Summarize with video + transcript context:

GEMINI_API_KEY=... poetry run conux --mode summarize_video

Outputs:

  • Transcript: transcripts/transcript_<timestamp>.txt
  • Summary: summaries/summary_<timestamp>.pdf

Slide decks (Marp)

Generate a PDF slide deck from the newest muxed video + transcript:

poetry run conux --mode deck

One-command capture + deck:

TRANSCRIPT_SAVE=1 AUDIO_SAVE=1 SCREEN_SAVE=1 poetry run conux --mode capture_deck

Outputs:

  • Slide deck: decks/deck_<timestamp>.pdf
  • Markdown source: decks/deck_<timestamp>.md
  • Image assets: decks/deck_<timestamp>_assets/

Dependencies:

  • poetry add google-genai
  • npm i -g @marp-team/marp-cli (or set MARP_PATH="npx @marp-team/marp-cli")

Options:

  • DECK_OUTPUT_PATH=/path/to/deck.pdf
  • DECK_OUTPUT_DIR=/path/to/dir (default decks/)
  • DECK_TITLE="..."
  • DECK_SUBTITLE="..."
  • DECK_MAX_IMAGES=6
  • DECK_SCENE_THRESHOLD=0.25 (scene-change capture)
  • DECK_FRAME_INTERVAL=10 (fallback seconds)
  • DECK_IMAGE_WIDTH=1280
  • DECK_PROMPT="..." (override deck summarization prompt)
  • DECK_AFTER_CAPTURE=1 (run deck generation after --mode capture)

End-to-end workflow (screen + audio to summary)

  1. Capture screen + audio (muxed video) + transcript:
TRANSCRIPT_SAVE=1 AUDIO_SAVE=1 SCREEN_SAVE=1 poetry run conux --mode capture
  1. Summarize using the full video + transcript:
poetry run conux --mode summarize_video

Outputs:

  • Muxed video: screen_recordings/muxed_<timestamp>.mp4
  • Transcript: transcripts/transcript_<timestamp>.txt
  • Summary PDF: summaries/summary_<timestamp>.pdf

Summarization options

Optional:

  • SUMMARY_OUTPUT_PATH=/path/to/summary.pdf (or .txt)
  • SUMMARY_OUTPUT_DIR=/path/to/dir (default summaries/)
  • TRANSCRIPT_PATH=/path/to/transcript.txt (default is newest in transcripts/)
  • VIDEO_PATH=/path/to/muxed.mp4 (used by summarize_video, default is newest in screen_recordings/)
  • SUMMARY_PROMPT="..." (override the summarization instruction)
  • GEMINI_MODEL=gemini-2.0-flash

Dependency:

  • poetry add google-genai reportlab

Examples:

TRANSCRIPT_SAVE=1 poetry run conux --mode audio
GEMINI_API_KEY=... poetry run conux --mode summarize
TRANSCRIPT_PATH=/path/to/file.txt GEMINI_API_KEY=... poetry run conux --mode summarize
VIDEO_PATH=/path/to/muxed.mp4 GEMINI_API_KEY=... poetry run conux --mode summarize_video

About

ConUHackX submission

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors