ConUHacksX

Idea

Working with your computer is pretty tedious and manual, what if it wasn't? Our project allows users to seemlessly present and use their computer to do tasks such as go over slides, docuements, send emails and more!

Features

Operating through your computer through your camera
Operating through your computer through your voice
Presenter mode
- Record presenter
- Generate transcript and subtitles
OS control actions
- Left click, right click, move mouse
- Drag-and-drop, scroll, click-and-hold, window snapping, app switching, volume/brightness, media keys
Macro system
- User-defined sequences triggered by a gesture or voice intent
Gesture personalization
- Map your own gestures to actions and save presets
Accessibility modes
- Dwell-click, sticky keys, large-pointer mode, reduced motion
Collaboration tools
- Remote slide control, laser pointer overlay, live captions
Telemetry & debugging
- Action logs, latency charts, and a safe replay sandbox for testing
Gemini intelligence
- Intent parsing: convert natural language into a structured action plan
- Context-aware shortcuts: detect on-screen context and auto-switch profiles
- Presenter assistant: generate speaker notes, summarize slides, prep Q&A
- Meeting mode: transcript summaries, action items, slide search by topic
- Gesture refinement: reduce false positives using vision prompts
- Error recovery: explain failures and suggest fixes
Safety & privacy
- Pause/kill toggle in the UI and a quick hotkey
- Local-only mode for users who don’t want cloud calls
- Clear opt-in for any external API data sharing

Tech Stack

Languages
- Python
- JavaScript
Libraries
- OpenCV
APIs
- Gemini
Supported Operating Systems
- Mac

Steps

1 - Set up gestures (Gesture Model)

We want to set up detection of the following gestures:

Left hand index finger
Right hand index finger

2 - OS Link

We want to set up functions applicable for operating systems to link the gestures to operations. Targeted operations

Left click
Right click
Moving mouse

3 - UI Setup

This is a desktop applicaiton built with Electron. We want to have a clean and slick UI users can use that's intuitive.

Usage

Run the app (window menu):

poetry run conux

Test detection:

Use the window menu and select "Gesture tester".

Controls: C clear, Q quit.

Tips:

Left hand: pinch to click/drag. Right hand: swipe left/right.
Try pinches and short swipes at different distances from the camera.
If detection is too sensitive, tune thresholds in gesture_model/src/cv/gesture_detector.py.

Python (gesture_model)

Setup

cd gesture_model
poetry install

Run

poetry run conux --mode all
poetry run conux --mode cv
poetry run conux --mode input
poetry run conux --mode audio
poetry run conux --mode summarize
poetry run conux --mode summarize_video
poetry run conux --mode screen
poetry run conux --mode mux
poetry run conux --mode capture (screen + audio together, then mux)
poetry run conux --mode audio --audio-translate

Configuration (Python)

Central defaults live in gesture_model/src/config.py (see AudioConfig). You can either:

Set env vars for one-off runs, or
Import and pass a config in code (e.g., audio.streaming.run(config=...)).

Components & modes

cv: camera + gesture detection + OSLink mouse control.
audio: live speech-to-text with optional translation + subtitle buffering.
screen: screen recording via ffmpeg.
mux: mux latest screen recording with saved audio clips.
capture: runs audio + screen together, then muxes; sets AUDIO_SAVE=1 and SCREEN_SAVE=1.
menu: simple window menu to launch CV/tester tools.

Live subtitles (audio streaming)

Default backend is Vosk with partials enabled; switch to SpeechRecognition with AUDIO_BACKEND=speech_recognition.

Run with Poetry:

AUDIO_BACKEND=vosk poetry run conux --mode audio

Translate audio on the fly:

AUDIO_TRANSLATE=1 AUDIO_TRANSLATE_TARGET=fr AUDIO_TRANSLATE_SOURCE=auto \
AUDIO_BACKEND=vosk poetry run conux --mode audio

Or use the CLI flag:

poetry run conux --mode audio --audio-translate

First run auto-downloads the Vosk model to ~/.cache/conux/vosk. Override with:

VOSK_MODEL_CACHE_DIR=/path/to/cache
VOSK_MODEL_URL=... (model zip URL)
VOSK_MODEL_PATH=/path/to/extracted/model (skip download)

Audio pipeline configuration:

AUDIO_BACKEND=vosk|speech_recognition
AUDIO_STREAM_PARTIALS=1 (emit partial Vosk hypotheses; default on)
AUDIO_TRANSLATE=1, AUDIO_TRANSLATE_TARGET=en, AUDIO_TRANSLATE_SOURCE=auto
AUDIO_TRANSLATE_PARTIALS=1
AUDIO_SENTENCE_GAP=1.2, AUDIO_MAX_SENTENCE_SECONDS=10, AUDIO_MAX_SENTENCE_CHARS=120

Audio device selection:

AUDIO_DEVICE_INDEX=0 (force a specific mic index)
AUDIO_SKIP_DEVICE_SCAN=1 (skip device enumeration)

SpeechRecognition tuning:

AUDIO_LISTEN_TIMEOUT=5, AUDIO_PHRASE_TIME_LIMIT=5, AUDIO_QUIET=1
AUDIO_DYNAMIC_ENERGY=1, AUDIO_ENERGY_THRESHOLD=...
AUDIO_PAUSE_THRESHOLD=..., AUDIO_PHRASE_THRESHOLD=...
AUDIO_NON_SPEAKING_DURATION=...

Vosk streaming tuning:

VOSK_LOG_LEVEL=-1
AUDIO_SAMPLE_RATE=16000, AUDIO_CHUNK_SIZE=8000
AUDIO_PRE_ROLL_SECONDS=0.5
AUDIO_PREPROCESS=1, AUDIO_TARGET_RMS=3000
AUDIO_MAX_GAIN=3.0, AUDIO_GATE_THRESHOLD=400
VOSK_MODEL_PATH=/path/to/model
VOSK_MODEL_CACHE_DIR=..., VOSK_MODEL_VARIANT=large|small
VOSK_MODEL_URL=..., VOSK_MODEL_ZIP_PATH=...

CLI audio flags (override env defaults):

--audio-backend vosk|speech_recognition
--audio-stream-partials/--no-audio-stream-partials
--audio-translate, --audio-translate-target, --audio-translate-source
--audio-translate-partials/--no-audio-translate-partials
--audio-sentence-gap, --audio-max-sentence-seconds, --audio-max-sentence-chars
--captions (audio/capture modes), --text-mode original|translated

Captions overlay (desktop subtitle bar):

--captions to show the overlay in audio or capture mode.
--text-mode original|translated selects what the overlay shows.
If --text-mode translated is used, translation is enabled automatically.
The overlay is a topmost window anchored to the bottom of the desktop.
When using --mode capture, the overlay is part of the desktop and will be recorded by screen capture if your OS allows capturing topmost windows.

Vosk required/optional env vars:

Required to use Vosk: none (default backend)
Optional model control:
- VOSK_MODEL_PATH=/path/to/model (skip download, point at extracted model)
- VOSK_MODEL_CACHE_DIR=... (override cache dir for auto-download)
- VOSK_MODEL_VARIANT=large|small (select default model URL)
- VOSK_MODEL_URL=... (custom model zip URL)
- VOSK_MODEL_ZIP_PATH=... (use a pre-downloaded zip)
Optional streaming/audio knobs:
- AUDIO_STREAM_PARTIALS=1 (emit partial hypotheses)
- AUDIO_SAMPLE_RATE=16000, AUDIO_CHUNK_SIZE=8000
- AUDIO_PRE_ROLL_SECONDS=0.5
- AUDIO_PREPROCESS=1, AUDIO_TARGET_RMS=3000
- AUDIO_MAX_GAIN=3.0, AUDIO_GATE_THRESHOLD=400
- VOSK_LOG_LEVEL=-1

Required env vars (only when needed)

There are no required env vars for the default poetry run conux --mode all flow. Set these only if your environment needs them:

FFMPEG_PATH=/path/to/ffmpeg if ffmpeg is not on PATH (screen/mux).
FFPROBE_PATH=/path/to/ffprobe if ffprobe is not on PATH (mux duration detection).
VOSK_MODEL_PATH=/path/to/model if you run AUDIO_BACKEND=vosk without allowing the auto-download.
SCREEN_INPUT=... if the default screen capture input does not work on your OS.
AUDIO_DEVICE_INDEX=... if you need to force a non-default microphone.

Audio saving

Set one of:

AUDIO_SAVE=1 (saves to audio_recordings/)
AUDIO_SAVE_DIR=/path/to/dir

Transcript saving

Set one of:

TRANSCRIPT_SAVE=1 (saves to repo-root transcripts/)
TRANSCRIPT_SAVE_DIR=/path/to/dir
TRANSCRIPT_SAVE_PATH=/path/to/transcript.txt Note: transcripts/ is intentionally untracked; do not commit transcript files.

Screen recording (ffmpeg required)

Set one of:

SCREEN_SAVE=1 (saves to screen_recordings/)
SCREEN_SAVE_DIR=/path/to/dir

Optional:

FFMPEG_PATH=ffmpeg
SCREEN_FPS=30
SCREEN_SIZE=1920x1080
SCREEN_INPUT=... (Windows default desktop, macOS auto-detects Capture screen and falls back to 1, Linux default $DISPLAY)
SCREEN_QUIET=1 (suppress ffmpeg output)

Post-process muxing

Uses the newest screen_*.mp4 in SCREEN_SAVE_DIR (or screen_recordings/) and timestamped audio_*.wav in AUDIO_SAVE_DIR (or audio_recordings/).

Optional:

MUX_OUTPUT_DIR=/path/to/dir
FFPROBE_PATH=ffprobe

Gesture + OS control tuning

Gesture thresholds, smoothing, and mouse motion tuning are code-level constants:

gesture_model/src/cv/gesture_detector.py (pinch/swipe/click/drag thresholds)
gesture_model/src/cv/camera.py (hand tracking confidence, motion gains, accel)

OSLink safety

OSLink exposes a SafetyConfig(paused=False) and pause/kill toggles in code (oslink/api.py); there are no env vars yet.

Summaries (Gemini)

Summarize the newest transcript:

GEMINI_API_KEY=... poetry run conux --mode summarize

Summarize with video + transcript context:

GEMINI_API_KEY=... poetry run conux --mode summarize_video

Outputs:

Transcript: transcripts/transcript_<timestamp>.txt
Summary: summaries/summary_<timestamp>.pdf

Slide decks (Marp)

Generate a PDF slide deck from the newest muxed video + transcript:

poetry run conux --mode deck

One-command capture + deck:

TRANSCRIPT_SAVE=1 AUDIO_SAVE=1 SCREEN_SAVE=1 poetry run conux --mode capture_deck

Outputs:

Slide deck: decks/deck_<timestamp>.pdf
Markdown source: decks/deck_<timestamp>.md
Image assets: decks/deck_<timestamp>_assets/

Dependencies:

poetry add google-genai
npm i -g @marp-team/marp-cli (or set MARP_PATH="npx @marp-team/marp-cli")

Options:

DECK_OUTPUT_PATH=/path/to/deck.pdf
DECK_OUTPUT_DIR=/path/to/dir (default decks/)
DECK_TITLE="..."
DECK_SUBTITLE="..."
DECK_MAX_IMAGES=6
DECK_SCENE_THRESHOLD=0.25 (scene-change capture)
DECK_FRAME_INTERVAL=10 (fallback seconds)
DECK_IMAGE_WIDTH=1280
DECK_PROMPT="..." (override deck summarization prompt)
DECK_AFTER_CAPTURE=1 (run deck generation after --mode capture)

End-to-end workflow (screen + audio to summary)

Capture screen + audio (muxed video) + transcript:

TRANSCRIPT_SAVE=1 AUDIO_SAVE=1 SCREEN_SAVE=1 poetry run conux --mode capture

Summarize using the full video + transcript:

poetry run conux --mode summarize_video

Outputs:

Muxed video: screen_recordings/muxed_<timestamp>.mp4
Transcript: transcripts/transcript_<timestamp>.txt
Summary PDF: summaries/summary_<timestamp>.pdf

Summarization options

Optional:

SUMMARY_OUTPUT_PATH=/path/to/summary.pdf (or .txt)
SUMMARY_OUTPUT_DIR=/path/to/dir (default summaries/)
TRANSCRIPT_PATH=/path/to/transcript.txt (default is newest in transcripts/)
VIDEO_PATH=/path/to/muxed.mp4 (used by summarize_video, default is newest in screen_recordings/)
SUMMARY_PROMPT="..." (override the summarization instruction)
GEMINI_MODEL=gemini-2.0-flash

Dependency:

poetry add google-genai reportlab

Examples:

TRANSCRIPT_SAVE=1 poetry run conux --mode audio
GEMINI_API_KEY=... poetry run conux --mode summarize
TRANSCRIPT_PATH=/path/to/file.txt GEMINI_API_KEY=... poetry run conux --mode summarize
VIDEO_PATH=/path/to/muxed.mp4 GEMINI_API_KEY=... poetry run conux --mode summarize_video

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
decks		decks
frontend		frontend
gesture_model		gesture_model
oslink		oslink
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConUHacksX

Idea

Features

Tech Stack

Steps

1 - Set up gestures (Gesture Model)

2 - OS Link

3 - UI Setup

Usage

Python (gesture_model)

Setup

Run

Configuration (Python)

Components & modes

Live subtitles (audio streaming)

Required env vars (only when needed)

Audio saving

Transcript saving

Screen recording (ffmpeg required)

Post-process muxing

Gesture + OS control tuning

OSLink safety

Summaries (Gemini)

Slide decks (Marp)

End-to-end workflow (screen + audio to summary)

Summarization options

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ConUHacksX

Idea

Features

Tech Stack

Steps

1 - Set up gestures (Gesture Model)

2 - OS Link

3 - UI Setup

Usage

Python (gesture_model)

Setup

Run

Configuration (Python)

Components & modes

Live subtitles (audio streaming)

Required env vars (only when needed)

Audio saving

Transcript saving

Screen recording (ffmpeg required)

Post-process muxing

Gesture + OS control tuning

OSLink safety

Summaries (Gemini)

Slide decks (Marp)

End-to-end workflow (screen + audio to summary)

Summarization options

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages