GitHub - Sidvortex/J.I.N.X: Judgmental Intelligence with Neural eXecution

Innocence fades in circuits — judgment is all that remains.

An autonomous AI desk robot with real-time computer vision, human-like voice, face recognition,
skeleton tracking, document Q&A, live code review, home automation —
and a sarcastic personality. Built from electronic waste for ₹8,000.

B.Tech Final Year Project — Data Science (2023–2027)

About · Features · Architecture · Hardware · ML Models · Installation · Usage · Codex · Demo

📌 About

J.I.N.X is a multi-modal AI robotic desk companion that sees, hears, speaks, judges, and even reviews your code — all built from recycled electronics, a spare smartphone, a dead ThinkPad, and low-cost microcontrollers.

It combines 10 machine learning models spanning computer vision, audio classification, natural language understanding, network security, and sensor fusion into a single desk-mounted platform with a cyberpunk aesthetic and an attitude problem.

"Born from a dead ThinkPad T61 that couldn't even turn on anymore. Its metal chassis became J.I.N.X's body. A spare phone that couldn't make calls became its eyes, ears, and voice. Total hardware cost: ₹8,000. This project proves that AI isn't about expensive hardware — it's about intelligence."

✨ Features

🧠 AI & Intelligence

    ✧ Real-time face detection + recognition (green = safe / blue = unknown / red = threat)
    ✧ 33-keypoint full body skeleton + pose estimation overlay
    ✧ 21-keypoint hand gesture recognition — control everything without touching anything
    ✧ 468-point face mesh with dramatic scanning visual effect
    ✧ CNN-trained audio classification (gunshots, sirens, glass breaks, screams)
    ✧ Offline wake word + full voice command system
    ✧ Gemini 2.0 Flash conversations with context memory and sarcastic personality
    ✧ Microsoft Neural TTS — sounds like a real human, not a robot. Free. No API key.
    ✧ Document Q&A — ask questions about uploaded PDFs by voice or web panel
    ✧ Code review agent — auto-reviews changed files on save, flags bugs and style issues
    ✧ Roast mode — scans your face, generates a personalized AI roast, delivers it live
    ✧ Voice-activated music search and streaming via yt-dlp + mpv
    ✧ IoT home automation — LED strip + smart device control by voice and gesture
    ✧ WiFi network monitor — flags unknown devices, detects traffic anomalies

🤖 Physical & Mechanical

    ✧ Animated eyes on 2.4" TFT with 12 emotional states (neutral / angry / scanning / roast / boot...)
    ✧ Pan-tilt servo head — physically turns to follow detected faces
    ✧ Digital pupils that track face position, synced with servo movement
    ✧ VL53L0X ToF sensor (downward) — detects desk edges with millimeter accuracy
    ✧ Dual HC-SR04 ultrasonic + forward VL53L0X obstacle avoidance
    ✧ 7.4V 18650 2S2P battery pack with BMS protection and voltage monitoring
    ✧ At <15% battery: eyes go sleepy, LEDs pulse yellow, J.I.N.X says "I'm running on spite"
    ✧ WS2812B LED strip with 11 animated modes — reacts to mood, threats, music, battery
    ✧ DFPlayer Mini + 3W speaker for sound effects. Neural TTS plays through phone speaker
    ✧ Body made from ThinkPad T61 chassis, keyboard keys, RAM sticks, HDD platters

🌐 Control & Interface

    ✧ Web control panel at http://LAPTOP_IP:5000 — live feed, modes, LEDs, movement, uploads
    ✧ Streamlit cyberpunk dashboard at port 8501 — camera, skeleton, network map, alerts
    ✧ Any device on the same WiFi can control J.I.N.X from a browser — no app needed

🔀 Operating Modes

MODE 1: BUDDY (Default)
├── Friendly personality, responds to voice commands
├── Answers questions, plays music, controls lights
├── Eyes follow people, head tracks faces
├── Skeleton overlay shows your movements in real-time
└── Returns low-battery warning when needed

MODE 2: SENTINEL (Surveillance)
├── Active scanning — color-coded face + object detection
│   ├── 🟢 GREEN  = Known + Safe
│   ├── 🔵 BLUE   = Unknown (not in database)
│   └── 🔴 RED    = Known + Flagged as Threat
├── Audio anomaly detection (glass break, screams, gunshots)
├── Network device monitoring — flags unknown WiFi devices
├── All events logged with timestamps + screenshots
└── LED strips react to threat level in real-time

MODE 3: ROAST
├── Scans person's face → identifies from database
├── Generates personalized comedic roast via Gemini
├── Delivers roast through speaker in human voice
├── Eyes show smug expression, LEDs flash orange party mode
└── Adjustable intensity: light / medium / savage

MODE 4: AGENT
├── Document Q&A — ask questions about uploaded PDFs/books
├── Code review — watches your project folder, reviews on save
├── Image search — "what does a golden retriever look like?"
├── Research assistant — searches web for answers
└── Read document aloud — summarizes books by voice command

MODE 5: SLEEP
├── Eyes close, LEDs dim
├── Wake word still active
└── "I was in the middle of something."

🏗️ System Architecture

                ┌──────────────────────────────────┐
                │       LAPTOP (Main Server)        │
                │                                  │
                │  🧠 ML Models:                   │
                │  ├── YOLOv5-nano (object detect)  │
                │  ├── MediaPipe (Face/Pose/Hands)   │
                │  ├── face_recognition (dlib)      │
                │  ├── Audio CNN (UrbanSound8K)     │
                │  ├── Vosk STT (offline)           │
                │  ├── edge-TTS (neural voice)      │
                │  ├── Gemini 2.0 Flash (LLM)       │
                │  ├── Network Anomaly (RF)         │
                │  └── Sensor Fusion (Hivemind)     │
                │                                  │
                │  🌐 Services:                     │
                │  ├── Flask Web Control (:5000)    │
                │  ├── Streamlit Dashboard (:8501)  │
                │  ├── MQTT Broker (Mosquitto)      │
                │  └── SQLite Database              │
                └───────────────┬──────────────────┘
                                │ WiFi (Private Local Network)
          ┌─────────────────────┼──────────────────────┐
          │                     │                      │
   ┌──────▼──────┐      ┌──────▼──────┐       ┌──────▼──────┐
   │  J.I.N.X    │      │   TABLET    │        │   PHONE     │
   │  ROBOT      │      │  NEXUS DASH │        │  (Control)  │
   │             │      │             │        │             │
   │ ┌─────────┐ │      │ ─ Camera    │        │ :5000       │
   │ │  ESP32  │ │      │ ─ Skeleton  │        └─────────────┘
   │ │─Motors  │ │      │ ─ Network   │
   │ │─Servos  │ │      │ ─ Alerts    │
   │ │─TFT Eyes│ │      │ ─ Audio     │
   │ │─LEDs    │ │      └─────────────┘
   │ │─Sensors │ │
   │ │─Speaker │ │
   │ └─────────┘ │
   │ ┌─────────┐ │
   │ │Redmi 12 │ │  ← Neural TTS voice plays here
   │ │─Camera  │ │
   │ │─Mic/Spk │ │
   │ └─────────┘ │
   └─────────────┘

🔧 Hardware

Purchased Components (~₹8,000 total)

#	Component	Spec	Purpose
1	ESP32-WROOM-32 DevKit	WiFi+BT, 30-pin	Robot brain
2	2.4" TFT ILI9341	240×320, SPI	Animated eyes
3	SG90 Servo ×2	180°, 1.8kg-cm	Head pan + tilt
4	L298N Motor Driver	Dual H-Bridge	DC motor control
5	HC-SR04 Ultrasonic ×2	2–400cm	Obstacle detection
6	VL53L0X ToF Sensor ×2	2m, ±1mm, I2C	Table edge + precise depth
7	WS2812B LED Strip	30 LEDs, 5V	Mood reactive lighting
8	DFPlayer Mini + 3W Speaker	UART, MP3	Sound effects
9	18650 Battery ×4	3.7V 2600mAh	2S2P = 7.4V ~5000mAh
10	2S BMS Board	7.4V 10A	Battery protection
11	Servo Pan/Tilt Platform	Vertical & Horizontal Axis Movement	Clean Build

Recycled / Pre-owned

Component	Source	Purpose
Metal parts, keyboard keys, RAM, HDD platters	Lenovo ThinkPad T61	Body structure + decoration
Camera, mic, speaker, WiFi	Xiaomi Redmi Note 12	Primary sensor array + TTS speaker
Tablet	UP Govt issued	Dashboard display

Recycled components saved an estimated ₹15,000+ in equivalent hardware costs.

🧠 ML Models

#	Model	Task	Type	Dataset
1	YOLOv5-nano	Object Detection	Pre-trained	COCO (80 classes)
2	MediaPipe Face Mesh	468-point Landmarks	Pre-trained	Google
3	MediaPipe Pose	33-point Skeleton	Pre-trained	Google
4	MediaPipe Hands + Classifier	Gesture Recognition	Pre-trained + Custom	Google + Custom
5	dlib ResNet / face_recognition	Face Recognition (128-d)	Pre-trained + Custom	LFW + Your faces
6	Custom CNN (2D Conv)	Audio Classification	Trained from scratch	UrbanSound8K
7	Vosk / Google STT	Speech-to-Text	Pre-trained	—
8	Gemini 2.0 Flash	NLU + Conversation + Agent	API	Google
9	Random Forest	Network Anomaly Detection	Trained	NSL-KDD
10	Weighted Ensemble	Sensor Fusion	Custom	—

Audio CNN Architecture

Input: 128×128 Mel Spectrogram
  ├── Conv2D(32, 3×3) → BatchNorm → ReLU → MaxPool(2×2)
  ├── Conv2D(64, 3×3) → BatchNorm → ReLU → MaxPool(2×2)
  ├── Conv2D(128, 3×3) → BatchNorm → ReLU → MaxPool(2×2)
  ├── Conv2D(64, 3×3) → GlobalAveragePooling
  ├── Dense(256) → ReLU → Dropout(0.4)
  ├── Dense(128) → ReLU → Dropout(0.3)
  └── Dense(10) → Softmax

Classes: air_conditioner, car_horn, children_playing, dog_bark,
         drilling, engine_idling, gun_shot, jackhammer, siren, street_music
Dataset: UrbanSound8K (8,732 samples)

Sensor Fusion (Hivemind)

Visual Score  (0–1) × 0.35  ←  face_recognition threat level
Audio Score   (0–1) × 0.30  ←  CNN threat-class confidence
Network Score (0–1) × 0.20  ←  anomaly model prediction
Proximity     (0–1) × 0.15  ←  ultrasonic + ToF distances
                    ↓
              doom_level (0–1)
              > 0.70 → ALERT → LED + eyes + buzzer + log

📖 Codex

File	Codename	Purpose
`genesis.py`	GENESIS	Main startup — launches everything
`dna.py`	DNA	All configuration and settings
`blackbox.py`	BLACKBOX	SQLite event logging
`psyche.py`	PSYCHE	Personality, jokes, roast prompts
`optic.py`	OPTIC	Vision — camera, faces, pose, mesh, gestures
`vocoder.py`	VOCODER	Voice — STT, neural TTS, Gemini, commands, music
`echo_hunter.py`	ECHO HUNTER	Audio — CNN sound classification
`ice_wall.py`	ICE WALL	Network — device scan, anomaly detection
`synapse.py`	SYNAPSE	MQTT — all inter-module messaging
`hivemind.py`	HIVEMIND	Sensor fusion — doom level scoring
`agent.py`	AGENT	AI Agent — document Q&A + code review
`nexus.py`	NEXUS	Streamlit cyberpunk dashboard
`web_control/app.py`	NEXUS-WEB	Flask phone control panel

⚡ Installation

Prerequisites

Personal laptop (Linux recommended, Windows/Mac also work)
Python 3.10+ · Arduino IDE 2.x · Assembled J.I.N.X hardware
WiFi router · Redmi Note 12 with DroidCam · Mosquitto MQTT
cmake (for face_recognition/dlib) · mpv (for music playback)

Step 1 — Clone

git clone https://github.com/Sidvortex/J.I.N.X.git
cd J.I.N.X

Step 2 — Install Dependencies

# Arch / EndeavourOS
sudo pacman -S mosquitto cmake espeak-ng mpv yt-dlp portaudio python-pip

# Ubuntu / Debian
sudo apt install mosquitto cmake libcmake-data espeak-ng mpv yt-dlp \
                 portaudio19-dev python3-pip

pip install -r requirements.txt
sudo systemctl enable --now mosquitto

Step 3 — Configure

nano server/dna.py

# Set:
LAPTOP_IP       = "your.laptop.ip"
PHONE_IP        = "redmi.note.ip"
GEMINI_API_KEY  = "get from aistudio.google.com"
FACE_LABELS     = {"yourname": "safe"}

Step 4 — Download ML Models

# Vosk offline STT (~40MB)
mkdir -p models/vosk-model
# Download: https://alphacephei.com/vosk/models → vosk-model-small-en-us-0.15
# Extract into models/vosk-model/

# YOLOv5 (auto-downloads on first run)
python -c "from ultralytics import YOLO; YOLO('yolov5n.pt')"

Step 5 — Register Your Face

python scripts/register_face.py --name yourname --file photo.jpg --label safe
# or live:
python scripts/register_face.py --name yourname --live --label safe

Step 6 — Flash ESP32

1. Open Arduino IDE 2.x → arduino/jinx_esp32/jinx_esp32.ino
2. Install via Library Manager:
   TFT_eSPI · Adafruit NeoPixel · PubSubClient
   ArduinoJson · ESP32Servo · DFRobotDFPlayerMini · VL53L0X
3. Edit config.h → set WIFI_SSID, WIFI_PASS, MQTT_BROKER
4. Board: ESP32 Dev Module → Upload

Step 7 — Setup Phone

1. Install DroidCam on Redmi Note 12
2. Connect to same WiFi → set static IP in router
3. Update PHONE_IP in server/dna.py
4. Open DroidCam → Start Server
5. Test: http://PHONE_IP:4747/video

🚀 Usage

Start J.I.N.X

python server/genesis.py              # Normal startup
python server/genesis.py --sentinel   # Start in Sentinel mode
python server/genesis.py --agent-mode # Document/code focus
python server/genesis.py --no-audio --no-network  # Faster startup

Voice Commands

    ✧ "Hey JINX, wake up"              → System activation
    ✧ "Hey JINX, guard mode"           → Sentinel surveillance
    ✧ "Hey JINX, roast [name]"         → AI-generated personalized roast
    ✧ "Hey JINX, what is [topic]"      → Gemini answers + shows image
    ✧ "Hey JINX, play [song/genre]"    → Music search and playback
    ✧ "Hey JINX, lights [color]"       → LED color change
    ✧ "Hey JINX, review my code"       → Code review of watched folder
    ✧ "Hey JINX, read [document name]" → Summarizes uploaded document
    ✧ "Hey JINX, status"               → System health report
    ✧ "Hey JINX, goodnight"            → Sleep mode

📊 Results & Metrics

Model	Metric	Score
Face Recognition	Accuracy	~95%+
Face Recognition	False Acceptance Rate	<2%
Audio CNN	F1-Score	~85%+
Audio CNN	Accuracy	~88%+
Network Anomaly	ROC-AUC	~92%+
Voice Recognition	Word Error Rate	~10–15%
Sensor Fusion	Detection Accuracy	~90%+
Table Edge Detection	Accuracy	~99% (VL53L0X)

🛠️ Tech Stack

🔮 Future Scope

    ✧ SLAM-based room mapping and path planning
    ✧ Raspberry Pi 4 integration — remove laptop dependency
    ✧ Robotic arm for object manipulation
    ✧ Emotion detection from facial expressions
    ✧ Multi-language voice (Hindi + English)
    ✧ Smart home ecosystem integration (Google Home, Alexa)
    ✧ Mobile app (React Native) for remote control
    ✧ Cloud dashboard for remote monitoring outside local network
    ✧ Multi-robot swarm communication
    ✧ Hexapod leg conversion (servo-based spider legs)

🙏 Acknowledgments

    ✧ Google MediaPipe team (vision models)
    ✧ Ultralytics (YOLOv5)
    ✧ Adam Geitgey (face_recognition library)
    ✧ Microsoft (edge-TTS Neural voices)
    ✧ Google Gemini AI
    ✧ Vosk / Alpha Cephei (offline STT)
    ✧ The dead ThinkPad T61 that gave its body for science
    ✧ The open-source community

🎬 Demo

📸 This is a unfinished projet right now, as soon as we get funds we will be desiging its body, we would not let it stay nude like a 3yr-old child roaming around the house !!

🎥 we hope you like the video on our channel and support us. Stay Tuned for such crazy projects we will be delivering them with a craze on !!

👥 Team

Ravada Siddharth
_{Lead Developer · AI/ML · Hardware · IoT · Backend}

Ayush Mishra
_{Developer · Backend · Integration}

Vinayak Kapoor
_{Developer · Frontend · App_Development}

B.Tech Data Science (2023–2027)

Built with dark magic acquired from the pitch black caves of West Bengal
@sidvortex

J.I.N.X doesn't just think. It judges.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
arduino/jinx_esp32		arduino/jinx_esp32
assets		assets
data		data
docs		docs
readmes		readmes
software tools		software tools
src		src
.gitignore		.gitignore
CODEX.md		CODEX.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Innocence fades in circuits — judgment is all that remains.

📌 About

✨ Features

🧠 AI & Intelligence

🤖 Physical & Mechanical

🌐 Control & Interface

🔀 Operating Modes

🏗️ System Architecture

🔧 Hardware

Purchased Components (~₹8,000 total)

Recycled / Pre-owned

🧠 ML Models

Audio CNN Architecture

Sensor Fusion (Hivemind)

📖 Codex

⚡ Installation

Prerequisites

Step 1 — Clone

Step 2 — Install Dependencies

Step 3 — Configure

Step 4 — Download ML Models

Step 5 — Register Your Face

Step 6 — Flash ESP32

Step 7 — Setup Phone

🚀 Usage

Start J.I.N.X

Voice Commands

📊 Results & Metrics

🛠️ Tech Stack

🔮 Future Scope

🙏 Acknowledgments

🎬 Demo

👥 Team

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages