SceneIQ

Structured Temporal Scene Intelligence for Narrative Video Understanding

The Shift

Video today is passive.

You scroll. You scrub. You guess timestamps.

SceneIQ changes that.

It transforms video into queryable structured intelligence — not frame detection, not static labeling, but temporal reasoning over evolving events.

The Challenge We Solved

Modern Systems Fall Short	SceneIQ Delivers
Detect objects per frame	Scene-level abstraction
Ignore temporal continuity	Persistent identity modeling
Miss entity relationships	Motion reasoning
Cannot retrieve narrative moments	Interaction inference
Depend on heavy black-box transformers	Importance ranking — locally

Impact Scenario

User types:

"Football World Cup trophy celebration"

Within seconds, SceneIQ:

Detects trophy object
Identifies crowd motion spike
Tracks multi-player clustering
Detects raised-arm celebration pattern
Computes interaction density
Ranks scene importance

Returns:

Scene:               Trophy Lift Celebration
Timestamp:           01:42:18 – 01:44:03
Importance Score:    0.93
Motion Intensity:    High
Interaction Density: High
Detected Objects:    trophy, players

Video jumps instantly to the exact moment. No timeline scrubbing. No manual editing. Just the moment.

This is not keyword search. This is structured temporal reasoning.

Core Innovation

SceneIQ models video as:

Dimension	What It Models
Entities	Persistent identity tracking
Objects	Multi-object detection
Motion	Velocity-based reasoning
Interaction	Spatial relationship inference
Narrative	Scene boundary modeling
Importance	Scene scoring & ranking

Each scene is represented as a semantic unit:

$$S_i = (T_i,\ E_i,\ O_i,\ M_i,\ R_i,\ I_i)$$

Symbol	Meaning
$T_i$	Time boundary
$E_i$	Persistent entities
$O_i$	Objects
$M_i$	Motion intensity
$R_i$	Interaction graph
$I_i$	Importance score

Video Structured Scene Graph.

System Architecture

Video Input
    
Structural Scene Segmentation
    
Motion-Aware Frame Sampling
    
YOLOv8 Object Detection
    
Persistent Multi-Object Tracking
    
Velocity & Motion Modeling
    
Interaction Graph Construction
    
Scene Importance Scoring
    
Semantic Indexing
    
Timestamp-Accurate Retrieval

Technical Depth

1 — Structural Scene Segmentation

HSV histogram comparison
Structural similarity metrics
Temporal smoothing
Narrative-consistent boundary grouping

**Result ** Scene-level units, not raw frames.

2 — Persistent Entity Modeling

Each tracked entity maintains a continuous trajectory:

$$e_j = {(x_t,\ y_t)}_{t=t_1}^{t_2}$$

Track ID continuity enables:

Long-term identity preservation
Behavior evolution tracking
Cross-frame reasoning

3 — Motion Intelligence Layer

Velocity is classified into:

Class	Description
`stationary`	No movement
`walking`	Low velocity
`running`	High velocity
`vehicle_motion`	Wheeled motion
`fast_object`	Projectile / fast-moving item

Detects: Goals Celebrations Action spikes High-energy events

4 — Interaction Graph Modeling

Entities become nodes. Spatial proximity and temporal overlap form edges.

Scene graph examples:

person     driving       car
player     holding       trophy
man        speaking_to   woman

Enables semantic reasoning beyond detection.

5 — Scene Importance Function

Scenes are ranked by:

Motion intensity
Entity count
Interaction density

The system surfaces moments that matter.

Semantic Retrieval Engine

User query:

"man driving car scene"

Converted into constraints:

person detected
vehicle detected
vehicle_motion > threshold
spatial overlap: person inside car region

Matched scenes ranked by importance returned with exact timestamp.

Deterministic. Explainable. Local.

Performance & Efficiency

Property	SceneIQ
Hardware	CPU-friendly
Inference	Deterministic
Indexing	Real-time
Transformer dependency	None
Execution	Fully local

Unlike heavy multimodal models, SceneIQ is efficient, transparent, and deployable anywhere.

Applications

Domain	Use Case
Sports	Highlight extraction
Film & Media	Scene indexing
Surveillance	Behavior analysis
Smart Mobility	Vehicle understanding
Education	Video navigation

Why This Is Real

SceneIQ introduces:

Structured temporal abstraction
Persistent identity modeling
Motion-aware scene importance ranking
Interaction graph reasoning
Deterministic semantic retrieval

It bridges classical computer vision and semantic video intelligence.

The Vision

In the near future, users will not scrub videos. They will query them.

"Last minute winning goal."
"Professor explaining gradient descent."
"Crowd panic moment."

SceneIQ is the engine that makes video searchable by meaning.

From Frame Intelligence Moment Intelligence

SceneIQ doesn't detect frames.
It understands moments.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
docs		docs
src		src
ui		ui
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SceneIQ

Structured Temporal Scene Intelligence for Narrative Video Understanding

The Shift

The Challenge We Solved

Impact Scenario

Core Innovation

System Architecture

Technical Depth

1 — Structural Scene Segmentation

2 — Persistent Entity Modeling

3 — Motion Intelligence Layer

4 — Interaction Graph Modeling

5 — Scene Importance Function

Semantic Retrieval Engine

Performance & Efficiency

Applications

Why This Is Real

The Vision

From Frame Intelligence Moment Intelligence

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SceneIQ

Structured Temporal Scene Intelligence for Narrative Video Understanding

The Shift

The Challenge We Solved

Impact Scenario

Core Innovation

System Architecture

Technical Depth

1 — Structural Scene Segmentation

2 — Persistent Entity Modeling

3 — Motion Intelligence Layer

4 — Interaction Graph Modeling

5 — Scene Importance Function

Semantic Retrieval Engine

Performance & Efficiency

Applications

Why This Is Real

The Vision

From Frame Intelligence Moment Intelligence

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages