Skip to content

RobertSamuel-tech/SceneIQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SceneIQ

Structured Temporal Scene Intelligence for Narrative Video Understanding


The Shift

Video today is passive.

You scroll. You scrub. You guess timestamps.

SceneIQ changes that.

It transforms video into queryable structured intelligence — not frame detection, not static labeling, but temporal reasoning over evolving events.


The Challenge We Solved

Modern Systems Fall Short SceneIQ Delivers
Detect objects per frame Scene-level abstraction
Ignore temporal continuity Persistent identity modeling
Miss entity relationships Motion reasoning
Cannot retrieve narrative moments Interaction inference
Depend on heavy black-box transformers Importance ranking — locally

Impact Scenario

User types:

"Football World Cup trophy celebration"

Within seconds, SceneIQ:

  1. Detects trophy object
  2. Identifies crowd motion spike
  3. Tracks multi-player clustering
  4. Detects raised-arm celebration pattern
  5. Computes interaction density
  6. Ranks scene importance

Returns:

Scene:               Trophy Lift Celebration
Timestamp:           01:42:18 – 01:44:03
Importance Score:    0.93
Motion Intensity:    High
Interaction Density: High
Detected Objects:    trophy, players

Video jumps instantly to the exact moment. No timeline scrubbing. No manual editing. Just the moment.

This is not keyword search. This is structured temporal reasoning.


Core Innovation

SceneIQ models video as:

Dimension What It Models
Entities Persistent identity tracking
Objects Multi-object detection
Motion Velocity-based reasoning
Interaction Spatial relationship inference
Narrative Scene boundary modeling
Importance Scene scoring & ranking

Each scene is represented as a semantic unit:

$$S_i = (T_i,\ E_i,\ O_i,\ M_i,\ R_i,\ I_i)$$

Symbol Meaning
$T_i$ Time boundary
$E_i$ Persistent entities
$O_i$ Objects
$M_i$ Motion intensity
$R_i$ Interaction graph
$I_i$ Importance score

Video Structured Scene Graph.


System Architecture

Video Input
    
Structural Scene Segmentation
    
Motion-Aware Frame Sampling
    
YOLOv8 Object Detection
    
Persistent Multi-Object Tracking
    
Velocity & Motion Modeling
    
Interaction Graph Construction
    
Scene Importance Scoring
    
Semantic Indexing
    
Timestamp-Accurate Retrieval

Technical Depth

1 — Structural Scene Segmentation

  • HSV histogram comparison
  • Structural similarity metrics
  • Temporal smoothing
  • Narrative-consistent boundary grouping

**Result ** Scene-level units, not raw frames.


2 — Persistent Entity Modeling

Each tracked entity maintains a continuous trajectory:

$$e_j = {(x_t,\ y_t)}_{t=t_1}^{t_2}$$

Track ID continuity enables:

  • Long-term identity preservation
  • Behavior evolution tracking
  • Cross-frame reasoning

3 — Motion Intelligence Layer

Velocity is classified into:

Class Description
stationary No movement
walking Low velocity
running High velocity
vehicle_motion Wheeled motion
fast_object Projectile / fast-moving item

Detects: Goals Celebrations Action spikes High-energy events


4 — Interaction Graph Modeling

Entities become nodes. Spatial proximity and temporal overlap form edges.

Scene graph examples:

person     driving       car
player     holding       trophy
man        speaking_to   woman

Enables semantic reasoning beyond detection.


5 — Scene Importance Function

Scenes are ranked by:

  • Motion intensity
  • Entity count
  • Interaction density

The system surfaces moments that matter.


Semantic Retrieval Engine

User query:

"man driving car scene"

Converted into constraints:

  • person detected
  • vehicle detected
  • vehicle_motion > threshold
  • spatial overlap: person inside car region

Matched scenes ranked by importance returned with exact timestamp.

Deterministic. Explainable. Local.


Performance & Efficiency

Property SceneIQ
Hardware CPU-friendly
Inference Deterministic
Indexing Real-time
Transformer dependency None
Execution Fully local

Unlike heavy multimodal models, SceneIQ is efficient, transparent, and deployable anywhere.


Applications

Domain Use Case
Sports Highlight extraction
Film & Media Scene indexing
Surveillance Behavior analysis
Smart Mobility Vehicle understanding
Education Video navigation

Why This Is Real

SceneIQ introduces:

  • Structured temporal abstraction
  • Persistent identity modeling
  • Motion-aware scene importance ranking
  • Interaction graph reasoning
  • Deterministic semantic retrieval

It bridges classical computer vision and semantic video intelligence.


The Vision

In the near future, users will not scrub videos. They will query them.

"Last minute winning goal."
"Professor explaining gradient descent."
"Crowd panic moment."

SceneIQ is the engine that makes video searchable by meaning.


From Frame Intelligence Moment Intelligence

SceneIQ doesn't detect frames.
It understands moments.

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors