SceneSplat

Flow Diagram

Inspiration

Inspired by Sherlock Holmes' incredible ability to deduce entire narratives from the smallest details, we wanted to empower real-world investigators with similar superpowers. Currently, forensic teams spend countless hours manually combing through 3D scene reconstructions and photos which is a process that is slow, painstaking, and prone to human error. We built SceneSplat to bridge that gap, turning complex 3D scenes into actionable insights in minutes by combining AI-powered detection with an immersive 3D viewer.

What it does

SceneSplat is an interactive web application that transforms static 3D crime scene models into dynamic, intelligent environments for forensic analysis.

Explore Immersive 3D Scenes: Investigators can load .glb models and navigate them with intuitive orbit, zoom, and pan controls, examining every angle of a reconstructed scene.
Automate Evidence Detection with AI: With a single click, the app analyzes the scene to find objects, anomalies, and potential evidence, automatically placing labeled 3D markers with descriptions and confidence scores.
Leverage Dual Analysis Modes:
- Quick Analysis: Instantly extracts evidence from the 3D model's geometry and metadata for rapid insights.
- Deep Vision Analysis: Captures a screenshot of the current view and sends it to Google Gemini Flash, using its powerful multimodal capabilities to visually identify points of interest just as a human would.
Manage Cases and Evidence: The system organizes scenes into parent cases and child evidence items. Uploading a new .glb model to a case automatically adds it as a piece of evidence for individual inspection.
Consolidate Notes and Files: Users can attach case notes and upload supporting files (images, documents) directly to a scene, keeping all investigative materials in one place.

How we built it

We built SceneSplat using a modern, full-stack TypeScript architecture designed for performance and rapid development.

Frontend Framework: Next.js with the App Router provided a robust foundation, allowing us to build a fast, server-aware single-page application and handle backend logic through API routes.
3D Rendering: We used React Three Fiber and drei to declaratively manage a three.js canvas, enabling us to render complex .glb models and overlay interactive UI elements smoothly in the browser.
AI & Vision: The core intelligence is powered by the Google Gemini Flash model via the @google/generative-ai SDK. We engineered a sophisticated pipeline that sends visual data (screenshots) to Gemini and parses its structured JSON response to place 3D markers in the scene.
UI/UX: The user interface was built with TailwindCSS for styling and Radix UI for accessible, unstyled component primitives, allowing us to create a clean and intuitive experience.

The data flow is seamless: a user selects a scene, which is rendered in the 3D viewer. They can trigger an AI analysis, which sends a request to our Next.js backend. The backend processes the request (either by parsing the model file or querying Gemini), and returns a list of evidence points that are then visualized as interactive markers in the 3D space.

Challenges we ran into

3D Coordinate Mapping: One of our biggest hurdles was mapping AI detections from a 2D screenshot back to accurate 3D positions within the scene's world space. This required careful prompt engineering and coordinate system translation.
Structured AI Output: Getting Gemini to consistently return valid, structured JSON without extra conversational text required extensive prompt refinement and robust backend parsing logic.
Performance Tuning: Rendering large, detailed .glb models in real-time while maintaining a smooth user experience required careful optimization of the React Three Fiber scene.
Dynamic File Handling: Building a system that could not only accept uploads but also dynamically recognize new .glb files and integrate them into the case hierarchy as child evidence was a complex but rewarding challenge.

Accomplishments that we're proud of

We are incredibly proud of creating a full, end-to-end flow: from uploading a 3D model to seeing AI-generated evidence markers appear in the scene moments later.
The reliable screenshot-to-detection pipeline using Gemini is our core technical achievement. It feels like magic to see the AI "look" at the scene and point out what's important.
Achieving smooth 3D interactions, including the evidence markers and labels, within a React-based web application.
Building a clean, professional, and highly functional UI during the time constraints of a hackathon.

What we learned

The Power of Multimodal AI: We learned firsthand how to combine 3D rendering with advanced vision models. The possibilities for spatial analysis are immense.
Prompt Engineering is Key: The quality of our AI's output was directly tied to the clarity and structure of our prompts. We learned to "think like the model" to get the best results.
The Tradeoffs of Analysis: We learned to balance the speed of local metadata parsing with the deep, contextual understanding provided by a cloud-based vision model, offering users multiple ways to analyze a scene.
Designing for a Niche Workflow: Building a tool for a specific, professional workflow like forensic investigation taught us a lot about prioritizing clarity, precision, and utility in UI/UX design.

What's next for SceneSplat

Fine-Tuned Models: Expand the object taxonomy and improve detection precision and recall by training or fine-tuning specialized models for forensic analysis.
Collaboration Features: Introduce multi-user support, allowing teams to collaborate on a case in real-time, share notes, and manage access.
Reporting and Chain of Custody: Implement features to export formal evidence reports and integrate with chain-of-custody tracking systems.
Expanded Format Support: Add support for more 3D formats (like .ply for point clouds) and implement streaming for exceptionally large scene files.
Offline and Edge Inference: Explore options for on-device or edge-based models to enable use in remote locations with limited internet connectivity.