Inspiration

Inspired by Sherlock Holmes' incredible ability to deduce entire narratives from the smallest details, we wanted to empower real-world investigators with similar superpowers. Currently, forensic teams spend countless hours manually combing through 3D scene reconstructions and photos which is a process that is slow, painstaking, and prone to human error. We built SceneSplat to bridge that gap, turning complex 3D scenes into actionable insights in minutes by combining AI-powered detection with an immersive 3D viewer.

What it does

SceneSplat is an interactive web application that transforms static 3D crime scene models into dynamic, intelligent environments for forensic analysis.

  • Explore Immersive 3D Scenes: Investigators can load .glb models and navigate them with intuitive orbit, zoom, and pan controls, examining every angle of a reconstructed scene.
  • Automate Evidence Detection with AI: With a single click, the app analyzes the scene to find objects, anomalies, and potential evidence, automatically placing labeled 3D markers with descriptions and confidence scores.
  • Leverage Dual Analysis Modes:
    • Quick Analysis: Instantly extracts evidence from the 3D model's geometry and metadata for rapid insights.
    • Deep Vision Analysis: Captures a screenshot of the current view and sends it to Google Gemini Flash, using its powerful multimodal capabilities to visually identify points of interest just as a human would.
  • Manage Cases and Evidence: The system organizes scenes into parent cases and child evidence items. Uploading a new .glb model to a case automatically adds it as a piece of evidence for individual inspection.
  • Consolidate Notes and Files: Users can attach case notes and upload supporting files (images, documents) directly to a scene, keeping all investigative materials in one place.

How we built it

We built SceneSplat using a modern, full-stack TypeScript architecture designed for performance and rapid development.

  • Frontend Framework: Next.js with the App Router provided a robust foundation, allowing us to build a fast, server-aware single-page application and handle backend logic through API routes.
  • 3D Rendering: We used React Three Fiber and drei to declaratively manage a three.js canvas, enabling us to render complex .glb models and overlay interactive UI elements smoothly in the browser.
  • AI & Vision: The core intelligence is powered by the Google Gemini Flash model via the @google/generative-ai SDK. We engineered a sophisticated pipeline that sends visual data (screenshots) to Gemini and parses its structured JSON response to place 3D markers in the scene.
  • UI/UX: The user interface was built with TailwindCSS for styling and Radix UI for accessible, unstyled component primitives, allowing us to create a clean and intuitive experience.

The data flow is seamless: a user selects a scene, which is rendered in the 3D viewer. They can trigger an AI analysis, which sends a request to our Next.js backend. The backend processes the request (either by parsing the model file or querying Gemini), and returns a list of evidence points that are then visualized as interactive markers in the 3D space.

Challenges we ran into

  • 3D Coordinate Mapping: One of our biggest hurdles was mapping AI detections from a 2D screenshot back to accurate 3D positions within the scene's world space. This required careful prompt engineering and coordinate system translation.
  • Structured AI Output: Getting Gemini to consistently return valid, structured JSON without extra conversational text required extensive prompt refinement and robust backend parsing logic.
  • Performance Tuning: Rendering large, detailed .glb models in real-time while maintaining a smooth user experience required careful optimization of the React Three Fiber scene.
  • Dynamic File Handling: Building a system that could not only accept uploads but also dynamically recognize new .glb files and integrate them into the case hierarchy as child evidence was a complex but rewarding challenge.

Accomplishments that we're proud of

  • We are incredibly proud of creating a full, end-to-end flow: from uploading a 3D model to seeing AI-generated evidence markers appear in the scene moments later.
  • The reliable screenshot-to-detection pipeline using Gemini is our core technical achievement. It feels like magic to see the AI "look" at the scene and point out what's important.
  • Achieving smooth 3D interactions, including the evidence markers and labels, within a React-based web application.
  • Building a clean, professional, and highly functional UI during the time constraints of a hackathon.

What we learned

  • The Power of Multimodal AI: We learned firsthand how to combine 3D rendering with advanced vision models. The possibilities for spatial analysis are immense.
  • Prompt Engineering is Key: The quality of our AI's output was directly tied to the clarity and structure of our prompts. We learned to "think like the model" to get the best results.
  • The Tradeoffs of Analysis: We learned to balance the speed of local metadata parsing with the deep, contextual understanding provided by a cloud-based vision model, offering users multiple ways to analyze a scene.
  • Designing for a Niche Workflow: Building a tool for a specific, professional workflow like forensic investigation taught us a lot about prioritizing clarity, precision, and utility in UI/UX design.

What's next for SceneSplat

  • Fine-Tuned Models: Expand the object taxonomy and improve detection precision and recall by training or fine-tuning specialized models for forensic analysis.
  • Collaboration Features: Introduce multi-user support, allowing teams to collaborate on a case in real-time, share notes, and manage access.
  • Reporting and Chain of Custody: Implement features to export formal evidence reports and integrate with chain-of-custody tracking systems.
  • Expanded Format Support: Add support for more 3D formats (like .ply for point clouds) and implement streaming for exceptionally large scene files.
  • Offline and Edge Inference: Explore options for on-device or edge-based models to enable use in remote locations with limited internet connectivity.

Built With

Share this project:

Updates