-
-
Our website, which shows a OpenGL rendering of our simulated oil platform that we imaged using drones
-
Our simulated drone trajectories spiraling around capture images that yield high-quality camera pose reconstructions through COLMAP!
-
We're able to utilize code to generate the hundreds of images necessary for training a Gaussian splat
-
Us in action, testing out the flight of a simulated drone, with the aim of imaging a power line.
Inspiration
Human infrastructure like power lines or oil rigs are immensely important for all of our daily lives, and if anything fails the consequences are catastrophic.
For example, in 2010, the infamous Deepwater Horizon oil spill (caused by a poorly maintained oil rig) sent 210 million gallons of oil into the Gulf of Mexico, causing staggering images of oil-stained beaches in several States and devastating local wildlife even to this day.
However, these most critical and high-risk pieces of infrastructure are often also most difficult to reach. Located in remote places like submersible oil platforms or vast in scope like power grids, these challenges make automating the inspection process even more useful. Given the growth in drone and AI technology, we saw an opportunity to innovate and solve this problem.
What it does
First, we fly the drones (thanks Parrot!) near the target location and capture footage. By applying the COLMAP algorithm to consecutive images to derive their camera poses, and then training a Gaussian splat representation of the to-be-inspected object, we are able to render novel views from every direction. This saves valuable air time in that drones no longer have to spend as much time inspecting each building as our reconstructions capture the outward appearance by extrapolating information. It also allows human reviewers to quickly inspect potentially faulty areas without having to review the entire drone footage.
Next, we apply multi-modal Large Language models to analyze shots sampled from all directions. To our knowledge, we are the first to leverage Intel's XPU technology to accelerate multi-modal LLM inference. These multi-modal LLMs can quickly sift through large amounts of data to identify the problematic regions, which we are able to summarize into a condensed report. Finally, to enhance the quality of inspection, we experiment with agentic multi-modal LLMs that explore our Gaussian splat simulations to search for vulnerabilities.
How we built it
We simulated the real environment through model creation in Blender. We hacked together cutting-edge research repositories for our project. The Intel developer cloud was useful in developing our methods.
Challenges we ran into
Since the Intel platform for AI is not as established, we learned a lot by trying to port existing research repositories with CUDA kernels over. We got nerd-sniped by "Data Parallel C++: the oneAPI Implementation of SYCL" documentation.
Accomplishments that we're proud of
We are proud of building such high-quality Gaussian splats based upon real data from our simulations! Make sure to come to our booth to check out our fully interactive splats!
What we learned
We learned how to create Gaussian splats, use Blender, do web development, and utilize Intel's AI GPU offerings.
What's next for EcoDrone
We're optimistic that introducing more specialized CV algorithms will yield enormous alpha for this important industry of ensuring our infrastructure remains working and eco-friendly.
Our Demo Video
(should be linked)
A quick demo for our website (which shows the gaussian-splatted rendering of an oil rig, trained from our simulation generation!)
We can talk to a Together.ai-powered Mistral-based fast inference chatbot about various key frames extracted from the drone footage to search for potential hazards!
The Vision LLM used for frame captioning (LLaVA) is powered by Intel!
Log in or sign up for Devpost to join the conversation.