Flo Mobility

Mandatory Precast Technology: A Double-Edged Sword for India’s Construction Industry

Mukul Dharpure — Wed, 19 Mar 2025 10:13:19 +0000

An analysis of the potential impacts of making precast construction mandatory in India

India’s construction industry stands at a crossroads. While traditional construction methods have built our cities for generations, there’s growing pressure to modernize through technologies like precast construction. Some policymakers have even suggested making precast technology mandatory across the construction sector. But is this the right approach for India’s unique context?

The Promise of Precast

Precast construction offers compelling advantages. Components manufactured in controlled factory environments lead to consistent quality, reduced construction time, and potentially lower costs at scale. With India’s massive infrastructure needs and housing shortages, these benefits are certainly attractive. The environmental case is also strong. Precast typically generates less waste than traditional construction, uses materials more efficiently, and can incorporate sustainable innovations more easily. In a country facing growing environmental challenges, this is no small consideration.

The Indian Reality

However, India’s construction landscape is uniquely complex. Our industry employs millions of workers across formal and informal sectors, with widely varying levels of technical training. A mandatory shift to precast would fundamentally disrupt this employment ecosystem.

The financial barriers are equally significant. Precast requires substantial upfront investment in manufacturing facilities, specialized transportation, and heavy lifting equipment. While large developers might absorb these costs, smaller contractors who form the backbone of local construction would struggle to adapt.

Then there’s the question of geographic appropriateness. India’s diverse climate zones, from Himalayan regions to coastal areas, often require locality-specific construction approaches. A one-size-fits-all mandatory policy would fail to account for these regional variations.

A Balanced Path Forward

Rather than a blanket mandate, India needs a nuanced approach that:

Incentivizes rather than mandates: Offer tax benefits, streamlined approvals, and other incentives for precast adoption where appropriate.
Invests in workforce development: Create comprehensive training programs to help workers transition to new construction methods.
Builds capacity gradually: Start with government projects and specific building types before wider implementation.
Supports technological adaptation: Fund research into precast techniques specifically suited to Indian conditions and requirements.
Establishes quality standards: Develop robust quality control systems appropriate for the Indian context.

Conclusion

Precast technology undoubtedly has a major role to play in modernizing India’s construction industry. However, mandatory implementation risks creating more problems than it solves. A thoughtful, phased approach that respects India’s economic realities while gradually building capacity would better serve both the industry and the millions who depend on it.

The future of construction in India should embrace innovation – but innovation that works with our unique circumstances rather than against them.

The post Mandatory Precast Technology: A Double-Edged Sword for India’s Construction Industry appeared first on Flo Mobility.

Project progress using camera based detection

Shashank Sharma — Mon, 25 Nov 2024 04:57:34 +0000

In the world of construction, resource management and operational efficiency are paramount. With construction sites bustling with activities and materials moving constantly, keeping track of these resources manually can be a daunting task. To address this challenge, advanced AI-powered solutions are revolutionizing construction operations, offering unparalleled accuracy and efficiency.

This blog dives deep into a cutting-edge AI model designed to monitor and track materials handled by robot dumpers on construction sites. Utilizing YOLOv8 for object detection and tracking, the model automates the process of resource tracking, ensuring precise and real-time insights for site managers and stakeholders.

Introduction

Construction sites are dynamic environments with numerous moving parts. From materials being delivered to tools being transported, keeping track of every item is crucial to ensure smooth operations and prevent resource wastage. Traditionally, manual methods of tracking were prone to errors and inefficiencies.

This innovative AI model provides a robust solution, leveraging state-of-the-art object detection and tracking algorithms to automate the process. By focusing on robot dumpers, a key element in material transport, the model ensures that every item is accounted for with unparalleled precision.

Objectives and Capabilities

1. Material Flow Automation

The model automates the tracking of materials moving into and out of robot dumpers, reducing human intervention and errors.

2. Advanced Object Detection

Powered by YOLOv8, a highly accurate and efficient object detection model, it identifies and tracks materials such as cement blocks, bricks, and cement bags.

3. ROI-Based Monitoring

By defining a Region of Interest (ROI) around the dumper, the model ensures focused detection, ignoring irrelevant background activities on the construction site.

4. Bidirectional Tracking

The model tracks materials moving in both directions:

Into the dumper: Logs items being loaded.

Out of the dumper: Tracks items being unloaded.

5. Daily Summarized Reports

At the end of each day, the system generates detailed reports, including:

The types of materials handled.

A count of each material moved.

Timestamped records of all detections.

6. Real-Time Alerts

The system can notify managers in real time if any discrepancies or unusual activities are detected, ensuring immediate intervention.

7. Scalability and Integration

Designed for flexibility, the model can be scaled to monitor multiple dumpers across large construction sites and integrate seamlessly with existing site management systems.

How It Works

 1.  Setting Up the ROI

Before the system begins operation, users define an ROI around the dumper. This area is where all detections and tracking occur, ensuring focused and efficient monitoring.

 2. Real-Time Object Detection and Tracking

Using YOLOv8, the system processes live video feeds from high-resolution cameras mounted around the dumper. The model detects objects within the ROI, classifying them into predefined categories such as bricks, cement blocks, and bags.

 3. Bidirectional Material Flow Tracking

The model tracks the movement of objects in and out of the ROI, distinguishing between items being loaded onto the dumper and those being offloaded. This ensures a comprehensive record of all material flows.

 4. Data Logging and Aggregation

Every detection and movement is logged with detailed information:

Type of object

Direction of movement (into or out of the dumper)

Timestamps

Unique object IDs for tracking individual items

 5. Report Generation

At the end of each day, the system compiles all logged data into a detailed report. This report includes:

Total count of each material handled

Trends in material movement (e.g., peak times of activity)

Anomalies or discrepancies observed

Model Training and Development

 1. Training Process

The model was trained on Google Colab, utilizing its A100 GPU for efficient computation. A pre-trained YOLOv8 model served as the foundation, fine-tuned with construction-specific data.

 2. Dataset Details

A robust dataset was curated from various construction scenarios, featuring images and videos of materials like bricks, cement blocks, and bags.

Categories: Cement blocks, Bricks, Cement Bags, etc.

Data Source: Custom dataset annotated with tools like Roboflow for high accuracy.

Distribution: Split into training (70%), validation (20%), and testing (10%) sets for optimal performance.

 3. Fine-Tuning for Specific Use Cases

The model was fine-tuned to detect and track objects within the unique conditions of construction sites, such as poor lighting, cluttered backgrounds, and varying object orientations.

Hardware and Software Features

1. High-Resolution Cameras

The system uses 1080p cameras capable of capturing 23 frames per second, ensuring clarity and accuracy in object detection and tracking.

2. GPU Acceleration

Equipped with NVIDIA GPUs, the system processes high-resolution video feeds in real time, maintaining low latency and high performance.

3. Software Stack

Ubuntu 22.04: For robust and reliable system operations.

YOLOv8: For state-of-the-art object detection.

OpenCV: For efficient video processing.

CUDA: To leverage GPU acceleration.

Use Cases and Benefits

Enhanced Resource Management :- Automating the tracking of materials minimizes waste and ensures efficient utilization of resources.
Improved Transparency :- Detailed logs and reports provide a clear record of material movement, enabling accountability and transparency.
Time and Cost Savings :- By automating a time-consuming manual process, the system saves valuable time and reduces operational costs.
Scalability :- The model can easily be scaled to monitor multiple dumpers or other material handling systems, making it suitable for large-scale construction sites.
Real-Time Oversight :- Managers can access real-time data and reports, allowing them to make informed decisions promptly.

Future Enhancements

Integration with IoT Sensors : – Combining the model with IoT sensors could provide additional data points, such as weight and volume of materials handled.

Predictive Analytics :- Analyzing historical data to predict material needs, optimizing inventory and transport schedules.

Enhanced Detection in Harsh Conditions :- Adapting the model for environments with poor visibility, such as dust or rain.
Multi-Site Integration :- Expanding the system to manage and report on multiple sites from a centralized dashboard.

Conclusion

This YOLOv8-based object detection and tracking model is a game-changer for construction sites. By automating material tracking, generating detailed reports, and providing real-time insights, it ensures higher efficiency, transparency, and resource optimization.

As AI and robotics continue to advance, solutions like this are setting new standards in the construction industry, paving the way for smarter, safer, and more productive operations. With features like ROI-based monitoring, daily reports, and bidirectional tracking, this model is a vital tool for modern construction site management.

Embracing such technologies will not only enhance operational efficiency but also empower site managers and stakeholders to make informed decisions, ensuring that every resource is used effectively and every material is accounted for. The future of construction is here—and it’s powered by AI.

The post Project progress using camera based detection appeared first on Flo Mobility.

Ensuring Safety and Efficiency on Construction Sites with AI

Shashank Sharma — Wed, 24 Jul 2024 06:50:13 +0000

In the rapidly evolving landscape of construction, ensuring the safety of workers while maintaining efficient operations is paramount. Leveraging AI and robotics, innovative solutions are now emerging to address these critical aspects. This blog explores how advanced AI systems are revolutionizing construction site safety.

Introduction

Construction sites are inherently hazardous environments, requiring numerous safety protocols to protect workers. However, monitoring and enforcing these protocols can be challenging due to the dynamic and often chaotic nature of construction activities. AI-powered solutions offer a new way to enhance safety by automatically detecting safety compliance. This blog delves into an AI-based system designed to ensure safety on construction sites, highlighting its features, usage, and performance.

Usage

The AI system is designed to enhance construction site safety through several key functionalities:

Safety Monitoring: The system continuously monitors workers to ensure they are wearing safety hats, vests, and masks. Cameras placed strategically around the construction site capture live video feeds, which are then processed in real-time by the AI system. If a worker is found not complying with these safety protocols, the system immediately records a video of the breach. This video continues to be recorded until the individual leaves the camera frame, ensuring a comprehensive record of the non-compliance incident. Each recorded video is timestamped, providing precise information about when and where the safety breach occurred. This feature ensures that every instance of non-compliance is documented, allowing for thorough review and analysis.

Incident Reporting: Once a safety breach is detected, the system promptly informs the site manager. The alert includes the exact time and location of the incident, along with the recorded video footage. This allows the site manager to take quick and appropriate action, such as addressing the non-compliant worker or reviewing the incident with the team to prevent future occurrences. The ability to receive real-time notifications ensures that safety issues are addressed promptly, minimizing the risk of accidents and injuries.
Proactive Safety Management: Beyond monitoring and incident reporting, the system also contributes to proactive safety management. By continuously analyzing video feeds, the AI can identify patterns and trends in safety compliance. For example, if non-compliance incidents are more frequent at certain times of the day or in specific areas of the site, the system can highlight these trends. This information allows site managers to take preventive measures, such as increasing supervision during high-risk periods or conducting targeted safety training for workers.

Dataset and Model

To develop and train this AI system, a comprehensive dataset is utilized, focusing on construction site safety. The dataset includes images and videos of workers with and without safety gear. The key components of the dataset and model include:

Pretrained Model: The AI model is built upon the YOLOv8n pretrained model, known for its high accuracy and speed in object detection tasks.
Dataset Source: The dataset used for training is sourced from Roboflow, a platform that provides tools for managing and annotating datasets.
Categories: The dataset is categorized into different safety gear types, such as Hardhat, Safety Vest, and Mask. It also includes categories for non-compliance, like No-Safety Vest, No-Mask, and No-Helmet.
Project Details: The dataset is hosted on Roboflow under the project “construction-site-safety”.
Data Distribution: The dataset is divided into training, validation, and test sets. The training set is used to train the AI model, the validation set is used to fine-tune the model’s parameters, and the test set is used to evaluate the model’s performance. This distribution ensures that the model is robust and can generalize well to new data.
Custom Training: The model is trained on Google Colab, utilizing a powerful A100 GPU to handle the computational demands. This setup allows for efficient training and fine-tuning of the model to achieve high accuracy in detecting safety gear and identifying non-compliance.

Performance Analysis

The AI system’s performance is enhanced by several key features:

Pre-installed Software: The system runs on a platform equipped with Ubuntu 22.04, ROS2, OpenCV, and CUDA. These software tools provide a robust foundation for developing and deploying AI and robotics applications. ROS2 (Robot Operating System) facilitates communication between different components of the system, while OpenCV and CUDA enable efficient image processing and deep learning operations.
Powerful GPU: The system is equipped with a powerful GPU, which is essential for handling the computational demands of real-time video processing. The GPU accelerates the training and inference of deep learning models, ensuring that the system can process high-resolution video feeds at high frame rates.
High-Resolution Camera: The system uses a 1920×1080 camera capable of capturing up to 23 frames per second. This high-resolution camera provides clear and detailed images, which are crucial for accurately detecting safety gear. The high frame rate ensures that the system can monitor the site continuously without missing any important events.

These features enable the system to achieve remarkable benchmarks, including high accuracy in detecting safety gear, as well as efficient real-time processing. The combination of advanced hardware and software ensures that the system can operate reliably in the demanding environment of a construction site.

Graphical Analysis

A crucial aspect of the system is its ability to analyze and report trends in safety compliance over time. For instance, a graph can be shown where it compares the number of violations versus the time of day. This helps in identifying peak times of non-compliance, allowing for targeted interventions. The graph is generated by aggregating data from the recorded incidents, providing a visual representation of safety trends.

During peak times of non-compliance, the system can send higher numbers of immediate alerts to the manager or the concerned person. This ensures that safety breaches are addressed promptly, even during busy periods. By providing real-time insights and notifications, the system helps site managers maintain a high level of safety at all times.

Conclusion

Incorporating AI into construction site operations significantly enhances safety. The system described here not only ensures compliance with safety protocols by recording and timestamping breaches but also provides a comprehensive solution for modern construction sites. The ability to monitor safety in real-time, report incidents promptly, and analyze trends over time makes this AI system a powerful tool for improving safety on construction sites.

By leveraging AI and robotics, we can create a safer work environment for construction workers, ultimately leading to more productive and secure construction sites. Additionally, the system runs on a server, ensuring continuous and reliable monitoring without interrupting site operations. The server-based architecture allows the system to scale easily, supporting multiple cameras and large construction sites.

The system also integrates GPS data to pinpoint the exact location of each safety violation, providing even more detailed information for site managers. This level of detail ensures that every incident is documented accurately, enabling better oversight and management of safety protocols.

The future of construction looks safer and more efficient with AI-powered safety systems. As technology continues to advance, we can expect even more innovative solutions to emerge, further enhancing the safety and efficiency of construction operations. By embracing AI and robotics, the construction industry can achieve new levels of safety, productivity, and overall performance, paving the way for a brighter and safer future.

The post Ensuring Safety and Efficiency on Construction Sites with AI appeared first on Flo Mobility.

PoseNet Pose Estimation model on Flo Edge One

Shreya Ragi — Thu, 27 Jul 2023 07:52:00 +0000

Pose estimation is a computer vision and deep learning method where the goal is to detect a person and their pose in a given image. This is done by locating specific landmarks like the head, shoulders, elbows, hands, hip, knees, feet, etc. By tracking the position and orientation of human body parts, a rough estimate of the person and all their movements is obtained. It’s basically like if AI did the Glowstick man lockdown challenge!

source: https://www.youtube.com/watch?v=6Te_X8DLKlA

Before we get into that, let’s take a quick look at the Flo Edge One, a must-have in every AI and robotics engineer’s toolbox. Here are some remarkable benchmarks that make it a top competitor in edge devices!

Pre-installed with Ubuntu 22.04 and tools like ROS2, OpenCV, TFlite, etc.
Qualcomm Adreno 630 GPU.
12 MP 4k camera at up to 30fps.
Inferencing PoseNet with a mobileNet base at 17 milliseconds.

Introduction:

Human pose estimation is used to track the keypoints of human bodies. Some examples of such keypoints are ‘left knee”, “right hip”, and so on. The performance of keypoint tracking on a live video requires high computational resources and lacks accuracy. But with new advancements in hardware and model architectures, this task has become far more feasible. Today, the base of all image processing techniques is a very powerful tool called a convolutional neural network (CNN). Hence, a CNN has been tailored particularly for pose estimation as well.

Typically, human pose estimation is preceded by identifying a person in the image. This classifies as object detection, where a person is detected and bounded by a box and then landmarks/keypoints are detected within that box for live pose tracking. Modern deep learning methods have achieved several breakthroughs by inferencing both 2D and 3D pose estimation as well as multi-person pose estimation. In this blog, we will be looking at PoseNet – a very commonly used 2D single pose detection architecture that is both fast and lightweight.

Dataset and Model:

As mentioned earlier, most pose estimation models are two step architectures that first detect human bounding boxes and then estimate keypoints within the boxes. The model has been trained on the COCO benchmark dataset with 17 identifiable keypoints – “nose”, “left_eye”, “right_eye”, “left_ear”, “right_ear”, “left_shoulder”, “right_shoulder”, “left_elbow”, “right_elbow”, “left_wrist”, “right_wrist”, “left_hip”, “right_hip”, “left_knee”, “right_knee”, “left_ankle”, “right_ankle”. Each keypoint is annotated with (x,y,v) where x and y mark the coordinates of the keypoint and v indicates if it is visible.

PoseNet is supported by a MobileNet backbone which is a lightweight architecture perfect for web operations and Edge devices like the Flo Edge One. It takes a 257 x 257 RGB image (video stream/camera stream/image) as an input and produces four 9 x 9 tensors with channel sizes 17, 34, 32, and 32. Of these four tensors, the first two – heat maps and offsets – are used to calculate the position and confidence scores for each of the 17 keypoints.

Heatmaps Vector:

This vector contains 17 channels, each for each identifiable keypoint. A single channel contains the heatmap for its corresponding keypoint which indicates all the estimated locations of the point and the confidence score. The most likely location is marked down based on these scores.

Offsets Vector:

This vector contains 34 channels which is twice the number of identifiable keypoints in a human body. This vector marks the position of each of the 17 keypoints. The first 17 channels gives the x coordinates while the last 17 give the y coordinates.

Usage:

Now let’s take a look at how this model can be run efficiently on the Flo Edge One. What is Flo Edge One you ask? This impressive device boasts a light GPU that can deliver smooth results, all while maintaining a high level of accuracy. The Flo Edge One is truly a remarkable device, providing a plethora of impressive features that make it a must-have for tech enthusiasts. With its onboard camera, inbuilt IMU, and GNSS capabilities, this device truly has it all. It comes pre-installed with Ubuntu 22.04, ROS2, OpenCV, and various other tools, making it the perfect choice for your robotics ventures. The best part? Amidst the semiconductor crisis, the Flo Edge One is affordable and ready to ship! So you can get started ASAP, without breaking the bank.
Want to build a hobby project version of the AMP robot from the Avatar movies? (Of course, not to destroy a whole species but just for fun.) Equip your bot with a human pose estimation model to control it just by gestures! With the Flo Edge One 12 MP camera, run the pose estimation model and track your actions. Based on a set of predetermined relative gestures, match your gesture and make the robot perform any action in response.

source: www.scifimoviezone.com

Performance Analysis:

The inference time of this model was around 17 milliseconds and 6-7 FPS. The PoseNet architecture was light and easily to load onto the Flo Edge One GPU. The confidence scores for each keypoint were in the range of 0.6-1, while the average score was 0.88 – 0.92. The accuracy of the model was quite decent but it has trouble detecting keypoints if the face was not visible.

Since the development of PoseNet, many other highly accurate and fast models have been designed and implemented such as OpenPose, MoveNet lightning, MoveNet Thunder, etc. Some of these models are computationally heavy, making them a better choice for hardware that can bear the load rather than Edge devices.

Conclusion:

The pose estimation model is lightweight and fast when run on the Flo Edge GPU and gives good tracking results. Coupled with the 12 MP onboard camera a wide range of systems can be developed for use cases like 3D modelling, robot training, console ontrol, surveillance, and many more.

The post PoseNet Pose Estimation model on Flo Edge One appeared first on Flo Mobility.

Mediapipe Gender detection model on Flo Edge One

Shreya Ragi — Thu, 27 Jul 2023 07:51:51 +0000

In this blog, we are going to discuss yet another mini-project for novice AI explorers. A gender detection model, as basic as it might appear, might just be exactly what we need in the 21st century where everybody seems confu- nope. Not getting cancelled already. Anyway.

A gender detection model is a simple classifier model built over any existing face detection module. It has a wide variety of uses like identity conformation, and forensics. Like most models, the mediapipe gender detection model is trained to identify the two biological genders – male and female – from the output of an independently operating face detection model . Let’s see how this classification is done by understanding the model architecture, as well as its deployment on Flo Edge One and its inferences.

Before we get into that, let’s take a quick look at the Flo Edge One,
a must-have in every AI and robotics engineer’s toolbox. Here are some
remarkable benchmarks that make it a top competitor in edge devices!

Pre-installed with Ubuntu 22.04 and tools like ROS2, OpenCV, TFlite, etc.
Qualcomm Adreno 630 GPU.
12 MP 4k camera at up to 30fps.
Inferencing both models simultaneously at 43 milliseconds producing a smooth output of around 30 fps.

Introduction:

The model stack used here contains 2 models – a face detection model and a gender classification model. The face detection model is a very simple OpenCV Haar cascades model while the gender classification model is a readily available mediapipe model.

Like we saw in the age detection blog, the face detection model in this stack is replaceable as well. Haar cascades model is preferred for this use case because it is light and simple and this model does not require high accuracy in face detection.

Dataset and Model:

The use of Haar cascades for face detection has been described elaborately here. In breif, Haar cascade uses several different moving windows all across the image to capture features. The regions that contain majority of the features that make up a face are bounded by a box. This region is the face of a person, a.k.a., our region of interest.

Once we obtain our region of interest in the image, i.e., the face, A
basic classification model with 2 class labels is used for gender detection – Male and Female.

The classification model was trained on over 30,000 images of people labelled man or woman. This mediapipe classification model has an accuracy of around 97% but it is susceptible to scenarios like cross dressing, men with longer hair, women with sort hair, etc.

Usage:

Another alternative to running this model efficiently is using the Flo Edge One GPU! What is Flo Edge One you ask? This impressive device boasts a light GPU that can deliver smooth results, all while maintaining a high level of accuracy. The Flo Edge One is truly a remarkable device, providing a plethora of impressive features that make it a must-have for tech enthusiasts. With its onboard camera, inbuilt IMU, and GNSS capabilities, this device truly has it all. It comes pre-installed with Ubuntu 22.04, ROS2, OpenCV, and various other tools, making it the perfect choice for your robotics ventures. The best part? Amidst the semiconductor crisis, the Flo Edge One is affordable and ready to ship! So you can get started ASAP, without breaking the bank.

Starting a new venture and wanna figure out what your target audience might be? Using the 12MP camera on your Flo Edge One, run the gender detection model real-time to and understand what gender groups your products appeal to.

Performance Analysis:

The invoking time of this model was a mere 13 + 30 milliseconds on the Flo Edge One GPU. Here, invoking time can be interpreted as 13 milliseconds for the model to detect faces from the input image and 30 milliseconds for the model to classify all the faces into the two label classes. Overall, around 43 milliseconds is the inferencing time of this model stack along with a smooth output of 30 FPS.

This goes to show without a doubt, that haar cascades is a very light weight model. The only downside to this model as discussed in earlier blogs, is that it doesn’t yield accurate results since it relies on features like lines, edges, and orientation in input images to detect a face and similar features can be found in the background as well.

Conclusion:

The Gender detection model yields incredibly accurate predictions when run on the Flo Edge GPU even after being compressed as a .tflite model architecture. Coupled with the 12 MP onboard camera a wide range of systems can be developed for use cases like surveillance, security, marketing and sales, and forensics.

The post Mediapipe Gender detection model on Flo Edge One appeared first on Flo Mobility.

Age Detection using Haar Cascades on Flo Edge One

Shreya Ragi — Thu, 27 Jul 2023 07:51:13 +0000

In this blog, we are going to discuss a fun project that’s built on a face detection model. Age detection can be used in the industry to customize ads in stores to see what groups of people are most likely to buy the product. It can also be used in forensic sciences, where often only the description of a suspect is available, usually the data that is always available are the gender and an age range, which would be very useful in reducing the list of candidates to be compared with the sketch to get the answers to be more precise. We will be looking at the architecture of the model briefly, along with its deployment on the Flo Edge One and its inferences.

Pre-installed with Ubuntu 22.04 and tools like ROS2, OpenCV, TFlite, etc.
Qualcomm Adreno 630 GPU.
12 MP 4k camera at up to 30fps
Inferencing the face detection model as well as the age detection model at 57 milliseconds producing a smooth output of around 12 fps.

Introduction:

The age detection model basically consists of an OpenCV face detection model and a classifier model for the detected faces.

The base model used, aka, the face detection model is an OpenCV model but you can always use mediapipe’s face detection model as well. The difference comes through in performance factors like FPS and inference time. These differences will be discussed later under performance analysis.

Dataset and Model:

For the purpose of Face detection, OpenCV is used to implement the Haar Cascade classifier. Haar cascade classifier is a widely used face detection model as the base for models like gender detection, age detection, mask detection, facial recognition, etc. Haar cascade is a feature based binary classifier which essentially uses a function represented by a window that runs of the entire image. It tries to classify every region in an image as positive or negative which means the region is either part of our object or it is not. In this case our object is a face. By doing this, we obtain the group of pixels that contain all the features that represent a face. Note that this can only be used for face detection and not recognition since it is trained on multiple different faces and the general features that are common to most faces. Given below are some examples of cascading function windows.

source: https://www.hindawi.com/journals/tswj/2014/753860/

Once we obtain our region of interest in the image, i.e., the face, A basic classification model with 9 class labels is used for age detection. These 9 classes are – 4 to 6, 7 to 8, 9 to 11, 12 to 19, 20 to 27, 28 to 35, 36 to 45, 46 to 60, and, 61 to 75.

The labels must’ve been obtained based on inter class distinction and intra class distinction. This means that the range in the first class is much smaller than the eighth class but that must be because the facial feature difference between ages 6 and 7 must be far more distinct than the difference between ages 46 and 47. These 9 groups could’ve been obtained using some clustering method on images of all age groups.

Usage:

Performance Analysis:

The inference time of the model was around 58 milliseconds on the Flo Edge GPU. This is because the face detection model is quite simple and light and easy on computation. The output of this model is fed as the input to the age classifier which classifies most people pretty accurately based on their facial features. The model gives an FPS of about 11 which is a little slow but that is because there are 9 classes and it is capable of simultaneously classifying multiple faces in a single frame.

The only downside to this model is the use of Haar Cascades for face detection. While it is very light weight and straight forward, making it perfect for real-time applications, it detects many false positives as seen in the video above. It is also not as accurate as an object detection model like YOLO. As mentioned before, this can be replaced with any other pre-trained face detection model like YOLO or mediapipe face detection, if more accuracy is desired.

This being said, it is the most efficient model to use as a base for a different model because it is light weight and any GPU/CPU should be able to run both models with ease, even on SBCs like the Flo Edge.

Conclusion:

The Age detection model yields incredibly accurate predictions when run on the Flo Edge GPU even after being compressed as a .tflite model architecture. Coupled with the 12 MP onboard camera a wide range of systems can be developed for use cases like surveillance, security, marketing and sales, and forensics.

The post Age Detection using Haar Cascades on Flo Edge One appeared first on Flo Mobility.

Mediapipe Hand Pose detection on Flo Edge One

Shreya Ragi — Thu, 27 Jul 2023 07:51:03 +0000

Like Tensorflow, Mediapipe is a framework used to build machine learning models and deploy them. The model we will be talking about in this blog is Mediapipe’s Hand landmark model that is trained on over 30K real life images. Mediapipe is the perfect API because it is light weight and it contains everything you need to deploy mobile, web, edge and IoT with ease!

Before we get into that, let’s take a quick look at the Flo Edge One,
a must-have in every AI and robotics engineer’s toolbox. Here are some
remarkable benchmarks that make it a top competitor in edge devices!

Pre-installed with Ubuntu 22.04 and tools like ROS2, OpenCV, TFlite, etc.
Qualcomm Adreno 630 GPU.
12 MP 4k camera at up to 30fps
Inferencing the model at 200 milliseconds (CPU) producing a smooth output of around 10 fps.

Introduction:

The hand landmark model bundle on mediapipe lets you detect landmarks of the hand in an image. Specifically, the outputs would include handedness (left/right hand) and landmarks (fingers, tips, dips, etc.).

This model bundle consists of two models – a palm detection model and a hand landmarking model. Hand pose detection has really cool real life applications including virtual reality. We are no Iron Man and we may not have a suit to control with gestures but we sure can find a lot of incredible applications for this model! And who knows, maybe we’ll get around to making a suit as well Stay with us and find out.

Dataset and Models:

This model come pretrained on over 30, 000 different real life images as well as several rendered hand pose images over various backgrounds. On the detected hand region, it localizes 21 keypoints which include 3 points per finger, and 6 other points in the palm region, making it a total of 21. After the palm detection model finds the region in which the palm is located, the landmarking model detects the keypoints within the cropped region.

source: https://developers.google.com/mediapipe/solutions

Since it is computationally heavy to constantly run the palm detection model on video or live stream, the landmarking model uses the bounding box produced for the current frame in all subsequent frames in order to localize. The bounding box is computed again only if the landmarking model fails to find a hand and all its keypoints in the current bounding box. Machine learning? More like Machine lazing.

Usage:

Another alternative to running this model efficiently is using the Flo Edge One GPU! What is Flo Edge One you ask? This impressive device boasts a light GPU that can deliver smooth results at around 20 FPS, all while maintaining a high level of accuracy. The Flo Edge One is truly a remarkable device, providing a plethora of impressive features that make it a must-have for tech enthusiasts. With its onboard camera, inbuilt IMU, and GNSS capabilities, this device truly has it all. It comes pre-installed with Ubuntu 22.04, ROS2, OpenCV, and various other tools, making it the perfect choice for your robotics ventures. The best part? Amidst the semiconductor crisis, the Flo Edge One is affordable and ready to ship! So you can get started ASAP, without breaking the bank.
Wanna make work interesting? Keep the edge on your tabletop with the camera pointing towards you, run this hand pose detection model alongside a script that translates all your hand gestures to keyboard and mouse movements and voila! You have yourself a virtually operated setup to show off at work.

Performance Analysis:

Being a heavy model, on CPU, the best we could get was 5 – 7 FPS. This performance can be highly improved by running it on the Flo Edge GPU and can give a satisfactory output of at least 10 – 13 FPS and an inference time of around 60 milliseconds.

The hand landmarking model happens to be one of the heaviest models on mediapipe. One of the reasons why the model is pretty heavy is because certain parts of the hands are detected individually and then they are all put together to obtain the output we see here. So for example, each of the fingers are detected and the parts of the palm are detected separately. On these detected paths, keypoints are drawn and then they’re all drawn together to form what we see as a pose model. Once this is done, the output is annotated in such a way that the landmarks are drawn on different parts of the hands and we have a total of 21 different parameters to mention. This is like putting together a jigsaw puzzle every single time the position of the hand changes with respect to the frame!

Conclusion:

The Hand landmarking model on mediapipe yields incredibly accurate pose detection results when run on a CPU or a GPU. But either way, it is a very computationally heavy model and realistically it can not give over 15 FPS while taking an input live stream or video. Regardless, it has really fun applications like VR and while it is still miles away from making you the next Iron Man, it’s like the man himself says “Sometimes you gotta run before you can walk”.

The post Mediapipe Hand Pose detection on Flo Edge One appeared first on Flo Mobility.

Semantic Segmentation using Deeplabv3 on Flo Edge One

Shreya Ragi — Thu, 27 Jul 2023 07:50:48 +0000

Let’s take a look at a deep learning algorithm called semantic segmentation today. It is technically an extension to object detection where instead of bounding the rough region containing the object in a box, it classifies each pixel in that region as part of the object or not part of it. This essentially creates segments of the entire image like, background, person, car, etc. Still a little cloudy? Read ahead!

Pre-installed with Ubuntu 22.04 and tools like ROS2, OpenCV, TFlite, etc.
Qualcomm Adreno 630 GPU.
12 MP 4k camera at up to 30fps
Inferencing DeepLabV3 with a mobileNet base at 50 milliseconds producing a smooth output of around 10 fps.

Introduction:

As discussed earlier, Semantic segmentation is a deep learning model that segments a given image into different regions by classifying each pixel of the image.

It has incredibly versatile use cases like image manipulation, 3D modelling, RoI detection, and many more. It produces a dense pixel-wise segmentation map that captures all the different regions in an image. In this blog we’re looking at a DeepLab V3 model that uses MobileNet as the base architecture. DeepLab v3 is quite popular for segmentation problem statement and the MobileNet architecture makes it perfectly lightweight. Let’s look at this architecture more closely.

Dataset and Model:

DeepLab v3 is a pretrained model made available by TensorFlow. It was trained on the COCO dataset and Pascal VOC dataset which have over 100,000 images and 20 categories combined. The network backbone in this case is MobileNet which makes this model easily implementable on Android, iOS, Edge devices, and web. Even ResNet is commonly used as the network architecture instead of MobileNet but the backbone varies between the two based on the application.

The model takes an input image and gives a tensor output with the same height and width as the input along with 20 different masks. So the output size is [W,H,20]. Each of the 20 masks map pixels containing a region of interest (RoI) to its corresponding class label. All of these masks are then compressed to form a singular mask that can be overlayed on the image to depict each segment separately.

Usage:

Now let’s take a look at how this model can be run efficiently on the Flo Edge One. What is Flo Edge One you ask? This impressive device boasts a light GPU that can deliver smooth results, all while maintaining a high level of accuracy. The Flo Edge One is truly a remarkable device, providing a plethora of impressive features that make it a must-have for tech enthusiasts. With its onboard camera, inbuilt IMU, and GNSS capabilities, this device truly has it all. It comes pre-installed with Ubuntu 22.04, ROS2, OpenCV, and various other tools, making it the perfect choice for your robotics ventures. The best part? Amidst the semiconductor crisis, the Flo Edge One is affordable and ready to ship! So you can get started ASAP, without breaking the bank.
Starting out a new retail venture? Wanna give the e-commerce guys a run for their money and prove that shopping at stores can be just as hassle free? Equip your store with a virtual try on station! Run the semantic segmentation model as a base on Flo Edge One and build a manipulation model that will generate an image of the customer wearing the product in their hand. Use the Flo Edge One 12MP camera to capture the product and the customer at the virtual try on station and simply pass those images as inputs to you model. And there you have it! Efficient and easy shopping, no long trial room lines, and satisfied customers!

Performance Analysis:

The inference time of this model was around 50 milliseconds, which means the model was light enough to load quickly onto the Flo Edge CPU. Overall, the model produces an output of around 8-10 FPS. Regardless of that, the output, as seen in the video above, is quite accurate and the model manages to capture different types of vehicles at different distances from the camera really well.

While the general regions of each object were not completely off, the actual pixel-wise classification that the model aims to do is not very precise and it could be better.

Another popular version of the same model, DeepLab v3+ uses PSPNet, which is a state-of-art segmentation model, as its network backbone. This model is quite heavy to run but produces highly accurate pixel masks for any given image.

That being said, the model architecture used varies from case to case and DeepLab v3 is your best friend if you need to build an easily deployable application around your model.

Conclusion:

The semantic segmentation model yields incredibly accurate predictions when run on the Flo Edge GPU even after being compressed as a .tflite model architecture. Coupled with the 12 MP onboard camera a wide range of systems can be developed for use cases like 3D modelling, Image, manipulation, surveillance, marketing and sales.

The post Semantic Segmentation using Deeplabv3 on Flo Edge One appeared first on Flo Mobility.

MiDaS Depth Estimation Model on Flo Edge One

Shreya Ragi — Wed, 26 Jul 2023 13:04:00 +0000

This is an introduction to a machine learning model called MiDaS that can be used with the Flo Edge One. You can easily use this model to create AI applications using Flo Edge as well as many other ready-to-use Flo Edge GPU models.

Pre-installed with Ubuntu 22.04 and tools like ROS2, OpenCV, TFlite, etc.
Qualcomm Adreno 630 GPU.
12 MP 4k camera at up to 30fps
Inferencing Midas at 57 milliseconds producing a smooth output of around 20 fps.

Introduction:

MiDaS is a machine learning model that estimates depth from an arbitrary input image.

Depth is an incredibly valuable parameter of the physical environment. It allows us to estimate the position of objects in a three-dimensional space, which is crucial for various applications. Monocular depth estimation is undoubtedly a useful technique, but it still poses a significant challenge for researchers and developers alike. In the world of autonomous vehicles, obtaining accurate results is crucial. To tackle this problem, LiDARs or stereo cameras have traditionally been used due to their ability to provide dense ground truth. However, it’s important to note that these sensor options can come with a hefty price tag and may require a complex deployment process.

I’m excited to share with you a cutting-edge machine learning algorithm called MiDaS. Its purpose is to predict the depth value of each pixel in a given RGB image. What’s impressive about MiDaS is that it’s been trained on multiple datasets, allowing it to accurately perform this task.

Dataset and Model:

Imagine a picture of several people standing in a straight line with a light bulb directly over only one person’s head, but this bulb is not included in the picture. How can you tell who’s standing closest to the bulb? That’s right! From their shadows – The one with no shadow is directly under the bulb and as the shadows grow, the distance of the person from the bulb must be increasing. Similar to how our brain picks up visual cues like this from an image, the Midas model is also trained on similar visual cues to estimate which object is closest to the camera. Let’s see how this is done.

Training the Midas model is the most essential part of the process in order to obtain accurate results. The way this model is trained is slightly different when compared to classical machine learning models. It requires two types of datasets – a sparse dataset and a labelled dataset.

Sparse dataset: This consists of straightforward monocular camera feed.

Labelled dataset: As in supervised learning, a dataset contained depth information, obtained from a measuring tool like LiDAR, laser Scanner, or stereo camera.

This is like giving someone a picture of a sphere and the actual sphere and asking them how far the sphere needs to be placed in order to match the scale in the image with respect to its surroundings. These datasets are used to train the model on the input data which is a single RGB image and its corresponding depth information which comes from the labelled dataset.

Loss function in MiDaS:

source: https://arxiv.org/pdf/1907.01341v3.pdf

All existing datasets containing depth information have been accumulated from different measuring tools such as LiDARs, Stereo cameras, Laser scanners, etc. this leads to a variation among data, making it hard to find a large collection of uniform data to train the model on. Midas makes use of a loss function that absorbs these variations, thereby solving all compatibility issues and making it possible to train the model on multiple datasets simultaneously. This function is called the “scale and shift invariant loss function”, and to define a meaningful scale and shift invariant loss, the prediction and ground truth should be aligned with respect to their scale and shift. The function definition above basically represents this, where it finds the difference in the predicted and the ground truth and then optimizes to reduce this loss.

Given below is the function definition introduced by Midas.

All thanks to this function, Midas uses multiple datasets. As seen below, which makes it robust to a variety of environments and conditions.

source: https://arxiv.org/pdf/1907.01341v3.pdf

Usage:

If you’re looking for a powerful and efficient way to run the Midas model, look no further than the Flo Edge One. This impressive device boasts a light GPU that can deliver smooth results at around 20 FPS, all while maintaining a high level of accuracy. The Flo Edge One is truly a remarkable device, providing a plethora of impressive features that make it a must-have for tech enthusiasts. With its onboard camera, inbuilt IMU, and GNSS capabilities, this device truly has it all. It comes pre-installed with Ubuntu 22.04, ROS2, OpenCV, and various other tools, making it the perfect choice for your robotics ventures. Take for example an industrial obstacle avoidance bot, obtaining geometric information of the environment has never been easier. Thanks to the onboard camera and Midas model running on the edge, you can accurately capture all the necessary details. And utilize it for implementing obstacle avoidance techniques while localizing using the other sensors built into the system. The best part? Amidst the semiconductor crisis, the Flo Edge One is affordable and ready to ship! So you can get started ASAP, without breaking the bank.

Performance Analysis:

The model is capable of producing an impressive 57 millisecond inference time while running on the Flo Edge GPU. With an input stream of 30 FPS, we get an output of around 17 FPS and very accurate depth estimates. Here, the color scale used is black and white, where objects in white are closer than those in black. This color scale is just a preference, and can be easily switched as seen in the image below.

source: https://arxiv.org/pdf/1907.01341v3.pdf

Conclusion:

The Midas model yields incredibly accurate depth results when run on the Flo Edge GPU even after being compressed as a .tflite model architecture. Coupled with the 12 MP onboard camera and inbuilt IMU, a wide range of object avoidance systems can be developed for use cases like surveillance, security, material handling, manufacturing, etc., at a greatly reduced cost.

The post MiDaS Depth Estimation Model on Flo Edge One appeared first on Flo Mobility.

People detection and counting using YOLOv5 on Flo Edge One

Shreya Ragi — Wed, 26 Jul 2023 12:53:48 +0000

In this blog we will be seeing how YOLOv5 can be used for people detection and counting. Our dataset is a variety of clips from an India vs. Pakistan cricket match, covering all your favourite shots and moments! Yes, including all the random clips of camera wala dada focusing on pretty women in the crowd (No YOLO model can beat their detection skills in this matter for sure).

Pre-installed with Ubuntu 22.04 and tools like ROS2, OpenCV, TFlite, etc.
Qualcomm Adreno 630 GPU.
12 MP 4k camera at up to 30fps
Inferencing yolov5 at 47 milliseconds producing a smooth output of around 20 fps.

And many such awesome features.

Introduction:

YOLO, or “You Only Look Once,” is an amazing algorithm loved by AI engineers because it’s all about detecting things in real-time. The latest version, YOLOv5, is even better because it’s the first of its kind built on PyTorch. That means it is part of PyTorch’s large ecosystem, making it accessible to a vast research community. It is super fast and accurate, plus, its weight files are almost 90 percent smaller than those of its predecessors which means it can run on embedded with ease!

Now are we running this model? This question brings us to something exciting for AI enthusiasts like yourself! SBCs are a must in your toolbox I’m sure. You can build so many different applications that can run independently on that tiny device. But shortage and supply issues these days make them so scarce and expensive. But guess what? We’ve got something even cooler called Flo Edge One! It’s an SBC with a built-in IMU and a 12MP camera. How awesome is that? It comes pre-installed with Ubuntu 22.04, ROS2, OpenCV, and various other tools and packages, making it the perfect choice for your robotics products. It’s powered by a Snapdragon 845 chip and has tons of cutting-edge features. And the best part? It’s affordable too! So you can explore and create without breaking the bank.

Dataset and Model:

The YOLOv5 model is training on the COCO database which contains over 330,000 images and 90 different labels like people, cars, trucks, fruits, animals, and other commonly seen objects. This trained model is used to detect people real time in the match footage. The model has an inference time of 40 to 50 milliseconds per frame, which means it applied the model and, detects and classifies any person in the frame within a fraction of seconds, thereby giving a smooth, uninterrupted output of 20 fps.

Usage:

If you’re looking for a powerful and efficient way to run the Midas model, look no further than the Flo Edge One. This impressive device boasts a light GPU that can deliver smooth results at around 20 FPS, all while maintaining a high level of accuracy. The Flo Edge One is truly a remarkable device, providing a plethora of impressive features that make it a must-have for tech enthusiasts. With its onboard camera, inbuilt IMU, and GNSS capabilities, this device truly has it all. It comes pre-installed with Ubuntu 22.04, ROS2, OpenCV, and various other tools, making it the perfect choice for your robotics ventures. Take for example a brand new marketing campaign that you deployed for your new product. Wanna know how impactful it’s been? Using the 12MP camera on the Flo Edge you can monitor your store inflow and count the number of people walking in since the campaign using this YOLO model. The best part? Amidst the semiconductor crisis, the Flo Edge One is affordable and ready to ship! So you can get started ASAP, without breaking the bank.

Performance Analysis:

The model is capable of detecting people with a 91% confidence. As a person progresses into the frame, the model score goes from around 50% with no prominent features other than the arm, leg or the shoulder, to 90% once the entire person is in the frame. The counter is updated per frame as well, showing the accurate number of people at any moment, even if all the people are not entirely visible. This makes it suitable for complex applications where people need to be detected accurately in real-time with minimal latency.

Conclusion:

The YOLOv5 object detection model is a powerful tool that can detect objects with a very high accuracy. Coupled with Flo Edge One’s low power GPU and 12 MP camera, a wide range of object detection applications like surveillance, security, and sports analysis, can be run with ease.

The post People detection and counting using YOLOv5 on Flo Edge One appeared first on Flo Mobility.