A complete implementation of a 2D differential-drive robot simulation featuring a closed-loop perception and control system. This project demonstrates autonomous navigation, target detection, and state-machine-based control strategies.
- Overview
- Features
- Demo
- System Architecture
- Installation
- Usage
- Project Structure
- Configuration
- Testing and Evaluation
- Technical Details
- Future Enhancements
- License
The robot autonomously navigates to sequential targets using vision-based perception and finite-state machine control.
This project implements a virtual 2D autonomous robot that demonstrates the complete perception-control-action pipeline used in real robotics systems. The robot navigates through a simulated environment to sequentially reach multiple targets while avoiding distractors and obstacles.
Pipeline: Sense -> Perceive -> Control -> Act
The system simulates realistic challenges including:
- Sensor noise and measurement uncertainty
- Perception latency
- Limited field-of-view constraints
- Dynamic decision-making under uncertainty
- Geometry-based Robot-Centric Camera: Projects targets, distractors, and obstacles within a limited FOV for realistic perception simulation
- Computer Vision Perception: HSV color thresholding with morphological operations, contour filtering, and confidence scoring
- State Machine Control: Three-state FSM (SEARCH/TRACK/APPROACH) with adaptive speed scheduling
- Differential Drive Simulation: Realistic kinematics for a two-wheeled robot with configurable constraints
- Robustness Features: Configurable perception latency, measurement noise, and exponential smoothing
- Sequential target navigation (targets disappear upon reaching)
- Optional cyan distractors to test robustness
- Static obstacles for collision detection scenarios
- High-resolution visualization (1920x1080)
- Per-step CSV telemetry logging (pose, commands, detections, errors, control mode)
- GIF capture for visualizations
- Batch evaluation with performance metrics export
- Debug overlay showing camera FOV, detection masks, and state information
+-------------+ +--------------+ +-------------+ +----------+
| Camera |--->| Perception |--->| Control |--->| Robot |
| (Sensing) | | (Vision) | | (FSM) | | (Action) |
+-------------+ +--------------+ +-------------+ +----------+
| | | |
| | | |
+--------------------+-------------------+------------------+
|
+-------v--------+
| World State |
| (Simulation) |
+----------------+
- Simulation (src/sim.py): World state management, robot kinematics, target/obstacle handling
- Perception (src/perception.py): Color-based target detection with OpenCV
- Control (src/control.py): Finite-state machine with heading control
- Estimation (src/estimation.py): Sensor fusion and smoothing algorithms
- Rendering (src/render.py): Pygame-based visualization with debug overlays
Click to expand installation instructions
- Python 3.8 or higher
- pip package manager
- (Optional) Virtual environment tool
-
Clone the repository
git clone https://github.com/OctaviusLeo/robotics-2d-perception-control.git cd robotics-2d-perception-control -
Create and activate a virtual environment (recommended)
Windows:
python -m venv .venv .venv\Scripts\activate
macOS/Linux:
python -m venv .venv source .venv/bin/activate -
Install dependencies
pip install -r requirements.txt
numpy>=1.24.0- Numerical computationspygame>=2.5.0- Visualization and simulation loopopencv-python>=4.8.0- Computer vision and image processingimageio>=2.31.0- GIF generation
Run the default demo with distractors and obstacles:
python src/run_demo.pyClick to expand example commands
Perfect conditions for algorithm validation:
python src/run_demo.py --steps 2000 --seed 0 --no-distractors --no-obstacles --camera-mode robot --debug-overlayOne-command setup for the demo shown above:
python src/run_demo.py --clean-gifRun without GUI and save telemetry + GIF:
python src/run_demo.py --headless --steps 600 --log-csv outputs/run1.csv --save-gif --gif-path outputs/run1.gifTest robustness under realistic sensor conditions:
python src/run_demo.py --headless --steps 600 --perception-latency 3 --meas-noise-px 2.0 --smooth-alpha 0.4 --log-csv outputs/run_latency.csvRun multiple episodes and collect performance metrics:
python src/eval.py --episodes 10 --steps 600 --camera-mode robot --no-distractors --no-obstacles --metrics-csv outputs/metrics.csvClick to expand all CLI flags
| Flag | Description |
|---|---|
--headless |
Run without GUI window (uses dummy video driver) |
--steps N |
Number of simulation steps (60 steps ≈ 1 second) |
--seed N |
Random seed for reproducible runs |
--log-csv PATH |
Save per-step telemetry to CSV file |
--save-gif |
Capture simulation as GIF |
--gif-path PATH |
Output path for GIF file |
--clean-gif |
Quick command to generate clean demo GIF |
--perception-latency N |
Add N frames of detection delay |
--meas-noise-px FLOAT |
Gaussian pixel noise std deviation |
--smooth-alpha FLOAT |
Exponential smoothing factor (0-1) |
--no-distractors |
Disable cyan distractor objects |
--no-obstacles |
Disable static obstacles |
--debug-overlay |
Show camera FOV, masks, and state info |
--camera-mode MODE |
Choose robot (geometry) or global (full scene) |
Click to expand directory tree
robotics-2d-perception-control/
├── assets/ # Demo GIFs and visual assets
│ └── clean-run.gif
├── outputs/ # Generated logs and visualizations
│ ├── logs/
│ └── sanity_logs/
├── src/ # Source code
│ ├── __init__.py
│ ├── config.py # Simulation configuration parameters
│ ├── control.py # FSM controller and heading control
│ ├── estimation.py # Smoothing and filtering algorithms
│ ├── eval.py # Batch evaluation script
│ ├── perception.py # Vision-based target detection
│ ├── render.py # Pygame visualization
│ ├── run_demo.py # Main demo entry point
│ └── sim.py # Physics and world simulation
├── requirements.txt # Python dependencies
├── LICENSE # MIT License
└── README.md # This file
Click to expand configuration parameters
Key parameters can be adjusted in src/config.py:
SimConfig:
# World
width: 1920 px
height: 1080 px
dt: 1/60 s (60 Hz simulation)
# Robot
wheel_base: 40.0 px
v_max: 140.0 px/s (linear velocity limit)
w_max: 3.2 rad/s (angular velocity limit)
# Camera
cam_w: 160 px
cam_h: 120 px
cam_fov: 1.2 rad (~69 degrees)
# Target
target_radius: 16 pxClick to expand evaluation and logging details
The src/eval.py script provides batch testing capabilities:
python src/eval.py --episodes 20 --steps 800 --metrics-csv outputs/metrics.csvThe metrics CSV includes (per episode):
- success
- steps
- sim_time_s
- distance_initial
- distance_final
- detection_rate
- avg_v_cmd
- avg_w_cmd
- runtime_s
Per-step CSV logs contain:
step, sim_time_s, robot_x, robot_y, robot_theta, target_x, target_y, distance_to_target,
v_cmd, w_cmd, detected, detected_cx, detected_cy, detect_conf, err_norm, err_norm_filtered,
mode, holding_estimate
Use with pandas/matplotlib for detailed analysis:
import pandas as pd
df = pd.read_csv('outputs/run1.csv')
# Analyze trajectory, control behavior, detection performanceClick to expand technical implementation details
- Camera Projection: Geometry-based FOV projection of world objects
- Color Segmentation: HSV thresholding for target color (red)
- Morphological Operations: Opening to remove noise
- Contour Analysis: Filter by area and aspect ratio
- Confidence Scoring: Based on size and shape metrics
- State: SEARCH - Rotate in place until target detected
- State: TRACK - Align heading with target while moving slowly
- State: APPROACH - Move toward target at maximum speed
- Transitions based on detection confidence and heading error thresholds
- Latency Simulation: Queue-based delay of perception outputs
- Measurement Noise: Gaussian noise added to pixel coordinates
- Exponential Smoothing: Filters heading error for stable control
- Replace color thresholding with learned CNN detector
- Implement obstacle avoidance using reactive control or path planning
- Add trajectory tracking task with waypoint following
- Integrate SLAM for environment mapping
- Support for dynamic obstacles and multi-robot scenarios
- ROS2 integration for hardware deployment
This project is licensed under the MIT License - see the LICENSE file for details.
