Single-image viewport transformation with hole filling, based on GEN3C.
This tool shifts the camera viewpoint of egocentric images to create augmented training data. It is used to generate viewpoint-diverse versions of human demonstration data for humanoid robot learning.
- MoGe - Monocular depth prediction
- Cache3D - 3D point cloud warping to new viewpoint
- Stable Diffusion Inpainting - Fill disoccluded regions (holes)
pip install torch numpy opencv-python h5py pillow tqdm einops warp-lang diffusers transformers moge-model psutilpython viewport_transform_batch_h5.py \
--h5_file /path/to/input.h5 \
--image_key "observation_image_left" \
--trajectory "down" \
--movement_distance 0.07 \
--output_dir ./outputpython viewport_transform_batch_h5.py \
--h5_dir /path/to/h5_directory \
--batch_size 32 \
--trajectory "down" \
--movement_distance 0.07 \
--num_gpus 4 \
--output_dir /path/to/outputFor processing multiple batches in parallel across multiple GPUs, you can run separate processes with different GPU assignments:
# GPU 0,1,2,3 process batch_000
CUDA_VISIBLE_DEVICES=0,1,2,3 python viewport_transform_batch_h5.py \
--h5_dir /path/to/data/batch_000 \
--batch_size 32 \
--trajectory "down" \
--movement_distance 0.07 \
--num_gpus 4 \
--output_dir /path/to/output/batch_000 &
# GPU 4,5,6,7 process batch_001
CUDA_VISIBLE_DEVICES=4,5,6,7 python viewport_transform_batch_h5.py \
--h5_dir /path/to/data/batch_001 \
--batch_size 32 \
--trajectory "down" \
--movement_distance 0.07 \
--num_gpus 4 \
--output_dir /path/to/output/batch_001 &
# Wait for all tasks to complete
wait| Argument | Description | Default |
|---|---|---|
--h5_file / --h5_dir |
Input H5 file or directory | - |
--image_key |
Key for image data in HDF5 | observation_image_left |
--trajectory |
Camera direction: left, right, up, down, forward, backward |
down |
--movement_distance |
Camera movement distance | 0.1 |
--movement_distance_noise |
Random perturbation per sample | 0.02 |
--batch_size |
Frames per batch | 1 |
--num_gpus |
Number of GPUs | 1 |
--sd_model |
SD Inpainting model | stabilityai/stable-diffusion-2-inpainting |
--save_h5 |
Save as H5 (replacing original images) | false |
The 3D warping code (Cache3D, camera utilities, forward warping) is adapted from NVIDIA Cosmos under the Apache 2.0 License.