Skip to content

Latest commit

 

History

History
85 lines (67 loc) · 2.75 KB

File metadata and controls

85 lines (67 loc) · 2.75 KB

Viewport Transform

Single-image viewport transformation with hole filling, based on GEN3C.

This tool shifts the camera viewpoint of egocentric images to create augmented training data. It is used to generate viewpoint-diverse versions of human demonstration data for humanoid robot learning.

Pipeline

  1. MoGe - Monocular depth prediction
  2. Cache3D - 3D point cloud warping to new viewpoint
  3. Stable Diffusion Inpainting - Fill disoccluded regions (holes)

Dependencies

pip install torch numpy opencv-python h5py pillow tqdm einops warp-lang diffusers transformers moge-model psutil

Usage

Single H5 file

python viewport_transform_batch_h5.py \
    --h5_file /path/to/input.h5 \
    --image_key "observation_image_left" \
    --trajectory "down" \
    --movement_distance 0.07 \
    --output_dir ./output

Directory of H5 files (multi-GPU)

python viewport_transform_batch_h5.py \
    --h5_dir /path/to/h5_directory \
    --batch_size 32 \
    --trajectory "down" \
    --movement_distance 0.07 \
    --num_gpus 4 \
    --output_dir /path/to/output

Parallel batch processing

For processing multiple batches in parallel across multiple GPUs, you can run separate processes with different GPU assignments:

# GPU 0,1,2,3 process batch_000
CUDA_VISIBLE_DEVICES=0,1,2,3 python viewport_transform_batch_h5.py \
    --h5_dir /path/to/data/batch_000 \
    --batch_size 32 \
    --trajectory "down" \
    --movement_distance 0.07 \
    --num_gpus 4 \
    --output_dir /path/to/output/batch_000 &

# GPU 4,5,6,7 process batch_001
CUDA_VISIBLE_DEVICES=4,5,6,7 python viewport_transform_batch_h5.py \
    --h5_dir /path/to/data/batch_001 \
    --batch_size 32 \
    --trajectory "down" \
    --movement_distance 0.07 \
    --num_gpus 4 \
    --output_dir /path/to/output/batch_001 &

# Wait for all tasks to complete
wait

Key Arguments

Argument Description Default
--h5_file / --h5_dir Input H5 file or directory -
--image_key Key for image data in HDF5 observation_image_left
--trajectory Camera direction: left, right, up, down, forward, backward down
--movement_distance Camera movement distance 0.1
--movement_distance_noise Random perturbation per sample 0.02
--batch_size Frames per batch 1
--num_gpus Number of GPUs 1
--sd_model SD Inpainting model stabilityai/stable-diffusion-2-inpainting
--save_h5 Save as H5 (replacing original images) false

Acknowledgement

The 3D warping code (Cache3D, camera utilities, forward warping) is adapted from NVIDIA Cosmos under the Apache 2.0 License.