README.md

Viewport Transform

Single-image viewport transformation with hole filling, based on GEN3C.

This tool shifts the camera viewpoint of egocentric images to create augmented training data. It is used to generate viewpoint-diverse versions of human demonstration data for humanoid robot learning.

Pipeline

MoGe - Monocular depth prediction
Cache3D - 3D point cloud warping to new viewpoint
Stable Diffusion Inpainting - Fill disoccluded regions (holes)

Dependencies

pip install torch numpy opencv-python h5py pillow tqdm einops warp-lang diffusers transformers moge-model psutil

Usage

Single H5 file

python viewport_transform_batch_h5.py \
    --h5_file /path/to/input.h5 \
    --image_key "observation_image_left" \
    --trajectory "down" \
    --movement_distance 0.07 \
    --output_dir ./output

Directory of H5 files (multi-GPU)

python viewport_transform_batch_h5.py \
    --h5_dir /path/to/h5_directory \
    --batch_size 32 \
    --trajectory "down" \
    --movement_distance 0.07 \
    --num_gpus 4 \
    --output_dir /path/to/output

Parallel batch processing

For processing multiple batches in parallel across multiple GPUs, you can run separate processes with different GPU assignments:

# GPU 0,1,2,3 process batch_000
CUDA_VISIBLE_DEVICES=0,1,2,3 python viewport_transform_batch_h5.py \
    --h5_dir /path/to/data/batch_000 \
    --batch_size 32 \
    --trajectory "down" \
    --movement_distance 0.07 \
    --num_gpus 4 \
    --output_dir /path/to/output/batch_000 &

# GPU 4,5,6,7 process batch_001
CUDA_VISIBLE_DEVICES=4,5,6,7 python viewport_transform_batch_h5.py \
    --h5_dir /path/to/data/batch_001 \
    --batch_size 32 \
    --trajectory "down" \
    --movement_distance 0.07 \
    --num_gpus 4 \
    --output_dir /path/to/output/batch_001 &

# Wait for all tasks to complete
wait

Key Arguments

Argument	Description	Default
`--h5_file` / `--h5_dir`	Input H5 file or directory	-
`--image_key`	Key for image data in HDF5	`observation_image_left`
`--trajectory`	Camera direction: `left`, `right`, `up`, `down`, `forward`, `backward`	`down`
`--movement_distance`	Camera movement distance	`0.1`
`--movement_distance_noise`	Random perturbation per sample	`0.02`
`--batch_size`	Frames per batch	`1`
`--num_gpus`	Number of GPUs	`1`
`--sd_model`	SD Inpainting model	`stabilityai/stable-diffusion-2-inpainting`
`--save_h5`	Save as H5 (replacing original images)	`false`

Acknowledgement

The 3D warping code (Cache3D, camera utilities, forward warping) is adapted from NVIDIA Cosmos under the Apache 2.0 License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Viewport Transform

Pipeline

Dependencies

Usage

Single H5 file

Directory of H5 files (multi-GPU)

Parallel batch processing

Key Arguments

Acknowledgement

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Viewport Transform

Pipeline

Dependencies

Usage

Single H5 file

Directory of H5 files (multi-GPU)

Parallel batch processing

Key Arguments

Acknowledgement