Skip to content

LQTS/M2VTP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Masked Visual-Tactile Pre-training for Robot Manipulation (ICRA24)

Overview

This repository contains the official code for the research project "Masked Visual-Tactile Pre-training for Robot Manipulation," presented at ICRA24. The project focuses on enhancing robotic manipulation capabilities through a novel approach that integrates visual and tactile information.

Key Links

Features

This repository provides the following functionalities:

  • Environment Setup: Instructions for configuring the necessary environment.
  • Pre-trained Model Integration: Code for importing and utilizing pre-trained models.
  • Downstream Task Training: Scripts for training models on specific manipulation tasks.
  • Model Evaluation: Tools for testing trained models and visualizing training strategies.

Getting Started

Environment Setup

To set up the environment, please refer to the detailed instructions in the Dependencies section.

Importing Pre-trained Models

You can access the pre-trained model code here. Additionally, pre-trained models can be downloaded from this link. Place the downloaded model and configuration files in the model/vitac/model_and_config directory. You can modify the directory information in model/backbones/pre_model.py as shown below:

MODEL_REGISTRY = {
    "vt20t-reall-tmr05-bin-ft+dataset-BottleCap": {
        "config": "model/vitac/model_and_config/vt20t-reall-tmr05-bin-ft+dataset-BottleCap.json",
        "checkpoint": "model/vitac/model_and_config/vt20t-reall-tmr05-bin-ft+dataset-BottleCap.pt",
        "cls": VTT_ReAll,
    }
}

Training a Policy

The repository includes a downstream task for bottle cap manipulation. To train the ShadowHand policy for this task, execute the following command:

python train_agent.py --task bottle_cap_vt --seed 123

Note: The training process requires at least two Nvidia 3090 GPUs—one for model training and the other for image rendering. Set the following environment variables accordingly:

import os
os.environ['MUJOCO_GL'] = 'egl'
os.environ["MUJOCO_EGL_DEVICE_ID"] = "1"  # '1' is for image rendering, '0' is used for training.

Testing a Policy

We provide a pre-trained policy that can be downloaded from this link. You can also train your own model using the instructions above.

To perform testing, use the following command:

python eval_agent.py --task bottle_cap_vt --seed 123 --resume_model path/to/your/model.pt --test

The test results will be output to the console and saved to a specified file.

Visualizing Policies

To save the operation process as a video, run the command:

python eval_agent.py --task bottle_cap_vt --seed 123 --resume_model path/to/your/model.pt --test --env_vis

The video will be saved in the runs/videos directory.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

If you have any questions or need support, please contact Qingtao Liu or Qi Ye. .

BibTeX

@inproceedings{liu2024m2vtp,
    title={Masked Visual-Tactile Pre-training for Robot Manipulation},
    author={Liu, Qingtao and Ye, Qi and Sun, Zhengnan and Cui, Yu and Li, Gaofeng and Chen, Jiming},
    booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
    year={2024},
    organization={IEEE}
} 

About

This is the downstream task repositiory for Masked Visual-Tactile Pre-training for Robot Manipulation (ICRA2024)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages