TiPToP: A Modular Open-Vocabulary Planning System for Robotic Manipulation
arXiv preprint, 2026
TiPToP is a modular, open-vocabulary system that combines vision foundation models with Task and Motion Planning to solve long-horizon manipulation tasks from RGB images and natural-language instructions. It requires zero robot data, matches or outperforms a state-of-the-art VLA, and is easily adapted to new embodiments.