Bilkent University
Department of Computer Engineering
M.S.THESIS PRESENTATION

 

Contact-VLA: Zero-Shot Planning and Control for Contact-Rich Manipulation

 

Berk Çiçek
Master Student
(Supervisor: Asst.Prof.Özgür Salih Öğüz)

Computer Engineering Department
Bilkent University

Abstract: Vision-Language-Action (VLA) systems often lack adaptability and explainability due to their blackbox structure and dependency on fixed action sets from extensive tele-operated datasets, limiting their effectiveness in complex, dynamic manipulation scenarios. To address this issue, we propose a novel VLA framework capable of effectively managing complex, dynamic, and contact-rich manipulation tasks. By integrating foundational vision-language models with motion planning and reactive controllers, our system achieves zero-shot planning and adaptive manipulation without relying on extensive tele-operated action datasets. Unlike conventional VLAs, we explicitly separate the roles of Vision-Language Models (VLM) and Large Language Models (LLM): the VLM handles object parameter extraction and environmental modeling, while the LLM generates initial contact strategies and cost estimations. These two components collaboratively contribute to the creation of a simulated environment in which our dynamic planner operates. Additionally, this modular approach significantly enhances both the explainability and performance of the overall framework, as demonstrated through our rigorous ablation studies. Furthermore, we introduce a memory unit to leverage past manipulation experiences, enabling the generalization and efficient reuse of learned contact strategies and parameter adjustments across diverse manipulation scenarios. Experiments conducted on challenging contact-rich tasks validate our framework's robustness and highlight the critical design elements that contribute to its effectiveness.

 

DATE: September 10, Wednesday @ 11:00 Place: EA 409