Bilkent University
Department of Computer Engineering
M.S.THESIS PRESENTATION
TpTf: Leveraging Global Receptive Fields and Spectral Filters for Visual Robotic Manipulation with Transporting Transformer Networks
Barış Bilgin Şenol
Master Student
(Supervisor: Asst.Prof.Özgür Salih Öğüz)
Computer Engineering Department
Bilkent University
Abstract: Transformers have recently emerged as a powerful and versatile tool capable of capturing complex interactions among long-distance features, making them highly suitable for learning visual representations for robotic manipulation tasks. However, existing density estimation models, such as Transporter networks, rely on convolutional backbones that primarily process local information, requiring multiple convolutional stems to learn task-specific policies. In this paper, we explore the potential of Transformer networks and alternative token mixing mechanisms for categorical density estimation and propose the Transporting Transformer networks for complex robotic pick-and-place tasks. Our approach employs a single encoder stem to leverage global features for learning both pick and pick-conditioned place policies. Building upon its enhanced capacity, we also introduce a novel training scheme for multi-task learning on the Ravens benchmark. The Transporting Transformer learns manipulation policies directly from visual observations without object-level assumptions, achieving improved performance through effective modeling of long-range spatial relationships. It maintains sample efficiency comparable to existing methods while demonstrating superior performance in both single-task and multi-task learning settings.
DATE: September 10, Wednesday @ 10:00 Place: EA 409