Bilkent University
Department of Computer Engineering
CS 590/690 SEMINAR
Lagrangian Perturbation Diffusion Steering: Latent Reinforcement Learning for Generative Policies
Muhammet Hikmet Şimşir
Master Student
(Supervisor:Asst.Prof.Özgür S. Öğüz)
Computer Engineering Department
Bilkent University
Abstract: Behavior cloning with high-capacity generative policies achieves strong imitation performance, but performance is often constrained by limited demonstration coverage and sensitivity to distribution shift. While reinforcement learning can improve task performance, directly fine-tuning large action decoders is often unstable and sample inefficient. We propose Lagrangian Perturbation Diffusion Steering (LP-DS), a lightweight adaptation method that improves a frozen generative policy while preserving its multimodal structure. LP-DS learns a compact noise-space perturbation module that shifts Gaussian noise inputs before decoding, enabling policy improvement without modifying the action decoder. To prevent off-manifold latent queries and unstable denoising dynamics, we optimize this module with a Lagrangian trust-region objective that maximizes downstream value while constraining perturbation magnitude, yielding stable and sample-efficient learning. Across RoboMimic manipulation, OpenAI Gym locomotion, and Adroit dexterous manipulation benchmarks, LP-DS improves sample efficiency, success, and return while maintaining diverse behavior, as quantified by higher action-space entropy using the Kozachenko--Leonenko k-nearest neighbor estimator, with return improvements of up to 25% over prior baselines. Project page: https://sites.google.com/view/lp-ds/home
DATE: April 13, Monday @ 15:30 Place: EA 502