Bilkent University
Department of Computer Engineering
M.S.THESIS PRESENTATION
Controllable Diffusion-Based Visual Editing
Yiğit Ekin
Master Student
(Supervisor: Asst.Prof.Ayşegül Dündar Boral)
Computer Engineering Department
Bilkent University
Abstract: Advancements in generative networks have significantly improved visual generation, particularly for image and video editing applications. However, key challenges remain in achieving controllable editing. Diffusion inpainting models often hallucinate or re-insert the intended object during object removal, and text-to-video diffusion models struggle to follow a desired motion pattern without sacrificing prompt alignment for motion conditioned generation. This thesis addresses these gaps through two interconnected studies. First, we introduce a background-focused image conditioning framework for object removal that utilizes focused embeddings and proposes a suppression method for removing foreground concept in the conditioning signal. By explicitly using such conditioning, it prevents common failure modes such as foreground leakage and mask-shape-driven hallucinations. Second, we develop a motion-conditioned video generation and editing method that achieves successful motion transfer from a reference to the generated video. By directly updating the positional embeddings, it achieves high fidelity motion aligned generation without sacrificing the textual condition alignment. Together, these contributions advance controllable visual editing by demonstrating that pretrained generative models contain useful behaviors beyond their explicit training objectives, and that providing the right guidance can unlock robust control with improved fidelity, consistency, and user-directed precision.
DATE: January 20, Tuesday @ 11:30 Place: EA 516