Bilkent University
Department of Computer Engineering
CS 590/690 SEMINAR
VGGTar: Feed-Forward Deformable VGGT Avatars from a Single Image
Fatih Pehlivan
Master Student
(Supervisor: Prof. Dr.Uğur Güdükbay)
Computer Engineering Department
Bilkent University
Abstract: Creating high-fidelity, animatable 3D human avatars from a single RGB image is a fundamental challenge in virtual reality and digital content creation. This task requires a difficult balance between the convenience of a single-image input and the high-fidelity demands of an animatable, full-body 3D representation. Modern feed-forward approaches have achieved impressive speed, but often rely on simplified 2D-based representations (e.g., UV maps) to map 3D attributes. This simplification introduces a fundamental limitation, making it difficult to capture complex 3D geometry that deviates from the underlying body's topology. Conversely, methods that can represent this complex 3D geometry (e.g., 3D points linked to a mesh) often do so by requiring a full video sequence as input. These approaches are not feed-forward and require a slow, "per-subject optimization" process, making them unsuitable for instant avatar creation. In this work, we introduce a novel feed-forward architecture that resolves this trade-off. Our method is based on a powerful geometry-grounded transformer backbone that directly predicts a "Deformable 3D Pointmap" representation of the clothed human from a single image. This 3D pointmap is co-predicted with its corresponding aligned parametric body model (SMPL-X), which allows the avatar to be instantly animated using standard Linear Blend Skinning (LBS). By training on a large-scale generative dataset, our approach avoids both the representation constraints of 2D-based methods and the slow, video-based optimization of 3D-based methods, offering a scalable and efficient solution for robust, animatable avatar creation from a single photo.
DATE: November 17, Monday @ 16:30 Place: EA 502