FastAnimate: Towards Learnable Template Construction and Pose Deformation for Fast 3D Human Avatar Animation

AAAI 2026

Jian Shu*, Nanjie Yao*, Gangjian Zhang, Junlong Ren, Yu Feng and Hao Wang†

Abstract

3D human avatar animation aims at transforming a human avatar from an arbitrary initial pose to a specified target pose using deformation algorithms. Existing approaches typically divide this task into two stages: canonical template construction and target pose deformation. However, current template construction methods demand extensive skeletal rigging and often produce artifacts for specific poses. Moreover, target pose deformation suffers from structural distortions caused by Linear Blend Skinning (LBS), which significantly undermines animation realism. To address these problems, we propose a unified learning-based framework to address both challenges in two phases. For the former phase, to overcome the inefficiencies and artifacts during template construction, we leverage a U-Net architecture that decouples texture and pose information in a feed-forward process, enabling fast generation of a human template. For the latter phase, we propose a data-driven refinement technique that enhances structural integrity. Extensive experiments show that our model delivers consistent performance across diverse poses with an optimal balance between efficiency and quality, surpassing state-of-the-art (SOTA) methods.

Our framework integrates three core components: Texturally, we employ a multi-source texture synthesis strategy to generate diverse synthetic data for training, along with a lightweight texture encoder for effective feature extraction. Geometrically, we introduce a Region-aware Shape Extraction Module that enhances human shape extraction through part-based human shape feature extraction and interaction, coupled with a Fourier Geometry Encoder for efficient geometric learning. Systematically, we propose a Dual Reconstruction U-Net that utilizes feature residuals to balance geometric and texture features, enabling mutual enhancement for cross-modal feature throughout reconstruction. Additionally, to refine 3D mesh quality and extraction efficiency, we design a Gaussian enhanced remeshing strategy based on the supervision of generated normal Gaussian avatar.

Decoupled Templates

Animated Results

Quantitative Comparison

Static cases llustration