His research interests lie in geometric generative modeling and its applications to multimodal foundation models, world models, embodied AI, and AI for health.
Selected publications are highlighted. (*Equal contribution. ✝Project lead. ✉Corresponding author.)
FlashMo: Geometric Interpolants and Frequency-Aware Sparsity for Scalable Efficient Motion Generation Zeyu Zhang*,
Yiran Wang*,
Danning Li*,
Dong Gong,
Ian Reid,
Richard Hartley
NeurIPS 2025
FlashMo introduces a geometric factorized interpolant and frequency-sparse attention, enabling scalable efficient 3D motion diffusion. Experiments show superior quality, efficiency, and scalability over state-of-the-art baselines.
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
Akide Liu*,
Zeyu Zhang*,
Zhexin Li,
Xuehai Bai,
Yizeng Han,
Jiasheng Tang,
Yuanjie Xing,
Jichao Wu,
Mingyang Yang,
Weihua Chen,
Jiahao He,
Yuanyu He,
Fan Wang,
Gholamreza Haffari,
Bohan Zhuang NeurIPS 2025Spotlight
FPSAttention is a training-aware FP8 quantization and sparsity co-design for video diffusion models that achieves up to 7.09x kernel speedups and 4.96× E2E speedups without quality loss by aligning 3D tile granularity, denoising-step adaptation, and hardware-efficient kernels.
Motion Mamba: Efficient and Long Sequence Motion Generation Zeyu Zhang*,
Akide Liu*,
Ian Reid,
Richard Hartley,
Bohan Zhuang,
Hao Tang
ECCV 2024
Human motion generation is a key goal in generative computer vision, and we propose Motion Mamba, a model using state space models (SSMs) with Hierarchical Temporal Mamba (HTM) and Bidirectional Spatial Mamba (BSM) blocks, achieving up to 50% FID improvement and 4x speedup on HumanML3D and KIT-ML datasets, showcasing efficient and high-quality long sequence motion modeling.
Research Projects
BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation Zeyu Zhang,
Shuning Chang,
Yuanyu He,
Yizheng Han,
Jiasheng Tang✉,
Fan Wang,
Bohan Zhuang✉
BlockVid is a semi-AR block diffusion framework equipped with semantic sparse KV caching, block forcing, and noise scheduling. Furthermore, LV-Bench is a fine-grained benchmark for minute-long videos with dedicated metrics to evaluate long-range coherence.
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
Ting Huang*,
Zeyu Zhang*✝,
Hao Tang✉
3D-R1 is an open-source generalist model that enhances the reasoning of 3D VLMs for unified scene understanding.
Motion Anything: Any to Motion Generation Zeyu Zhang*,
Yiran Wang*,
Wei Mao,
Danning Li,
Akira Zhao,
Biao Wu,
Zirui Song,
Bohan Zhuang,
Ian Reid,
Richard Hartley
Motion Anything advances multimodal motion generation with an Any-to-Motion framework, introducing Attention-based Mask Modeling for fine-grained control. It surpasses prior methods and introduces TMD, a large text-music-dance dataset, achieving state-of-the-art results.
Research Assistant Monash University Feb 2024 - May 2024
3D/4D generative learning, specifically focusing on text-guided human motion and avatar generation, working with Prof. Reza Haffari (Monash University), and Prof. Bohan Zhuang (ZJU, Monash University).
Bachelor of Science (Advanced) (Honours) The Australian National University (ANU) Jul 2021 - Jun 2025
Major: Computer Science, Minor: Mathematics, First Class Honours (H1), GPA: 6.656/7
Visiting Student Imperial College London Jul 2022
Quantitative Sciences Research Institute (QSRI)