Creator

Digital Art and Extended Reality

Researcher

Geometric Generative Modeling

Hey Guys, I'm Zeyu Zhang

Portrait of Zeyu Zhang
profile photo

Zeyu Zhang

Zeyu Zhang is an undergraduate researcher.

His research interests lie in geometric generative modeling and its applications to multimodal foundation models, world models, embodied AI, and AI for health.

He received his bachelor’s degree at the Australian National University, advised by Prof. Richard Hartley and Prof. Ian Reid.

Zeyu is actively seeking a PhD, research engineer, or research intern position for Fall 2026 in the US.

                     



News
(09/18/2025) 🎉 Our paper FlashMo has been accepted to NeurIPS 2025!
(08/05/2025) 🎉 Our paper 3D-R1 has been shared in Daily Papers by AK!
(07/02/2024) 🎉 Our paper Motion Mamba has been accepted to ECCV 2024!
(03/13/2024) 🎉 Our paper Motion Mamba has been shared in Daily Papers by AK!


Publications

Selected publications are highlighted. (*Equal contribution. Project lead. Corresponding author.)

FlashMo: Geometric Interpolants and Frequency-Aware Sparsity for Scalable Efficient Motion Generation
Zeyu Zhang*, Yiran Wang*, Danning Li*, Dong Gong, Ian Reid, Richard Hartley
NeurIPS 2025
FlashMo introduces a geometric factorized interpolant and frequency-sparse attention, enabling scalable efficient 3D motion diffusion. Experiments show superior quality, efficiency, and scalability over state-of-the-art baselines.
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
Akide Liu*, Zeyu Zhang*, Zhexin Li, Xuehai Bai, Yizeng Han, Jiasheng Tang, Yuanjie Xing, Jichao Wu, Mingyang Yang, Weihua Chen, Jiahao He, Yuanyu He, Fan Wang, Gholamreza Haffari, Bohan Zhuang

NeurIPS 2025 Spotlight
FPSAttention is a training-aware FP8 quantization and sparsity co-design for video diffusion models that achieves up to 7.09x kernel speedups and 4.96× E2E speedups without quality loss by aligning 3D tile granularity, denoising-step adaptation, and hardware-efficient kernels.
Motion Mamba: Efficient and Long Sequence Motion Generation
Zeyu Zhang*, Akide Liu*, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang

ECCV 2024
Human motion generation is a key goal in generative computer vision, and we propose Motion Mamba, a model using state space models (SSMs) with Hierarchical Temporal Mamba (HTM) and Bidirectional Spatial Mamba (BSM) blocks, achieving up to 50% FID improvement and 4x speedup on HumanML3D and KIT-ML datasets, showcasing efficient and high-quality long sequence motion modeling.


Research Projects

BlockVid: Block Diffusion for High-Fidelity and Coherent Minute-Long Video Generation
Zeyu Zhang, Shuning Chang, Yuanyu He, Yizheng Han, Jiasheng Tang, Fan Wang, Bohan Zhuang

BlockVid is a semi-AR block diffusion framework equipped with semantic sparse KV caching, block forcing, and noise scheduling. Furthermore, LV-Bench is a fine-grained benchmark for minute-long videos with dedicated metrics to evaluate long-range coherence.
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
Angen Ye*, Zeyu Zhang*, Boyuan Wang, Xiaofeng Wang, Dapeng Zhang, Zheng Zhu

VLA-R1 is a reasoning-enhanced vision–language–action model that enables step-by-step reasoning and robust action execution across diverse tasks and domains.
Nav-R1: Reasoning and Navigation in Embodied Scenes
Qingxiang Liu*, Ting Huang*, Zeyu Zhang*✝, Hao Tang

Nav-R1 is an embodied foundation model that integrates dialogue, reasoning, planning, and navigation capabilities to enable intelligent interaction and task execution in 3D environments.
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
Ting Huang*, Zeyu Zhang*✝, Hao Tang

3D-R1 is an open-source generalist model that enhances the reasoning of 3D VLMs for unified scene understanding.
Motion Anything: Any to Motion Generation
Zeyu Zhang*, Yiran Wang*, Wei Mao, Danning Li, Akira Zhao, Biao Wu, Zirui Song, Bohan Zhuang, Ian Reid, Richard Hartley

Motion Anything advances multimodal motion generation with an Any-to-Motion framework, introducing Attention-based Mask Modeling for fine-grained control. It surpasses prior methods and introduces TMD, a large text-music-dance dataset, achieving state-of-the-art results.


Research Experience


Research Intern
GigaAI
Dec 2024 - Present
3D generation, spatial intelligence, and world model, working with Dr. Zheng Zhu (GigaAI).
Research Intern
Alibaba DAMO Academy
Oct 2024 - Present
Efficient long video generation, working with Mr. Jiasheng Tang (DAMO) and Prof. Bohan Zhuang (ZJU, DAMO).
Researcher
Peking University
July 2024 - Present
Spatial intelligence and embodied AI, working with Asst. Prof. Hao Tang (PKU).
Research Assistant
La Trobe University
Apr 2024 - Present
3D generation and AI for Heath, working with Dr. Yang Zhao (La Trobe University).
Research Assistant
Flinders Health and Medical Research Institute (FHMRI)
Nov 2022 - Present
3D medical imaging analysis, particularly in the realms of 2D and 3D medical representation learning and explainable AI, working with Dr. Minh-Son To (FHMRI).


Education Experience


Bachelor of Science (Advanced) (Honours)
The Australian National University (ANU)
Jul 2021 - Jun 2025
Major: Computer Science, Minor: Mathematics, First Class Honours (H1), GPA: 6.656/7


Honors and Awards

Chancellor's Letter of Commendation, The Australian National University, July 2025.
NRF Vacation Scholarship, NeuroSurgical Research Foundation, Oct 2023.
Flinders Summer Research Scholarship, Flinders University CMPH, Nov 2022.
UNSW Science Vacation Research Scholarship, The UNSW Sydney, Oct 2022.


Academic Services

Conference Reviewer: CVPR 2025 2026, ICLR 2025 2026, AAAI 2026, MM 2025, IJCAI 2025, MICCAI 2025, CHI 2025, 3DV 2026, VR 2025.


Talks

(09/19/2025) Grounding Foundation Models to the Real World @ Peking University. Our slides and recording are available.
(09/18/2025) Spatial Intelligence: From Virtual to Real Worlds @ Yahaha. Our slides and recording are available.
(07/22/2024) Motion Mamba: Efficient and Long Sequence Motion Generation @ miHoYo. Our slides are available.