His research interests lie in geometric generative modeling and its applications to multimodal foundation models, world models, embodied AI, and AI for health.
Selected publications are highlighted. (*Equal contribution. ✝Project lead. ✉Corresponding author.)
FlashMo: Geometric Interpolants and Frequency-Aware Sparsity for Scalable Efficient Motion Generation Zeyu Zhang*,
Yiran Wang*,
Danning Li*,
Dong Gong,
Ian Reid,
Richard Hartley
NeurIPS 2025
FlashMo introduces a geometric factorized interpolant and frequency-sparse attention, enabling scalable efficient 3D motion diffusion. Experiments show superior quality, efficiency, and scalability over state-of-the-art baselines.
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
Akide Liu*,
Zeyu Zhang*,
Zhexin Li,
Xuehai Bai,
Yizeng Han,
Jiasheng Tang,
Yuanjie Xing,
Jichao Wu,
Mingyang Yang,
Weihua Chen,
Jiahao He,
Yuanyu He,
Fan Wang,
Gholamreza Haffari,
Bohan Zhuang NeurIPS 2025Spotlight
FPSAttention is a training-aware FP8 quantization and sparsity co-design for video diffusion models that achieves up to 7.09x kernel speedups and 4.96× E2E speedups without quality loss by aligning 3D tile granularity, denoising-step adaptation, and hardware-efficient kernels.
ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS
Weijie Wang,
Yuedong Chen,
Zeyu Zhang,
Duochao Shi,
Akide Liu,
Bohan Zhuang NeurIPS 2025
ZPressor is an architecture-agnostic module that compresses multi-view inputs for scalable feed-forward 3DGS.
TRiCo: Triadic Game-Theoretic Co-Training for Robust Semi-Supervised Learning
Hongyang He,
Xinyuan Song,
Yangfan He,
Zeyu Zhang,
Yanshu Li,
Haochen You,
Lifan Sun,
Wenqiao Zhang NeurIPS 2025
TRiCo introduces a triadic game-theoretic co-training framework with two students, a meta-learned teacher, and an adversarial generator, leveraging mutual information pseudo-labeling to achieve state-of-the-art semi-supervised learning performance.
OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad
Luyao Tang,
Chaoqi Chen,
Yuxuan Yuan,
Zeyu Zhang,
Yue Huang,
Kun Zhang
CVPR 2025
Foundation models struggle with distribution shifts and weak supervision. We propose OCRT, a framework extracting high-level concepts and relations, enhancing SAM and CLIP generalizability in diverse tasks.
Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve Anomalies
Zirui Song,
Guangxian Ouyang,
Meng Fang,
Hongbin Na,
Zijing Shi,
Zhenhao Chen,
Yujie Fu,
Zeyu Zhang,
Shiyu Jiang,
Miao Fang,
Ling Chen,
Xiuying Chen✉ NAACL 2025
Household robots struggle to detect hazards. We propose anomaly scenario generation using multi-agent brainstorming and 3D simulations, enhancing robotic skills in hazard detection, hygiene management, and child safety through diverse environments.
Efficient Learning With Sine-Activated Low-rank Matrices
Yiping Ji,
Hemanth Saratchandran,
Cameron Gordon,
Zeyu Zhang,
Simon Lucey
ICLR 2025
We propose a novel theoretical framework integrating a sinusoidal function into low-rank decomposition, enhancing parameter efficiency and model accuracy across diverse neural network applications such as Vision Transformers, Large Language Models, Neural Radiance Fields, and 3D shape modeling.
Motion Mamba: Efficient and Long Sequence Motion Generation Zeyu Zhang*,
Akide Liu*,
Ian Reid,
Richard Hartley,
Bohan Zhuang,
Hao Tang
ECCV 2024
Human motion generation is a key goal in generative computer vision, and we propose Motion Mamba, a model using state space models (SSMs) with Hierarchical Temporal Mamba (HTM) and Bidirectional Spatial Mamba (BSM) blocks, achieving up to 50% FID improvement and 4x speedup on HumanML3D and KIT-ML datasets, showcasing efficient and high-quality long sequence motion modeling.
Research Projects
BlockVid: Block Diffusion for High-Fidelity and Coherent Minute-Long Video Generation Zeyu Zhang,
Shuning Chang,
Yuanyu He,
Yizheng Han,
Jiasheng Tang,
Fan Wang,
Bohan Zhuang✉
BlockVid is a semi-AR block diffusion framework equipped with semantic sparse KV caching, block forcing, and noise scheduling. Furthermore, LV-Bench is a fine-grained benchmark for minute-long videos with dedicated metrics to evaluate long-range coherence.
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
Angen Ye*,
Zeyu Zhang*,
Boyuan Wang,
Xiaofeng Wang,
Dapeng Zhang,
Zheng Zhu✉
VLA-R1 is a reasoning-enhanced vision–language–action model that enables step-by-step reasoning and robust action execution across diverse tasks and domains.
Nav-R1: Reasoning and Navigation in Embodied Scenes
Qingxiang Liu*,
Ting Huang*,
Zeyu Zhang*✝,
Hao Tang✉
Nav-R1 is an embodied foundation model that integrates dialogue, reasoning, planning, and navigation capabilities to enable intelligent interaction and task execution in 3D environments.
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
Ting Huang*,
Zeyu Zhang*✝,
Hao Tang✉
3D-R1 is an open-source generalist model that enhances the reasoning of 3D VLMs for unified scene understanding.
Motion Anything: Any to Motion Generation Zeyu Zhang*,
Yiran Wang*,
Wei Mao,
Danning Li,
Akira Zhao,
Biao Wu,
Zirui Song,
Bohan Zhuang,
Ian Reid,
Richard Hartley
Motion Anything advances multimodal motion generation with an Any-to-Motion framework, introducing Attention-based Mask Modeling for fine-grained control. It surpasses prior methods and introduces TMD, a large text-music-dance dataset, achieving state-of-the-art results.
Research Experience
Research Intern GigaAI Dec 2024 - Present
3D generation, spatial intelligence, and world model, working with Dr. Zheng Zhu (GigaAI).
Research Intern Alibaba DAMO Academy Oct 2024 - Present
Efficient long video generation, working with Mr. Jiasheng Tang (DAMO) and Prof. Bohan Zhuang (ZJU, DAMO).
Research Assistant La Trobe University Apr 2024 - Present
3D generation and AI for Heath, working with Dr. Yang Zhao (La Trobe University).
Research Assistant Monash University Feb 2024 - May 2024
3D/4D generative learning, specifically focusing on text-guided human motion and avatar generation, working with Prof. Reza Haffari (Monash University), and Prof. Bohan Zhuang (ZJU, Monash University).
Bachelor of Science (Advanced) (Honours) The Australian National University (ANU) Jul 2021 - Jun 2025
Major: Computer Science, Minor: Mathematics, First Class Honours (H1), GPA: 6.656/7
Visiting Student Imperial College London Jul 2022
Quantitative Sciences Research Institute (QSRI)
(09/19/2025)Grounding Foundation Models to the Real World @ Peking University. Our slides and recording are available. (09/18/2025)Spatial Intelligence: From Virtual to Real Worlds @ Yahaha. Our slides and recording are available. (07/22/2024)Motion Mamba: Efficient and Long Sequence Motion Generation @ miHoYo. Our slides are available.