Creator

Digital Art and Extended Reality

Researcher

Geometric Generative Modeling

Hey Guys, I'm Zeyu Zhang

Portrait of Zeyu Zhang
profile photo

Zeyu Zhang

Zeyu Zhang is a researcher in generative AI.

His research interests lie in geometric generative modeling and its applications to multimodal foundation models, world models, embodied AI, and AI for health.

He received his bachelor’s degree at the Australian National University, advised by Prof. Richard Hartley and Prof. Ian Reid.

                     



News
(05/01/2026) 🎉 Our paper Code2Worlds has been accepted to ICML 2026!
(03/11/2026) 🎉 Glad to receive the Berkeley Fellowship!
(02/21/2026) 🎉 Our paper GeoWorld has been accepted to CVPR 2026!
(01/27/2026) 🎉 Our paper VaseVQA-3D has been accepted to ICLR 2026!
(11/27/2025) 🎉 Glad to receive the Australasian Undergraduate Research Medal!
(09/18/2025) 🎉 Our paper FlashMo and FPSAttention has been accepted to NeurIPS 2025!
(07/02/2024) 🎉 Our paper Motion Mamba has been accepted to ECCV 2024!


Publications

Selected publications are highlighted. (*Equal contribution. Project lead. Corresponding author.)

Code2Worlds: Empowering Coding LLMs for 4D World Generation
Yi Zhang*, Yunshuang Wang*, Zeyu Zhang*✝, Hao Tang

ICML 2026
Achieving spatial intelligence requires moving beyond visual plausibility. We propose Code2Worlds, a language-to-simulation framework with dual-stream generation and physics-aware refinement, achieving superior dynamic fidelity and performance on Code4D benchmarks.
Position: RL Should Be Used to Adjust Foundation Models, NOT Abused
Ting Huang*, Zeyu Zhang*✝, Hao Tang

ICML 2026 Position
This position paper argues reinforcement learning should adjust foundation models after pretraining, not serve as default capability creation, emphasizing targeted refinement, reward minimalism, and disciplined deployment.
GeoWorld: Geometric World Models
Zeyu Zhang, Danning Li, Ian Reid, Richard Hartley

CVPR 2026
GeoWorld introduces hyperbolic energy-based world models with geometric reinforcement learning, enabling stable long-horizon visual planning and hierarchical reasoning, outperforming V-JEPA-2 on CrossTask and COIN benchmarks.
StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes
Zhengri Wu*, Yiran Wang*, Yu Wen*, Zeyu Zhang*✝, Biao Wu, Hao Tang

ICRA 2026
Underwater stereo depth estimation enables accurate 3D geometry for robotics. We propose StereoAdapter, combining LoRA-adapted monocular encoders with stereo refinement, achieving robust improvements on benchmarks.
VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery
Nonghai Zhang*, Zeyu Zhang*✝, Jiazi Wang*, Yang Zhao, Hao Tang

ICLR 2026
VaseVQA-3D introduces the innovative 3D visual question-answering dataset for ancient Greek pottery, featuring 664 annotated vase models, while VaseVLM is a domain-adaptive vision-language model trained for cultural heritage analysis.
FlashMo: Geometric Interpolants and Frequency-Aware Sparsity for Scalable Efficient Motion Generation
Zeyu Zhang*, Yiran Wang*, Danning Li*, Dong Gong, Ian Reid, Richard Hartley

NeurIPS 2025
FlashMo introduces a geometric factorized interpolant and frequency-sparse attention, enabling scalable efficient 3D motion diffusion. Experiments show superior quality, efficiency, and scalability over state-of-the-art baselines.
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
Akide Liu*, Zeyu Zhang*, Zhexin Li, Xuehai Bai, Yuanjie Xing, Yizeng Han, Jiasheng Tang, Jichao Wu, Mingyang Yang, Weihua Chen, Jiahao He, Yuanyu He, Fan Wang, Gholamreza Haffari, Bohan Zhuang

NeurIPS 2025 Spotlight
FPSAttention is a training-aware FP8 quantization and sparsity co-design for video diffusion models that achieves up to 7.09x kernel speedups and 4.96× E2E speedups without quality loss by aligning 3D tile granularity, denoising-step adaptation, and hardware-efficient kernels.
Motion Mamba: Efficient and Long Sequence Motion Generation
Zeyu Zhang*, Akide Liu*, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang

ECCV 2024
Human motion generation is a key goal in generative computer vision. We propose Motion Mamba, using state space models with Hierarchical Temporal and Bidirectional Spatial blocks, achieving improved FID and faster motion modeling.


Research Experience


Research Assistant
Peking University
July 2024 - Present
Spatial intelligence and embodied AI, working with Asst. Prof. Hao Tang (PKU).
Research Assistant
La Trobe University
Apr 2024 - Present
3D generation and AI for Heath, working with Dr. Yang Zhao (La Trobe University).


Education Experience


Bachelor of Science (Advanced) (Honours)
The Australian National University (ANU)
Jul 2021 - Jun 2025
Major: Computer Science, Minor: Mathematics, First Class Honours (H1), GPA: 6.656/7


Honors and Awards

ICML Gold Reviewer, May 2026.
ICLR Travel Grant, Mar 2026.
Berkeley Fellowship, UC Berkeley, Mar 2026.
Australasian Undergraduate Research Medal, Australasian Council for Undergraduate Research (ACUR), Nov 2025.
Chancellor's Letter of Commendation, The Australian National University, July 2025.
NRF Vacation Scholarship, NeuroSurgical Research Foundation, Oct 2023.
Flinders Summer Research Scholarship, Flinders University CMPH, Nov 2022.
UNSW Science Vacation Research Scholarship, The UNSW Sydney, Oct 2022.


Academic Services

Conference Reviewer: CVPR 2025 2026, ICLR 2025 2026, AAAI 2026, MM 2025, IJCAI 2025, MICCAI 2025, CHI 2025, 3DV 2026, VR 2025.


Talks

(03/03/2026) Latest Advances in Embodied Reasoning @ CVLife. [Recording/Slides]
(11/12/2025) FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion @ Alibaba DAMO Academy. [Recording/Slides]
(10/17/2025) Video World Models: Learning the Physical World from Videos @ Zhejiang University. [Recording/Slides]
(10/15/2025) How RL Enhances Spatial Understanding? @ NVIDIA Spatial Intelligence Lab. [Recording/Slides]
(10/09/2025) How RL Enhances Spatial Understanding? @ 3DCVer. [Recording/Slides]
(09/19/2025) Grounding Foundation Models to the Real World @ Peking University. [Recording/Slides]
(09/18/2025) Spatial Intelligence: From Virtual to Real Worlds @ Yahaha. [Recording/Slides]
(07/22/2024) Motion Mamba: Efficient and Long Sequence Motion Generation @ miHoYo. [Slides]