Zeyu Zhang

Zeyu Zhang is an undergraduate researcher advised by Prof. Richard Hartley and Prof. Ian Reid. He is also a visiting student researcher at MIT CSAIL, working with Assoc. Prof. Stefanie Mueller. He is an incoming research assistant at USC, working with Asst. Prof. Yue Wang. His research interests are rooted in computer vision, focusing on generative 3D modeling and AI for health. Specifically, he is dedicated to advancing efficient and high-quality motion and avatar generation, as well as 3D medical imaging segmentation and representation learning. With extensive experience across multiple research disciplines, Zeyu actively explores cutting-edge advancements in both the foundational and applied aspects of artificial intelligence. He also collaborates closely with Prof. Bohan Zhuang (ZJU), Asst. Prof. Hao Tang (PKU), Dr. Yang Zhao (La Trobe), Dr. Minh-Son To (FHMRI), and many others. Zeyu is actively seeking PhD, RA, and internship in the US.

             

profile photo

News

(10/14/2024) 🎉 Our paper MedDet has been accepted for an oral presentation at BIBM 2024!
(07/19/2024) 🎉 Our paper Motion Avatar has been accepted to BMVC 2024!
(07/02/2024) 🎉 Our paper Motion Mamba has been accepted to ECCV 2024!
(06/18/2024) 🎉 Our paper JointViT has been selected as oral presentation at MIUA 2024!
(05/14/2024) 🎉 Our paper JointViT has been accepted to MIUA 2024!
(03/13/2024) 🎉 Our paper Motion Mamba has been featured in Daily Papers!
(02/10/2024) 🎉 Our paper SegReg has been accepted to ISBI 2024!

Publications

Selected publications are highlighted. (Equal contribution. Project lead. Corresponding author.)

Motion Anything: One Prompt for Multimodal Motion and Avatar Generation
Zeyu Zhang, Yiran Wang, Wei Mao, Danning Li, Rui Zhao, Biao Wu, Zirui Song, Bohan Zhuang, Ian Reid, Richard Hartley

Preprint
Motion Anything introduces a unified method for generating 4D avatars with text, music, and dance, leveraging adaptive transformers, selective rigging, and a new Text-Music-Dance dataset for multimodal tasks.
KMM: Key Frame Mask Mamba for Extended Motion Generation
Zeyu Zhang, Hang Gao, Akide Liu, Qi Chen, Feng Chen, Yiran Wang, Danning Li, Rui Zhao, Ian Reid, Richard Hartley, Hao Tang

Preprint
KMM addresses memory decay and multimodal fusion in motion generation, introducing key frame masking and contrastive learning, achieving state-of-the-art results on BABEL with superior efficiency and alignment.
InfiniMotion: Mamba in Mamba for Long Motion Generation
Zeyu Zhang, Akide Liu, Qi Chen, Feng Chen, Yiran Wang, Danning Li, Ling Shao, Ian Reid, Richard Hartley, Hao Tang, Bohan Zhuang

Preprint
InfiniMotion introduces a novel mamba-in-mamba architecture with memory updates and a similarity-based masking strategy, achieving 15% FID improvement on BABEL for robust, long-sequence motion generation.
Motion Mamba: Efficient and Long Sequence Motion Generation
Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang

ECCV 2024
Human motion generation is a key goal in generative computer vision, and we propose Motion Mamba, a model using state space models (SSMs) with Hierarchical Temporal Mamba (HTM) and Bidirectional Spatial Mamba (BSM) blocks, achieving up to 50% FID improvement and 4x speedup on HumanML3D and KIT-ML datasets, showcasing efficient and high-quality long sequence motion modeling.
More Publications
MedDet: Generative Adversarial Distillation for Efficient Cervical Disc Herniation Detection
Zeyu Zhang, Nengmin Yi, Shengbo Tan, Ying Cai, Yi Yang, Lei Xu, Qingtai Li, Zhang Yi, Daji Ergu, Yang Zhao

BIBM 2024 Oral
Cervical disc herniation (CDH) is a common disorder needing expert analysis. Current automated detection methods face challenges: high computational demands and MRI noise. We propose MedDet for efficient detection, leveraging knowledge distillation, generative adversarial training, and nmODE2. Our model improves mAP by 5%, reduces parameters by 67.8%, and speeds inference fivefold.
Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
Zeyu Zhang, Yiran Wang, Biao Wu, Shuo Chen, Zhiyuan Zhang, Shiya Huang, Wenbo Zhang, Meng Fang, Ling Chen, Yang Zhao

BMVC 2024
Our paper introduces a novel agent-based approach called Motion Avatar for generating customizable human and animal 3D avatars with motions via text queries, coordinated by an LLM planner, and supported by the new Zoo-300K animal motion dataset.
JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA
Zeyu Zhang, Xuyin Qi, Mingxi Chen, Guangxi Li, Ryan Pham, Ayub Qassim, Ella Berry, Zhibin Liao, Owen Siggs, Robert Mclaughlin, Jamie Craig, Minh-Son To

MIUA 2024 Oral
Our paper introduces JointViT, a Vision Transformer model with a novel joint loss function and balancing augmentation technique that significantly improves the accuracy of diagnosing sleep-related disorders using OCTA, achieving up to a 12.28% accuracy improvement.
SegReg: Segmenting OARs by Registering MR Images and CT Annotations
Zeyu Zhang, Xuyin Qi, Bowen Zhang, Biao Wu, Hien Le, Bora Jeong, Zhibin Liao, Yunxiang Liu, Johan Verjans, Minh-Son To, Richard Hartley

ISBI 2024
To improve the accuracy and efficiency of organ at risk (OAR) segmentation in radiotherapy, we propose SegReg, a method that combines CT and MRI using Elastic Symmetric Normalization, outperforming traditional CT-only methods by 16.78% in mDSC and 18.77% in mIoU.
Thin-Thick Adapter: Segmenting Thin Scans Using Thick Annotations
Zeyu Zhang, Bowen Zhang, Abhiram Hiwase, Feng Chen, Akide Liu, Christen Barras, Biao Wu, Adam Wells, Daniel Ellis, Benjamin Reddi, Andrew Burgan, Minh-Son To, Ian Reid, Richard Hartley, Yutong Xie

Preprint
Medical imaging segmentation is critical for medical analysis, predominantly using thicker CT slices due to the scarcity of annotated thin slices, so we propose segmenting thin scans with thicker slice annotations, introduce the CQ500-Thin dataset, and present the Thin-Thick Adapter to bridge domain gaps, significantly improving segmentation performance.
DiabetesNet: A Deep Learning Approach to Diabetes Diagnosis
Zeyu Zhang, Khandaker Asif Ahmed, Md Rakibul Hasan, Tom Gedeon, Md Zakir Hossain

ACIIDS 2024
We propose a non-invasive diabetes diagnosis method using a Back Propagation Neural Network with batch normalization, addressing class imbalance and improving performance over traditional methods, achieving high accuracy on multiple datasets.

ProjectedEx: Enhancing Generation in Explainable AI for Prostate Cancer
Xuyin Qi, Zeyu Zhang∗✝, Aaron Berliano Handoko, Huazhan Zheng, Mingxi Chen, Ta Duc Huy, Vu Minh Hieu Phan, Lei Zhang, Linqi Cheng, Shiyu Jiang, Zhiwei Zhang, Zhibin Liao, Yang Zhao, Minh-Son To

Preprint
Prostate cancer diagnosis benefits from MRI and AI advancements. We propose ProjectedEx, a generative framework with multiscale feedback, enhancing interpretability and lesion classification while addressing challenges in medical imaging complexity.

CT Heterogeneity and Dose Distribution Patterns in Block and Ring Regions Improved the Prediction of Radiation Pneumonitis
Yuyu Liu, Wang Li, Fangfang Yang, Yali Wang, Zeyu Zhang, Biao Wu, Gao Yanping, Han Bai, Wenbing Lv
In Submission (EMBC 2025)
This study analyzed 251 lung cancer patients to examine CT heterogeneity and dose distribution's impact on radiation pneumonitis (RP). Using radiomics features and dose-volume histogram parameters across block and ring regions, seven machine learning models were applied. Results showed multi-modal features outperforming DVH, with ring regions (40–50 Gy) achieving the highest AUC (0.977). This supports personalized treatment planning.
A GNN-based Fraud Detection Approach against Heterophily Inconsistencies
Wenxin Zhang, Jingxing Zhong, Zeyu Zhang, Lingfei Ren, Cuicui Luo
In Submission (IJCAI 2025)
Graph-based fraud detection is vital yet challenged by heterophilic inconsistencies, often overlooking semantic nuances. We propose HIGNN, leveraging distinct semantic patterns in heterophilic connections to enhance fraud detection robustness, validated on real-world datasets.
Bias and Toxicity in Large Language Model Role-Playing
Jinman Zhao, Zifan Qian, Linbo Cao, Yining Wang, Yitian Ding, Yulan Hu, Zeyu Zhang
In Submission (ARR)
Role-play in LLMs enhances contextual responses and reasoning across benchmarks but introduces risks, as adopting diverse roles increases susceptibility to biased or harmful outputs on sensitive evaluations.
GAMED-Snake: Gradient-aware Adaptive Momentum Evolution Deep Snake Model for Multi-organ Segmentation
Ruicheng Zhang, Haowei Guo, Zeyu Zhang, Puxin Yan, Shen Zhao
In Submission (ICME 2025)
The Gradient-aware Adaptive Momentum Evolution Deep Snake (GAMED-Snake) model advances multi-organ segmentation, achieving a 2% mDice improvement via novel gradient-based learning, adaptive momentum evolution, and dynamic boundary alignment innovations.
FDG-Diff: Frequency-Domain-Guided Diffusion Framework for Compressed Hazy Image Restoration
Ruicheng Zhang, Kanghui Tian, Zeyu Zhang, Qixiang Liu, Zhi Jin
In Submission (ICME 2025)
This study introduces FDG-Diff, a frequency-domain-guided framework addressing joint haze degradation and JPEG compression. Key innovations include HFCM for detail restoration and DADTP for adaptive region-specific enhancement, achieving superior dehazing performance.
Class-Centric Semi-supervised Vision Transformers
Hongyang He, Haochen You, Zeyu Zhang, Hongyang Xie, Boyang Fu, Guodong Shen, Victor Sanchez
In Submission (ICME 2025)
CLASST, a framework for Semi-Supervised Vision Transformers, addresses class imbalance by dynamically enhancing minority class representation using learnable centers and adaptive contrastive loss, achieving state-of-the-art results on imbalanced datasets.
AdvMark: Robust Image Watermarking via Two Stage Adversarial Enhancement
Jiahui Chen, Zehang Deng, Zeyu Zhang, Chaoyang Li, Lianchen Jia, Lifeng Sun
In Submission (CVPR 2025)
AdvMark introduces a two-stage fine-tuning strategy for robust watermarking, achieving up to 46% accuracy improvement against advanced attacks while ensuring superior image quality through constrained loss and quality-aware early stopping.
DiffuMural: Diffusion Model for Dunhuang Murals Restoration based on Multi-scale Convergence and Cooperative Diffusion
Puyu Han, Jiaju Kang, Yuhang Pan, Erting Pan, Qunchao Jin, Juntao Jiang, Zeyu Zhang, Zhichen Liu, Luqi Gong
In Submission (CVPR 2025)
DiffuMural combines multi-scale convergence, collaborative diffusion, and cyclic loss, excelling in ancient mural restoration with unmatched detail, style coherence, and cultural preservation, surpassing SOTA methods across comprehensive metrics.
OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad
Luyao Tang, Chaoqi Chen, Yuxuan Yuan, Zeyu Zhang, Yue Huang, Kun Zhang
In Submission (CVPR 2025)
Foundation models struggle with distribution shifts and weak supervision. We propose OCRT, a framework extracting high-level concepts and relations, enhancing SAM and CLIP generalizability in diverse tasks.
EPDD-YOLO: An Efficient Benchmark for Pavement Damage Detection Based on Mamba-YOLO
Shipeng Luo, Yuxin Zhang, Zeyu Zhang, Binhua Guo, Junbo Jacob Lian, Hui Jiang, Shun Zou, Wei Wang
In Submission (Measurement)
EPDD-YOLO enhances pavement damage detection with advanced augmentation, architectural improvements, and the EPDD dataset, achieving 0.873 precision and real-time inference at 198 FPS on challenging benchmarks.
SegKAN: High-Resolution Medical Image Segmentation with Long-Distance Dependencies
Shengbo Tan, Rundong Xue, Shipeng Luo, Zeyu Zhang, Xinran Wang, Lei Zhang, Daji Ergu, Zhang Yi, Yang Zhao, Ying Cai

Preprint
Hepatic vessel segmentation faces noise and fragmentation challenges. We propose SegKAN, enhancing image embedding and spatial-temporal relationships in Vision Transformers, achieving a 1.78% Dice score improvement on a hepatic vessel dataset.
Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve Anomalies
Zirui Song, Guangxian Ouyang, Meng Fang, Hongbin Na, Zijing Shi, Zhenhao Chen, Yujie Fu, Zeyu Zhang, Shiyu Jiang, Miao Fang, Ling Chen, Xiuying Chen

Preprint
Household robots struggle to detect hazards. We propose anomaly scenario generation using multi-agent brainstorming and 3D simulations, enhancing robotic skills in hazard detection, hygiene management, and child safety through diverse environments.
Medical AI for Early Detection of Lung Cancer: A Survey
Guohui Cai, Ying Cai, Zeyu Zhang, Yuanzhouhan Cao, Lin Wu, Daji Ergu, Zhinbin Liao, Yang Zhao

Preprint
Deep learning has revolutionized pulmonary nodule analysis, surpassing traditional methods in detection, segmentation, and classification. This review highlights recent advancements, including CNNs, RNNs, GANs, and ensemble models, improving lung cancer diagnosis.
MSDet: Receptive Field Enhanced Multiscale Detection for Tiny Pulmonary Nodule
Guohui Cai, Ying Cai, Zeyu Zhang, Daji Ergu, Yuanzhouhan Cao, Binbin Hu, Zhinbin Liao, Yang Zhao

Preprint
Pulmonary nodules are vital for early lung cancer diagnosis but challenging to detect. Our proposed MSDet network, with ERD, PCAM, and TODB strategies, achieves state-of-the-art results on LUNA16, improving mAP by 8.8%.
ESA: Annotation-Efficient Active Learning for Semantic Segmentation
Jinchao Ge, Zeyu Zhang, Minh Hieu Phan, Bowen Zhang, Akide Liu, Yang Zhao

Preprint
Active learning improves annotation efficiency by selecting the most informative samples for labeling. We propose Entity-Superpixel Annotation (ESA), an efficient strategy using a mask proposal network and superpixel grouping. Our method reduces click cost by 98% and boosts performance by 1.71%, outperforming pixel-based methods with only 40 clicks per image.
SegStitch: Multidimensional Transformer for Robust and Efficient Medical Imaging Segmentation
Shengbo Tan, Zeyu Zhang, Ying Cai, Daji Ergu, Lin Wu, Binbin Hu, Pengzhang Yu, Yang Zhao

Preprint
Medical imaging segmentation, crucial for lesion analysis, has seen advances with transformers in 3D segmentation. Despite their scalability, transformers struggle with local features and complexity. We propose SegStitch, combining transformers with denoising ODE blocks, improving mDSC by up to 11.48% and reducing parameters by 36.7%, promising real-world clinical adaptation.
Sine Activated Low-Rank Matrices for Parameter Efficient Learning
Yiping Ji, Hemanth Saratchandran, Cameron Gordon, Zeyu Zhang, Simon Lucey

Preprint
We propose a novel theoretical framework integrating a sinusoidal function into low-rank decomposition, enhancing parameter efficiency and model accuracy across diverse neural network applications such as Vision Transformers, Large Language Models, Neural Radiance Fields, and 3D shape modeling.
XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Biao Wu, Yutong Xie, Zeyu Zhang, Minh Hieu Phan, Qi Chen, Ling Chen, Qi Wu

Preprint
Vision-and-language pretraining (VLP) in the medical field faces challenges with reconstructing pathological features due to data scarcity and limited use of paired/unpaired data. This paper proposes XLIP, using AttMIM and EntMLM modules, to enhance feature learning from unpaired data, achieving state-of-the-art results in medical classification tasks.
A Landmark-Based Approach for Instability Prediction in Distal Radius Fractures
Yang Zhao, Zhibin Liao, Yunxiang Liu, Koen Oude Nijhuis, Britt Barvelink, Jasper Prijs, Joost Colaris, Mathieu Wijffels, Max Reijman, Zeyu Zhang, Minh-Son To, Ruurd Jaarsma, Job Doornberg, Johan Verjans

ISBI 2024
Distal radius fractures (DRFs) are common and their instability assessment is crucial for treatment decisions, affecting recovery and costs. We propose a deep learning-based landmark detection method using anatomical landmarks from X-rays to measure distances and angles. These features are used in an XGBoost model for DRF instability classification, validated on a large Dutch dataset.
Can Rotational Thromboelastometry Rapidly Identify Theragnostic Targets in Isolated Traumatic Brain Injury?
Abhiram Hiwase, Christopher Ovenden, Lola Kaukas, Mark Finnis, Zeyu Zhang, Stephanie O'Connor, Ngee Foo, Benjamin Reddi, Adam Wells, Daniel Ellis

EMA 2024
This study evaluates the prognostic utility of ROTEM sigma in isolated traumatic brain injury (TBI). ROTEM sigma, a point-of-care assay, demonstrated faster turnaround times and comparable accuracy to standard coagulation tests in predicting head injury-related deaths. The findings suggest ROTEM sigma effectively detects coagulopathy in isolated TBI cases.
BHSD: A 3D Multi-class Brain Hemorrhage Segmentation Dataset
Biao Wu, Yutong Xie, Zeyu Zhang, Jinchao Ge, Kaspar Yaxley, Suzan Bahadir, Qi Wu, Yifan Liu, Minh-Son To

MLMI 2023
The Brain Hemorrhage Segmentation Dataset (BHSD) is a comprehensive 3D multi-class ICH dataset with pixel-level and slice-level annotations designed to support supervised and semi-supervised ICH segmentation tasks, addressing the lack of existing public datasets for multi-class ICH segmentation.

Research Experience

Visiting Student Researcher
MIT CSAIL
Jun 2024 - Present
Worked on physically compatible 3D generation in MIT CSAIL HCIE group, working with Assoc. Prof. Stefanie Mueller (MIT CSAIL) and Mr. Faraz Faruqi (MIT CSAIL).
Research Intern
Giga AI
Dec 2024 - Present
3D generation, spatial intelligence, and world model, working with Dr. Zheng Zhu (GigaAI).
Research Intern
Alibaba DAMO Academy
Oct 2024 - Present
Efficient long video generation, working with Mr. Jiasheng Tang (DAMO) and Prof. Bohan Zhuang (ZJU, DAMO).
Research Assistant
Zhejiang University
Aug 2024 - Present
Worked on efficient generative models, working with Prof. Bohan Zhuang (ZJU).
Visiting Student Researcher
Peking University
July 2024 - Present
Worked on 3D human motion generation, working with Asst. Prof. Hao Tang (PKU).
Research Intern
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
May 2024 - June 2024
Worked on unsupervised classification of cellular structures based on cryo-electron tomography (cryo-ET), working with Assoc. Prof. Min Xu (CMU, MBZUAI) and Prof. Ian Reid (MBZUAI, AIML).
Research Assistant
La Trobe University
Apr 2024 - Present
Worked on 3D generation and AI for Heath, working with Dr. Yang Zhao (La Trobe University).
Research Assistant
Monash University
Feb 2024 - May 2024
Worked on 3D/4D generative learning, specifically focusing on text-guided human motion and avatar generation, working with Prof. Reza Haffari (Monash University), and Prof. Bohan Zhuang (ZJU, Monash University).
Research Intern
National Computational Infrastructure (NCI)
Feb 2023 - Jun 2023
Worked on long tail large scale multi-label text classification, working with Dr. Jingbo Wang (NCI).
Visiting Student Researcher
Australian Institute for Machine Learning (AIML)
Nov 2022 - Jan 2024
Worked on 3D medical imaging analysis, with a particular focus on semantic segmentations of tumors, hemorrhages, and organs at risk, working with Prof. Ian Reid (MBZUAI, AIML), Dr. Bowen Zhang (AIML), Dr. Yutong Xie (AIML), and Dr. Qi Chen (AIML).
Research Assistant
Flinders Health and Medical Research Institute (FHMRI)
Nov 2022 - Present
Worked on 3D medical imaging analysis, particularly in the realms of 2D and 3D medical representation learning and explainable AI, working with Dr. Minh-Son To (FHMRI).
Student Researcher
The Australian National University (ANU)
Jul 2022 - Nov 2022
Worked on diabetes diagnosis in deep learning, working with Dr. Md Zakir Hossain (ANU, Curtin University, CSIRO Data61), Dr. Khandaker Asif Ahmed (CSIRO), Mr. Md Rakibul Hasan (Curtin University, Brac University), and Prof. Tom Gedeon (Curtin University, ANU, Óbuda University).

Education

Bachelor of Science (Advanced) (Honours)
The Australian National University (ANU)
Jul 2021 - Jun 2025 (Expected)
Major: Computer Science, Minor: Mathematics
Visiting Student
Imperial College London
Jul 2022
Quantitative Sciences Research Institute (QSRI)
Visiting Student
University College London (UCL)
Jul 2022
Visiting Student
Shanghai Jiao Tong University (SJTU)
Dec 2021 - Jan 2022

Honors & Awards

NRF Vacation Scholarship, NeuroSurgical Research Foundation, Oct 2023.
Flinders Summer Research Scholarship, Flinders University CMPH, Nov 2022.
UNSW Science Vacation Research Scholarship, The UNSW Sydney, Oct 2022.

Academic Services

Conference: CVPR 2025, ICLR 2025, IJCAI 2025, CHI 2025, VR 2025, ICME 2025, ICASSP 2025, IJCNN 2025, ISBI 2025, MIUA 2024, BIBM 2024.
Journal: EBM, ACO, MCET, PSEN.

Talks

(07/22/2024) Motion Mamba: Efficient and Long Sequence Motion Generation @ miHoYo, Shanghai. You can find our slides here.