Motion Mamba

Motion Mamba: Efficient and Long Sequence Motion Generation

News:

(07/22/2024) 🎉 Our paper was invited for a talk at miHoYo. You can find our slides here!

(07/05/2024) 🎉 Our paper has been highlighted twice by CVer!

(07/02/2024) 🎉 Our paper has been accepted to ECCV 2024!

(03/15/2024) 🎉 Our paper has been highlighted by MarkTechPost!

(03/13/2024) 🎉 Our paper has been featured in Daily Papers!

(03/13/2024) 🎉 Our paper has been highlighted by CVer!

Abstract

Human motion generation stands as a significant pursuit in generative computer vision, while achieving long-sequence and efficient motion generation remains challenging. Recent advancements in state space models (SSMs), notably Mamba, have showcased considerable promise in long sequence modeling with an efficient hardware-aware design, which appears to be a promising direction to build motion generation model upon it. Nevertheless, adapting SSMs to motion generation faces hurdles since the lack of a specialized design architecture to model motion sequence. To address these challenges, we propose Motion Mamba, a simple and efficient approach that presents the pioneering motion generation model utilized SSMs. Specifically, we design a Hierarchical Temporal Mamba (HTM) block to process temporal data by ensemble varying numbers of isolated SSM modules across a symmetric U-Net architecture aimed at preserving motion consistency between frames. We also design a Bidirectional Spatial Mamba (BSM) block to bidirectionally process latent poses, to enhance accurate motion generation within a temporal frame. Our proposed method achieves up to 50% FID improvement and up to 4 times faster on the HumanML3D and KIT-ML datasets compared to the previous best diffusion-based method, which demonstrates strong capabilities of high-quality long sequence motion modeling and real-time human motion generation.

Visualization

The character performs a backflip when doing exercise.

The character first walks, then jumps, and later runs in a straight line.

The character walks in two consecutive clockwise circles.

The character executes a smash during a badminton match.

The character performs a street dance.

The character sits on the ground, then stands up, picks up a newspaper from the ground, and starts to read.

BibTeX

@inproceedings{zhang2025motion, title={Motion Mamba: Efficient and Long Sequence Motion Generation}, author={Zhang, Zeyu and Liu, Akide and Reid, Ian and Hartley, Richard and Zhuang, Bohan and Tang, Hao}, booktitle={European Conference on Computer Vision}, pages={265--282}, year={2025}, organization={Springer} }

Motion Mamba: Efficient and Long Sequence Motion Generation

News:

Abstract

Visualization

Methodology

Performance

Poster

BibTeX