News:

 

Abstract

Energy-based predictive world models provide a powerful approach for multi-step visual planning by reasoning over latent energy landscapes rather than generating pixels. However, existing approaches face two major challenges: (i) their latent representations are typically learned in Euclidean space, neglecting the underlying geometric and hierarchical structure among states, and (ii) they struggle with long-horizon prediction, which leads to rapid degradation across extended rollouts. To address these challenges, we introduce GeoWorld, a geometric world model that preserves geometric structure and hierarchical relations through a Hyperbolic JEPA, which maps latent representations from Euclidean space onto hyperbolic manifolds. We further introduce Geometric Reinforcement Learning for energy-based optimization, enabling stable multi-step planning in hyperbolic latent space. Extensive experiments on CrossTask and COIN demonstrate around 3% SR improvement in 3-step planning and 2% SR improvement in 4-step planning compared to the state-of-the-art V-JEPA 2.

Energy-Based Planning

 

 


Energy-based planning by GeoWorld. The diagram shows a Replace Memory Chip task from the COIN dataset, where GeoWorld plans actions by following geodesics over a hyperbolic energy landscape rather than generating pixels.


Architecture

 

 


Overview of GeoWorld. Our geometric world model integrates Hyperbolic JEPA for geometry-preserving latent dynamics and Geometric Reinforcement Learning for geodesic-consistent multi-step refinement. Together with energy-based planning using CEM, GeoWorld enables stable and geometry-aware long-horizon visual planning.


BibTeX

@article{zhang2026geoworld,
  title={GeoWorld: Geometric World Models},
  author={Zhang, Zeyu and Li, Danning and Reid, Ian and Hartley, Richard},
  journal={arXiv preprint arXiv:2602.23058},
  year={2026}
}