MoMa-Kitchen

A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation



Motivation

Conventional navigation methods typically prioritize reaching a target location but do not account for constraints affecting manipulation feasibility. Left Position A prioritizes proximity but is obstructed by chairs, preventing stable execution. Middle: Position B places the robot in a spacious and stable area for operation but beyond its effective reach. Right: Our approach, leveraging navigation affordance grounding, identifies Position C as the optimal stance, ensuring both reachability and task feasibility.




Project Overview

We present MoMa-Kitchen, a benchmark dataset with over 100k auto-generated samples featuring affordance-grounded manipulation positions and egocentric RGB-D data, and propose NavAff, a lightweight model that learns optimal navigation termination for seamless manipulation transitions. Our approach generalizes across diverse robotic platforms and arm configurations, addressing the critical gap between navigation proximity and manipulation readiness in mobile manipulation.



Contributions



Dataset Generation

MoMa-Kitchen generates a floor affordance map to determine feasible navigation positions that enable successful manipulation in cluttered environments. It integrates RGB-D visual data and robot-specific parameters to label navigation affordances in diverse kitchen settings, collecting first-person view data and ground truth through mobile manipulators.



Data Generation Demos

We collect floor-level navigation affordance data using mobile manipulators with various robotic arms. For each target object, we define a semicircular affordance sampling area and attempt manipulations at sampled positions, recording success or failure.  




Real World Demos

We validate our method in real-world experiments using a mobile manipulator equipped with a D435i camera. Object masks and depth images are obtained using Grounded-SAM and Depth Anything v2, respectively, to generate a global point cloud for affordance prediction. Results in kitchen scenarios show generalization from simulation to reality, demonstrating the robustness of our approach.  




Main Contributors

Pingrui Zhang

Pingrui Zhang

Phd Candidate

Fudan University &

Shanghai AI Lab

zhangpingrui@pjlab.org.cn

Dr. Yan Ding

Dr. Yan Ding

Researcher at Shanghai AI Lab

yding25@binghamton.edu

BibTeX

@misc{zhang2025momakitchen100kbenchmarkaffordancegrounded,
				title={MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation}, 
				author={Pingrui Zhang and Xianqiang Gao and Yuhan Wu and Kehui Liu and Dong Wang and Zhigang Wang and Bin Zhao and Yan Ding and Xuelong Li},
				year={2025},
				eprint={2503.11081},
				archivePrefix={arXiv},
				primaryClass={cs.RO},
				url={https://arxiv.org/abs/2503.11081}, 
		  }