Robotics paper index

Learning Robot Visual Navigation in Crowds via Intention-Aware Scene Representations

2026-06-24 · arXiv: 2606.26047

One-line summary

A robotics research paper on Learning Robot Visual Navigation in Crowds via Intention-Aware Scene Representations.

Engineering notes

Engineering notes will be added by the Robot Papers editorial team.

Chinese explanation / 中文解读

中文解读待补充:本站会优先为 VLA、具身智能、人形机器人控制、机器人操作等高价值论文补充中文说明。

Original abstract

Robot crowd navigation requires the ability to infer human intentions while accounting for the structural constraints of the environment. Currently, deep reinforcement learning (DRL) provides a promising method for learning navigation policies that understand human intentions. However, most of them rely on limited scene representations, treating pedestrians as simple 2D points and ignoring rich visual cues from both humans and the environment. To address this issue, we introduce iCrowdNav, a novel visual crowd navigation method with intention-aware scene representations, to encode behavioral and structural context from egocentric visual observations. Our method employs two key components: a spatio-temporal encoder for extracting occupancy features of the scene, and Intent-Interact Former (I$^2$ Former), an attention-based module that encodes human poses to infer pedestrians' motion intentions. These features are integrated into a compact state embedding that supports effective DRL policy training. Extensive experiments show that our method achieves superior performance over baselines, and real-world deployment demonstrates vision-based crowd navigation.

5.0Engineering value
7.0Research novelty
4.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Robot Papers can prepare a custom robotics literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.
Login or register to leave a comment