Scene Graph for Active Exploration in Clutered Scenario
Yuhong Deng
∗
@mails.tsinghua.edu.cn
Tsinghua University
Haidian, Beijing, China
Qie Sima
∗
smq20@mails.tsinghua.edu.cn
Tsinghua University
Haidian, Beijing, China
Huaping Liu
2
hpliu@mail.tsinghua.edu.cn
Tsinghua University
Haidian, Beijing, China
ABSTRACT
Robotic question answering is a representative human-robot in-
teraction task, where the robot must respond to human questions.
Among these robotic question answering tasks, Manipulation Ques-
tion Answering (MQA) requires an embodied robot to have the
active exploration ability and semantic understanding ability of
vision and language. MQA tasks are typically confronted with two
main challenges: the semantic understanding of clutter scenes and
manipulation planning based on semantics hidden in vision and
language. To address the above challenges, we frst introduce a
dynamic scene graph to represent the spatial relationship between
objects in cluttered scenarios. Then, we propose a GRU-based struc-
ture to tackle the sequence-to-sequence task of manipulation plan-
ning. At each timestep, the scene graph will be updated after the
robot actively explore the scenario. After thoroughly exploring the
scenario, the robot can output a correct answer by searching the
fnal scene graph. Extensive experiments have been conducted on
tasks with diferent interaction requirements to demonstrate that
our proposed framework is efective for MQA tasks. Experiments
results also show that our dynamic scene graph represents seman-
tics in clutter efectively and GRU-based structure performs well in
the manipulation planning.
CCS CONCEPTS
· Computer systems organization → Embedded systems; Re-
dundancy; Robotics; · Networks → Network reliability.
KEYWORDS
Embodied task,Active exploration,Scene graph
ACM Reference Format:
Yuhong Deng, Qie Sima, and Huaping Liu. 2022. Scene Graph for Active
Exploration in Cluttered Scenario. In Woodstock ’18: ACM Symposium on
Neural Gaze Detection, June 03ś05, 2018, Woodstock, NY . ACM, New York,
NY, USA, 9 pages. https://doi.org/XXXXXXX.XXXXXXX
Figure 1: After receiving the question, the robot will under-
stand the semantic information hidden in clutter by the ini-
tial scene graph. A manipulation sequence will be used to ac-
tive explore the clutter and obtain the fnal scene graph con-
tains sufcient information for answering the given ques-
tion. And the robot can output the answer by searching the
fnal graph.
1 INTRODUCTION
People have long anticipated that one day embodied robot can
directly receive human questions based on natural language and
actively interact with the real environment to respond and give an
answer [16], which refects intelligent interactions between robot,
human, and environment [17]. Recently, active manipulation have
been widely used in embodied robot tasks to enable the agent to
retrieve more information from environment. Manipulation ques-
tion answering (MQA) is a new proposed human-robot interaction
task where the robot must perform manipulation actions to actively
explore the environment to answer a given question[7]. However,
proposed methods for MQA task only focus on one specifc task
related to counting questions[14, 22]. Considering that the form
of human-computer interaction should be varied, we extend the
∗
Both authors contributed equally to this research.
2
Corresponding author
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specifc permission and/or a
fee. Request permissions from permissions@acm.org.
Conference acronym ’XX, June 03ś05, 2018, Woodstock, NY
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00
https://doi.org/XXXXXXX.XXXXXXX
arXiv:2207.07870v2 [cs.RO] 22 Feb 2023