Robot Planning and Reasoning with LLMs and Concept Grounding Ameya Dhamanaskar Arizona State University adhamana@asu.edu 1222318825 Venkatesh Gunda Arizona State University vgunda2@asu.edu 1220102819 Anil B Murthy Arizona State University abmurthy@asu.edu 1221504856 Abstract—This article presents a concise overview of our re- search on robot planning and reasoning with LLMs and concept grounding. We discuss the motivations behind our research and outline the methodology employed in our approach. We will then dive into the general idea and the approach we take to solve it. We provide experimental evidence to substantiate the effectiveness of our approach. Overall, our project aims to prove that this new approach of supplementing LLMs with concept grounding and few-shot example tasks with prompts offers us signiﬁcant improvements over existing results. I. I NTRODUCTION The rapid progress in large language models (LLMs) built on the Transformer architecture [1], [2], [3], [4] has revolution- ized the ﬁeld of natural language processing, enabling systems to generate complex text, answer questions, and engage in dialogue on a wide range of topics. However, the challenge of using LLM’s outputs in the physical world and utilizing their knowledge for real-world tasks remains a major obstacle [8]. This challenge is particularly acute for robots, which must navigate a complex and dynamic environment and interact with objects in a physically meaningful way. In this context, the ability of LLMs to interpret natural language commands and provide guidance for task execution is highly desirable. However, relying solely on LLM-generated instructions can be problematic, as they may lack the context and grounding nec- essary for effective and relevant task execution. Therefore, We present a new approach that combines the strengths of LLMs with concept grounding that allows for a better understanding of the environment and uses the robot’s capabilities to enable more effective task execution. II. PROBLEM OVERVIEW Our objective is to generate executable plans for our robots by leveraging the capabilities of LLMs when presented with speciﬁc tasks [5], [6], [7]. However, for this to be successful, the LLMs must understand the constraints and limitations of robotic tasks. This introduces us to concept grounding, wherein we provide relevant concepts in symbolic terms for the tasks to the LLMs. By incorporating this additional context into the prompts for the LLMs, we aim to enhance their ability to generate plans that align with the capabilities of our robots and achieve successful outcomes. Our approach is inspired by a novel framework called ”Say- Can” [9], which leverages recent advances in reinforcement learning and imitation learning to enable the robot to learn from both natural language commands and demonstrations of desired behaviors. We will demonstrate the effectiveness of our approach through a series of experiments like vertical stacking, horizontal placements, and digit formation, where the robot must execute tasks of varying complexity that inherently demand logical reasoning. Overall, our project provides a new perspective on how to tackle the lack of context in the current LLM model outputs by adding concept grounding to the prompts and proposing a promising approach that can enable effective interaction between robots and their environments. III. APPROACH The SayCan work [9] showed the utility of using LLM as a high-level planner, that provides reasonably relevant plans for a low-level planner or robot to execute. The SayCan algorithm [9] used a combination of LLM-generated Option scoring and a robotic affordance function to score the relevant options higher and prune out irrelevant options for a given task. However, this approach is ’token-intensive’ and hence, extremely expensive owing to the generation and scoring of options by an LLM like GPT-4, at each step for all the tasks, in addition to the few-shot examples and query context text. Thus, in our experiments, we had to get rid of option scoring and also the affordance function since our experiments do not involve any robot demonstrations in the wild or generic queries [9] where the LLM can be used for relevant decision-making among multiple plausible affordance options. Fig. 1. Overall Architectural Pipeline Utilized So, for our experiments, we utilized task-speciﬁc concept grounding with the context of few-shot examples of successful