Recomputing Solutions to Perturbed Multi-Commodity Pickup and Delivery Vehicle Routing Problems using Monte Carlo Tree Search Mithun Goutham and Stephanie Stockar Abstract—The Multi-Commodity Pickup and Delivery Vehi- cle Routing Problem aims to optimize the pickup and delivery of multiple unique commodities using a fleet of several agents with limited payload capacities. This paper addresses the challenge of quickly recomputing the solution to this NP- hard problem when there are unexpected perturbations to the nominal task definitions, likely to occur under real-world operating conditions. The proposed method first decomposes the nominal problem by constructing a search tree using Monte Carlo Tree Search for task assignment, and uses a rapid heuristic for routing each agent. When changes to the problem are revealed, the nominal search tree is rapidly updated with new costs under the updated problem parameters, generating solutions quicker and with a reduced optimality gap, as compared to recomputing the solution as an entirely new problem. Computational experiments are conducted by varying the locations of the nominal problem and the payload capacity of an agent to demonstrate the effectiveness of utilizing the nominal search tree to handle perturbations for real-time implementation. I. INTRODUCTION The Multi-Commodity Pickup and Delivery Traveling Salesman Problem (m-PDTSP) involves finding the shortest possible route that transports multiple unique commodities that need to be picked up from and delivered to different lo- cations using a single material handling agent [1]. The Multi- Commodity Pickup and Delivery Vehicle Routing Problem (m-PDVRP) extends this problem to a fleet of several agents with defined payload capacities. The objective is to complete all the material handling requirements while minimizing the total distance traveled by all agents and ensuring capacity constraints are met. The resulting fleet policy involves both the distribution of tasks among the agents and also the se- lection of routes for each agent. This problem has significant implications in transportation, logistics, and supply chain management [2], [3]. Solving the m-PDVRP is challenging, since it is a com- binatorial optimization problem that is an extension of the NP -hard m-PDTSP problem [4], [5]. This means that no algorithm exists that can optimally solve the problem in polynomial time due to the exponential growth in the number of possible task assignments and routes with the number of agents, locations, and commodities. Consequently, exact algorithms are typically limited to small problem sizes [6], and metaheuristic algorithms such as genetic algorithms, ant colony optimization, and simulated annealing are utilized This work was not supported by any organization Mithun Goutham and Stephanie Stockar are with the Department of Me- chanical and Aerospace Engineering, The Ohio State University, Columbus, OH 43210, USA goutham.1@osu.edu to find solutions within a reasonable computation time, although without any guarantees of optimality [7]–[9]. This paper examines the scenario where a fleet is actively executing nominally assigned tasks based on a previously computed fleet policy, when it experiences a perturbation in task definitions or environment due to uncertainties in real- world deployment. In the context of combinatorial optimiza- tion, these small changes or disturbances in the problem def- inition can drastically affect the optimal solution or solution quality. These perturbations may arise from various factors such as unpredictable demand, unavailability of routes due to traffic congestion or accidents, malfunctioning agents, or adverse weather conditions. In such cases, the nominal fleet policy of tasks and routes assigned to each agent may no longer be feasible or optimal, and it is crucial for the online fleet fleet manager to adjust the policy immediately to ensure continuity of operations. The objective of policy adjustment is to reassign tasks and reroute agents in real-time to ensure feasible operation and reduce the optimality gap in the new problem. Due to the NP -hard complexity of the problem, this is typically achieved through a decentralized approach that considers only affected agents to reduce problem size and computation time [10], [11]. Although this results in a shorter recovery time following a perturbation, it typically generates subopti- mal policies. In contrast, centralized fleet control can access global information and leverage the resilience of the entire fleet to attain an optimal or near-optimal policy. However, exact methods for centralized control are often computation- ally intractable, and metaheuristic algorithms do not provide guarantees regarding the quality of the solution produced. Another approach involves generating a-priori datasets that map expert-identified problem perturbations to pre-computed solutions, and a supervised machine learning framework is then used to find a policy update that best approximates a mapping in the dataset [12], [13]. However, this approach produces policies that are biased toward available mappings A gap in the centralized fleet management literature is the under-utilization of prior knowledge of the nominal search space when recomputing the policy for the perturbed problem. While some algorithms include the nominal policy in the initial population of evolutionary algorithms for a warm start, they do not fully harness valuable knowledge of the larger search space [14]. This is due to the intractable memory requirements for keeping track of the extremely large search space of decisions related to both tasks and routing [15], [16]. As a result, current approaches solve an entirely new problem each time a perturbation occurs. arXiv:2304.11444v1 [eess.SY] 22 Apr 2023