Developing Metrics and Evaluation Methods for Assessing AI-Enabled Robots in Manufacturing Adam Norton, Amy Saretsky, and Holly Yanco New England Robotics Validation and Experimentation (NERVE) Center University of Massachusetts Lowell 110 Canal Street, Lowell, Massachusetts 01852 Corresponding author e-mail: adam norton@uml.edu Abstract Evaluating the capabilities of a robotic system for manufac- turing can include metrics related to performance, efﬁciency, and productivity. Measures for traditional industrial automa- tion typically address operations that rely on strict repetition that does not allow for much variation. The inclusion of artiﬁ- cial intelligence (AI) in robotic systems can allow for greater aptitude in maintaining capability in the presence of varia- tion, such as local changes in environmental characteristics or global changes in task execution parameters. New evaluation methods and metrics are needed to allow these advanced ca- pabilities to be appropriately measured. This paper discusses evaluating the robustness, adaptability, generalizability, and versatility of AI-enabled robotic manufacturing systems. The considerations for conducting evaluations of these capabili- ties are reviewed, including implications for robots that learn and those that are designed to be explainable. Recommenda- tions are made for advancing the development of metrics and evaluation methods that highlight the capabilities afforded by AI. A prototype framework is presented to guide the design of evaluations and classiﬁcation of metrics. Introduction Traditional robot automation in manufacturing performs the same task over and over, allowing for highly repeatable metrics related to performance, efﬁciency, and productivity. Such systems may not be robust in the presence of variation (e.g., a target object is not in the exact place it is expected to be) or may not be able to be reconﬁgured for other tasks. The advent of robots with artiﬁcial intelligence (AI), or AI- enabled robots, in manufacturing enables agile and ﬂexi- ble solutions that can adapt to variation or uncertainty (El- Maraghy 2005; Browne et al. 1984). Variation can appear in many forms including ﬂuctuations in environmental or task characteristics, which can be expected and trained for or un- expected and must be acclimated to. Metrics and evaluation methods are needed to measure and express the capabilities of advanced robotics in manu- facturing and to induce variation that appropriately demon- strates those capabilities. Robots in this domain may also Copyright c  2020, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved. be outﬁtted with learning capabilities to improve their per- formance and may be tasked with explaining their behavior. Both capabilities require special attention when designing evaluations. Prior work in test and evaluation can be lever- aged from relevant domains including those for industrial manipulators, autonomous industrial vehicles, human-robot interaction (HRI), and machine learning. This paper presents some of the considerations for devel- oping metrics and evaluation methods for measuring robot capabilities that are enabled by AI, particularly those that operate in the presence of variation. These variations aid in deﬁning the context of a manufacturing operation/evaluation and can include changes to: • the input data provided to the robot to perform its task, • the target objects being interacted with, • the tasks being performed with those objects, • the environment where the tasks are being executed, and • the robot platform executing the tasks. These variances must be properly characterized so that they accurately represent the context in which a robot will operate (Norton, Messina, and Yanco 2020 In Press). This process is paramount to eliciting results that are potentially generalizable to other, similar scenarios (Amigoni, Luperto, and Schiaffonati 2017), rather than abstract test cases. To do so, the parameters must be selected, measured, and induced as part of an evaluation. This paper is primarily concerned with outcomes-based measures (i.e., those more directly observable) rather than internal assessments of AI. Considerations for developing metrics and evaluation methods are discussed, including im- plications if the robot possesses learning capabilities or is designed to be explainable. Several recommendations are made followed by a proposed framework for guiding the de- sign of evaluations and classiﬁcation of metrics. Related Metrics and Evaluation Methods Performance evaluation is critical to many robotics do- mains including mobile vehicle navigation, manipulation, and human-robot interaction. Some of the metrics and eval- uation methods used in these domains are applicable to AI- enabled robots in manufacturing.