MINSU (Mobile Inventory And Scanning Unit): Computer Vision and AI Jihoon Ryoo, Byungkon Kang, Dongyeob Lee, Seunghyeon Kim, Youngho Kim I. I NTRODUCTION Computer vision is an advanced computer science tech- nique that has provided the basis for many new patents and APIs that have informed a multitude of applications. Computer vision involves all analyses, including machine learning, performed using pixels of images. Some manu- facturing industries have recently discovered that they need manpower resources to monitor whether the machine is working properly and if adequate materials are entering the hopper. From an economic perspective, using manpower to monitor resources manually can be an inefﬁcient allocation of resources associated with welfare decline. Furthermore, the inventory is 2∼3m tall, introducing the possibility of falls as personnel ascends a ladder for monitoring. A fall from the ladder could lead to an industrial accident and a signiﬁcant loss of proﬁt for the company. The MINSU(Mobile Inventory and Scanning Unit) al- gorithm uses the computational vision analysis method to record the residual quantity/fullness of the cabinet. To do so, it goes through a ﬁve-step method: object detection, foreground subtraction, K-means clustering, percentage es- timation, and counting. The input image goes through the object detection method to analyze the speciﬁc position of the cabinets in terms of coordinates. After doing so, it goes through the foreground subtraction method to make the image more focus-able to the cabinet itself by removing the background (some manual work may have to be done such as selecting the parts that were not grab cut by the algorithm). In the K-means clustering method, the multi- colored image turns into a 3 colored monotonous image for quicker and more accurate analysis. At last, the image goes through percentage estimation and counting. In these two methods, the proportion that the material inside the cabinet is found in percentage which then is used to approximate the number of materials inside. Had this project been successful, the residual quantity management could solve the problem addressed earlier in the introduction. II. RELATED WORKS A. K-means clustering K means clustering is a clustering method that groups the similar pixels, and make the image more monotonous by converting the groups into a one colored pixel. Figure 1 shows how the similar groups are determined. The maximum number of groups can be the maximum type of colors to represent the image. This algorithm is based on the EM al- gorithm. The EM algorithms operates in two step processes: Fig. 1. Simple diagram of how deep learning technique works [6] expectation, and maximization. This will be repeating until it converges. This is a type of algorithms that is used to ﬁnd the solutions for difﬁcult problems. The K means clustering is used in this case since the computational complexity of the K means clustering is O(n). It means that the K means clustering could operate quicker than the other clustering methods. B. YOLOv3 Fig. 2. Simple diagram of how deep learning technique works [4] Mobile inventory and scanning units are used in order to measure the degree of fullness of the inventory. The mobile inventory and scanning unit utilized YOLOv3 algorithm. YOLOv3 algorithm is an object detection framework that is based on Convolution Neural Network (a powerful image processing artiﬁcial intelligence that use deep learning in order to perform both generative and descriptive tasks). the YOLOv3 framework, among all other prominent object detection frameworks such as RetinaNet-101, fast RCNN, etc., has the best detection FPS, time, and mean average precision. YOLOv3 draws bounding boxes by the probability of each frame that is detected by its network. Unlike R- CNN which requires thousands of networks for a single image, YOLOv3 only requires one single network for a arXiv:2204.06681v1 [cs.CV] 14 Apr 2022