(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 13, No. 9, 2022 864 | Page www.ijacsa.thesai.org Performance Analysis of Deep Learning YOLO Models for South Asian Regional Vehicle Recognition Minar Mahmud Rafi, Siddharth Chakma, Asif Mahmud, Raj Xavier Rozario, Rukon Uddin Munna, Md. Abrar Abedin Wohra, Rakibul Haque Joy, Khan Raqib Mahmud, Bijan Paul* Department of Computer Science and Engineering University of Liberal Arts Bangladesh Dhaka, Bangladesh Abstract—For years, humans have pondered the possibility of combining human and machine intelligence. The purpose of this research is to recognize vehicles from media and while there are multiple models associated with this, models that can detect vehicles commonly used in developing countries like Bangladesh, India, etc. are scarce. Our focus was to assimilate the largest dataset of vehicles exclusive to South Asia in addition to the more common universal vehicles and apply it to track and recognize these vehicles, even in motion. To develop this, we increased the class variations and quantity of the data and used multiple variations of the YOLOv5 model. We trained different versions of the model with our dataset to properly measure the degree of accuracy between the models in detecting the more unique vehicles. If vehicle detection and tracking are adopted and implemented in live traffic camera feeds, the information can be used to create smart traffic systems that can regulate congestion and routing by identifying and separating fast and slow-moving vehicles on the road. The comparison between the three different YOLOv5 models led to an analysis that indicates that the large variant of the YOLOv5 architecture outperforms the rest. Keywords—You Only Look Once (YOLOv5); vehicle detection; neural network; deep learning; vehicle tracking I. INTRODUCTION Advancements in automobile manufacturing have given rise to more affordable cars which has resulted in over five million registered vehicles [1] coasting through the roads of Bangladesh. Road infrastructures in this country were not designed to hold the growing number of vehicles which presents grave environmental and health concerns. Given the circumstance, congestion is inevitable, and this significantly contributes to the rising air and noise pollution levels in the city. To circumvent this obstacle, restless drivers resort to maneuvering chaotically without any regard to traffic rules and are thus responsible for most of the road fatality cases in the country. One of the biggest obstacles is that traditional methods of prevention such as traffic lights and pedestrian crossings are not sufficient because they are generally ignored. To address this issue, an intuitive system is needed to observe traffic patterns and direct different vehicles into proper lanes. Most vehicles in South Asia are very different than those in the western world as they differ drastically in shapes, sizes, and colors. This is a major challenge the algorithm will face [2] as it needs to differentiate between these vehicles to identify them individually. The height and angle at which these vehicles are posed and captured also factor into this problem. Datasets that include traditionally used South Asian vehicles are scarce and do not contain the required amount of data which presents a separate challenge. Due to the erratic nature of traffic in South Asian countries, different CNN models that are usually tested in other environments have not been applied enough to see how they perform in the tumultuous streets of cities like Dhaka. A major challenge of our research is that we have had data scarcity, particularly for south Asian vehicles. Machine learning has progressed enough to make use of traffic cameras [3] to track vehicles and their patterns. Additionally, using Neural Network-based Object Detection can produce valuable tracking and surveillance data that could be essential to coming up with a solution to the traffic problem. Further applications in the division of slow- and fast-moving vehicles and the identification of missing vehicles can also be pursued through Deep Learning. Smart traffic systems [4] can utilize these applications to reduce mishaps while also improving the flow of traffic. Autonomously driven cars [5] can also employ the previously mentioned applications to avoid different vehicles, clogged roads, and potential accidents while on the road. However, one of the key difficulties in using machine learning algorithms is the requirement of a vast amount of data to train a model. In this research, we develop a sizable vehicle dataset from scratch and train a model to accurately recognize them. The intention was to set our work apart from conventional vehicle detection systems. Our research is distinctive in that we have curated a dataset consisting of 21 classes of vehicles commonly available worldwide and those that are only seen in South Asian regions. Unique vehicles like rickshaws, human haulers, three-wheelers, etc. all vary in build and proportion. The collected images are put through a lengthy process of cleaning, augmenting, and finally labeling through bounding box annotations. To address the data scarcity issue, we used different augmentation techniques to balance the dataset. This is done to ensure we have enough data for accurate testing and training. We chose a well-known object detection algorithm called YOLOv5 (You Only Look Once) [6] to use in our model for training and we compared the *Corresponding Author.