Research Article
Identification of Dry Bean Varieties Based on Multiple Attributes
Using CatBoost Machine Learning Algorithm
S. Krishnan ,
1
S. K. Aruna ,
2
Karthick Kanagarathinam ,
3
and Ellappan Venugopal
4
1
Department of , Mahendra ngineering College (Autonomous), Namakkal, Tamil Nadu, India
2
Department of Computer Science and ngineering, School of ngineering and Technology, CHRIST (Deemed to be University),
Bangalore, Karnataka, India
3
Department of lectrical and lectronics ngineering, GMR Institute of Technology, Rajam, Andhra Pradesh, India
4
Department of lectronics and Communication ngineering, School of lectrical ngineering and Computing,
Adama Science and Technology University, Adama, thiopia
Correspondence should be addressed to Ellappan Venugopal; ellappan.venugopal@astu.edu.et
Received 12 December 2022; Revised 11 February 2023; Accepted 3 March 2023; Published 21 April 2023
Academic Editor: Sadiq Hussain
Copyright © 2023 S. Krishnan et al. Tis is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Dry beans are the most widely grown edible legume crop worldwide, with high genetic diversity. Crop production is strongly
infuenced by seed quality. So, seed classifcation is important for both marketing and production because it helps build sus•
tainable farming systems. Te major contribution of this research is to develop a multiclass classifcation model using machine
learning (ML) algorithms to classify the seven varieties of dry beans. Te balanced dataset was created using the random
undersampling method to avoid classifcation bias of ML algorithms towards the majority group caused by the unbalanced
multiclass dataset. Te dataset from the UCI ML repository is utilised for developing the multiclass classifcation model, and the
dataset includes the features of seven distinct varieties of dried beans. To address the skewness of the dataset, a Box•Cox
transformation (BCT) was performed on the dataset’s attributes. Te 22 ML classifcation algorithms have been applied to the
balanced and preprocessed dataset to identify the best ML algorithm. Te ML algorithm results have been validated with a 10•fold
cross•validation approach, and during validation, the CatBoost ML algorithm achieved the highest overall mean accuracy of 93.8
percent, with a range of 92.05 percent to 95.35 percent.
1. Introduction
People eat dry beans, which are a type of legume that is self•
pollinated. Beans are a signifcant crop on a global scale and
are popular with both farmers and consumers. Dry beans
account for nearly 50 percent of the grain legumes consumed
directly by humans in the majority of developing countries
1]. Beans are a staple food in Sub•Saharan Africa, where
they are consumed by more than 200 million people 2]. A
system of quality control makes sure that approved seed
meets national and global quality benchmarks. For the
majority of food products, visual characteristics are the
primary criterion used by consumers when making pur•
chasing decisions 3]. Like other legume species, common
beans show the most variation in terms of growth patterns,
physical features (size, shape, and shading), maturity, and
ability to grow and adapt 4, 5]. Sorting and classifying bean
seeds manually is a time•consuming process. Additionally,
this method is inefcient and tedious, particularly when
working with large production volumes. Human inspectors
are usually in charge of checking raw materials, and it is
difcult to streamline the inspectors’ fndings. Tese con•
siderations reafrm the importance of objective measure•
ment systems. As a result, automatic grading and
classifcation methods are required.
Recent technological changes have helped researchers in
this feld a lot. Computer vision systems (CVSs) are being
used for quality control and have recently begun to be used
as an objective measurement and evaluation system 6–9].
CVS technology, which is primarily camera cum computer
Hindawi
Scientific Programming
Volume 2023, Article ID 2556066, 21 pages
https://doi.org/10.1155/2023/2556066