Abstract—In this paper, the Clonal Selection Algorithm (CSA) is implemented to determine the optimal architecture of a convolutional neural network (CNN) for the classification of handwritten character digits. The efficacy of CNN in image recognition is one of the central motives why the world has woken up to the effectiveness of deep learning. During training, an optimal CNN architecture can extract complex features from the data that is being trained; however, the ideal architecture of a CNN for a specific problem cannot be determined by some standard procedure. In practice, CNN architectures are generally designed using human expertise and domain knowledge. By using CSA, optimal architecture of CNN can be determined autonomously through evolution of hyperparameters of the architecture for a given dataset. In this work, proposed methodology is tested on EMNIST dataset which is an enhanced version of MNIST dataset. The results have proven that the CSA based tuning is capable of generating optimal CNN architectures. Through this proposed technique, the best architecture of CNN for a given problem can be self- determined without any human intervention. Index Terms—Convolutional neural network, clonal selection algorithm, EMNIST dataset. I. INTRODUCTION Machine learning is an application of artificial intelligence that deals with the study of algorithms that can learn from data. It enables systems to learn from historical data without being explicitly programmed. It is an exciting area of research that can drive the future of technology [1]. Machine learning has become an active area of research for the last two decades. Famous applications of machine learning include speech recognition, image recognition, self-driving cars and medical diagnosis, etc. These different applications of machine learning are impacting human lives in a positive manner. Machine learning algorithms generally consist of two phases: training and prediction. In training phase, goal of any machine learning model is to learn those parameters of the model which best describes the overall training dataset. In prediction phase, learned model is used to make predictions. These predictions can then be used for many different real- world applications [2]. Machine learning algorithms can be categorized into two broad categories depending upon the nature of available dataset: supervised and unsupervised. Supervised algorithms are those, which require corresponding output labels in addition to input data for training, whereas unsupervised algorithms only require input data for training. Manuscript received May 27, 2019; revised October 11, 2019. Neural networks are a set of Machine learning algorithms, which model complex relationships in data using a multilayer representation. A neural network is made up of number of individual units which are called neurons. These neurons are arranged in multiple layers which are connected to each other through these neurons. Neurons of one layer are connected to neurons of another layer through weighted edges. Weights of these edges are learned during training phase using training dataset. Data is passed from input layer to output layer of the network through these neurons. Neurons in each layer perform simple mathematical operations using learned weights and transmits the information to all the other neurons connected to it [3], [4]. Increase in computational power has increased the interest of researchers in deep neural networks, where the structure of network become more complex by adding more layers into the network. These deep neural networks are capable of solving wide range of problems adequately that could not be solved before such as Image classification [5]-[7]. A special class of deep neural networks is the convolutional neural networks, which are usually referred to as CNNs or ConvNets. CNNs are a special kind of artificial neural networks, which have a grid-like architecture and proposed in a paper by Yann LeCun in 1998 [8]. He used CNNs to classify handwritten digits and was able to achieve it through his first CNN which was called LeNet-5. For image classification, CNN is the state-of-art technique due to its grid-like architecture and used by several big players like Facebook, Google and Amazon etc. for various applications related to image classification. CNNs use a mathematical process called convolution, which is a special type of linear transformation. Unlike conventional neural networks, CNNs use convolution operation to obtain an intermediate output (feature). This intermediate output is then provided as input to the next layer. In a CNN architecture, this is done in at least one of its layers. A typical CNN architecture comprises of layers of different types, such as convolution, pooling and fully connected. In order to design such CNN architecture, users have to make various design decisions. The architecture of CNN needs to determine the types and number of layers, ordering of these layers, and hyperparameters of each layer. Hyperparameters of each layer include filter size for the convolution layers, activation type for each layer, pool size, and filter size for pooling layers. The vast number of architectures that can be generated based on these selections makes it difficult for an exhaustive manual search. While there has been some work going on automated discovery of The authors are with the EECS Department, the University of Toledo, Toledo, OH 43606 USA (e-mail: ali.albataineh@utoledo.edu, devinder.kaur@utoledo.edu). Optimal Convolutional Neural Network Architecture Design Using Clonal Selection Algorithm Ali Al Bataineh and Devinder Kaur doi: 10.18178/ijmlc.2019.9.6.874 International Journal of Machine Learning and Computing, Vol. 9, No. 6, December 2019 788