Research Article Metaheuristic Algorithms for Convolution Neural Network L. M. Rasdi Rere, 1,2 Mohamad Ivan Fanany, 1 and Aniati Murni Arymurthy 1 1 Machine Learning and Computer Vision Laboratory, Faculty of Computer Science, Universitas Indonesia, Depok 16424, Indonesia 2 Computer System Laboratory, STMIK Jakarta STI&K, Jakarta 12140, Indonesia Correspondence should be addressed to L. M. Rasdi Rere; laode.mohammad@ui.ac.id Received 29 January 2016; Revised 15 April 2016; Accepted 10 May 2016 Academic Editor: Martin Hagan Copyright © 2016 L. M. Rasdi Rere et al. his is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. A typical modern optimization technique is usually either heuristic or metaheuristic. his technique has managed to solve some optimization problems in the research area of science, engineering, and industry. However, implementation strategy of metaheuristic for accuracy improvement on convolution neural networks (CNN), a famous deep learning method, is still rarely investigated. Deep learning relates to a type of machine learning technique, where its aim is to move closer to the goal of artiicial intelligence of creating a machine that could successfully perform any intellectual tasks that can be carried out by a human. In this paper, we propose the implementation strategy of three popular metaheuristic approaches, that is, simulated annealing, diferential evolution, and harmony search, to optimize CNN. he performances of these metaheuristic methods in optimizing CNN on classifying MNIST and CIFAR dataset were evaluated and compared. Furthermore, the proposed methods are also compared with the original CNN. Although the proposed methods show an increase in the computation time, their accuracy has also been improved (up to 7.14 percent). 1. Introduction Deep learning (DL) is mainly motivated by the research of artiicial intelligent, in which the general goal is to imitate the ability of human brain to observe, analyze, learn, and make a decision, especially for complex problem [1]. his technique is in the intersection amongst the research area of signal processing, neural network, graphical modeling, opti- mization, and pattern recognition. he current reputation of DL is implicitly due to drastically improve the abilities of chip processing, signiicantly decrease the cost of computing hardware, and advanced research in machine learning and signal processing [2]. In general, the model of DL technique can be classi- ied into discriminative, generative, and hybrid models [2]. Discriminative models, for instance, are CNN, deep neural network, and recurrent neural network. Some examples of generative models are deep belief networks (DBN), restricted Boltzmann machine, regularized autoencoders, and deep Boltzmann machines. On the other hand, hybrid model refers to the deep architecture using the combination of a discriminative and generative model. An example of this model is DBN to pretrain deep CNN, which can improve the performance of deep CNN over random initialization. Among all of the hybrid DL techniques, metaheuristic opti- mization for training a CNN is the focus of this paper. Although the sound character of DL has to solve a variety of learning tasks, training is diicult [3–5]. Some examples of successful methods for training DL are stochastic gradient descent, conjugate gradient, Hessian-free optimization, and Krylov subspace descent. Stochastic gradient descent is easy to implement and also fast in the process for a case with many training samples. However, this method needs several manual tuning scheme to make its parameters optimal, and also its process is princi- pally sequential; as a result, it was diicult to parallelize them with graphics processing unit (GPU). Conjugate gradient, on the other hand, is easier to check for convergence as well as more stable to train. Nevertheless, CG is slow, so that it needs multiple CPUs and availability of a vast number of RAMs [6]. Hessian-free optimization has been applied to train deep autoencoders [7], proicient in handling underitting problem, and more eicient than pretraining + ine tuning Hindawi Publishing Corporation Computational Intelligence and Neuroscience Volume 2016, Article ID 1537325, 13 pages http://dx.doi.org/10.1155/2016/1537325