IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010 103 Manuscript received August 5, 2010 Manuscript revised August 20, 2010 Architecture and Weight Optimization of ANN Using Sensitive Analysis and Adaptive Particle Swarm Optimization Faisal Muhammad Shah † , Md. Khairul Hasan †† , Mohammad Moinul Hoque ††† and Suman Ahmmed †††† †,††,††† Department of Computer Science and Engineering Ahsanullah University of Science and Technology, Dhaka, Bangladesh †††† Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh. Abstract This paper presents an optimized architecture and weights of three layered ANN designing method using sensitivity analysis and adaptive particle swarm optimization (SA–APSO). The optimized ANN architecture determination means to look for near minimal number of neurons in the ANN and finding the efficient connecting weights of it in such a way so that the ANN can achieve better performance for solving different problems. The proposed algorithm designs the ANN into two phases. In the first phase it tries to prune the neurons from ANN using sensitivity analysis to achieve the near minimal ANN structure and therefore it tries to optimize the weight matrices for further performance enhancement by adaptive particle swarm optimization. In the SA phase the authors use impact factor and correlation coefficients for pruning lower salient neurons. Initially it tries to prune the neurons having less impacts in the performance of ANN based on their impact factor values. Therefore it tries to lessen more neurons through merging the similar neurons in the ANN using correlation coefficient among the neuron pairs. In the optimization part it applied adaptive particle swarm optimization to optimize the connecting weight matrices to attain better performance. In the optimization by APSO, a special type of PSO, the authors’ use training and validation fitness functions to emphasis on avoiding overfitting and more adapted with ANN, and to achieve effective weight matrices of ANN. To evaluate SA–APSO, it is applied on the dataset of Regional Power Control Center of Saudi Electricity Company, Western Operation Area (SEC-WOA) to do short term load forecasting (STLF). Results show that the proposed SA-APSO is able to design smaller architecture and attain excellent accuracy. Key words: Artificial neural networks, overfitting, correlation coefficients, particle swarm optimization. 1. Introduction Architecture designing of an artificial neural network (ANN) is a very important area as the performance of an ANN largely depends on its effective structure. When applications become more complex, the structures presumably become larger. Moreover, larger structures increase the numbers of parameters and lose the generalizations ability. The determination of optimized ANN architectures means to decide the number of layers along with their respective neurons and to get the optimized connecting weights among the neurons of consecutive layers. It is well known that a three layered ANN, consists of an input, a hidden, and an output layer, can solve all kinds of linear and non linear problems. Therefore, in this research the number of layers is taken as three and the number of neurons and values of connecting weights will be determined by the sensitivity analysis and adaptive particle swarm optimization (SA–APSO) approach. Usually the numbers of input and output neurons are determined by the sizes of input and output vectors of dataset and architecture designing means to determine the number of hidden neurons and optimization of weights mean to optimize the values of weight matrices. The problem of designing a near optimal ANN architecture for a given application is a tricky question for the researchers. However, this is an important issue since there are strong biological and engineering evidences to support its’ functions. So, the information processing ability of an ANN is majorly depends on its architecture [1-4]. The fact is that both the large and small networks exhibit a number of advantages and disadvantages. On the one hand, a larger-sized network may be trained quickly; it can more easily avoid local minima and more accurately fit the training data. However, it may be inefficient because of its high computational complexity, many degrees of freedom and poor performance in generalization due to over-fitting. On the other hand, a smaller network may save the computational costs and have good performance in generalization. However, it may learn very slowly or may not learn the data set at all. Even it is known, there is no guarantee that the smallest feasible network will converge to the correct weights during training because the network may be sensitive to the initial settings and more likely to be trapped in local minima [5- 6]. To design an appropriate architecture for the solution of a given task is always an open challenge [1] [3-4].