An Efﬁcient Approach for Neural Network Architecture Kasem Khalil, Omar Eldash, Ashok Kumar, Magdy Bayoumi The Center for Advanced Computer Studies University of Louisiana at Lafayette, Louisiana, USA Emails: kmk8148, oke1206, axk1769, mab0778@louisiana.edu Abstract—Neural network is one of the main concepts used in ma- chine learning applications. The hardware realization of neural network requires a large area to implement a network with many hidden layers. This paper presents a novel design of a neural network to reduce the hardware area. The proposed approach reduces the number of physical hidden layers from N to N/2 while maintaining full accuracy with a minimal increase in time complexity. The proposed approach adopts the concept of multiplexing input and output layers of the neural network. The approach is implemented based on Tensorﬂow framework and Xilinx Virtex-7 FPGA. The simulation results show the accuracy of the proposed approach is the same as expected from traditional network, which uses N layers, while using only N/2 hardware layers. The hardware implementation results show the proposed approach saves 42% area. Keywords: Neural Network, Deep Learning, Multiplexing, Image Recognition, Pattern Recognition, Convolutional Neural Network. I. I NTRODUCTION Neural Networks (NN) are widely used as classiﬁers for data classiﬁcation applications and it is becoming pervasive in applications such as: speech recognition [1], computer vision, image recognition, natural language processing and decision making [2]. Classiﬁcation techniques involve predicting a certain output based on a given input. Many neural network models have been proposed to associate between data sets and then predict based on given data. The algorithm after training should detect appropriate relationships between the attributes in order to precisely predict the actual output for new inputs. The accuracy of prediction resembles the efﬁciency of the algorithms in recognizing new patterns depending on the training of the algorithm. An NN is an important system as it works differently from traditional computing in digital processors and it closely mimics processing by a human brain. The human brain is a nonlinear, very complex, and parallel information processing system. It has the ability to arrange its neurons to perform certain operations faster than the fastest digital computers. The brain can recognize familiar faces in an unfamiliar scene within approximately 100-200 ms. However, conventional computer may take long time to do even less complex tasks. Thus, a neural network works as a designed machine to model the way in which the brain runs a certain task. The NN can be simulated in software to perform a task and it can be further implemented and accelerated by hardware. NN is based on parallel computation which uses multiple layers that consists of basic units called nodes or neurons. Each node performs a process on the input data and sends the output to the next layer nodes. Deep Learning (DL), in recent years, has become the state of the art for Machine Learning(ML). Many applications have been studied in classiﬁcation and methods based on DL provides improvements in different domains. These domains include object detection [3], action recognition [4], face and speech recognition [5], semantic segmenta- tion [4], Computational ﬁnance [6] etc. The deep convolutional neural network (CNN) one of the most popular methods due to its ability to learn data hierarchical level abstraction through encoding them on different layers. DL methods have achieved a better classiﬁcation performance compared with traditional scene classiﬁcation methods in remote sensing domain [7]. NN relies on many layers for complex applications. Therefore, the size of the network is high and it requires large area of hardware realization. A fully connected layer is used as the last stage in different architecture of deep neural network. For example, CNN uses a fully connected layer in the last stage after convolution and pooling layers. A fully connected layer size in CNN is approximately 4069 neurons or more. Clearly, a new research direction that focuses on decreasing the neural network size while retaining the precision is needed. The focus of this paper is to reduce the hardware size of such big NN. The work in [8] presented a time division multiplexing of a communication protocol for neuromorphic system to minimize the physical number of interconnects between neurons. The time division multiplexing is based on a single channel to minimize the intercon- nection cost and processing time. Their proposed method presented better energy efﬁciency and lower implementation complexity in terms of interconnects. The analytical and numerical results shows 40% lower energy consumption in interconnect in a network of 1024 neurons. The work in [9] proposed two approaches to reuse resources in feed-forward neural network which are coalescing and folding. In coalescing approach, a single stack of neurons performs both feature extraction task and classiﬁcation task using shared resources. In folding approach, in a high-dimension feed-forward layer, the neurons are folded to execute multiple tasks. Also, it can be combined with low precision modules. The proposed techniques are tested for classiﬁcation task on binary and multi-class (MNIST) dataset. The simulation results show the power consumption is 3.65 mW and classiﬁcation accuracy is 91.2% for MNIST. The work in [10] presented an approach for tracking a mobile system using an artiﬁcial neural network. Data from the environment is collected by the mobile system through an ultrasonic transmitter and receiver then a binary artiﬁcial neural network processes the data. The method is implemented on SOC Xilinx FPGA Zync7000 which includes hardware neural network and processor which contains interfaces. The simulation results show the system can identify the position of a track in less than 1 us. One of the main problems in neural network architecture is area overhead. Some applications require many layers to achieve certain performance in image recognition, speech recognition, decision- making, etc. This paper presents an approach of hardware neural network with lower hardware area by using a neural network archi- tecture using N/2 hardware layers to attain the same performance as a neural network architecture using N layers. The implementation of the proposed hardware node is not the same as the traditional node. The proposed approach saves 42% area overhead compared with the network using N layers. Reducing the area will help to reduce the cost of hardware. The rest of the paper is organized as follows. Section II presents the proposed approach and its architecture. Section III presents the implementation of the proposed approach and the simulation results of the experimental tests. The conclusion is presented in IV. II. PROPOSED NEURAL NETWORK ARCHITECTURE An NN is based on a collection of nodes (neurons) and each connection between nodes can transmit a signal from one to another as shown in Fig.1. Each input is multiplied by a weight then the result feeds the equivalent of a cell body. The weighted signals are 745 978-1-5386-9562-3/18/$31.00 ©2018 IEEE