A New Global Pooling Method for Deep Neural Networks: Global Average of Top-K Max- Pooling Yahya Dogan Department of Computer Engineering, Siirt University, Siirt 56100, Turkey Corresponding Author Email: yahyadogan@siirt.edu.tr (This article is part of the Special Issue Advances of Machine Learning and Deep Learning) https://doi.org/10.18280/ts.400216 ABSTRACT Received: 24 December 2022 Accepted: 7 March 2023 Global Pooling (GP) is one of the important layers in deep neural networks. GP significantly reduces the number of model parameters by summarizing the feature maps and enables a reduction in the computational cost of training. The most commonly used GP methods are global max pooling (GMP) and global average pooling (GAP). The GMP method produces successful results in experimental studies but has a tendency to overfit training data and may not generalize well to test data. On the other hand, the GAP method takes into account all activations in the pooling region, which reduces the effect of high activation areas and causes a decrease in model performance. In this study, a GP method called global average of top-k max pooling (GAMP) is proposed, which returns the average of the highest k activations in the feature map and allows for mixing the two methods mentioned. The proposed method is compared quantitatively with other GP methods using different models, i.e., Custom and VGG16-based and different datasets, i.e., CIFAR10 and CIFAR100. The experimental results show that the proposed GAMP method provides better image classification accuracy than the other GP methods. When the Custom model is used, the proposed GAMP method provides a classification accuracy of 1.29% higher on the CIFAR10 dataset and 1.72% higher on the CIFAR100 dataset compared to the method with the closest performance. Keywords: global pooling, convolutional neural network, deep learning, image classification, transfer learning 1. INTRODUCTION Deep neural networks (DNNs) are a type of artificial neural network consisting of multiple layers and interconnected artificial neurons. DNNs are inspired by the structure and function of the human brain; it learns from examples to recognize patterns and relationships in data. The importance of DNNs stems from their ability to learn complex representations of data and make accurate predictions. Some of the key benefits of DNNs are: (1) Improved accuracy; DNNs have achieved state-of-the-art results in various tasks, outperforming traditional machine learning methods in many problems, (2) Hierarchical learning; DNNs can learn hierarchical representations of data that capture more and more complex features as the data forwards from the input layer to the deeper layers of the network, (3) Automated feature extraction; DNNs can automatically learn useful features from raw data without the need for manual feature selection, (4) Handling of large-scale and complex data; DNNs can handle large-scale and complex data, making them suitable for tasks, e.g. image and speech recognition [1, 2], natural language processing [3], and machine translation [4], etc., and (5) Transfer learning: DNNs can be used as a pre- trained model for other tasks, reducing the amount of data and computational resources required to train a model from scratch. The depth and capacity of DNNs increase depending on the size and complexity of the problem to be solved. Convolutional neural networks (CNNs) are a special type of DNN that have recently provided state-of-the-art results in many problems in computer vision [4, 5]. A standard CNN consists of convolution, activation function, local pooling, flatten, and fully connected (FC) layers at the top. The convolution layers are used to detect features in the image. These layers contain various kernels that combine each pixel in the image with the pixels around it to detect features in the image. Initially, these kernels, which are randomly generated, are slid over the input image or the feature maps from the previous layer to create new feature maps for the next layer. This process determines the regions in the image where the features are located and allows to use of these features in later layers. The activation function determines the effect of weight values that regulate the operation of artificial neurons. It is commonly referred to as a squashing function and determines whether a neuron is activated or not. Activation functions are generally non-linear, which gives the model the ability to be non-linear. The local pooling layer reduces the size of the input images or feature maps, which reduces the calculations required during training and increases the training speed. It also makes the model more robust to small changes, e.g., minor shifts in the image, which enables the model to have better generalization ability on images. However, the local pooling operation causes information loss, as it attempts to represent the pixels within a defined kernel with a single value. Therefore, small kernel sizes, e.g., 2×2 or 3×3, are often used to keep the information loss low. The flatten layer converts the input tensor into a one- dimensional vector by flattening it. This layer is usually used as a preprocessing step before the data is passed to an FC layer. The main purpose of the flatten Traitement du Signal Vol. 40, No. 2, April, 2023, pp. 577-587 Journal homepage: http://iieta.org/journals/ts 577