A Novel Dynamic Q-Learning-Based Scheduler Technique for LTE-Advanced Technologies Using Neural Networks Ioan Sorin Comsa, Sijing Zhang, Mehmet Aydin Institute for Research in Applicable Computing University of Bedfordshire Luton, United Kingdom {Ioan.Comsa,Sijing.Zhang,Mehmet.Aydin}@beds.ac.uk Pierre Kuonen, Jean-Frederic Wagen Institute of Information and Communication Technologies University of Applied Sciences of Western Switzerland Fribourg, Switzerland {Pierre.Kuonen, Jean-Frederic.Wagen}@hefr.ch Abstract— The tradeoff concept between system capacity and user fairness attracts a big interest in LTE-Advanced resource allocation strategies. By using static threshold values for throughput or fairness, regardless the network conditions, makes the scheduler to be inflexible when different tradeoff levels are required by the system. This paper proposes a novel dynamic neural Q-learning-based scheduling technique that achieves a flexible throughput-fairness tradeoff by offering optimal solutions according to the Channel Quality Indicator (CQI) for different classes of users. The Q-learning algorithm is used to adopt different policies of scheduling rules, at each Transmission Time Interval (TTI). The novel scheduling technique makes use of neural networks in order to estimate proper scheduling rules for different states which have not been explored yet. Simulation results indicate that the novel proposed method outperforms the existing scheduling techniques by maximizing the system throughput when different levels of fairness are required. Moreover, the system achieves a desired throughput-fairness tradeoff and an overall satisfaction for different classes of users. Keywords- LTE-Advanced, TTI, CQI, throughput, fairness, scheduling rule, policy, Q-learning, neural network I. INTRODUCTION The recent advances in mobile devices together with the growing popularity of video-sharing websites will inevitably increase the data traffic in cellular networks. It is estimated that by the end of 2014 the mobile devices will count 90% of the entire mobile broadband traffic and the mobile video services will represent more than 66% of the world’s mobile data traffic [1]. In order to handle the problem of the explosion of data traffic, an advanced radio resource management is strongly recommended. Among these, packet scheduling is a particular sub-module, in which the radio resources are assigned to each user at each time instant in order to offer the requested services accordingly. To evaluate the performance of different packet schedulers, a wide range of targeted performance metrics such as system throughput, user fairness and packet delay, have been proposed so far [2]. The problem of how to mix and balance these performance metrics efficiently has attracted a lot of attention. In our framework is proposed a straightforward scheduler which is able to analyze the system throughput and user fairness. The main purpose of our scheduler is to improve the performance of one or more of the below objectives when required by the system demanding a flexible tradeoff among them. The price for satisfying one objective is reflected in the degradation of other performances. The solution is to combine these performance targets in the comprehensive way by using the most representative scheduling rule at each TTI in order to achieve the overall satisfaction. The Q-learning algorithm with the neural network is used for the scheduling rule adoption and policy refinement at each 1 ms (the TTI duration in LTE). The rest of the paper is organized as follows: Section II introduces the elements of related work. Section III presents the architecture of the LTE-Advanced scheduler. Section IV describes the system model together with the Q-learning approach and the neural network for the packet scheduling process. The simulation parameters and results are presented in Section V. Finally, the paper concludes with Section VI. II. RELATED WORK There are a few studies on the tradeoff between system throughput and user fairness in the published literature. Proportional Fair (PF) is the first scheduling metric proposed to reduce the fairness-throughput tradeoff [3]. A dynamic tradeoff between system throughput, average user throughput and fairness index based on the weighted scheduling rule is analyzed in [4]. The same concept is studied in [5] for WCDMA networks using the channel statistics. By grouping users and developing two steps for the selection process, a flexible tradeoff can be achieved [6]. Different tradeoff levels are obtained in [7] by using sequential quadratic programming models. However, the dynamic tradeoff is not achieved during the transmission process. A method for throughput maximization with adjustable fairness based on sum utility maximization is presented in [8]. By setting a fixed fairness threshold, the scheduler fails to take the advantage of multiuser diversity. We propose a high performance scheduler which is able to assure a dynamic tradeoff during the transmission session based on channel conditions of each user. In order to assure the convergence to the required tradeoff level, different scheduling rules are applied at each TTI.