Dynamic Systems and Applications 30 (2021) No. 11, 1719 - 1732 REINFORCEMENT LEARNING BASED HANDOFF MECHANISM IN COOPERATIVE COGNITIVE RADIO NETWORKS Vineetha Mathai 1* And P. Indumathi 2 *1, 2 Department of Electronics Engineering, MIT Campus, Anna University, Chennai. Mail id: vineethamathai@gmail.com ABSTRACT. The spectrum handoff (SH) is a dynamic spectrum access technique which ensures effective channel utilization, fair resource allocation, as well as uninterrupted real-time connection. Facilitating SH across traffics of dissimilar characteristics in Cognitive Radio Networks (CRNs) is posing difficulty due to manifold interventions from Primary Users (PUs), disagreement among Secondary Users (SUs) and diversified Quality of Experience (QoE) demand. Here, we consider effective channel selection strategy (CSS) and put forward a learning-based handoff scheme to enhance QoE demand of users by the introduction of docition idea. A PU prioritized Markov method is introduced to represent the communications between PUs and SUs for even channel access. The reinforcement learning (RL) is applied to CSS to carry out proper channel selection. Numerical outcomes projects that proposed queuing model, suggested learning based handoff scheme and docitive learning enhances the quality of service by maintaining the average MOS of 3.6. Keywords: Cognitive radio network, Spectrum handoff, Queuing Model, Reinforcement Learning, QoE. 1 INTRODUCTION The progression of wireless communication towards 5G includes changes in network model and assessment of providing QoE for multimedia applications. The term CRN is coined to mitigate the effect of underutilization of spectrum resources [1],[2]. In CRN, unlicensed users (SUs) are having chance to access the spectrum only when it is not engaged by licensed users (PUs). If a PU returns on a channel, SU can either stay on it or shift (ie., handoff) to another one until the completion of PU’s data transmission. If cognitive radio is shadowed by a high building over the sensing channel, then cooperative mechanism is included. Proactive, reactive and hybrid handoff [10] are the various methods available in the literature. In the proactive method, to characterize PU’s activities, to identify channels and to accomplish switching on revisit of PU, SUs uses the information of PU traffic model. So, handoff delay of this scheme is less but to get precise traffic model of PU is difficult. In the reactive mode, an SU does spectrum sensing initially when a PU interruption happens to identify vacant channels. So, channel status for handoff could be found without difficulty. However, it may bring delay. In hybrid method, a speedy method has combination effects of earlier methods by means of the proactive sensing and reactive handoff action [3-5]. Multimedia applications [12],[15] is difficult to introduce in CRN due to intervention of PUs and different requirements of QoE. In order to tackle previous problems we select a mixed preemptive and non-preemptive resume priority (PRP/NPRP) M/G/1 [31] queueing model to describe behavior of PUs and SUs on spectrum usage. Here, the former model is used to describe the queueing of the PUs and SUs and to ensure that PUs have control. To avoid an SU from intruding the current communication of other SUs, the queueing between them is modeled with latter model. When picking channels for SH, it is significant to study the transmission delay, channel quality and conditions. The varying channel situations and tr affic loads, the knowledge gained from prior SHs and earlier channel environments, a reinforcement learning-based [18]-[20],[22]SH scheme is proposed to adaptively achieve SH[7],[15-16]. The main parameter of QoE [30] is mean Received JUL 12, 2021 ISSN1056-2176(Print); ISSN 2693-5295 (online) www.dynamicpublishers.com; $15.00 ©Dynamic Publishers, Inc. www.dynamicpublishers.org; https://doi.org/10.46719/dsa202130.11.03