1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2019.2904523, IEEE Sensors Journal 1 Optimal Power Allocation for Energy-efﬁcient Data Transmission Against Full-duplex Active Eavesdroppers in Wireless Sensor Networks Quang Vinh Do, Tran Nhut Khai Hoan, and Insoo Koo, Member, IEEE Abstract—This paper studies an optimal transmit power de- cision policy for energy-efﬁcient data transmissions between a sensor node (i.e. the source) and a cluster head (i.e. the destina- tion) in cluster-based wireless sensor networks in the presence of a full-duplex (FD) active eavesdropper. In this network, the source is powered by a wireless energy harvester, while the destination is constantly supplied by traditional electrical energy. The eavesdropper is capable of FD transmitting and receiving, and hence, opportunistically launches jamming attacks against the destination while eavesdropping, which affects not just the legitimate transmissions but the eavesdropper itself. The destination can also work in FD mode to simultaneously receive information signals and send an artiﬁcial noise to interfere with the eavesdropper. Therefore, we investigate an optimal power allocation policy for the source in order to maximize the secrecy transmission rate against an FD eavesdropper. In addition, we study the problem of decision making in two different scenarios. First, the legitimate nodes are assumed to have prior information about the arrival of harvested energy and about the eavesdropper’s jamming attack model. The problem is formulated as the framework of a partially observable Markov decision process and is solved with value iteration–based dynamic programming. Secondly, the legitimate nodes do not know the dynamics of the environment in advance, so the problem be- comes a standard Markov decision process. Hence, we propose an actor-critic learning framework to ﬁnd the solution from practical interactions with the environment. Finally, we verify the performance of the proposed schemes by simulations. Index Terms—cognitive radio, energy efﬁciency, energy har- vesting, full-duplex, actor-critic I. I NTRODUCTION W IRELESS sensor networks (WSNs) are increasingly being deployed to monitor many sensitive and crit- ical activities, and have become a promising solution to a wide range of applications. Typically, a WSN may contain a large number of compact, low-cost, and low-power wireless sensor (WS) nodes, which are connected through wireless channels to observe some phenomenon of the environment [1]. Furthermore, WSNs are normally deployed in unattended target areas; thus, the energy efﬁciency of WS nodes is always a crucial concern in order to guarantee self-sustainability and the lifetime of the nodes with respect to the energy required for operation, thereby having a signiﬁcant impact on the performance of the entire network [2]. One of the most effective ways to improve the network lifespan is to use a small rechargeable battery integrated with an energy harvester to ensure energy autonomy, and thus, enable long-term and maintenance-free operation of the WS nodes. Recently, wire- less energy harvesting has become a promising technology for improving a battery’s limited capacity and lifespan as renewable energy resources become available in many forms, including solar energy [3], wind power [4], thermal energy [5], and electromagnetic energy [6]. Therefore, it is essential to employ a self-sustaining scheme for energy autonomy in WSNs. For example, Valera et al. [7] characterized various existing environmental energy harvesting schemes that employ adaptive learning frameworks to achieve energy neutrality and maximize network performance in WSNs. Besides, Akhtar and Rehmani in their the survey [8] described different efﬁcient battery recharging techniques that not only extend the lifetime of a node but can also provide extra energy for enhanced functionality of the node. Among the different types of renewable energy, solar power is one of the most common and effective energy resources in outdoor applications, and it can be scavenged from sunlight by using photovoltaic materials (i.e., solar cells). However, the solar power that can be harvested is highly dependent on environmental conditions like cloud, dust on the cells’ surface, and illumination. In addition to solar power, radio frequency (RF) energy harvesting has recently become a promising solution for wireless communications networks due to the wide availability of radio sources (e.g., radio broadcasting towers, base stations, WiFi networks, and even mobile phones), which are not limited by space or time. An RF energy harvester can collect and convert radio signals into usable direct current (DC) voltage [9]. Furthermore, a crucial advantage of RF energy harvesting in WSNs is that a transmission from one WS node can provide power to all nodes that receive or listen to the transmission [10]. For this reason, Lee et al. [11] proposed a method for a primary wireless network to coexist with a secondary transmitter that harvests RF energy from transmis- sions by nearby primary transmitters while opportunistically accessing the licensed spectrum. The harvested energy is stored in a rechargeable battery with a ﬁnite capacity, which is then used for subsequent transmissions. More importantly, it is possible for a sensor node to integrate RF energy-harvesting modules with other energy-harvesting solutions, such as solar cells, to utilize the ambient energy [12]. Along with the emergence of low-powered wireless sensor networks, there has been growing consideration of wireless communications security [13]. The wireless signal, which is transmitted through open, random access, and shared wireless media, is easily vulnerable to malicious attacks by illegitimate users, such as data interception by an eavesdropper or trans- mission disruption by a jammer [14]. In this respect, physical