Distributed Learning in Multiuser OFDMA Femtocell Networks Ana Galindo-Serrano 1 , Lorenza Giupponi 1 , and Gunther Auer 2 1 Centre Tecnol` ogic de Telecomunicacions de Catalunya(CTTC) Parc Mediterrani de la Tecnologia, Av. Carl Friedrich Gauss 7, Barcelona,Spain 08860 e-mail: {ana.maria.galindo,lorenza.giupponi}@cttc.es 2 DOCOMO Euro-Labs, Landsberger Str. 312, 80687 Munich, Germany e-mail: auer@docomolab-euro.com Abstract—This paper elaborates on self-organized and dis- tributed interference management for femtocells that share the available radio resources with macrocells. A multi-agent learning approach is examined, based on distributed Q-learning, where femtocell base stations control their transmit power, such that the femtocell capacity is maximized, while the aggregated downlink interference generated at macro users’ receivers is maintained within acceptable limits. The distributed Q-learning algorithm is carried out at the femto nodes, in the way that the interference is controlled at each resource block. The contribution of this work is to integrate multi-user scheduling in the operation of the macrocell network, so that instantaneous changes, with 1 ms granularity, are encountered in the perception that the femtocell agents get of the environment under observation. We demon- strate that, by relying on 3GPP Long Term Evolution (LTE) compliant signaling from the macro network on the intended macrocell scheduling policies, the proposed learning approach allows each femto node to react on these instantaneous changes in the environment, such that the femto-to-macro interference is appropriately controlled. Index Terms—Femtocell deployment, interference manage- ment, multi-agent system, decentralized Q-learning. I. I NTRODUCTION Femtocells [1] are short-range, low-cost, low-power home Base Stations (BSs) installed by the end consumer. Femtocells are designed to serve very small areas, such as homes or offices, providing broadband coverage to indoor users, while offloading traffic from the macrocell network. Owing to their low transmit powers, femtocells allow for a substantially enhanced spatial reuse of radio resources, and can therefore be deployed far more densely than macrocells. While femtocells offer significant benefits from both operator and subscriber perspective [2], several challenges remain that need to be resolved, to facilitate their expected mass deployment. In order for operators to increase spectrum efficiency we consider femtocells to be deployed in the same frequency band as macrocells. Moreover, due to the unplanned place- ment of femtocells, its interference conditions may exhibit strong local deviations. In order to tackle the encountered distributed interference management problem, in [3] a self- organized power management scheme was developed, based on a form of multi-agent Reinforcement Learning (RL), known as distributed Q-learning, where femto BSs are intelligent agents able to make autonomous decisions. In particular, in [3] decentralized Q-learning was shown to adopt a policy that maintains the interference caused to macrousers under a desired value in a non stationary environment. The necessary signaling is conveyed from macro to femto BSs by the X2 interface, in the form of a bitmap about the interference per- ceived by the macrousers on each Resource Block (RB). While mobility of the macrousers, together with fast fading and the lognormal shadowing create a non stationary environment, in our previous work [3] we did not take into account the non stationarity generated by the multi-user scheduling pertinent in the normal operation of Orthogonal Frequency Division Multiple Access (OFDMA). In this case, femto nodes need to be able to react to instantaneous changes of macrouser allocation per RB. In Long Term Evolution (LTE) the user allocation may change in multiples of a subframe duration, i.e., every 1 ms, whereas the latency induced by the X2 interface is on average 10ms. Consequently, the femto nodes can only react to changes of the macrouser resource allocation with a significant delay. In order to mitigate the delays imposed by the X2 inter- face, in this paper we explore the possibility of transferring additional information over the X2 interface, so that a macro BS can inform the femto network about its intended future scheduling policies. Taking advantage of this information, each femto node is able to appropriately transfer the expert knowledge acquired in different RBs, in order to proactively avoid excessive femto-to-macro interference. The remainder of this paper is organized as follows. Sec- tion II describes the system model. Section III introduces the proposed learning technique for multi-user OFDMA, and Section IV discusses practical implementation issues to 3rd Generation Partnership Project (3GPP) LTE. Section V presents the performance evaluation scenario and discusses the simulation results. Finally Section VI summarizes the main conclusions. II. SYSTEM MODEL We consider a heterogeneous wireless network composed of a set of M macrocells that coexist with F femtocells. The M =|M| macrocells provide coverage over the entire network.