Received XX Month, XXXX; revised XX Month, XXXX; accepted XX Month, XXXX; Date of publication XX Month, XXXX; date of current version XX Month, XXXX. Digital Object Identiﬁer 10.1109/TMLCN.2024.XXXXXXX On Learning Generalized Wireless MAC Communication Protocols via a Feasible Multi-Agent Reinforcement Learning Framework Luciano Miuccio 1 , Member, IEEE, Salvatore Riolo 1 , Member, IEEE, Sumudu Samarakoon 2 , Member, IEEE, Mehdi Bennis 2 , Fellow, IEEE, and Daniela Panno 1 , Member, IEEE 1 Dipartimento di ingegneria elettrica, elettronica e informatica, Universit ` a degli Studi di Catania, Italy 2 Centre for Wireless Communications, University of Oulu, Finland Corresponding author: Salvatore Riolo (email: salvatore.riolo@unict.it). L. Miuccio and D. Panno were partially supported by the European Union under the Italian National Recovery and Resilience Plan (NRRP) of NextGenerationEU, partnership on “Telecommunications of the Future” (PE00000001 - program “RESTART”). S. Riolo was supported by the Italian MUR under Project PON R&I 2014-2020 Azioni IV.4 “Dottorati e contratti di ricerca su tematiche dell’innovazione”. S. Samarakoon and M. Bennis were partially supported by the Research Council of Finland (former Academy of Finland) 6G Flagship Programme (Grant Number: 346208) and the European Union under the projects CENTRIC (Grant Number: 101096379) and 6G-INTENSE (Grant Number: 101139266). ABSTRACT Automatically learning medium access control (MAC) communication protocols via multi- agent reinforcement learning (MARL) has received huge attention to cater to the extremely diverse real- world scenarios expected in 6G wireless networks. Several state-of-the-art solutions adopt the centralized training with decentralized execution (CTDE) learning method, where agents learn optimal MAC protocols by exploiting the information exchanged with a central unit. Despite the promising results achieved in these works, two notable challenges are neglected. First, these works were designed to be trained in computer simulations assuming an omniscient environment and neglecting communication overhead issues, thus making the implementation impractical in real-world scenarios. Second, the learned protocols fail to generalize outside of the scenario they were trained on. In this paper, we propose a new feasible learning framework that enables practical implementations of training procedures, thus allowing learned MAC protocols to be tailor-made for the scenario where they will be executed. Moreover, to address the second challenge, we leverage the concept of state abstraction and imbue it into the MARL framework for better generalization. As a result, the policies are learned in an abstracted observation space that contains only useful information extracted from the original high-dimensional and redundant observation space. Simulation results show that our feasible learning framework exhibits performance comparable to that of the infeasible solutions. In addition, the learning frameworks adopting observation abstraction offer better generalization capabilities, in terms of the number of UEs, number of data packets to transmit, and channel conditions. INDEX TERMS 6G, multi-agent reinforcement learning, abstraction, generalization, feasibility, protocol learning. I. Introduction W HILE 5G is rolled out globally and the standardiza- tion discussions for its future evolution are taking place, researchers in academia and industry have already been reﬂecting on visions, use cases, and disruptive key technologies for 6G systems [1]. Apart from new spectrum technologies, the support of simultaneous communications and sensing, and extreme connectivity requirements, it is expected that machine learning (ML) and artiﬁcial intelli- gence (AI) will play a deﬁning role in the development of This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME , 1 This article has been accepted for publication in IEEE Transactions on Machine Learning in Communications and Networking. This is the author's version which has not been fully edited an content may change prior to final publication. Citation information: DOI 10.1109/TMLCN.2024.3368367 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/