Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control

By: 
Ziyan Yin; Zhe Wang; Jun Li; Ming Ding; Wen Chen; Shi Jin

The explosive growth of dynamic and heterogeneous data traffic brings great challenges for 5G and beyond mobile networks. To enhance the network capacity and reliability, we propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme that adaptively allocates the uplink and downlink time-frequency resources of base stations (BSs) to meet the asymmetric and heterogeneous traffic demands while alleviating the inter-cell interference. We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) that maximizes the long-term expected sum rate under the users' packet dropping ratio constraints. In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named federated Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm. The BSs decide their local time-frequency configurations through RL algorithms and achieve global training via exchanging local RL models with their neighbors under a decentralized federated learning framework. Specifically, to deal with the large-scale discrete action space of each BS, we adopt a DDPG-based algorithm to generate actions in a continuous space, and then utilize Wolpertinger policy to reduce the mapping errors from continuous action space back to discrete action space. Simulation results demonstrate the superiority of our proposed algorithm to the benchmark algorithms with respect to system sum rate.

Introduction

Driven by the burgeoning demands of various services coming from smart cities and industries, 5th generation (5G) and beyond wireless communication systems are facing the challenges of diverse quality-of-service (QoS) requirements [1][2][3]. The conventional “one-size-fit-all” network infrastructure may not be able to simultaneously meet the heterogeneous service requirements. Network slicing has been proposed to “slice” the mobile infrastructure into multiple logical networks, which provides flexible network services in a cost-efficient way [4][5]. The key problem for network slicing is to dynamically and efficiently allocate the computation and communication resources, e.g., computing frequencies [4], transmit power [6][7], radio spectrum [8] and transmission time [9], to meet various and even conflicting QoS demands.

SPS Social Media

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel