Reward Shaping for Safe Reinforcement Learning in Network Control
Dr. Tianyu Wang, Prof. Robert Schober
University of Erlangen-Nuremberg
Abstract
Deploying RL agents in live networks carries the risk of unsafe actions that degrade service. We propose a reward shaping framework that incorporates safety constraints from network SLAs directly into the RL training process. Our constrained RL approach guarantees that QoS violations remain below 0.1% while still achieving 90% of the throughput optimality of unconstrained agents. We validate on a commercial 5G testbed with 50 active users.
AI Summary
- Reward shaping framework incorporating network SLA safety constraints into RL training.
- Guarantees QoS violations below 0.1% while achieving 90% throughput optimality.
- Validated on commercial 5G testbed with 50 active users.
- Addresses the critical safety gap for deploying RL in live networks.
Key Findings
- 1Hard safety constraints are more effective than penalty-based soft constraints.
- 2Safety-constrained agents learn more conservative but reliable policies.
- 3The framework supports dynamic SLA updates without retraining.
Industry Implications
Removes a major barrier to deploying RL in production telecom networks.
Applicable to any network optimization task with QoS requirements.
Builds operator confidence in AI-driven autonomous network management.
Read the Original Paper
Access the full paper on arXiv for complete methodology, results, and references.
Open on arXivRelated Papers
Transformer-Based Channel Estimation for Massive MIMO Systems
Tsinghua University — 12 citations
AI/ML PapersFederated Reinforcement Learning for Distributed Network Optimization
Stanford University — 8 citations
AI/ML PapersNeural Architecture Search for Efficient Edge AI in Wireless Networks
Samsung AI Center Seoul — 5 citations