AI/ML Papers15 min read11 citations

Reward Shaping for Safe Reinforcement Learning in Network Control

Dr. Tianyu Wang, Prof. Robert Schober

University of Erlangen-Nuremberg

Jan 26, 2026View on arXiv

Abstract

Deploying RL agents in live networks carries the risk of unsafe actions that degrade service. We propose a reward shaping framework that incorporates safety constraints from network SLAs directly into the RL training process. Our constrained RL approach guarantees that QoS violations remain below 0.1% while still achieving 90% of the throughput optimality of unconstrained agents. We validate on a commercial 5G testbed with 50 active users.

AI Summary

AI-Generated Summary

Reward shaping framework incorporating network SLA safety constraints into RL training.
Guarantees QoS violations below 0.1% while achieving 90% throughput optimality.
Validated on commercial 5G testbed with 50 active users.
Addresses the critical safety gap for deploying RL in live networks.

Key Findings

1Hard safety constraints are more effective than penalty-based soft constraints.
2Safety-constrained agents learn more conservative but reliable policies.
3The framework supports dynamic SLA updates without retraining.

Industry Implications

Removes a major barrier to deploying RL in production telecom networks.

Applicable to any network optimization task with QoS requirements.

Builds operator confidence in AI-driven autonomous network management.

Safe RLReward ShapingNetwork ControlQoS

Read the Original Paper

Access the full paper on arXiv for complete methodology, results, and references.

Open on arXiv

Reward Shaping for Safe Reinforcement Learning in Network Control

Abstract

AI Summary

Key Findings

Industry Implications

Read the Original Paper

Related Papers

Transformer-Based Channel Estimation for Massive MIMO Systems

Federated Reinforcement Learning for Distributed Network Optimization

Neural Architecture Search for Efficient Edge AI in Wireless Networks