AI/ML Papers15 min read11 citations

Reward Shaping for Safe Reinforcement Learning in Network Control

Dr. Tianyu Wang, Prof. Robert Schober

University of Erlangen-Nuremberg

Jan 26, 2026View on arXiv

Abstract

Deploying RL agents in live networks carries the risk of unsafe actions that degrade service. We propose a reward shaping framework that incorporates safety constraints from network SLAs directly into the RL training process. Our constrained RL approach guarantees that QoS violations remain below 0.1% while still achieving 90% of the throughput optimality of unconstrained agents. We validate on a commercial 5G testbed with 50 active users.

AI Summary

AI-Generated Summary
  • Reward shaping framework incorporating network SLA safety constraints into RL training.
  • Guarantees QoS violations below 0.1% while achieving 90% throughput optimality.
  • Validated on commercial 5G testbed with 50 active users.
  • Addresses the critical safety gap for deploying RL in live networks.

Key Findings

  • 1Hard safety constraints are more effective than penalty-based soft constraints.
  • 2Safety-constrained agents learn more conservative but reliable policies.
  • 3The framework supports dynamic SLA updates without retraining.

Industry Implications

Removes a major barrier to deploying RL in production telecom networks.

Applicable to any network optimization task with QoS requirements.

Builds operator confidence in AI-driven autonomous network management.

Safe RLReward ShapingNetwork ControlQoS

Read the Original Paper

Access the full paper on arXiv for complete methodology, results, and references.

Open on arXiv

Related Papers