Mixture-of-Experts Transformers for Scalable 6G Signal Processing
Dr. Yifan Chen, Prof. Deniz Gunduz
Imperial College London
Abstract
We propose a Mixture-of-Experts (MoE) transformer architecture for 6G physical layer signal processing that dynamically activates only the relevant expert sub-networks based on current channel conditions. This conditional computation approach achieves the accuracy of a dense 1B-parameter model while requiring only 200M parameters of computation per inference, enabling real-time deployment on edge hardware. Experiments on OFDM channel estimation and MIMO detection demonstrate 2-3 dB gains over standard transformers at 5x lower computational cost.
AI Summary
- MoE transformer architecture for 6G PHY that conditionally activates expert sub-networks.
- Achieves 1B-model accuracy with only 200M active parameters per inference.
- 2-3 dB gains over standard transformers at 5x lower computational cost.
- Validated on OFDM channel estimation and MIMO detection tasks.
Key Findings
- 1Channel-condition-based routing outperforms random or load-balanced expert selection.
- 2Sparse activation enables deployment on base station controllers with limited GPU memory.
- 3MoE models generalize better across diverse deployment scenarios than dense models.
Industry Implications
Enables practical deployment of large AI models at the network edge for 6G.
Reduces the compute cost barrier for AI-native air interfaces.
Provides a scalable architecture that grows with network complexity.
Read the Original Paper
Access the full paper on arXiv for complete methodology, results, and references.
Open on arXivRelated Papers
Transformer-Based Channel Estimation for Massive MIMO Systems
Tsinghua University — 12 citations
AI/ML PapersNeural Architecture Search for Efficient Edge AI in Wireless Networks
Samsung AI Center Seoul — 5 citations
AI/ML PapersFederated Reinforcement Learning for Distributed Network Optimization
Stanford University — 8 citations