AI/ML Papers14 min read7 citations

Token-Free Language Models for Efficient Telecom Log Analysis

Dr. Chen Li, Dr. Marco Fiore

IMDEA Networks / NEC Laboratories Europe

Jan 29, 2026View on arXiv

Abstract

Traditional LLMs struggle with telecom network logs due to their technical vocabulary and structured format not aligning well with standard tokenization. We propose a byte-level token-free language model specifically designed for telecom log analysis. Our model processes raw byte sequences directly, avoiding out-of-vocabulary issues common with standard tokenizers on network log data. On a benchmark of 1M real operator logs, our approach achieves 91% fault classification accuracy and generates root cause explanations that experts rate as helpful 85% of the time.

AI Summary

AI-Generated Summary
  • Byte-level token-free LM designed specifically for telecom log analysis.
  • 91% fault classification accuracy on 1M real operator logs.
  • Root cause explanations rated helpful 85% of the time by experts.
  • Avoids OOV issues common with standard tokenizers on network data.

Key Findings

  • 1Byte-level processing handles diverse log formats without preprocessing.
  • 2The model learns meaningful representations of network protocol structures.
  • 3Fine-tuning on operator-specific logs improves accuracy by 8% over generic model.

Industry Implications

Enables automated root cause analysis for faster network troubleshooting.

Reduces mean time to repair for network faults.

Applicable to multi-vendor environments with heterogeneous log formats.

LLMLog AnalysisFault DiagnosisTelecom Operations

Read the Original Paper

Access the full paper on arXiv for complete methodology, results, and references.

Open on arXiv

Related Papers