Time:2019-12-31
Network functions have evolved from traditional multi-vendor devices (mips, x86, np) without uniform standards on physical machines to uniform physical machines based on the x86 server. Upper-layer applications were virtualized and continued to evolve into cloud and cloud native. In just 10 years, 4G has evolved into 5G, and 5G will be matured in 2020.
With 4G evolving into 5G, traditional telecom equipment has evolved into the virtualization and cloudification era, and the hardware and software architecture has undergone tremendous changes. The telecommunications industry has introduced many IT software architectures, ideas, and methods. Open source, APPs, and infrastructure decoupling bring lots of benefits and convenience to telecom applications, and also bring new problems which have great impacts on telecom O&M modes.
In the case of 2G/3G/4G traditional equipment, once the service is faulty, the O&M personnel does not need to distinguish hardware fault and software fault. The equipment should first perform active/standby switchover of both software and hardware to recover the service, and then locate the fault.
In the current cloud era, infrastructure is centralized. Physical devices of the central DC are large in scale (>1000), and network functions are distributed on different physical nodes. Once a fault occurs, the original active/standby switchover mode is no longer effective. In this case, more effective automatic identification capability is required to quickly identify the fault root cause, which is due to hardware, cloud platform or upper-layer VNF, so that, the fault can be effectively isolated and recovered.
Implementing automatic and intelligent network O&M is a key technology for 5G maturity. Intelligent monitoring and fault analysis are critical for automatic O&M.
1. Intelligent Monitoring
The purpose of intelligent monitoring is to automatically detect system operation errors and trigger the next step of fault analysis, so as to locate the root cause of system errors and quickly fix faults.
Intelligent monitoring can be implemented in two modes: direct mode and indirect mode.
Direct mode is to directly monitor the indexes of environment, hardware (computing, storage and network), Cloud OS and other key facilities. Once abnormal data occurs, the system directly raise a alarm and locate the fault.
Indirect mode is to monitor KPIs of 5G services and make comparative analysis from multiple dimensions, and then judge whether a fault occurs, so as to trigger further fault association analysis and location. The multi-dimensional comparison analysis can be performed from the following aspects:
2. Fault Analysis
Fault root analysis can start from two aspects: alarms and logs. After the supervision system detects that the system is abnormal, it triggers vertical hierarchical alarms and hierarchical log association analysis. At the same time, it implements horizontal association analysis among micro-services and among NFs, to locate the fault source.
(1) Vertical association
Vertical association is based on the vertical architecture which includes physical layer, virtual layer, and application layer. If the underlying layer is faulty, upper-layer services are affected.
Figure 1 3-Layer Architecture
Key problems solved by vertical association: Once a fault occurs on the underlying hardware or virtual layer, upper-layer service KPIs will be abnormal. Vertical association associates upper-layer service exceptions with underlying faults to identify the location of fault root (PIM layer, VIM layer, or VNF service itself).
(2) Horizontal association
Figure 2 Different Micro-Services Clusters in One VNF
Figure 3 Interconnection of Different VNFs
It can be concluded that horizontal association is to associate the fault of one VNF with the fault of another service-related VNF, and associate the fault of one micro-service with the real faulty micro-service. In this way, it identifies the micro-service or component that is really faulty at the application layer.
3. Common Technologies
Common technologies include: data collection, data classification (cleaning), data monitoring, data association, and layer-by-layer positioning.
Key technologies of automatic O&M include fault monitoring, fault root analysis, fault self-healing, global perspective, all-round cross-domain data collection, network topology management, one-click automatic tests, and one-click automatic service deployment. They are all mature commercial capabilities for intelligent O&M. Intelligent and simplified O&M can reduce the complexity of system maintenance brought by 5G cloud native and service-based software architecture, and focuses on 5G services