Enterprise WLAN O&M
Network O&M involves the essential network maintenance performed routinely by campus network administrators to ensure the normal and stable running of networks. Network O&M includes routine monitoring, network inspection, device upgrade, troubleshooting, and network change. More specifically, network change refers to the adjustment of network service configurations or the replacement of network devices; that is, it involves the design and deployment of small-scale services or devices. However, the design and deployment part has been explained in other chapters and will not be described here. Instead, this chapter will focus on the other abovementioned items.
ROUTINE MONITORING
To ensure network service quality, network administrators need to monitor the running indicators and device status to learn about the operation status of the entire network. If there is no strict network management requirement, network administrators can use the built-in web network management system (NMS) to perform basic monitoring. However, if there is, network administrators need to use a standalone NMS to periodically collect data for monitoring.
Monitoring Method
1. Local web-based NMS
Wireless access controllers (WACs) and fat APs have built-in web network management functions. Network administrators can use the web network management function to monitor key device indicators. However, due to the limited storage space and data processing capability of the devices, the built-in Web NMS can only view real-time monitoring information, whereas the standalone NMS can store monitoring information for several months.
2. SNMP mode
The Simple Network Management Protocol (SNMP) released in 1990 is the management protocol used by traditional NMS to efficiently monitor and manage network devices in batches. In addition, SNMP can monitor network devices of different types and vendors in a unified manner. Traditional NMS obtains network monitoring data by receiving trap alarms reported by devices and periodically reading management information base (MIB) node data from devices through SNMP. For example, eSight is a network management device that is used in traditional campus networks to monitor and manage multiple network devices, including WACs, APs, and switches.
Traditional SNMP-based NMS has the following limitations:
- • Traditional NMS is deployed on the user network and is costly for small-scale campus networks or campus networks with many branches. As a result, many networks run without an NMS, leading to frequent network faults and poor user experience.
- • Traditional NMS monitors the network status from the perspective of devices and collects alarms, logs, command lines, or MIB data on devices to monitor network faults. However, many network faults are not device faults and cannot be detected from the device side. In this sense, the network needs to be monitored from the perspective of users and applications, so as to comprehensively detect network faults.
- • Traditional NMS obtains operation data by proactively and periodically accessing devices. However, not only does this take a long time, but it also consumes valuable resources. A traditional NMS also cannot detect micro faults within minutes or shorter. Each time a large amount of data is obtained, the CPU usage of the device increases, impacting network services carried on the devices.
3. Telemetry mode
Telemetry is a next-generation network monitoring technology that, when compared with traditional NMS, includes the following improvements:
- • Telemetry uses the cloud management mode. The network monitoring service is deployed on the cloud server to ensure that all devices are managed in a centralized manner. Its deployment and maintenance costs are low, while network administrators can monitor the network anytime and anywhere.
- • Telemetry is used to monitor device, network, user, and application faults. Devices integrate network-, device-, user-, and application-level performance probes to detect the running status and the quality of each layer within minutes or even seconds. Then, the data is packed and sent to the server, which processes and integrates the data to generate a running track and trend chart of each application. By doing this, administrators are provided comprehensive data support for subsequent network fault analysis and location.
- • Based on the raw data sent by telemetry, the server automatically evaluates the quality and converts the performance indicator data that is difficult to understand into quality scores, helping network administrators quickly identify the network running status.
- • Telemetry provides big data storage and mining capabilities. That is, to collect a large amount of network device data and record data in seconds, the big data storage and query function are introduced on the server side to support the long-term storage of a large amount of network data. Historical network data generally needs to be stored for at least three months to more than half a year to support historical problem analysis and troubleshooting.
Major Monitoring Metric
Table 10.1 lists the device monitoring indicators that need to be considered when designing a typical wireless local area network (WLAN).
Indicator Category |
Indicator Name |
Description |
|
Device information (WAC and AP) |
Basic equipment information |
Device name, model, hardware PCB, bill of materials (BOM), memory and storage space size, MAC address, and system time |
Basic information about the device, which is generally static information |
Equipment status |
Startup time, online duration, restart cause, CPU usage, memory usage, AP online status, AP online failure/offline cause, port working status, negotiated rate, and number of received and sent packets |
Dynamic information about a device, which indicates the basic running status of the device. Generally, a device with abnormal indicators sends a trap alarm or performs some recovery actions For example, if the memory usage keeps increasing, a memory leak may occur. In this case, the device generates an alarm or even restarts to rectify the fault |
|
RF and air interface information |
RF information |
Operating status, frequency band, channel, bandwidth, working mode, and RF transmit power |
Basic RF specifications, in which the channel and RF transmit power requires special attention |
Air interface environment |
Channel usage, packet loss rate, packet error rate, retransmission rate, co-channel interference strength, adjacent-channel interference strength, and non-Wi-Fi interference strength |
These are important indicators of air interface quality. In the event of poor air interface quality, users may fail to obtain a good user experience. In this case, you need to check the interference source, adjust optimization parameters, or even rectify the radio network planning |
|
STA information |
Basic information |
User MAC address, user name, STA type, authentication mode, associated AP/AP group, access service set identifier (SS1D), online duration, online failure/offline cause, and application |
These indicators are used for user data analysis, such as statistics on STA type proportions and user behavior analysis (e.g., stay duration and access application type distribution) |
Indicator Category |
Indicator Name |
Description |
|
Key service indicators |
Online success rate, online delay, frequency band, received signal strength indicator (RSSI), signal-to- noise ratio (SSID), negotiated rate, throughput, retransmission rate, packet loss rate, and roaming trajectory |
Key indicators of the air interface can reflect whether air interface experience is smooth. In addition, the network load, user roaming status, and key algorithm running status can be analyzed. For example, network administrators can check whether the spectrum navigation function is normal based on indicators, such as the proportion of users using 5 GHz, radio frequency load, and signal strength |
|
Service information |
Service indicators |
For example, the application, start time, end time, IP addresses of both parties, mean opinion score (MOS), jitter, delay, and packet loss rate of a voice or video session |
Related service indicators can be used for observation in actual environment |