About us

Quality oriented, customer-oriented, hardworking, pragmatic and innovative

<Return to the public list of news

Knowledge of server monitoring

Release time: August 22, 2019 13:55:38

The main purpose of monitoring is to sample and record some important indicators. Once these indicators have changed significantly, they can cooperate with the alarm system to feed back the problem to the person in charge. The monitoring points can be very detailed, or only the main indicators can be selected.

Log monitoring

01



Business logic monitoring is mainly reflected in logs. After you have done enough logging, how to apply logs is a problem. By monitoring the changes in the exception log file, new exceptions are reflected by exception type and number. Some exceptions are related to a specific subsystem, and the monitoring of an exception may reflect the status of the subsystem.


In addition to the monitoring of exception logs, the monitoring of access logs can also reflect the actual business QPS value. Observing the performance of QPS can check the distribution of business in time.


In addition, PV and UV monitoring can also be realized from the access log. The same as the QPS value, through the monitoring of PV/UV, you can well know the habits of application users and predict the peak of access.


response time

02



Response time is also a point to be monitored. Once an exception or performance bottleneck occurs in a subsystem of the system, the response time of the system will become longer. The response time can be monitored on reverse proxies such as Nginx, or through the access log generated by the application itself. A healthy system response time should be less volatile and continuously balanced.

Process monitoring

03



The monitoring log and response time can better monitor the status of the system, but their premise is that the system is running, so monitoring the process is more critical than the first two tasks. The monitoring process generally checks the number of application processes running in the operating system. For example, for Web applications with multi process architecture, the number of working processes needs to be checked. If it is lower than the estimated value, an alarm should be given.


Disk monitoring

04



Disk monitoring mainly monitors the usage of disks. Due to frequent log writing, disk space is gradually used up. Once the disk is not enough, it will cause various problems of the system. Set an upper limit for the disk usage. Once the disk usage exceeds the warning value, the server manager should clear the logs or disks.


Memory monitoring

05



For a node, once a memory leak occurs, it is not easy to troubleshoot. Monitor the memory usage of the server to check whether there is a memory leak in the application. If the memory only rises but does not fall, there must be a memory leak. Healthy memory usage should rise and fall. It rises when the number of accesses is large, and then falls when the number of accesses falls.


If there is a memory leak in the process and the problem is not solved for a while, there is a solution to solve this problem. This scheme is applied to the service cluster of the multi process architecture. Each work process specifies the number of service requests. After the number of requests is reached, the process will no longer serve new connections. The main process starts a new work process to serve customers. The old process exits after all connections are broken. In this way, even if there is a risk of memory leakage, the impact of memory leakage can be effectively avoided. However, this is a problem of circumvention. It only solves the appearance of the problem and is not recommended.


In conclusion, monitoring memory and observing it for a long time is a good way to prevent system exceptions. If a memory exception occurs suddenly, you can also track which recent code changes caused the problem.


CPU usage monitoring

06



The CPU usage monitoring of the server is also essential. The CPU usage is divided into user mode, kernel mode, IOWait, etc. If the user mode CPU utilization rate is high, the application on the server needs a lot of CPU overhead; If the utilization rate of kernel CPU is high, the server spends a lot of time on process scheduling or system calls; The IOWait utilization rate reflects that the CPU is waiting for disk I/O operations.


In CPU utilization, when the user mode is less than 70%, the kernel mode is less than 35%, and the overall CPU utilization is less than 70%, the CPU is in a healthy state. Monitoring the CPU usage can help analyze the status of applications in the actual business. Reasonable setting of monitoring threshold can give a good early warning.


CPU load monitoring

07



CPU load, also called CPU average load, is used to describe the current busy degree of the operating system. It can be simply understood as the average number of tasks that the CPU is using and waiting to use the CPU in a unit time. It has three indicators, namely, the average load in 1 minute, the average load in 5 minutes, and the average load in 15 minutes. High CPU load indicates that there are too many processes, which may be reflected in the node that the sub process module is used to start new processes repeatedly. Monitoring this value can prevent accidental occurrence.


I/O load

08



I/O load refers mainly to disk I/O. It reflects the read and write conditions on the disk. For applications written by Node, they are mainly network services oriented, so it is unlikely that the I/O load is too high. Most of the I/O pressure comes from the database. No matter whether the node process shares the same server with the database or other I/O-intensive applications, we should monitor this value in case.


network monitoring

09



Although the priority of network traffic monitoring is not as high as the above items, it is still necessary to monitor the traffic and set the upper limit value. Even if the application is suddenly favored by users, the effectiveness of the website's promotion can be perceived through the numerical value when the traffic surges. Once the traffic exceeds the alert value, developers should find out the reason for the traffic growth. For normal growth, it should be evaluated whether to add hardware devices to provide services for more users. The two main indicators of network traffic monitoring are inbound traffic and outbound traffic.


Application status monitoring

ten



In addition to these hard to detect indicators, applications should also provide a mechanism to feedback their own status information. External monitoring will continuously call the application's feedback interface to check its health status.


The simplest state feedback is to give the monitoring response a timestamp, and the monitoring party can check whether the timestamp is normal.


The more robust status response is to print out the status of the application dependency, such as whether the database connection is normal or the cache is normal.



/template/Home/Zkeys/PC/Static