At present, many server host users consider Server hosting Leave it to the host provider to save a lot of trouble. Server hosting users not only enjoy the server, but also a series of network resources provided by the hosting provider. The server custodian also needs to carry out daily security maintenance for the data center to ensure that users can enjoy better and better hardware resource support. Therefore, users and friends will not feel the complexity and professionalism of data center room maintenance.
A large data center often contains many small systems, and the operation and maintenance work is carried out around these specific application systems, which can be divided into six parts: basic operation and maintenance management, daily business operation and maintenance, network, server, storage, and security. Today, this article "Tengyou Bianxiao" tells about the operation and maintenance methods and capabilities that a general large data center should have.
First, from the basic operation and maintenance management of the data center
It mainly includes hardware configuration management, maintainability optimization, monitoring, alarm processing, automatic operation and maintenance, network outage, power failure, disaster recovery of computer room and other operation and maintenance work. Hardware configuration management includes the model and hardware configuration of each server in the cabinet, and it is clear which business systems are using these servers. Even in a virtualized running environment, it is necessary to know which physical machines flow in the resource pool. There are a large number of physical machines and virtual machines in the data center, which need automated operation and maintenance.
Automated O&M can not only improve O&M efficiency, but also reduce manual participation. At the same time, the data center can manage itself and release manpower. And do a good job of monitoring and alarming possible faults in the data center, so as to understand the problem at the first time when the fault occurs. Often a large fault gradually expands from a small fault at the beginning, eventually leading to the collapse of the entire large system. Therefore, some small abnormalities must be eliminated in time, and these abnormalities must be detected through a perfect monitoring and alarm system.
Second, consider the daily business operation and maintenance of the data center
It mainly includes routine inspection, application change, software and hardware upgrade, sudden failure, etc. Specifically:
1. Daily inspection: "The dike of thousands of miles collapses into the ant nest". Any fault may occur before it occurs, and small hidden dangers will not be eliminated, which may lead to major faults. So the daily routine inspection of the data center is boring, but it is also very important to find some hidden dangers in the operation in time. According to the importance of the data center's carrying business, it is necessary to carry out routine inspection on all operating equipment of the data center. Check whether the server application service and CPU memory utilization are normal. Check the application business to see if it is running normally. In addition, check the computer room environment of the data center to see whether the temperature, humidity and dust of the environment meet the requirements. The air conditioning and power supply system operate well, and whether the equipment is overheated. Floor, skylight, fire control and monitoring are all inspection parts. Air conditioning leakage and equipment leakage will cause harm to the normal and stable operation of the data center. Don't be careless.
2. Application change: the business carried by the data center will not be static. With the diversification and continuous development of business, it is often necessary to adjust the business, including the settings of servers and networks. Therefore, to be familiar with the operation of servers and network devices, it is necessary to master Linux server commands and network protocols. Make timely and accurate modifications according to the needs of the application.
3. Software and hardware upgrading: the general operation cycle of data center equipment is five years. There are constantly devices that need to be replaced. Some devices need to be upgraded due to software defects, so software and hardware upgrading is also part of the maintenance work. When upgrading software and hardware, a backup mechanism should be developed to prevent problems in the upgrade and long-term failure of business recovery. When you take over the maintenance of the data center, you will find that there are too many upgrades, almost every month. Staying up late to upgrade has become a common practice for maintenance personnel.
4. Sudden failure: no data center is fault free, and problems of one kind or another will occur during the operation of the data center. For sudden failures, senior maintenance personnel can calm down, calmly analyze the triggering causes of the failure, and quickly find solutions. If they can't find a solution in a short time, they can also switch to the standby equipment to resume business, and then analyze it. At this point, it is very important for the data center to have high-level maintenance personnel, which can be used at critical moments. Although these jobs look normal, don't underestimate them. In fact, the daily maintenance of the data center is very important, which is related to the normal operation of the entire data center business. Only by paying attention to the maintenance of the data center can the data center be safe.
Third, data center network considerations
It mainly includes network hardware equipment, ACL, OSPF, LACP, VIP, protocol analysis, traffic, load balancing, 2347 layer conditions, network monitoring, 10 Gigabit cards, core switching, etc. The network is an important part of the data center and the basic guarantee of all work. Without the network, the data center cannot operate, so ensuring the stability of the network is the most important thing in the operation and maintenance of the data center. Here, we should pay attention not only to the network hardware, but also to the network defined by SDN software. In general, after the network in the traditional IT architecture is deployed and launched according to the business requirements, if the business requirements change, it is very tedious to modify the configuration of the corresponding network equipment (routers, switches, firewalls). In today's Internet/mobile Internet, it is changing rapidly
In the changing business environment, the high stability and high performance of the network are not enough to meet the business needs, but flexibility and agility are more critical. What SDN does is to separate the control rights on the network devices and manage them by a centralized controller. It does not need to rely on the underlying network devices (routers, switches, firewalls). It shields the differences from the underlying network devices. The control rights are completely open. Users can customize any network routing and transmission rules and policies they want to implement, so that they are more flexible and intelligent. After SDN transformation, there is no need to repeatedly configure the router of each node in the network, and the equipment in the network is automatically connected. Simply define simple network rules when using. If you don't like the built-in protocol of the router, you can also modify it programmatically to achieve better data exchange performance. For example, Baidu self-developed switches can directly support the remote configuration and management features of SDN, so as to achieve full automatic online configuration. In the future, the self-developed switch will further combine with server automation online to improve server delivery and management efficiency. The network can be said to be all inclusive, involving too many devices, protocols, and software layer technologies. Therefore, we also need to continue to learn and deepen our understanding of network technology, so that we can do a good job in network operation and maintenance. Fourth, data center server considerations It mainly includes file system, kernel parameter tuning, various hard disk drives, kernel version, Kernelpanic, etc. Linux system not only occupies the mainstream position in the server, but also in the network operating system. Only mastering the use of Linux system can better handle the operation and maintenance of servers and network equipment. Linux is a basic skill in the operation and maintenance work. In addition to being familiar with the operation of the Linux system, you should also monitor and manage the running state of the server and the running state of the kernel to reduce the occurrence of server failures. In general, large data centers contain thousands of servers. Almost every day, there are various problems with servers. Only a deep understanding of servers can effectively eliminate the problems. In order to prevent service interruption caused by server failure, virtualization technology or cluster technology should be deployed on the server generally. When the physical hardware of one server fails, the service can be smoothly switched to other servers without any impact on the service. These virtualization technologies have increased the difficulty of operation and maintenance, and they also need to be continuously studied. In addition, the customization of data center servers is also very meaningful. Cloud computing needs large-scale deployment, so the server needs to have higher deployment density, energy saving and easy management, but the computing capacity requirements for each node are not very strict. However, the servers in the general sense produced by manufacturers need to adapt to a variety of applications, so they pay more attention to performance and scalability, ignoring cost and energy consumption. If the server is customized for the cloud, it will be optimized according to the characteristics of the cloud, so as to meet the needs of users. For enterprises, the benefits it brings are obvious. Imagine that even though the power saved by each customized server is limited (4 power supplies are replaced by 2 power supplies), for large-scale data centers, the cost savings are obvious in the long run. For example, all servers owned by Google are designed by themselves, using customized trays and built-in batteries as backup power. Compared with traditional servers, the cost and power consumption are much lower, which also saves Google a lot of power expenditure.
Fifth, in terms of data center storage, the architecture is more diverse and complex After cloud computing, virtualization, big data and other related technologies enter the data center, storage has undergone tremendous changes. Block storage, file storage, and object storage support reading of multiple data types; Centralized storage is no longer the mainstream storage architecture in the data center. The storage and access of massive data requires a highly scalable distributed storage architecture. In terms of large-scale system support, distributed file system, distributed object storage and other technologies provide highly scalable, extensible and extremely elastic support and powerful data access performance for various applications of storage, and because these distributed technologies support standardized hardware, large-scale data center storage can be constructed, operated and maintained at a low cost. Of course, distributed storage is not intended to replace the existing disk array, but to cope with the rapid growth of data volume and bandwidth. The other is software defined storage, which represents a trend, that is, the separation of software and hardware in storage architecture, that is, the separation of data layer and control layer. For data center users, software is used to manage and schedule storage resources, realize virtualization, abstraction, and automation of storage resources, and fully meet the deployment, management, monitoring, adjustment, and other requirements of the data center storage system, making the storage system flexible, free, and highly available. The data of enterprises and the Internet is growing at a rate of 50% every year. The total amount of structured data in the new data is limited, most of which are unstructured and semi-structured data. The data center storage architecture also needs a strong elastic adaptability with the development of business. Low cost, massive expansion, and high concurrency are the basic technical attributes of the operating storage architecture for large cloud data centers. How to carry out a large and disorderly amount of data storage and in-depth application processing, and quickly extract valuable information to form business decisions will become the basis for the survival of all types of enterprises, as well as the future direction of storage and business development around the storage architecture. Sixth, from the perspective of data center security Security is a number of small items: attack protection, upgrade backup, catch bugs/find bugs, scripting tools, data security, service patrol, etc. Each item actually contains a lot of content. For example, when it comes to attack and protection, it mainly refers to preventing malicious and unintentional attacks on the data center by external abnormal intruders. Malicious attacks are those people who deliberately use various attack methods to enter the data center and steal or destroy important data to achieve their hidden purposes. There are also unintentional attacks, because the entire data center is to maintain connectivity with the outside world. The operation is dynamic and changing. It is inevitable that there will be some abnormal traffic attacks on the data center, sometimes even from inside the data center. For example, some servers are poisoned, or hardware failure has created a loop, abnormal traffic and other network failures, All of these will affect the operation of the data center, so how to do a good job in the attack and protection of the data center is a big problem, which can not be solved by deploying several security devices in the data center. It needs to carry out a comprehensive unified plan for the entire data center, and deploy some targeted security protection measures. And with the improvement of various hacker technologies, Security protection measures should also be constantly improved. This is a process of continuous learning and improvement. As long as the data center is still running, this improvement will not stop. In order to facilitate operation and maintenance, some execution scripts should also be prepared to quickly handle problems in case of emergencies. For example, if the service of a data center is abnormal, in order to quickly restore the service, the route needs to be adjusted to lead all the traffic to other data centers, which requires adjustment on the core router. At this time, an existing script can be automatically executed to achieve the purpose of fast switching. The data center should also prepare many scripts for other work, so that they can be used quickly in an emergency. Zhengzhou Micronet has 12 years of rich IDC (server rental/server hosting) experience/cloud computing service provider · IDC operation expert · became the designated service center of Baidu Cloud Henan in 2018. Zhengzhou Micronet is a professional IDC service provider in Zhengzhou, with real 7 * 24 technical support and computer room technology. Telecom, China Unicom (Netcom |), double line, BGP multi line machine rooms are all over the country, providing you with the best cost-effective layout plan! If you want server hosting, please contact us.