About us

Quality oriented, customer-oriented, hardworking, pragmatic and innovative

<Return to the public list of news

From Reed Solomon algorithm to Flex EC mathematics is the real "hard core" of Huawei Cloud

Published on: 09:46:40, September 3, 2019

"The universe is big, the particles are small, the speed of rockets, the ingenuity of chemical engineering, the changes of the earth, the mysteries of biology, the complexity of daily use, and mathematics is everywhere needed"---- Hua Luogengbest-known Chinese mathematician  

In 2012, a book, "The Beauty of Mathematics", written by Dr. Wu Jun, sold well in China. It made the profound mathematical principles more understandable, so that non professional readers can also appreciate the charm of mathematics.


Dr. Wu Jun mentioned in his book that Randy Katz, a well-known computer scientist, invented RAID (redundant array of computer independent disks) system, and it is RAID technology that lays the foundation for high performance and reliability of commercial storage systems. The core idea of RAID is to use EC (Error Code) error correction code to flexibly configure data redundancy, and to provide better storage utilization than multi copy technology on the basis of maintaining high performance and reliability of the storage system.

1、 Error Code And Reed Solomon Codes

With the development of cloud computing, cloud computing power has grown exponentially, and 5G and AI applications have risen. Massive data has become an irreversible trend in the cloud. The data managed by cloud storage systems has moved from traditional enterprise storage TB to EB (1EB=1 million TB). At the early stage of cloud storage development, limited by technical capabilities, Cloud storage vendors mainly use the multi copy mechanism (usually three copies), resulting in only 33% of the space utilization of the cloud storage system, and high data storage costs. Later, the industry usually adopts the EC method to reduce costs.

 2.png

Figure 1: Mathematical formula of Reed Solomon code

The traditional EC (Error Code) error correction code uses Reed Solomon Codes (RS Codes for short), and its application in cloud storage system is as follows:

All storage units (single hard disk or storage node) of cloud storage (mainly public cloud object storage system) are regarded as an Error Code storage pool, and objects can be stored in the N+M mode (N is the data partition of the object, M is the check slice), which divides the object into N data slices and M check slices. Taking 6+3 EC as an example, the space utilization rate can reach 67%. A larger proportion of N+M will have a higher space utilization rate, which is very competitive in cost. At the same time, the system throughput is significantly improved compared with 3 replicas (or multiple replicas). Therefore, EC technology has been widely used in the field of cloud storage.

But traditional EC The mechanism faces the following problems in the public cloud scenario

1.       In the scene where the object size is uncertain, zero filling and filling calculation are required, Waste storage space, high cost.

2.       If zero is not filled in, multiple overwrites are required to ensure the atomicity of EC member groups. This will lead to higher system complexity and lower throughput, and faster CPU And greater network bandwidth, increasing storage costs.

3. In addition, there is also a way to add Cache Tier to avoid the problem of cell dissatisfaction in EC: write high-performance Tier: calculate and move to HDD Tier after filling EC units. such off-line EC of Its disadvantages are:

a)         Additional SSD Tier , high cost;

b)        Continuous write to SSD Reliability challenges;

c)         Data migration consumes a lot of internal bandwidth.

4. Under a large proportion of N+M (such as 20+3) in traditional EC, if node/media failure occurs, a large number of data pieces and check pieces need to be read for data reconstruction, This will result in a steep decline in system performance.

Based on the above, we can know that the advantages and disadvantages of a data storage system using EC can be comprehensively considered through the following dimensions:

1、   Efficient space utilization: How much space utilization can a system provide stably? At a given N+M, the space utilization is constant.

2、   Efficient write performance: No matter how the business layer changes or how the size of objects changes, there should be constant write bandwidth, IOPS, etc.

3、   Efficient reconfiguration performance: Minimize the system reconfiguration IO bandwidth and occupy the network bandwidth across AZ/DC as much as possible.

2、 Huawei Cloud "On line Streaming Error Coding" And "Flex Error Coding":

Huawei's cloud OBS service is based on innovative “On-line Streaming Erasure Coding” And "Flex ErasureCoding" Provide online Error Coding of Storage mechanism , solved the above key problems in the public cloud object storage system.

1.       On-line Streaming Erasure Coding

As shown in the figure below, the core component of the whole system is Streaming ErasureCoding Unit, and the data of multiple objects can flow into the processing unit for encoding. By combining multiple object data, you can eliminate the space wasted by data writing and zeroing calculation when the object is not satisfied.

Figure 2: Huawei Cloud On line StreamingError Coding

This process does not require complex and inefficient distributed transactions, nor does it require reading written data. The Streaming ErasureCoding Unit provides an online EC mechanism to avoid internal data migration.

2. Flex Error Coding algorithm:

Huawei Cloud uses new Flex Erasure Coding Encoding and decoding algorithm Reconstruct bandwidth requirements while maintaining data reconstruction efficiency Significant reduction , which greatly improves the data reconstruction performance in case of failure, effectively shortens the reconstruction time, and ensures the data persistence and system throughput.

Through the above two self-developed algorithms, the single stream bandwidth of Huawei Cloud OBS has reached 3-5 times that of industry friends, more than 300MB/S, and more than 10 million concurrent links. It also remains stable and low latency under high business loads. Its overall performance and space utilization have significantly improved compared with multi copy or traditional EC technologies:

In big data application scenarios Because the write amplification is greatly reduced, the single stream bandwidth is significantly increased, and the performance multiple under the big data object case is improved, users can obtain data analysis results faster.

At IOT scene Massive IOT devices need to transmit data to the cloud in real time, and OBS has a concurrency of more than 10 million, which can support the connection and access of hundreds of millions of IOT devices.

In video application (video monitoring, live broadcast and on-demand) scenarios The stable and low latency of Huawei Cloud OBS supports fast playback of high quality video without jamming.

In more application scenarios, Huawei Cloud OBS has demonstrated with the same outstanding performance that the optimization of mathematical algorithms can make the software product capabilities once again lead the industry for an era.

3、 The soul of software is algorithm, and the hard core of algorithm is mathematics

I think that the physical method to solve problems has reached saturation, and we should pay attention to the prominence of mathematical methods. "—— Ren Zhengfei

It is Huawei's long-term and continuous investment in the field of mathematics that makes its products in the cloud+AI+5G era competitive.

As early as 1999, Huawei set up a special algorithm research institute in Russia. Based on the mathematical ability of Russian scientists, Huawei has continuously broken through the special bottleneck of 3G/4G mobile network technology, making Huawei the world leader in 4G mobile network equipment. In 2016, Huawei again announced the establishment of the second European mathematical institute in France to continue to strengthen basic scientific research.

In addition to the mathematical research institutes established in Russia and France, Huawei also actively participates in and invests in scientific research projects of global mathematicians, including China, and actively promotes the implementation of mathematical research and mathematical achievements in the industry. With the long-term support of Huawei, Professor Erdal Arikan made many breakthroughs in the Polar code, which eventually became the 5G control channel coding standard, promoting the development of communication technology.

It is precisely based on the application of mathematical and other basic scientific research achievements in chip design, integrated circuit development, software algorithms and quality management that Huawei can become a long-distance runner in the ICT industry and maintain the momentum of continuous progress in the current changing external environment.

Figure 3: "Mathematics is the tool to open everything"

Huawei has published on its official media Basic research and basic education are the basis for the birth and revitalization of the industry 》(link), explained the contribution of basic research, especially mathematical research, to industrial development in the form of official publicity, and put forward Mathematics is the tool to open everything " The assertion of.

Mathematics is the guiding light for Huawei to move forward Hua Weiyun real Hard core strength.



/template/Home/Zkeys/PC/Static