Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
2019/9
2 Module-Level Reliability
3 System-Level Reliability
4 Solution-Level Reliability
5 O&M Reliability
Durability
Service HA No data loss
Availability
Availability = MTBF/(MTBF + MTTR) Durability = 1 – Data Loss Rate
MTTR Maintainability
Mean Time To Recover Rapid fault recovery
Repair rate = 1/MTTR
Level 1:
Module level
Hardware
Component Device Media Production Disk
Model reliability is ensured.
reliability
2 Module-Level Reliability
3 System-Level Reliability
4 Solution-Level Reliability
5 O&M Reliability
Note 1: With the increase of magnetic recording density, the head flying height becomes lower and the particle stability decreases gradually. In this
case, silent data corruptions are more likely to occur when there are process defects or electromagnetic/signal interference.
Note 2: Electronic escape may occur on the FLASH CELL of SSDs as time goes by, which causes the bit value to change from 0 to 1.
50% 50%
x Block Block
Uncorrectable
① Granular multi-copy & RAID: metadata (multiple copies) and user data (RAID) ① Wear leveling: periodically moves data blocks so that the data blocks with less
② Data restoration: LDPC, Read Retry, and intra-disk XOR that enable data wear can be used again(There level of SSD wear threshold:
restoration using redundancy mild/moderate/severe~ <50%/<=50%<85%/>=85%).
2 Module-Level Reliability
3 System-Level Reliability
4 Solution-Level Reliability
5 O&M Reliability
Service HA
Full-redundancy Solid Data Reliability
architecture End-to-end data reliability
Continuous design
mirroring RAID 2.0+
Multiple cache Fast reconstruction
copies Reconstruction and offload
Controller failover Pre-copy
in seconds Matrix protection
SmartQoS DIX+T10 PI
Interface
Interface
module module module module
module
module
Controller
A
Controller
B
Controller
A
Controller
B
1. Path Redundancy: host multi-path/disk
Isolation of management
3 multi-path …
and data planes
4 • Cache Mirroring
Interface
Interface
module
module
Interface
module
Interface
module Inter- Inter-
Interface
module
Interface
module 3. Data Redundancy: Cache
Back end
enclosure
switching
enclosure
switching Back end mirror/RAID/Data Integrity …
2 Disk multi-path
Disk
enclosure 4. Isolation: management,IO Planes
RAID ... isolation/smart Qos …
3 ... ... ... ... ... ... ... ... ... ... ...
RAID ...
Host
End-to-End Redundancy Design
1
Front end Controller Controller Front end
A enclosure B
Interface Interface Interface Interface 1. Services and host connections are not adversely affected if
Interface
Interface
module module module module
module
module
a controller is faulty: front-end interconnect I/O modules,
protocol offload, and controller failover within seconds
Controller Controller Controller Controller Controller Controller Controller Controller 2. Services are not interrupted if multiple controllers are
A B C D A B C D
faulty: three copies of controller data cache, continuous
Interface
Interface
module
module
2 mirroring, and cross-engine mirroring
3. Services and host connections are not adversely affected if
6 the software is faulty: Process availability is checked in real
Interface
Interface
module
module
Software Software Software Software Software Software Software Software
time. If a process is faulty, the process can be started in
seconds. If a background task becomes faulty frequently, the
3 task is isolated intermittently.
Interface
Interface
module
module
4. Services are not interrupted if an engine or multiple
Interface 4 Interface Interface Interface controllers are faulty: Back-end interconnect I/O modules
module module Inter- Inter- module module ensure interconnection among engines.
enclosure enclosure
Back end switching switching Back end 5. Services are not interrupted if multiple disks are faulty:
User data supports EC-2/EC-3 and multiple copies.
Disk 6. The controller does not reset and services are not
enclosure interrupted if a switch module is faulty: High-end storage is
5 equipped with multiple interface modules for redundancy. Mid-
RAID ...
range and entry-level storage is equipped with a single interface
... ... ... ... ... ... ... ... ... ... ... module. If the interface module is faulty, TCP forwarding is
RAID ... implemented.
4* 1* 2* 3* 1 2* 3* 1 2*
Mirroring the copy Continus mirroring
4 1*
4* 1*
3* 4*
Controller A Controller B Controller C Controller D Controller A Controller B Controller C Controller D Controller A Controller B Controller C Controller D
Normal Failure of one controller (controller A) Failure of one more controller (controller D)
Continuous mirroring (ensuring service continuity when seven out of eight controllers are faulty): If controller A is faulty,
controller B detects and creates a mirroring relationship with controllers C and D to ensure dual-copy redundancy. If controller D fails
at the moment, controllers B and C detect each other and create copies for data blocks on each other, ensuring two copies of cached
data. In an 8-controller scenario, a maximum of seven controllers can fail consecutively without interrupting services.
Service continuity: If a controller fails, its services are quickly switched to the mirror controller, and the mirroring relationship is re-
established for the remaining controllers. This feature eliminates the performance deterioration associated with the controller failure
because the surviving controller does not enter the write-through mode and is still in the write-back mode. In addition, data mirrors
ensure system reliability.
1 2 3 4 5 6 7 8
4* 1* 2* 3* 8* 5* 6* 7*
Services are not interrupted if two controllers are faulty: Three copies of cached data are written to hosts. That
is, for data of different LBAs, the system creates a pair to establish a dual-copy relationship and selects another
node except the current pair to house the third copy.
Services are not interrupted if the controller enclosure is faulty: When the system has two or more controller
enclosures, the controllers in other controller enclosures can be used to house the third copy. This ensures that
services are not interrupted if a controller enclosure (containing four controllers) is faulty.
Controller A
Front end
1. The host delivers a request to controller A: If no
Front-end interconnect I/O module 1 Front-end interconnect I/O module 2
controller is faulty, host I/Os are delivered to controller A
through front-end interconnect I/O module 1.
Controller A Controller B Controller C Controller D
2. Controller A is faulty: If controller A is faulty, front-end
interconnect I/O module 1/2 and controller B/C/D detects
4 that the controller A is unavailable by means of
1
3 interruption.
3. Service switchover: Because the copies of data on
controller A are stored on controller B/C/D, only the status
2 X of the virtual node managed by controller A needs to be
switched to controller B/C/D, and the service switch
succeeds (less than one second is required). Then the
front-end interconnect I/O module is instructed to refresh
Interface module Interface module the distribution view.
Back end
4. I/O path switch: The front-end interconnect I/O module
Disk enclosure returns BUSY for the I/Os that have been delivered to
controller A. The retried and new I/Os delivered by the
RAID ... host are delivered to controller B/C/D based on the new
... ... ... ... ... ... view.
RAID ...
Traditional LUN
Block
LUN
RAID virtualization
RAID
RAID Hot
RAID
group 1
spare
group 2
5 21 22 23 61 62 63
68
⊕
HDD 2 HDD 6 24 25 26 64 65 66
27 28 29
HDD 3
67 68 69
HDD 7 ⊕
31 32 33 71 72 73 CKG 3 21 34 51 71 99
HDD 3 HDD 7 34 35 36 74 75 76
Rebuilding Rebuilding
37 38 39 77 78 79
Reconstruction using traditional RAID: During reconstruction, data Reconstruction using RAID 2.0+: RAID 2.0+ supports dozens of member
is read from the other functional disks and reconstructed. Then, disks. Each disk is equipped with dedicated hot spare space. When a disk
reconstructed data is written to a hot spare disk or a new disk. The fails, other disks participate in reconstruction reads and writes, greatly
write performance of a single disk restricts the reconstruction. shortening the reconstruction time. As more disks share reconstruction loads,
Therefore, the reconstruction takes a long time. the load on each disk significantly decreases.
Disk 1 Disk 2 ... Disks Disk Disk 1 Disk 2 ... Disk Disk
All data on the disks in the RAID group is read to the
controller for computing, occupying the data write bandwidth.
3-5 36 35 36 As a result, the host I/O write bandwidth is affected.
Disk
read request.
Disk
... Disks
12 blocks.
Storage systems can not only employ T10 PI to ensure the integrity of data within a storage system but also employ DIX+T10 PI to
protect end-to-end data integrity from Oracle databases to disks.
Multiple nodes within a storage system implement data parity. Such nodes include front-end chips, memories, back-end chips, and
other critical nodes in I/O paths.
Invalid data
CKG SAN
①
③ ⊕
①
High Latency
②
Idle CKG ⊕
x
x
① Quick response to slow I/Os: prevents
① Hot and cold data separation services from being affected by long retry and
② ROW full-stripe write recovery time. ① Intelligent bad block repair: The system
③ Global garbage collection: Valid data in the ② Smart disk health evaluation: detects and periodically queries the pending list of each
CKG with a large amount of garbage is isolates slow disks, disks with too many HSSD and rectifies the detected bad blocks in
migrated to the newly allocated CKG and the uncorrectable error-correcting code errors a timely manner.
storage space is reclaimed to the storage (UNCs), and disks with bit errors. ② Online diagnosis: The storage system works
pool. The system notifies the SSD to start ③ Service life management: provides with HSSDs to quickly diagnose disk fault
garbage collection in the disk. visualized life information display and life types and impact scopes and recover the
④ Global wear leveling and anti-wear leveling prediction. HSSDs.
2 Module-Level Reliability
3 System-Level Reliability
4 Solution-Level Reliability
5 O&M Reliability
FC/IP
④ SAN
Inter-array Reliability Solution
③
Asynchronous
replication ③ Active-active
Intra-city Active-active architecture: Active-active LUNs are readable and
FC/IP SAN active-active FC/IP SAN writable in both DCs and data is synchronized in real time.
solution
① ②
High reliability: Cross-site bad block repair improves system reliability.
High performance: Multiple performance tuning measures are provided,
Real-time data
Cloning Snapshot reducing latency of interactions between two DCs and improving service
Snapshot Cloning synchronization Public cloud
performance by 30%.
Elastic scalability: expanded to the 3DC DR solution based on the
...
CloudBackup
...
IP IP HEC / AWS
network network Cloud Backup Solution
⑤ CloudBackup: The public cloud object storage is used as the backup
Quorum storage to prevent data loss caused by manual or physical faults on storage
device devices in the enterprise DC.
Burst Quota
1. Token accumulation: If the performance of a LUN, snapshot, LUN group, or
host is lower than the upper threshold within a second, a one-second burst
duration is accumulated. When the service pressure suddenly increases, the
LUNs whose traffic LUNs whose LUNs whose performance can exceed the upper limit and reach the burst traffic. The
does not reach the traffic reaches the traffic far exceeds accumulated tokens are used by the current objects and the duration is
lower limit lower limit the lower limit configured. In this way, the system can respond to the burst traffic in time.
No Burst Traffic
suppression prevention suppression Lower Limit Guarantee
1. Minimum traffic: If each LUN is configured with the minimum traffic
(IOPS/bandwidth) by default, the minimum traffic must be ensured when the
LUN is overloaded.
2. Traffic suppression for high-load LUNs: When the system is overloaded, if
the traffic of some LUNs does not reach the lower limit, the system performs
... load rating on all LUNs. The system provides a loose traffic condition for
medium- and low-load LUNs based on the load status. The system suppresses
Disk the traffic of high-load LUNs until the system releases sufficient resources to
enable the traffic of all LUNs to reach the lower limit.
A A
Synchronous/ Synchronous/
Asynchronous Asynchronous
replication replication
SAN Asynchronous SAN SAN
SAN
replication
A A' A A'
Remote DR Remote DR
center C center C
1. The intra-city DR center undertakes
remote replication tasks and has Asynchronous
replication
minor impact on services in the
production center.
If the storage system in the
2. If the storage system in the production
SAN production center malfunctions, SAN
center malfunctions, the intra-city DR
the intra-city or remote DR center
center takes over services and keeps
A" can quickly take over services.
a data replication relationship with the A"
remote DR center.
2 Module-Level Reliability
3 System-Level Reliability
4 Solution-Level Reliability
5 O&M Reliability
Host
Upgrade Transparent to Hosts
Host Controller enclosure Upgrade transparent to hosts: Front-end ports are
connection connected to hosts. Components in a controller
Front-end interface module Front-end interface module
keeping exchange I/Os with hosts through front-end interface
Connection keeping Connection keeping
modules. The components do not directly connect to
hosts. The controller upgrade does not interrupt the
Phase 1 Controller Controller Controller Controller host connection.
A B C D
Component upgrade: The system upgrade is divided
Component Component Component Component
1 1 1 1 into two phases. The software components (processes)
with redundant units are upgraded first. After the
Component Component Component Component software packages are uploaded and the processes
2 2 2 2
are restarted, the second phase is triggered.
Component Component Component Component Zero performance loss: The restart time of each
N N N N
Phase 2 software component is less than 1 second. The front-
end interface module returns BUSY for failed I/Os
during the upgrade. The host retries the failed I/Os, and
the performance is restored to 100% within 2 seconds.
Short upgrade duration: No host compatibility issue is
Disk enclosure involved. Host information does not need to be
collected for evaluation. The entire storage system can
... be upgraded within 10 minutes as controllers do not
Fast restart
need to be restarted.
Rolling Upgrade
2 Module-Level Reliability
3 System-Level Reliability
4 Solution-Level Reliability
5 O&M Reliability
Ensure that the product meets the requirements of the Met the mandatory
corresponding EMC standards, real-world admission certification
EMC test On Apr. 20, 2013, Ya'an of Sichuan province encountered a 7.0-
electromagnetic environment, and device or system requirements of each
compatibility. country and manitude earthquake with an intensity of 9. The affected area
Ensure the personal safety when using the products, organization:
reduce the injury caused by electric shock, fire, heat, • China: CCC amounted to 18,682 km2.
Safety test mechanical damage, radiation, chemical damage, and • European
IT systems of the Ya'an TV Station have undergone the
energy, and meet the admission requirements of each Commission: CE
country. • USA: FCC catastrophe. Two S2600 storage systems deployed by Ya'an
• Japan: VCCI-A
Environment Check that the products meet the requirements. • Russia: CU Health Bureau were also proven to be powerfully shockproof in
(climatic) test Expose defects in design, process, and materials. • ...
Passed some optional this earthquake.
Improve the environment adaptability of the products certifications, such as
to mechanical stress during storage and transportation China's:
Environmental to ensure qualified product appearance, structure, and • 9-intensity
(mechanical) test performance and ensure that the product can Earthquake 9-intensity earthquake resistance certification
withstand the adverse impact caused by external
mechanical stress on the equipment.
resistance
certification (Huawei Only)
Find the weak points of the products and improve • China Environmental
HALT test Labeling certification
product reliability.
Copyright © 2019 Huawei Technologies Co., Ltd. 2019. All rights reserved.
All logos and images displayed in this document are the sole property of their respective copyright holders. No endorsement, partnership, or affiliation is suggested or
implied. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results,
future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed
or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.