HP 3PAR Disk Replacement

HP 3PAR disk replacement
December 15, 2015
Regmen
HP 3PAR Tech Notes
HP 3PAR disk replacement. How to deal with failed

drive on 3PAR
This article treats of disk replacement on 3PAR for administrators who want to know
a little more about the background of disk replacement.
3PAR logical layer

With telling about disk replacement on 3PAR, the logical layer cannot be omited, as
this is the fundamental concern around hard drive replacement procedure on
3PAR. The logical layer of 3PAR consist few levels. In overall the structure is not
complicated, starting with physical disk and ending on Virtual Volumes.
physical disks (PD) logical disks (LD) Common Provisioning Groups (CPGs)
Virtual Volumes (VV)
Physical disks are divided into chunklets, starting with 7000 series, we are talking
about 1GB fixed size of chunklets. Then 3PAR is using chunklets to build LDs. This all
happen without any administrator involvement. Chunklet is the basic logic unit in
3PAR terminology. Thanks to this approach, we are receiving nicely virtualized
storage, with virtual RAID approach, which gives a lot of more flexibility, also in terms
of redundancy. From the other hand, while some blocks within specific chunklet are
unreadable, then the whole chunklet (1GB) is marked as failed.
3PAR and RAID protection

3PAR offers virtualized approach in RAID creation. RAID is created during CPG
creation and RAID behavior can be adjusted by administrator according to needs.
RAID is based on chunklets, not on physical disks. Thanks to that we are in power to
create CPG based on performance, or we can use slower sectors within physical disk
to use them for example for backup destaging, where performance is not so
important. Thanks to that we have full control on shaping our storage resources and
environment under.
For example to see the details on already created CPGs, use showcpg command with
a suitable parameters.
3PAR-cluster cli% showcpg -sdg
------(MB)-----Id Name Warn Limit Grow Args

0 CPG_FC_R5 - - 86304 -t r5 -ha cage -ssz 4 -ss 128 -ch first -p -devtype FC
1 CPG_SSD_R5 - - 23546 -t r5 -ha cage -ssz 4 -ss 64 -ch first -p -devtype SSD
2 CPG_DESTAGING - - 86304 -t r5 -ssz 4 -ss 128 -ch last -p -devtype FC
3 FC_r1 - - 65536 -ssz 2 -ha cage -t r1 -p -devtype FC
4 FC_r6 - - 65536 -ssz 8 -ha cage -t r6 -p -devtype FC
5 SSD_r1 - - 16384 -ssz 2 -ha cage -t r1 -p -devtype SSD
6 CPG_SSD_DEDUP - - 65536 -t r5 -ha cage -ssz 4 -ss 64 -ch first -p -devtype SSD
The exact working principle will be explained in the other article. However for now, it
is good to know, that 3PAR allows to build RAID groups based on some capabilities:
-t: type of RAID.

-ha: here you can specify layout of RAID stripe size distribution. The policy
can be based on cage (default), magazine, backend port.
-ssz: option stands for set size in terms of chunklets. Default value is based
on specific RAID, for example RAID-1 2 chunklets, RAID-5 4 chunklets, RAID-6
8 chunklets.
-ss: with this option it is possible to set-up step size in kilobytes for
-ch: type of chunklets that would be preferred to build stripe (from
lowest/highest available in terms of numeration outer/inner zones of disks).
-p: pattern is used for creating LDs in terms of disk type (FC, SSD, NL)
How 3PAR deals with spares

Some chunklets are promoted for spares during first set up a system. The 3PAR
algorithms build chunklets from physical disks to maximize the usage of outer zones
within disk. As spares chunklets should be used only temporarily in emergency
situation, 3PAR decided to assign spare chunklets on the inner zones of disk. The
details of spare chunklets visible on your 3PAR can be shown with showspare
command.
3PAR-cluster cli% showspare
Deep investigation is not needed to see that the numeration of spare chunklets on
each drive has high number, hence chunklets designated for spares are from inner
zones of disks, which is obviously good.
But thats not all. The amount of chunklets that are candidates for spares are
determined by policy, which can be chosen by following vendor recommendation
from presents or chosen by us. We can distinguish below sparing algorithms:
default: amount of one full disk for every 40 disks, with required 4 disks as
minimum.
minimum: same as default, but without required minimum drives.
maximum: amount of one full disk per drive cage so called cage level high
availability.
custom: defined by user, but administrator should remember to add spares
while adding new drives.
Remember about vendor recommendation to create spare chunklets during first

system initialization, as this is the time when layout of created system is established
and specific chunklets can be distributed evenly among all physical disks.
Logging logical drive and spares?

In case of physical disk failure all new writes that would be committed to failed drive,
are redirected to logging logical disk. When drive come back online or time limit for
logging is reached, then reallocation is performed to free chunklets marked as spare
chunklets.
To see Logical Disk that are marked as logging disks, grep log from showpd
command as shown below.
ssh 3PAR-cluster showpd |grep log
Id Name
RAID -Detailed_State- Own
SizeMB
UsedMB Use Lgct LgId
WThru MapV
5 log0.0
1 normal
0/-/-/-
20480
0 log
0 ---
6 log1.0
1 normal
1/-/-/-
20480
0 log
0 ---
7 log2.0
1 normal
2/-/-/-
20480
0 log
0 ---
8 log3.0
1 normal
3/-/-/-
20480
0 log
0 ---
Following official guide, we can investigate informations related to logging drives,

which are:
Column Use: log in cell under this column means that logical disk is used as a
logging logical disk.
Column Lgct: The number of chunklets that are in logging mode in the logical
disk.
Column LgId: The ID of the logging disk that is being used for logging by the
logical disk.
Important information
Logging logical disk is entity that is entirely created and managed by 3PAR
system.
Replacing failed disk

The most common task on any storage array is to deal with failed drives. Storage
arrays makes tremendous work with our data, especially if cache hit is not on the
remarkable level. The question is, how to deal with failed drive, and what should be
under our attention.
The 3PAR is starting spit out many alerts regarding some disk, marking that situation
become serious.
2015-12-06
04:36:38
GMT
Informational
Disk
event
hw_disk:5000C50075EB86C4 pd 7 port b0 on 0:0:1: cmdstat:0x00 (TE_PASS -Success),
scsistat:0x02
asc/ascq:0x5d/0x0
(Check
(Failure
cmd_spec:0x0,
condition),
prediction
snskey:0x01
threshold
sns_spec:0x50000,
CDB:2A00562EE38800001800
(Recovered
exceeded),
info:0x0,
host:0x0,
(Write10),
blk:0x562ee388,
error),
abort:0,
blkcnt
0x18,
fru_cd:0x32, LUN:0, LUN_WWN:0000000000000000 after 0.007s, toterr:1808,

deverr:1138
2015-12-06
04:37:41
GMT
Degraded
Disk
abort
hw_disk:5000C50075EB86C4;sw_pd:7 pd 7 port b0 on 0:0:1: scsi abort/sick/hwerr

status TE_SMART_THRESH
2015-12-06
04:41:49
GMT
Informational
Disk
event
hw_disk:5000C50075EB86C4 pd 7 port b0 on 0:0:1: cmdstat:0x00 (TE_PASS -Success),
scsistat:0x02
asc/ascq:0x5d/0x0
cmd_spec:0x0,
(Check
(Failure
condition),
prediction
snskey:0x01
threshold
sns_spec:0x50000,
(Recovered
exceeded),
host:0x0,
error),
info:0x0,
abort:0,
CDB:2A000021D7C000004000 (Write10), blk:0x21d7c0, blkcnt 0x40, fru_cd:0x32,

LUN:0, LUN_WWN:0000000000000000 after 0.008s, toterr:1813, deverr:1143
That was matter of time, when disk totally crash.

2015-12-06 04:47:42 GMT 1 Informational Disk state change sw_pd:7 pd 7 wwn
5000C50075EB86C4 changed state from valid to missing because disk gone event
was received for this disk.
2015-12-06
04:47:42
GMT
Informational
Disk
state
change
hw_disk:5000C50075EB86C4 pd wwn 5000C50075EB86C45000C50075EB86C4

changed state from valid to missing because disk gone event was
received for this disk.
Lets check the situation with marked as failed disk. If you are uncertain about failed
drive you can see which PDs are failed with using command showpd with -failed
option. With showpd command information about systems physical disks can be
shown.
3PAR-cluster cli% showpd -failed
-Size(MB)-- ----Ports---Id CagePos Type RPM State Total Free A
7 0:7:0? FC
10 failed 838656
Capacity(GB)
0 ----- -----
900
----------------------------------------------------------------1 total
838656
Use again showpd with -c parameter, which gives visibility on chunklets.

3PAR-cluster cli% showpd -c 7
------- Normal Chunklets -------- ---- Spare Chunklets ---- Used - -------- Unused -------- - Used - ---- Unused ---Id CagePos Type State Total OK Fail Free Uninit Unavail Fail OK Fail Free Uninit
Fail
7 0:7:0? FC failed 819 0
667 152 0
---------------------------------------------------------------------------------------1 total
819 0
667 152 0
To see detailed information about chunklets within disk you can use command
showpdch.
3PAR-cluster cli% showpdch 7
The -i parameter shown disk details.

3PAR-cluster cli% showpd -i 7
Id CagePos State
----Node_WWN---- --MFR-- -----Model------ -Serial- -FW_Rev-
Protocol MediaType -----AdmissionTime----7 0:7:0?

3P01
failed 5000C50075EB86C4 SEAGATE SLTN0900S5xnN010 S0N1L0LN
SAS
Magnetic 2014-07-15 12:40:41 IST
----------------------------------------------------------------------------------------------------------------------1 total
After new disk will be replaced, event log will record it.
2015-12-08 12:03:06 GMT 1 Informational Disk state change sw_pd:160 pd 160
wwn 5000CCA0714AC3D3 changed state from new to valid because disk was
admitted successfully.
2015-12-08
12:03:06
GMT
Informational
Disk
state
change
hw_disk:5000CCA0714AC3D3 pd wwn 5000CCA0714AC3D35000CCA0714AC3D3

changed state from new to valid because disk was admitted successfully.
2015-12-08
12:03:06
GMT
Informational
Object
added
hw_disk:5000CCA0714AC3D3 Disk 5000CCA0714AC3D3 added
Important information
Remember that the PD_ID of replaced drive will be different from failed drive.
In this example pd_id of failed drive is 7, and replaced drive has been
assigned to the first available id, which is 160.
The array will still signal degraded status, until reallocation will be completed.
After reallocation, pd_id assigned to new drive disappear and replaced disk
will be visible under pd_id assigned previously to failed drive.
However all chunklets that resided on failed drive must be reallocated from logging
drive and spare area back to replaced disk. To see the progress use service mag
command or monitor the chunklets.
3PAR-cluster cli% servicemag status
Cage 0, magazine 7:
The magazine is being brought online due to a servicemag resume.
The last status update was at Tue Dec 8 12:04:07 2015.
Chunklets relocated: 404 in 3 hours, 17 minutes and 25 seconds
Chunklets remaining: 762
Chunklets marked for moving: 762

Estimated time for relocation completion based on 29 seconds per chunklet is: 6
hours, 8 minutes and 18 seconds
servicemag resume 0 7 -- is in Progress
Unfortunately above method is not 100% reliable, and from time to time, the
amount of chunklets and time for that operations are wrongly displayed. To check
how the process looks like you can use showpdch command with -mov parameter.
At the end of output sum of chunklets remaining for move is shown.
3PAR-cluster cli% showpdch -mov 160
What if I want replace disk manually?

If you see that this is only a matter of time, when your disk fail, and you possess
some spare drive in your closet, then you free to do it on your own.
You can do this at least on both ways. The most common is to use servicemag utility,
but from time to time for some reasons command fail and disk replacement is not
possible.
Lets say that our disk layout looks like on image presented below.
The log parameter determines that write operations to chunklets of specific drive
are committed to logging disk. The pdid obviously stands for disk ID. Also you could
add wait parameter to servicemag start. Task will not run in background and you
will have visibility on the whole process.
Command to use: servicemag start -log -pdid 8, which is shown on below image.
Check whether command finish successfully with servicemag status command.

3PAR-cluster cli% servicemag status d
According to documentation:
Any I/O on the chunklets marked normal,smag, changes the states to logging and
I/O is written to the logging logical disks.
Manual disk replacement procedure

In case servicemag command fail for some reasons, then you are pushed to do it
manually, using the whole sets of commands.
1. First thing to do is to stop disk from being use. To achieve it, 3PAR has special
command.
setpd ldalloc off <pd_id>
And to see detailed state of disk use command showpd -s.

3PAR-cluster cli% showpd -s <pd_id>
2. Now you can initiate the movement process for data from specified physical
disk to location chosen by system, which is one of the main steps in terms of
disk replacement.
The suitable command is movepdtospare with -vacate option. Vacate option
makes moves pernament and removes source tags after recolocation. The -f
parameter means that no confirmation is required.
In case this command fail, you will be forced to do it manually, chunklet by

chunklet.
3PAR-cluster cli% movech -perm -ovrd <pd_id>:<chunklet_location>
where:
-perm: chunklet are moved pernamently and original location will be forgot.
-ovrd: allows to move chunklet to some destination even if it will have impact
on quality. Option is necessary with -perm parameter.
3. Time to see whether we have any spare chunklets on disk designated for
removal, as previous step only moved data chunklets.
To display chunklets marked as spare use showpdch -spr command.
3PAR-cluster cli% showpdch -spr <pd_id>
4. Time to see whether we have any spare chunklets on disk designated for
removal, as previous step only moved data chunklets.
Command designated for that kind of task is shown below. It will remove all
spare chunklets off the disk. After execution check again whether any spares
exist.
3PAR-cluster cli% removespare <pd_id>:a
5. After all previous steps you can safely remove physical disk definition from
system. Hold on with physical disk replacement at this step.
3PAR-cluster cli% dismisspd <pd_id>
6. Check if dismissed disk shown us new. If yes, then it can be safely remove
from magazine.
7. In case you put in new disk and disk will not be automatically added to the
system, you have to do it manually
First thing is to determine the WWN of disk. Check this with showpd -i
command.
3PAR-cluster cli% showpd -i <pd_id>
After that use admitpd command to make new disk operational for system.
3PAR-cluster cli% admitpd <disk_wwn>
At the end, tunesys is necessary to make the proper layout of chunklets

within CPGs.
3PAR-cluster cli% tunesys

HP 3PAR Disk Replacement

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

HP 3PAR Disk Replacement

Caricato da

Copyright:

Formati disponibili

HP 3PAR disk replacement

December 15, 2015

HP 3PAR Tech Notes

HP 3PAR disk replacement. How to deal with failed

3PAR logical layer

3PAR and RAID protection

------(MB)-----Id Name Warn Limit Grow Args

-t: type of RAID.

How 3PAR deals with spares

Remember about vendor recommendation to create spare chunklets during first

Logging logical drive and spares?

RAID -Detailed_State- Own

UsedMB Use Lgct LgId

Following official guide, we can investigate informations related to logging drives,

Replacing failed disk

hw_disk:5000C50075EB86C4 pd 7 port b0 on 0:0:1: cmdstat:0x00 (TE_PASS -Success),

fru_cd:0x32, LUN:0, LUN_WWN:0000000000000000 after 0.007s, toterr:1808,

hw_disk:5000C50075EB86C4;sw_pd:7 pd 7 port b0 on 0:0:1: scsi abort/sick/hwerr

hw_disk:5000C50075EB86C4 pd 7 port b0 on 0:0:1: cmdstat:0x00 (TE_PASS -Success),

CDB:2A000021D7C000004000 (Write10), blk:0x21d7c0, blkcnt 0x40, fru_cd:0x32,

That was matter of time, when disk totally crash.

hw_disk:5000C50075EB86C4 pd wwn 5000C50075EB86C45000C50075EB86C4

Use again showpd with -c parameter, which gives visibility on chunklets.

The -i parameter shown disk details.

----Node_WWN---- --MFR-- -----Model------ -Serial- -FW_Rev-

Protocol MediaType -----AdmissionTime----7 0:7:0?

failed 5000C50075EB86C4 SEAGATE SLTN0900S5xnN010 S0N1L0LN

Magnetic 2014-07-15 12:40:41 IST

hw_disk:5000CCA0714AC3D3 pd wwn 5000CCA0714AC3D35000CCA0714AC3D3

hw_disk:5000CCA0714AC3D3 Disk 5000CCA0714AC3D3 added

Chunklets marked for moving: 762

What if I want replace disk manually?

Check whether command finish successfully with servicemag status command.

Manual disk replacement procedure

And to see detailed state of disk use command showpd -s.

In case this command fail, you will be forced to do it manually, chunklet by

3PAR-cluster cli% movech -perm -ovrd <pd_id>:<chunklet_location>

At the end, tunesys is necessary to make the proper layout of chunklets

Potrebbero piacerti anche