Sei sulla pagina 1di 5

Common issues and resolution-->>

1.
2.
3.
4.
5.

Power supply fault


Enclosure x cpu module A AC fault
HIB issue
Connectivity issue
DM Panic , DM Reboot (sev1)

Power Supply
-----------Resolution
---------1.>>Dial in to the box
>> RUn /nas/sbin/enclosure_status -e 0 -v
Check for FRU Status
eg.
------------------FRU STATUS------------------Pass
FRU CPU DIMM 0
Pass
Pass
FRU CPU DIMM 1
Pass
Pass
FRU CPU DIMM 2
Pass
Pass
FRU CPU DIMM 3
Pass
Pass
FRU CPU Module
Pass
Pass
FRU CPU IO Module
Pass
Pass
NAS Personality Card Pass
Pass
FRU Enclosure
Pass
Pass
FRU Coldfire
Pass
Pass
FRU Power Supply B Pass
Pass
FRU Power Supply A Pass
Check for power supplies. If power supply is not faulted - Close the SR after ba
sic healthcheck.
If power supply is faultedCheck part no. generally it is 071-000-508
To confirm part number follow this cd /nas/log
ll
cat...........resume.server.xml
Dispatch the case with the following action plan to fieldCE needs to check with customer regarding following dial home alert:
1.Enclosure 0 power supply A AC Fault.
And if Enclosure 0 power supply A AC is faulted then CE needs to replace it.
Location-Enclosure 0 power supply A AC.
part no- 071-000-508.

Primus:emc213435
->If the alert had occurred due to power outage then CE can close the case after
checking with customer.
2. Enclosure x cpu module A fault
-----------------------------------Resolution
--------->> Check the current state of the Data Mover (Blade)
nas_server -l
/nas/sbin/getreason -e
>>Check CPU Hardware status for errors with the following command:
/nas/sbin/enclosure_status -e 0 -v
If any fault is noticed then send that part for replacement with correct part n
o.
Ask CE to refer Primus:emc212726 .
3.HIB issue
----------This alert generates when there are multiple faults.
Steps to follow
-------------->>Check SYR logs. Check for the alerts, it will give a idea about faults.
to check SYR logs go to
http://omega.eng.emc.com
SYR > Celerra >> Serial Number >> Connect home info
Dial in and investigate for the alerts you noticed in SYR logs.
>> Also perform the complete healthcheck.
If any part is faulted then send it for replacement.
If there is no hardware fault then close the SR.
>>Include below mentioned action plan in final action plan for CE:
UPGRADE CALL HOME TEMPLATE :
*Once the array is working fine then CE need to following below mentioned Steps
to stop the HIB storming
For CLARiiON arrays, Also check Clariions on the back end of Celerra and DL HIBs
check the version of DH Template being used and load a new one where applicable
. (latest version is 7.32.0.1)
If the Dial Home Template is confirmed and upgraded, recommend the following upg
rades:
FOR CELERRA HIBs, recommend upgrade to NAS Code 6.0.51.6 or later (this code v
ersion includes HIB patch)

F0R Clariion HIBs, recommend upgrade to 04.30.000.5.523 or later (this code ver
sion includes HIB patch)
Next Action Owner: CE
When it is OK to resume normal processing, please let SYR know via omega syr fee
dback (syrfeedback@emc.com). Refer primus emc210386. Email with serial number of
array to unblock.
Primus# emc210386
4.Connectivity issue
------------------Case -1
==========
server_3 failed to ping resond to server_3b
use following commands:
ping -c4 (server_3 ip)
ping -c4 (server_3b ip)
server_ping server_3 (server3b ip)
case -2:
============
Enclosure 0 both management switches failed to respond to "ping".
1.Identify number of enclosure available and their MgmtSwitch IP address. Run th
e command
/nas/sbin/setup_enclosure readConfig
2.Now ping the IP address of the MgmtSwitches from the Control Station
ping -c 4 <ipaddress of mgmnt switches>
3.Check the cable connectivity between the enclosures by issuing the command
/nas/sbin/setup_enclosure checkCable
this will give status of c
able connectivity.

4.If management switch is not pingable then only use below command.
Management switch reset command :
/nas/sbin/setup_enclosure

resetMgmtswitches

5. DM Panic , DM Reboot (sev1)


-----------------------------Resolution
---------This alert comes under sev1 category.

Steps to follow:
>> Check nas_server -l
See if there is any DM failover
>>Check /nas/sbin/getreason
All slots should be in contacted state
>> Check uptime
>> Check if all file systems are mounted
server_mount ALL | grep -i unmount
>>If any file system is unmounted
> Call customer and ask the following
1. No. of users being affected
2. Duration of impact
3. Any onsite activity that might have caused it
While talking to the customer note his mood.
>Raise code red
>Contact SME
> Check sys logs, server logs (grep for panic)
>Check if autocollected dumps are available.
> Check the Panick Header
serch Knowledge based article for that panic header.
> Perform complete healthcheck
> Gather SP Collects and send it for NAS collab.
>> If no file system is unmounted
> Check sys logs, server logs (grep for panic)
>Check if autocollected dumps are available.
> Check the Panick Header
serch Knowledge based article for that panic header.
> Perform complete healthcheck
> Gather SP collects and send it for NAS collab
>>Customer may ask the reason of panic
To answer check panic header and refer relavent KB Article.
NOTE
---1. Do not Failback DM if back end is not clean.
If back end is clean then schedule the DM Failback according to customer's co
nvinient time.
This activity will have down time of 5-7 minutes
2. Do not failback LUNs untill the array is not operating normally
3. If nas_storage -c -a doesn't come clean and
/nas/sbin/navicli -h spa faults -list shows Array is operating normally
then Failback the LUNs using "nas_storage -f id=1"
and check if nas_storage -c -a comes clean.
Some times the command nas_storage -c -a will not run, it will show some "loc
k" error.
It is due to multiple process running togather.

Either kill process or wait for some time.It will work.


4. Rebooting control station will not have any down time but we need customer's
permission to perform this activity.
5. Do not dial into the box if customer's permission is needed for remote connec
tion.
6. Do not ask box credentials via Email.Ask for his convinient time to come on c
all to get credentials.
Do not ask him to send it via email,instaed ask his convinient way to provide
credentials.
7. We do not handle PS and ES Srs. If you come across such Srs inform TL and get
it moved to relavent que.
8. Do not call customer during off business hours (if site time is not between 8
.30 Am - 6 pm).

Potrebbero piacerti anche