Sei sulla pagina 1di 26

Security Level:

The emergency
operations of the Solaris
for iManager M2000
V1.0
M2000TSD
2009-3-31

www.huawei.com

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential


Foreword

 This slide is an introduction of the


emergency operations of Solaris for
M2000

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 2


Objectives

Upon completion of this course, you will be


able to:
 Grasp the emergency operations of the Solaris
for M2000.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 3


Reference

 Guide to M2000 Emergency


Operations

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 4


Contents
• Abnormal Power Off of Server
• Shortage of Server Disk Space
• Server Hard Disk Failure
• CPU Occupancy is Too High
Abnormal Power Off of Server

 Problem Phenomenon
 Can ping, but cannot telnet M2000 server.
 Log into the Console (SC, RSC or Serial port), the
system prompts,
 Type control-d to proceed with normal startup, (or give root
password for system maintenance):
 Do not press control-d now, please see following solutions.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 6


Abnormal Power Off of Server

 Solution 1
 Step 1, login the console of the Solaris and you will find,
 Type control-d to proceed with normal startup, (or give root password
for system maintenance):
 please type the password of the root user. Do not type CTRL+D
 Step 2, run the following commands
 # fsck -y
 # fsck -y
 # fsck -y
 # sync; sync; sync; sync; sync; sync
 # reboot
 Normally, doing aforesaid operations can restore system.
After system restart, if it still prompts “Type control-d to
proceed with normal startup, (or give root password for
system maintenance):”, it means restoring failed. Please
input root password to execute the solution 2.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 7


Abnormal Power Off of Server
 Solution 2
 Step 1, check the /var/adm/messages and find out which
partition is bad. For example, the c1t1d0s0 and c1t1d0s7 are
bad
 Step 2, reboot system with the bootable CDROM/DVDROM
 # sync; sync; sync; sync; sync; sync
 # reboot -- cdrom -s //note, there are two minus characters after
the reboot command.
 Step 3, fscy again
 # fsck -y /dev/rdsk/c1t1d0s0 -------the c1t1d0s0 is just a example
 # fsck -y /dev/rdsk/c1t1d0s0
 # fsck -y /dev/rdsk/c1t1d0s7 -------the c1t1d0s0 is just a example
 # fsck -y /dev/rdsk/c1t1d0s7
 # sync; sync; sync; sync; sync; sync
 # reboot

 Normally, the system will be ok after the above operations.


You must execute solution 3 if the system prompts some
Super Blocks are destroyed during the step 3.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 8


Abnormal Power Off of Server
 Solution 3
 For example, the system prompts that there are some super
blocks are destroyed in c1t1d0s0. So,
 Step 1, get the super-block backups
 # newfs -Nv /dev/rdsk/c1t1d0s0
 Aforesaid command will list all c1t1d0s0’s available super block
backups, we can use one of them, say 98464.
 Step 2, fix the super-block with the its backup
 # fsck -y -F ufs -o b=98464
 repeat step 1 and 2 to fix all the super-blocks destroyed
 Step 3, fsck again
 # fsck -y
 # fsck -y
 # fsck -y
 # sync; sync; sync; sync; sync; sync
 # reboot

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 9


Contents
• Abnormal Power Off of Server
• Shortage of Server Disk Space
• Server Hard Disk Failure
• CPU Occupancy is Too High
Shortage of Server Disk Space

 Problem Phenomenon
 In M2000 client, information output window is at the
bottom. If it prompts that some partition’s occupancy
exceeds upper limit, then the partition requires
cleaning immediately.
 In M2000 client "Hard disk Monitor" interface, it shows
that some partition’s occupancy is high.
 Use "df -k" command to view that some partition’s
occupancy is high.
 Shortage of disk space may cause that performance
data can not be reported normally.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 11


Shortage of Server Disk Space
 Solution 1
 Step 1, check which partitions need to clean with "df -k" command
 Step 2, if,
 partition “/data”, it means that nothing to do.
 due to this partition is used to the database files of the Sybase and M2000. You must reinstall the
Sybase and M2000 if you remove any one file of the /data directory.
 partition “/” or “/export/home”, it means that you must follows step 3.
 Step 3, remove some unused files listed as following
 rm /var/log/vsftpd.log

 cd /export/home
 rm -r bak
 rm -r m2000software

 cd /export/home/omc/var
 rm -r ThresholdExport

 cd /export/home/omc/var/logs
 rm core*

 cd /export/home/omc/var/logs/tracebak
 rm iMAP*
 If this problem is not be fixed you have to do solution 2

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 12


Shortage of Server Disk Space

 Solution 2
 Please collect the following information and the get
help from the expert
 the amount of disk space occupied by file systems
 # df -k
 the detail information of some partition
 # du -k <partition name > | sort -n > /tmp/du_k.txt
 for example, # du -k /export/home | sort -n > /tmp/du_k.txt
 download the /tmp/du_k.txt from server via ascii mode

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 13


Contents
• Abnormal Power Off of Server
• Shortage of Server Disk Space
• Server Hard Disk Failure
• CPU Occupancy is Too High
Server Hard Disk Failure

 Problem Phenomenon
 Indicator light of one disk in disk array turns to be
yellow or red.
 When using OMCAutoStar tool or other command to
check disk status, find disk status is abnormal.
 In the message files of Solaris system, there are
alarms of disk failure.
 When using format command, find disk supplier or
other information of some disks can not be identified
and displayed.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 15


Server Hard Disk Failure

 What can field engineer do?


 Step 1, check which disk is bad
 Step 2, analyze what the impact is
 Step 3, call the hotline of vendor to replace hard disk if
it has the service of vendor, otherwise must get hard
disk from local spare center of Huawei and then
replace it by yourself.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 16


Server Hard Disk Failure

 check which disk is bad


 For local hard disk, there 3 ways to check,
 the light of hard disk
 each hard disk has a fault light. It will be turn on when the hard
disk is bad.
 execute the format command in the Solaris
 if you can not find out some hard disks, it does means that they
must have some bad things.
 execute the probe-scsi-all command in the OK state
 if you can not find out some hard disks, it does means that they
must have some bad things.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 17


Server Hard Disk Failure

 check which disk is bad


 For disk array, there 2 ways to check,
 the light of hard disk
 each hard disk has a fault light. It will be turn on when the
hard disk is bad.
 tools
 for the Sun StorEdge 3310/3320, run the sccli command by
root user, then select one disk array, then input show disks
command. The normal state must be ONLINE or STAND-BY
 for the Sun StorEdge 6140, open the Internet Explorer, and
then input https://x.x.x.x:6789, the x.x.x.x is must be the ip of
the M2000 server. The following movie is best operation
guide.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 18


Server Hard Disk Failure

 analyze what the impact is


 Local hard disk
 The two disks are mirrored each other, so the system will not crash
when the only one disk is bad. Anyway, you must replace the bad
disk as soon as possible.
 Disk array
 if there is not mirrored disk array
 it must be not mirrored when there is only one disk array.
 In general, there are at least one disk configured to hot-spare disk and
other disks configured to RAID5. So, it is ok if there are two bad disks. It
will crashed when there 3 or more bad disks
 if there is mirrored disk array
 It should be mirrored when there are two disk array.
 It is ok when one disk array is bad.
 The system must be abnormal when the M2000 or Sybase
is abnormal.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 19


Contents
• Abnormal Power Off of Server
• Shortage of Server Disk Space
• Server Hard Disk Failure
• CPU Occupancy is Too High
CPU Occupancy is Too High

 Problem Phenomenon
 In M2000 client, system prompts the alarm: “Server
Name ****, CPU Occupancy has reached its upper limit,
please rearrange it”.
 In M2000 client, “System Monitor > Monitor Browser >
Performance Monitor” interface, it shows that CPU
occupancy is over 80% and the status is abnormal.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 21


CPU Occupancy is Too High

 System max load period


 For example, an M2000 has two traffic statistics
measurement periods. One is 30 minutes, the other is
60 minutes. For the latter, there are the most
measurement counters and objects. Accordingly, the
loads of the M2000 are the heaviest. So, the system
max load period is 60 minutes.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 22


CPU Occupancy is Too High

 The CPU load is really high or not

CPU load(%) No Yes

100

75
50
25
0
t1 t2 t

T = System Max Load Period T

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 23


CPU Occupancy is Too High

 The ideal CPU load

CPU load(%)
100

75
50
25
0
t1 t2 t

T = System Max Load Period T

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 24


CPU Occupancy is Too High

 How to reduce the CPU load?


 Network Management Capability-------Most effective
 Evaluate the network management capability by the evaluation tool
and make sure it is less than 90% of capability.
 If the real capability is over 90%, you should,
 delete performance measure consumed CPU load as soon as possible
by the result of evaluation tool. In normally, the 5 and 15 minutes
measure, cell and neighborhood cell measure will consume most CPU
loads.
 if possible, migrate some NEs to anther M2000 system.
 Expand CPU and RAM of server or buy a new server
 Alarms-------most effective
 Shield some useless event alarms in NE side.
 Cancel all the settings of the alarm correlation
 Suspend some timer tasks
 We have to suspend some timer tasks to reduce the CPU load in
special time, but we must resume them in right time.
 Suspend the NE Configuration Data Synchronization task
 Suspend the NE Log Synchronization task

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 25


Thank you
www.huawei.com

Potrebbero piacerti anche