Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
V100R001
Emergency Maintenance
Issue
01
Date
2009-04-15
Huawei Technologies Co., Ltd. provides customers with comprehensive technical support and service. For any
assistance, please contact our local office or company headquarters.
Website:
http://www.huawei.com
Email:
support@huawei.com
Notice
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but the statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Contents
Contents
About This Document.....................................................................................................................1
1 Overview of Emergency Maintenance...................................................................................1-1
1.1 Definition of Emergency Maintenance...........................................................................................................1-2
1.2 Definition of Emergencies..............................................................................................................................1-2
1.3 Initiation of Emergency Maintenance.............................................................................................................1-2
1.4 Guidelines for Emergency Maintenance.........................................................................................................1-3
1.5 Flow for Emergency Maintenance..................................................................................................................1-3
1.5.1 Notifying Huawei of the Emergency.....................................................................................................1-5
1.5.2 Locating the Fault...................................................................................................................................1-5
1.5.3 Collecting Fault Information..................................................................................................................1-5
1.5.4 Rectifying the Fault................................................................................................................................1-6
1.5.5 Obtaining Help.......................................................................................................................................1-7
1.5.6 Checking the Handling Result................................................................................................................1-7
1.5.7 Recording Information About Emergency Maintenance.......................................................................1-7
1.6 Emergency Maintenance Precautions.............................................................................................................1-8
1.7 Technical Support...........................................................................................................................................1-9
Contents
4.1 Overview.........................................................................................................................................................4-2
4.2 Collection of Basic Fault Information.............................................................................................................4-2
4.3 Collection of Device Fault Information..........................................................................................................4-2
ii
Issue 01 (2009-04-15)
Figures
Figures
Figure 1-1 Flowchart of emergency maintenance................................................................................................1-4
Figure 1-2 Flowchart for identifying the type of a fault.......................................................................................1-6
Figure 2-1 Flowchart for handling device faults..................................................................................................2-3
Figure 2-2 Flowchart for handling the failed login to a system through the console interface............................2-5
Figure 2-3 Flowchart for handling the failed system start...................................................................................2-8
Figure 2-4 Flowchart for handling the abnormality of the board status.............................................................2-10
Figure 2-5 Flowchart for handling the abnormality of the interface status........................................................2-12
Figure 3-1 Flowchart for handling service faults.................................................................................................3-3
Figure 3-2 Flowchart for handling the failure to forward IP unicast packets......................................................3-6
Figure 3-3 Flowchart for handling the failure to forward IP multicast packets.................................................3-11
Figure 3-4 Flowchart for handling the failure to forward MPLS VPN packets.................................................3-15
Issue 01 (2009-04-15)
iii
Tables
Tables
Table 1-1 Methods of identifying the fault type...................................................................................................1-6
Table 2-1 Collection of information about the failure to log in to a system through the console interface.........2-4
Table 2-2 Collection of information about the failure to start a system...............................................................2-7
Table 2-3 Collection of information about the abnormality of the board status................................................2-10
Table 2-4 Collection of information about the abnormality of the interface status............................................2-11
Table 3-1 Collection of information about the failure to forward IP unicast packets..........................................3-4
Table 3-2 Collection of information about the failure to forward IP multicast packets.......................................3-9
Table 3-3 Collection of information about the failure to forward MPLS VPN packets.....................................3-14
Table 4-1 Collection of basic fault information...................................................................................................4-2
Table 4-2 Collection of device fault information.................................................................................................4-3
Table 6-1 Notice of emergency maintenance.......................................................................................................6-2
Issue 01 (2009-04-15)
Related Versions
The following table lists the product versions related to this document.
Product Name
Version
S9300
V100R001
Intended Audience
This document is intended for:
l
NM configuration engineers
Organization
This document is organized as follows.
Issue 01 (2009-04-15)
Chapter
Description
1 Overview of Emergency
Maintenance
2 Emergency Maintenance
for Device Faults
Chapter
Description
3 Emergency Maintenance
for Service Faults
4 Guide to Fault
Information Collection
6 Emergency Maintenance
Record Table
7 System Upgrading
Through BIOS
Conventions
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol
Description
DANGER
WARNING
CAUTION
TIP
NOTE
General Conventions
The general conventions that may be found in this document are defined as follows.
Convention
Description
Issue 01 (2009-04-15)
Convention
Description
Boldface
Italic
Courier New
Command Conventions
The command conventions that may be found in this document are defined as follows.
Convention
Description
Boldface
Italic
[]
{ x | y | ... }
[ x | y | ... ]
{ x | y | ... }*
[ x | y | ... ]*
&<1-n>
GUI Conventions
The GUI conventions that may be found in this document are defined as follows.
Convention
Description
Boldface
>
Keyboard Operations
The keyboard operations that may be found in this document are defined as follows.
Issue 01 (2009-04-15)
Format
Description
Key
Press the key. For example, press Enter and press Tab.
Key 1+Key 2
Key 1, Key 2
Mouse Operations
The mouse operations that may be found in this document are defined as follows.
Action
Description
Click
Double-click
Drag
Press and hold the primary mouse button and move the
pointer to a certain position.
Update History
Updates between document issues are cumulative. Therefore, the latest document issue contains
all updates made in previous issues.
Issue 01 (2009-04-15)
Issue 01 (2009-04-15)
1-1
Abnormal Switch Routing Unit (SRU) or Main Control Unit (MCU): All services are
interrupted.
Generally, alarms and logs about an abnormality are displayed before an emergency arises. You
can determine whether an emergency occurs by checking either alarms and logs or a complaint
of a customer.
NOTE
The roadmap of emergency maintenance described in this chapter applies to emergencies. For common
troubleshooting, refer to the Quidway S9300Terabit Routing Switch Troubleshooting.
1-2
Complaints of customers
Huawei Proprietary and Confidential
Copyright Huawei Technologies Co., Ltd.
Issue 01 (2009-04-15)
Alarm indication
When you check the alarms output by the Network Management System (NMS) or
displayed on the terminal, initiate the emergency maintenance if the alarms possibly cause
a wide range of service failures.
Natural disaster
When a natural disaster such as an earthquake, a fire, or a flood happens, it is required to
temporarily power off devices to prevent them from damages. Therefore, the emergency
maintenance need be initiated. Then power on the devices again after the disaster.
To keep the stable running of a device and minimize the probability of emergencies, refer
to the Quidway S9300Terabit Routing Switch Routine Maintenance.
The core function of emergency maintenance is to recover system operation and service
provisioning as soon as possible. To respond to an emergency, you must have ready plans
to cope with various emergencies according to the emergency maintenance manual.
Managers and maintenance personnel must be familiar with the plans and well-trained.
The maintenance personnel must attend the emergency maintenance training, which is
mandatory for maintenance personnel. You must learn the basic methods of identifying
emergent faults and how to handle them.
When an emergency occurs, keep calm and check whether the hardware devices and the
routing are working normally. Then check whether the emergency is caused by an
S9300. If it is caused by the S9300, handle the fault according to the prepared schemes or
the procedures in this manual.
The CF card contains important data. When an emergency occurs, do not format the CF
card before consulting Huawei engineers.
Contact the Customer Service Center or the local office of Huawei early for technical
support during troubleshooting.
After handling an emergent fault, collect alarm information related to this fault and send
the fault handling report, device alarm files, and log files to Huawei for analysis. This can
help Huawei to improve the after-sales service.
Issue 01 (2009-04-15)
1-3
You must maintain detailed records of operations and results for further reference by Huawei engineers
during troubleshooting so that they can handle a fault quickly.
When a fault persists, contact Huawei Customer Service Center. For contact information, see Technical
Support.
The main purpose of emergency maintenance is to recover a system as soon as possible. Figure
1-1 shows the flowchart of emergency maintenance.
Figure 1-1 Flowchart of emergency maintenance
Start
Notify Huawei of the
Emergency
Locate the Fault
Collect fault
information
Rectify the Fault
Service recover?
No
Obtain help
Yes
Check the handling
result
Record information
about emergency
maintenance
End
Issue 01 (2009-04-15)
Even if you can independently complete emergency maintenance with the guidance of this manual, notify
Huawei of the emergency. Then Huawei technical personnel maintain records of the fault to improve aftersales services.
Abnormal Switch Routing Unit (SRU) or Main Control Unit (MCU): All services are
interrupted.
Device alarms
Device logs
Device configuration
Issue 01 (2009-04-15)
1-5
Can log in
through the
console
interface?
No
Yes
System starts
normally?
No
Yes
Board status is
normal?
No
Yes
Interface status
is normal?
No
Yes
A service fault occurs
1-6
Item
Identifying Method
Login through
the console
interface
Connect the COM port of the PC or terminal to the console interface of the
S9300 with a standard RS-232 configuration cable and set relevant
parameters correctly on the terminal. For details, refer to the Quidway
S9300 Terabit Routing Switch Configuration Guide - Basic
Configurations. Check that a terminal displays normally, for example,
<Quidway> is available on the terminal.
System startup
Check whether the system starts normally. If the command prompt such as
<Quidway> is displayed, it means that the system starts normally.
Issue 01 (2009-04-15)
Item
Identifying Method
Board status
Run the display device command on the terminal to check whether the status
of all boards is Normal. In the case of a local fault, check the status of the
service board connected to the user who reports the fault. For example:
<Quidway> display device
S9312's Device status:
Slot Sub Type Online
Primary
- - - - - - - - - - - - - 9
LPU
Present
13
SRU
Present
Master
Power
Register
Alarm
- - - - - - - - - - - - - - - - - - - PowerOn
PowerOn
Registered
Registered
Normal
Normal
NA
Run the display interface command on the terminal to check whether the
status of the interface connected to the user who reports the fault is Up and
whether more packets are transmitted and received on the interface during
a specified period. For example:
Interface
status
After you identify the fault type, see 2 Emergency Maintenance for Device Faults and 3
Emergency Maintenance for Service Faults to proceed with emergency maintenance.
1-7
Version information
Fault symptom
For the format of an information record table, refer to Appendix A Emergency Maintenance
Record Table.
You need to record the output information during emergency maintenance by using the Capture
Text function of the HyperTerminal or the related functions of other Telnet terminals.
Static Electricity
Wear an ESD wrist strap before operating a board or the backplane, and follow these rules:
l
When the board to be replaced is a standby SRU/MCU, an LPU, or a CMU, run the
power off slot slot-id command to power off the board, and then remove the board.
Laser/LED
When you maintain a device with an optical module or optical interface, follow these rules:
l
Do not look straight into the optical fiber from which the light beam shoots out when you
install and maintain the optical fiber.
Do not look straight into the connector of the optical fiber from which the light beam shoots
out when you replace the pluggable optical module.
Only the qualified personnel who have attended training can operate the optical module
and optical fiber.
CAUTION
When you install and maintain the optical fiber, keep the connector of the optical fiber clean,
unfolded, and straight.
1-8
Issue 01 (2009-04-15)
Telephone: +86-755-28560000
Fax: +86-755-28560111
Website: http://support.huawei.com
Email: support@huawei.com
NOTE
For ease of contacting technical support personnel, it is recommended to make a phone directory and
mark it on the maintenance site. The phone directory can contain contact information about the superior
maintenance personnel, Huawei engineers, transmission office maintenance personnel, and remote
office maintenance personnel. At least two contact methods of each person must be provided.
The maintenance personnel need maintain a detailed record of the emergency maintenance
procedures, notify Huawei of the type of the board to be replaced, and apply for a spare one
according to the warranty articles. The fault can thus be removed sooner. The fax can adopt the
format of the Notice of Emergency Maintenance. For the details, refer to Appendix A
Emergency Maintenance Record Table.
Issue 01 (2009-04-15)
1-9
Issue 01 (2009-04-15)
2-1
2.1 Overview
This section describes the definition and types of device faults.
A device fault refers to a hardware failure of a device. To rectify a device fault, you must reset,
repair, or replace the relevant hardware.
During the running of a device, you can determine that a device fault occurs and initiate the
emergency maintenance in either of the following cases:
l
2.
3.
2-2
Issue 01 (2009-04-15)
Can log in
through the
console
interface?
No
Yes
System starts
normally?
No
No
Handle the
abnormality of the
board status
No
Handle the
abnormality of the
interface status
Yes
Board status is
normal?
Yes
Interface
status is
normal?
Yes
Proceed to the flow
for handling service
faults
2-3
2-4
No.
Collecting Item
Collecting Method
Communication
parameters of the
COM port
Indicator status
Issue 01 (2009-04-15)
Handling Flowchart
Figure 2-2 Flowchart for handling the failed login to a system through the console interface
start
Parameters of
the COM
interface are
correct?
No
Modify the
parameters
Fault rectified?
Yes
No
Yes
Cable is in good
condition?
No
Fault rectified?
Yes
No
Yes
No
Yes
No
Yes
The SRU/MCU
runs normally?
Fault rectified?
No
Exchange replace
the SRU/MCU
Fault rectified?
Yes
No
Yes
Yes
Reset the system
Fault rectified?
No
Seek technical
support
Issue 01 (2009-04-15)
End
2-5
CAUTION
All the following steps can be performed only when the user services are already interrupted. If
the user services are not interrupted, collect fault information and provide feedback to Huawei
engineers for further processing.
Procedure
Step 1 Check and modify the parameters of the COM port.
Check whether the parameters of the COM port are identical with those of the console interface
on the S9300. If the parameters are not identical, modify the parameters of the COM port.
By default, the console interface of the S9300 adopts 9600 bps as the baud rate, 8 as the data
bit, 1 as the stop bit, no parity check, and no flow control.
NOTE
When the parameters of the console interface are modified, adopt the modification.
Power module
Check that the power module is switched on. When there are multiple power modules, ensure
that at least one works normally.
Check whether the ALM indicator of the power module is on. If so, it indicates that the power
module is faulty. You can replace the power module to solve the problem.
When no problem is found after the preceding checking, but the power supply system fails
to work, see Technical Support for Huawei technical support.
Issue 01 (2009-04-15)
After you perform the preceding steps, you can reset the system if the fault persists. You can
switch off the power and switch on the power module after three minutes to reset the system.
Step 6 Seek technical support.
For seeking Huawei technical support, see Technical Support.
----End
The terminal stops at the file decompression state for a long period.
Issue 01 (2009-04-15)
No.
Collecting Item
Collecting Method
Information about
system startup
Check the name of the startup file through the Basic Input/
Output System (BIOS) menu.
2-7
Handling Flowchart
Figure 2-3 Flowchart for handling the failed system start
Start
The Cfcard
self-test fails?
Yes
Fault rectified?
Yes
No
No
Replace the
Cfcard
Fault rectified?
Yes
No
The module
self-test fails?
Yes
Debug or replace
the SRU/MCU
Yes
Fault rectified?
No
No
System continuously
restarts?
Yes
No
No
File is incorrectly
decompressed?
Fault rectified?
Yes
Yes
Fault rectified?
No
Seek technical
support
End
CAUTION
All the following steps can be performed only when the user services are already interrupted. If
the user services are not interrupted, collect fault information and provide feedback to Huawei
engineers for further processing.
2-8
Issue 01 (2009-04-15)
Procedure
Step 1 Remove and insert the CF card.
If the "CF Card Init.....FAIL!" message is displayed, the CF card may be held loosely. You can
try the following operations to solve the problem:
1.
2.
3.
When you run the display device command to view information about a board, the board
status is Abnormal.
When you run the display device command to view information about a board, the board
status is Unregistered.
2-9
Table 2-3 Collection of information about the abnormality of the board status
No.
Collecting Item
Collecting Method
Indicator status of a
board
Detailed information
about a board
Handling Flowchart
Figure 2-4 Flowchart for handling the abnormality of the board status
Start
Yes
Fault rectifyed?
No
Replace the board
Fault rectifyed?
No
Yes
End
CAUTION
All the following steps can be performed only when the user services are already interrupted. If
the user services are not interrupted, collect fault information and provide feedback to Huawei
engineers for further processing.
Procedure
Step 1 Reset the board.
2-10
Issue 01 (2009-04-15)
When you run the display interface command to view the status of an interface, the
interface status is DOWN.
When you run the display interface command to view the status of an interface, the number
of the sent and received packets on the interface remains the same.
The indicator status of an interface is abnormal. For example, the LINK indicator of the
interface is off.
Issue 01 (2009-04-15)
No.
Collecting Item
Collecting Method
Indicator status of an
interface
Detailed information
about an interface
Brief IP-related
information about an
interface
2-11
No.
Collecting Item
Collecting Method
Handling Flowchart
Figure 2-5 Flowchart for handling the abnormality of the interface status
Start
Yes
Status of
interface indicator
normal?
No
Interface status
is Up?
No
Is manually shut
down?
Yes
Yes
No
Yes
Detect the link
Fault rectified?
End
No
Packets are
transeived
normally?
No
Perform a local
loopback test
Yes
Fault rectified?
No
Is the status
normal?
Yes
No
Yes
End
2-12
Issue 01 (2009-04-15)
CAUTION
All the following steps can be performed only when the user services are already interrupted. If
the user services are not interrupted, collect fault information and provide feedback to Huawei
engineers for further processing.
Procedure
Step 1 Start the interface.
When you find that an interface is shut down through the shutdown command by checking the
configuration, you can run the undo shutdown command in the interface view to start it.
Step 2 Detect the link.
Before detecting a link, check whether the LINK indicator of the interface is on.
If so, it indicates that the physical link is Up and you can detect the link as follows:
1.
Check that the interface parameters at both ends of the link are identical, such as the duplex
mode and rate.
2.
When the interfaces are optical ones, check whether the receiving and sending optical
powers at both ends are normal by using the optical power meter. When you find that either
end only sends or receives data, the optical module is possibly faulty or the optical fiber
possibly fails to match the optical module. Then you can try to replace the optical module
or the optical fiber.
DANGER
Do not look straight into the optical fiber from which the light beam shoots out reversely along
a beam of light when you check the receiving and sending optical powers. You must use the
optical power meter to measure the optical power.
When the LINK indicator of the interface is off, you can check the link as follows:
1.
Perform a physical loopback test on the device. That is, connect the faulty interface to
another interface that is in the normal state with an optical fiber or cable in good condition.
2.
When the LINK indicator is on, it indicates that the interface runs normally. You need
check whether the optical fiber or the cable is damaged and whether the trunk link runs
normally. In this case, the neighboring office is required to cooperate.
3.
If the LINK indicator is off, it indicates that the interface hardware is faulty. When a
pluggable optical module is used, you can replace the optical module; otherwise, you can
cut over the services from the faulty interface to another interface that runs normally.
2-13
NOTE
After the local loopback test is complete, run the undo loopback command to disable the local loopback
immediately.
Step 4 Check and modify the configurations of the data link layer or the upper layer protocols.
If the interface still fails to send and receive packets in the local loopback test, check the
configuration of the data link layer or the upper layer protocols. For example, check that the
configurations of the Point-to-Point Protocol (PPP) or the High level Data Link Control protocol
at both ends are identical and the routing protocols run normally.
Step 5 Reset the interface.
After you perform the preceding steps, you can reset the interface if the fault persists.
To reset an interface, run the shutdown and undo shutdown commands.
Step 6 Contact Huawei technical support personnel.
For seeking Huawei technical support, see 1.7 Technical Support.
----End
2-14
Issue 01 (2009-04-15)
Issue 01 (2009-04-15)
3-1
3.1 Overview
This section describes the definition and types of service faults.
A service fault refers to the partial or global service congestion due to a software or network
fault. You can handle a service fault by modifying service configuration, resetting service
modules, or restoring network connections.
NOTE
Generally, a hardware fault may result in service interruption. For the handling of a device fault, see
Emergency Maintenance for Device Faults.
This chapter describes the emergency maintenance for service faults, focusing on fault clearance
and prompt service recovery rather than fault rectification. To locate, handle, and rectify common
service faults, refer to the Quidway S9300Terabit Routing Switch Troubleshooting.
For the S9300, emergent service faults that commonly occur fall into the following:
l
3-2
Issue 01 (2009-04-15)
Fault involves
all users?
Yes
No
Fault involves
users on certain
board?
Yes
No
Fault involves Yes
users on certain
interface?
No
Fault involves
users of certain
type?
Yes
No
Fault involves
single users?
Yes
Proceed to the
troubleshooting flow
No
End
NOTE
For a fault affects a single user, you do not need to initiate the emergency maintenance. For the common
handling flowchart of a fault, refer to the Quidway S9300Terabit Routing Switch Troubleshooting.
3-3
3-4
No.
Collecting Item
Collecting Method
display fib
Configuration of mesh-groups
10
11
12
13
OSPF errors
14
15
16
Issue 01 (2009-04-15)
No.
Collecting Item
Collecting Method
17
18
display rip
19
20
21
NOTE
FIB = Forwarding Information Base; ARP = Address Resolution Protocol; BGP = Border Gateway
Protocol; IS-IS = Intermediate System to Intermediate System; LSDB = Link State Database; OSPF =
Open Shortest Path First; RIP = Routing Information Protocol
Issue 01 (2009-04-15)
3-5
Handling Flowchart
Figure 3-2 Flowchart for handling the failure to forward IP unicast packets
Start
Can receive
upstream
packets?
No
Recover the
uplink
Fault rectified?
No
Yes
Can forward
packets?
No
Recover the
downlink
Fault rectified?
Yes
Routing entries
are correct?
No
Fault rectified?
Yes
No
No
Fault rectified?
Yes
No
Yes
Yes
No
Yes
Forwarding
entries are
correct?
Yes
Fault rectified?
Yes
No
Seek technical
support
End
CAUTION
All the following steps can be performed only when the user services are already interrupted. If
the user services are not interrupted, collect fault information and provide feedback to Huawei
engineers for further processing.
3-6
Issue 01 (2009-04-15)
Procedure
Step 1 Check and recover the uplink.
When some unicast packets fail to be forwarded, check whether the S9300 can receive upstream
packets. You can run the display interface command to view whether the number of received
packets on the device changes. When you find that the device cannot receive any upstream
packets, perform the following:
1.
Check whether the status of the upstream interface on the S9300 is normal. For details, see
Abnormality of the Interface Status.
2.
If the status of the upstream interface is normal, ping the peer interface of the upstream
interface. When the ping is successful, you can assume that a fault occurs on the upstream
device. To recover the system, contact the site office where the upstream device resides.
3.
When the ping fails, detect the link connecting the interface on the S9300 to the upstream
device. For example, check the cable for correct positioning, the optical module and the
optical power for normality, the relay agent for normality, and the IP address for
correctness.
4.
If the fault persists after you perform the preceding steps, contact Huawei for technical
support. For seeking technical support, see Technical Support.
Check whether the status of the downstream interface on the S9300 is normal. For details,
see Abnormality of the Interface Status.
2.
If the status of the downstream interface is normal, ping the peer interface of the downstream
interface. When the ping is successful, you can judge that a fault occurs on the downstream
device. To recover the system, contact the site office where the downstream device resides.
3.
When the ping fails, detect the link connecting the interface on the S9300 to the downstream
device. For example, check the cable for correct positioning, the optical module and the
optical power for normality, the relay agent for normality, and the IP address for
correctness.
4.
When the link is in good condition, the communication between the S9300 and the
downstream device is possibly abnormal. You need to check the configuration such as
routing according to the following step.
Check whether a route to the downstream device exists in the routing table of the S9300.
If the route does not exist, add a static route, and then check whether the ARP entries on
the downstream device can be learned.
2.
When the ARP entries on the downstream device cannot be learned, you can add static
ARP entries.
3.
If there is still no route to the downstream device in the routing table of the S9300, the
routing table is possibly oversized. You can try to delete unnecessary routing entries and
update the routing table. Then check whether the S9300 learns the route.
4.
If a route to the downstream device exists, check this routing entry for its correctness, such
as the routing protocol, subnet mask, preference, and hop count. As the troubleshooting of
Issue 01 (2009-04-15)
3-7
IP routing is complicated, it is not mentioned here. For details, refer to the Quidway
S9300Terabit Routing Switch Troubleshooting - IP Routing.
5.
If the fault persists after you perform the preceding steps, reset the relevant routing protocol.
For example, reset all IS-IS connections through the reset isis all command.
6.
If resetting the relevant routing protocol is ineffective, proceed to the following step.
If the system can be restarted through a software program, do not reset the system.
3-8
No multicast routing entry exists on the S9300 directly connected to the multicast source.
Clients fail to receive multicast data, which may be due to the incorrect configuration of
the Internet Group Management Protocol (IGMP).
The Protocol Independent Multicast (PIM) routing table has no (S, G) entry.
The multicast data can reach intermediate S9300s but not the last hop S9300.
The static Rendezvous Point (RP) fails to communicate with the dynamic RP.
The multicast video programs displayed are asynchronous on the clients connected to
different S9300s, but the program is played fluently, without mosaics.
Huawei Proprietary and Confidential
Copyright Huawei Technologies Co., Ltd.
Issue 01 (2009-04-15)
Before using the debugging command to collect debugging information, run the terminal debugging
command to enable the debugging display on a terminal, and then run the terminal monitor command
to enable the display on the terminal.
After you collect debugging information, run the undo debugging all command to disable all the
debugging immediately.
Table 3-2 Collection of information about the failure to forward IP multicast packets
Issue 01 (2009-04-15)
No.
Collecting Item
Collecting Method
display ip routing-table
10
11
12
13
14
3-9
No.
Collecting Item
Collecting Method
15
16
NOTE
PIM-SM = Protocol Independent Multicast-Sparse Mode; RPF = Reverse Forwarding Path
3-10
Issue 01 (2009-04-15)
Handling Flowchart
Figure 3-3 Flowchart for handling the failure to forward IP multicast packets
Start
No
Yes
Fault rectified?
No
Yes
TTL of the
packets is big
enough to clients?
No
Yes
Modify the TTL
Fault rectified?
No
Yes
RP about group G
on all devices is
identical?
No
Restore the RP
configurations
Yes
Fault rectified?
No
Yes
Multicast routing
entries are
correct?
No
Yes
Fault rectified?
No
Yes
Yes
Reset the system
Fault rectified?
No
Seek technical
support
Issue 01 (2009-04-15)
End
3-11
CAUTION
All the following steps can be performed only when the user services are already interrupted. If
the user services are not interrupted, collect fault information and provide feedback to Huawei
engineers for further processing.
Procedure
Step 1 Check and restore the IGMP configuration.
When clients fail to receive multicast data, check the IGMP configuration on the S9300
connecting the clients for correctness as follows:
1.
Check whether multicast is enabled on the S9300. That is, check whether the multicast
routing-enable command is run. If the command is not run, enable multicast in the system
view and ensure that IGMP is enabled on all interfaces. Then check whether the clients
succeed in receiving multicast data.
2.
If the clients still fail to receive multicast data, check whether the interface status is normal.
Run the display igmp interface interface-name command to view whether information
about the specified interface is displayed. If no information is displayed, see Abnormality
of the Interface Status to handle it; if the interface status is normal, check whether the
clients succeed in receiving multicast data.
3.
If the clients still fail to receive multicast data, check whether access control lists (ACLs)
are configured on the interface to prevent group G from joining the multicast group. Run
the display current-configuration interface interface-name command to check whether
the IGMP group policy is configured. If so, modify the ACL configuration to permit IGMP
group G to join the multicast group. Then check whether the clients succeed in receiving
multicast data.
4.
When the clients still fail to receive multicast data, check whether the interface resides on
the same network as the hosts. If the interface resides on a different network, modify the
IP address of the interface, and then check whether the clients succeed in receiving multicast
data.
5.
If the fault persists after you perform the preceding checking, run the reset igmp group
command to delete the IGMP group, and then add it again to the multicast group.
6.
If deleting the IGMP group is not effective, proceed to the following step.
Step 2 Check and modify the Time-to-Live (TTL) value of the packets sent by the multicast source.
Check the TTL value of the (S, G) packets sent by the S server. If this value is too small, it is
recommended to modify the TTL value to a larger one. The larger TTL value thus ensures the
packets reach the hosts.
Step 3 Check and modify the RP configuration.
If the fault persists after you perform the preceding steps, check the RP configuration for
correctness. First, ensure that all the devices in the PIM domain are enabled with PIM. There
are two cases:
When an RP is specified statically in the network, perform the following:
1.
3-12
Check whether the same static-rp command is run on all the devices. If the command is
not run, run the same static-rp command on all the devices, and then check the receiving
Huawei Proprietary and Confidential
Copyright Huawei Technologies Co., Ltd.
Issue 01 (2009-04-15)
of multicast data. When ACLs are configured, ensure that the ACL configurations are also
the same. Then check whether the clients succeed in receiving multicast packets.
2.
Check whether ACLs are configured to prevent the static RP from serving group G. If so,
modify the ACL configuration to remove the restriction. Then check whether the clients
succeed in receiving multicast packets.
Check whether the BSR is correctly configured by running the display pim bsr-info
command on the BSR. If the BSR is not configured, re-configure the BSR.
2.
Run the display pim rp-info command on the BSR to check whether the BSR learns RP
information. If the BSR fails to learn RP information, check that the RP is correctly
configured, a route between the BSR and the RP exists, and the BSR and the RP can ping
each other. If the route is faulty, refer to the Quidway S9300Terabit Routing Switch
Troubleshooting - Multicast.
3.
Run the display current-configuration command on both the BSR and the RP to check
whether the crp-policy commands are run to prohibit group G. If so, modify the ACL
configuration.
4.
Check whether the multicast routing entries from the RP to the clients, from the multicast
source to the RP, and from the multicast source to the clients are correct. For details, refer
to the VRP Troubleshooting - IP Multicast.
2.
If the fault persists after you troubleshoot the multicast routing entries, reset the
corresponding multicast and unicast routing protocols. For example, reset all IS-IS
connections through the reset isis all command.
3.
If resetting the relevant routing protocols is ineffective, proceed to the following step.
If the system can be restarted through a software program, do not reset the system.
Issue 01 (2009-04-15)
3-13
Before using the debugging command to collect debugging information, run the terminal debugging
command to enable the debugging display on a terminal, and then run the terminal monitor command
to enable the display on the terminal.
After you collect debugging information, run the undo debugging all command to disable all the
debugging immediately.
Table 3-3 Collection of information about the failure to forward MPLS VPN packets
No.
Collecting Item
Collecting Method
10
NOTE
LDP = Label Distribution Protocol; LSR = Label Switching Router; LSP = Label Switching Path
3-14
Issue 01 (2009-04-15)
Handling Flowchart
NOTE
First, check whether all or some MPLS VPN services are interrupted on a network. If some MPLS
VPN services are interrupted, the cause possibly lies in the incorrect setting of the maximum
transmission unit (MTU) of a certain device on the network. The protocol stack or application of some
servers does not minimize packet fragments. The length of a packet in VPN forwarding, however,
exceeds the default MTU 1500 after it is added with MPLS labels, each of which is of four bytes.
Therefore, the P that forwards MPLS packets must be set with an MTU greater than 1500 plus the label
length.
This section only describes the handling flowchart for the failure to forward all MPLS VPN packets.
Figure 3-4 Flowchart for handling the failure to forward MPLS VPN packets
Start
LDP sessions
are set up?
No
Yes
Restore LDP
Fault rectified?
Yes
No
LSPs are
set up?
No
Yes
Fault rectified?
No
Yes
VPN instances
are correctly
configured?
No
Yes
Fault rectified?
No
Yes
VPN forwarding
is normal?
No
Yes
Restore VPN
routers
Fault rectified?
No
Yes
Yes
Reset the system
Fault rectified?
No
Seek technical
support
Issue 01 (2009-04-15)
End
3-15
CAUTION
All the following steps can be performed only when the user services are already interrupted. If
the user services are not interrupted, collect fault information and provide feedback to Huawei
engineers for further processing.
Procedure
Step 1 Check and restore LDP.
When MPLS VPN services are interrupted on a network, check whether the LDP sessions
between Provider Edges (PEs) are set up. If the LDP sessions are not set up, perform the
following:
1.
Run the display mpls ldp command to check whether the LSR IDs of different PEs conflict.
On a network, similar to a router ID, an LSR ID must be globally unique. If the LSR IDs
conflict, change the LSR IDs to keep each of them unique. Then check whether the LDP
sessions can be set up.
2.
If the LDP sessions cannot be set up, run the display mpls ldp peer command to check the
IP address of the peer.
3.
Run the ping -a source-ip command to check whether the peer address is reachable.
4.
If the peer cannot be pinged, run the display ip routing-table command to check whether
the route destined for the peer is reachable. Then, run the display fib command to check
whether the forwarding entry exists in the FIB table of the local end. If neither the route
nor the corresponding forwarding entry exists, check the link layer and network layer.
5.
If packets cannot be forwarded after the LDP sessions are set up, proceed to the following
step.
Check how to set up an LSP by LDP. By default, only the route to a local loopback interface
is assigned labels to set up an LSP. When all the routes to local interfaces besides the
loopback interface need to be assigned labels to set up LSPs, the lsp trigger all command
must be run for LDP.
2.
Check that the label mapping message is received from the source device of the route. Then
check whether the outbound interface and next hop of the route are those in the label
mapping message. If the outbound interface and next hop are different, check the Interior
Gateway Protocol (IGP) configuration for correctness or reset the IGP.
3.
If MPLS VPN packets still fail to be forwarded after the successful setup of LDP LSPs,
proceed to the following step.
Issue 01 (2009-04-15)
1.
Check whether the Route-Distinguisher of each VPN instance is unique and the VPN target
configuration caters to the requirements of network planning. If the Route-Distinguisher
or VPN target does not meet requirements, re-configure them.
2.
Check whether interfaces join VPN instances. If the interfaces are not in the VPN instance,
re-bind the interfaces to the relevant VPN instances. Note that all IP-related configurations
on an interface are removed when the interface is bound to a VPN instance. Therefore, you
need to perform IP-related configuration again.
Check whether the Multicast Border Gateway Protocol (MBGP) neighbor relationships
between the PEs are set up. If the neighbor relationships are not set up, check whether the
IGP spreads the routes of a loopback interface to the peer. If the IGP does not spread routes
to the peer, modify the IGP configuration.
2.
Check whether the address family is created for each VPN instance in the BGP view and
the routes are imported to the BGP routing table according to the routing protocol between
PEs and CEs. Check whether the MBGP sessions between PEs use the loopback interfaces
for protocol connections. If the MBGP sessions do not use the loopback interfaces, cancel
the configuration and re-configure them.
3.
If static routes are configured between PEs and CEs, you need check whether the next hop
of a static route is directly connected. The next hop of a static route cannot be iterated.
Otherwise, delete the static routes and re-configure them.
4.
If the fault persists after you perform the preceding checking, reset the relevant IGP and
BGP. For example, reset all IS-IS connections through the reset isis all command and BGP
connections through the reset bgp command.
5.
If the system can be reset through a software program, do not power off the device.
Issue 01 (2009-04-15)
3-17
Issue 01 (2009-04-15)
4-1
4.1 Overview
This section describes the collection and classification of fault information.
After an emergency occurs, collect and back up fault information on time for reference. In
addition, provide fault information to Huawei engineers for fault location and rectification.
When a fault occurs, collect the following information:
l
Collecting Item
Collecting Method
Fault duration
Fault description
On the basis of the range that the fault spreads to and the
severity of the fault, record the fault severity level according
to the level definition.
Software version
Networking
information
Taken measures
NOTE
When you collect fault information through command lines, you can copy the information displayed on
the console, including the COM port or the Telnet terminal, and then attach it to a txt file for a record.
Issue 01 (2009-04-15)
Issue 01 (2009-04-15)
No.
Collecting Item
Collecting Method
Device information
Temperature
CPU usage
Routing table
information
Logs
Traps
Configuration
Diagnostic information
about the device
Interface information
10
Network connectivity
information
4-3
Issue 01 (2009-04-15)
5-1
5.1 Overview
This section describes the applicable environment and precautions for system reboot.
CAUTION
Do not reboot an S9300 randomly. If necessarily required, learn the guidelines and precautions
described in Overview of Emergency Maintenance or restart the system with the guidance of
Huawei engineers.
During the S9300 reboot, all services on the device should be interrupted except in the dualsystem hot backup networking. The services are resumed after the S9300 is rebooted
successfully.
An S9300 automatically reboots when an excessively severe fault occurs on it. After the
automatic reboot, the system begins to run normally. Therefore, you do not need to reboot a
system manually.
Rebooting an S9300 is applicable to only an emergency or an exception. For example, if an
S9300 fails to automatically reboot when services on it are interrupted and other taken measures
are ineffective, you can reboot it manually.
CAUTION
Do not remove the LPU or a flexible plug-in card of the SRU in service. The boards of other
types are hot pluggable.
The S9300 can be manually rebooted in either of the following ways:
5-2
Issue 01 (2009-04-15)
NOTE
It is not recommended to reboot an S9300 remotely; otherwise, the reboot may fail and services are
interrupted for a long period.
reboot Command
Enter the reboot command in the user view and press Y after the display. Then the system
reboots. The operation example is as follows:
<Quidway> reboot
Info:The system is now comparing the configuration, please wait.
Info:Save current configuration?[Y/N]:y
Now saving the current configuration to the device
Info:The current configuration was saved to the masterboard device successfully.
System will reboot! Continue?[Y/N]:
NOTE
After you run the reboot command, the displays maybe vary with different system versions.
Running the schedule reboot delay command, you can enable the scheduled reboot
function and set the wait delay.
You can set the wait delay for the scheduled reboot function in two formats: "hour:minute"
and "absolute minutes". The total minutes cannot be more than 30 x 24 x 60 minutes.
Running the schedule reboot at command, you can enable the scheduled reboot function
and specify the reboot date and time. Note that the specified date cannot be 30 days later
than the current date.
If the schedule reboot at command specifies the date parameter (yyyy/mm/dd) and the
date is a later date, the S9300 will reboot at a specified time with the error no more than
one minute.
If no specific date is set, the following situations occur:
Issue 01 (2009-04-15)
If the set time is later than the current time, the S9300 reboots at this time that day.
If the set time is before the current time, the S9300 reboots at this time the next day.
Huawei Proprietary and Confidential
Copyright Huawei Technologies Co., Ltd.
5-3
After the schedule reboot delay or schedule reboot at command is run, the system prompts
you to confirm the reboot. Enter Y or y, and the configuration takes effect. If the related
configuration exists, the latest configuration overrides the previous one.
NOTE
If you adjust the system time through the clock command after running the schedule reboot delay or
schedule reboot at command, the previous configuration through the schedule reboot delay or schedule
reboot at command becomes invalid.
You can run the undo schedule reboot command to remove the configuration through the
schedule reboot delay or schedule reboot at command.
You can run the display schedule reboot command to view the configuration through the
schedule reboot delay or schedule reboot at command.
CAUTION
It is recommended to reboot an S9300 in this mode only when a critical fault occurs in the power
supply system of the equipment room and the S9300, therefore, is powered off. In this case,
switch the S9300 off, and then switch it on again when the power supply system returns to
normal.
The S9300 chassis can hold two AC or DC power modules, which do not support intermixing.
It is recommended to install an active power module and a standby power module in the chassis,
which work in 1+1 load balancing mode.
The switch of the power module is located on the front panel of the power module. Turn the
switch of the power module point to OFF to turn off the power; turn the switch of the power
module point to ON to turn on the power.
When an S9300 uses two power modules for load balancing, you need to switch off both the
power modules to turn off the power and switch on both the power modules to turn on the power.
Issue 01 (2009-04-15)
Procedure
Step 1 In the topology navigation tree or the topology view, select the S9300 to be operated and rightclick it.
Step 2 Choose Device Management > Reboot Device on the shortcut menu.
Step 3 Click Yes in the displayed dialog box.
----End
Postrequisite
When the S9300 is rebooted, its node icon becomes unavailable.
After the reboot is successful, the node icon becomes green.
For detailed operations, refer to the NMS Online Help.
CAUTION
After an S9300 is rebooted, check that the configuration data is recovered correctly and
completely in case services are interrupted owing to failed recovery of configuration data. If
some configuration data is lost, add it manually and save it.
5.4.1 Displaying Information About System Reboot
5.4.2 Checking the Software Version and Configuration File
Issue 01 (2009-04-15)
5-5
:
:
:
:
:
:
:
:
LE02SRUA
Cavium Octeon
128KB
700MHz
133MHz
DDR2 SDRAM
512MB
667MHz
...
Recover configuration...OK!
Press ENTER to get started.
The preceding display shows that the S9312 is rebooted successfully. Press Enter and enter the
user view.
5-6
Issue 01 (2009-04-15)
The preceding information displays the Versatile Routing Platform (VRP) version, host version,
and patch version. You can check that the version numbers are the same as those before system
reboot.
cfcard:/s9300v100r001c02b112.cc
cfcard:/s9300v100r001c02b112.cc
cfcard:/s9300v100r001c02b112.cc
cfcard:/new.cfg
cfcard:/new.cfg
cfcard:/c02b112sph001.pat
cfcard:/c02b112sph001.pat
The preceding information displays the name of the current startup file, the name of the current
configuration file, and the patch package loaded when startup.
Issue 01 (2009-04-15)
5-7
Issue 01 (2009-04-15)
6-1
Device model
Capacity
Customer
Phone No.
Version
Complaint
time
Required
response time
In warranty
Yes No
Fault description and handling procedure (in detail): Approved by: Signature:
Filled by Huawei engineers
Handling
method
Handler
Customer
complaint
Basic information:
Routine
maintenance
Alarms
Other sources
Fault symptom:
Solution and result:
6-2
Issue 01 (2009-04-15)
Context
CAUTION
The process of upgrading the system through the Basic Input/Output System (BIOS) is
complicated and this method is not recommended. The BIOS is required only when the host
program of the S9300 cannot be started.
The BIOS can be used on only the FTP client. The operation terminal must be connected to the
S9300 through the COM port and communicate with the S9300 through the HyperTerminal.
NOTE
Take the S9312 upgrading procedure as an example here. The upgrading procedure of S9303 and S9306
is the same as the upgrading procedure of S9312.
Procedure
Step 1 Run FTP on the configuration terminal or PC to specify the path of system files. Create an FTP
user named 9300 and set the password as 9300.
Step 2 Reboot the S9312.
When the S9312 is powered on, the PC or terminal used to set up the configuration environment
displays the following:
input 'm' to Select Debug Console:
Boardname ..................................................................SRU
Start L2 Cache Test ? ('t' is test):skip
BIOS Creation Date ....................................... Feb 2 2009 14:48:10
Bootbus init.................................................................OK
DDR DRAM init................................................................OK
Memory Data Bus Walk '0' Test .............................................Pass
Memory Data Bus Walk '1' Test .............................................Pass
Issue 01 (2009-04-15)
7-1
The S9312 is starting the basic BootROM menu. Then, the S9312 starts the extended BootROM
menu.
If a fault is caused by detection or other reasons, the system displays the basic BootROM menu.
You can also press Ctrl+A within two seconds to enter the basic BootROM menu. Otherwise,
the system automatically initiates the extended BootROM menu.
The basic BootROM is used to upgrade the basic BootROM and the extended BootROM. For
details, see the following description.
Update BIOS menu(ver004)
Creation date: Feb 2 2009 14:48:04
1.
2.
3.
4.
5.
To upgrade the BootRom, you need to change the baud rate, and then download the files. After the
upgrade, restore the default connection rate to 9600 bit/s; otherwise, information may not be displayed
when you start or restart the system.
After you select 4, the system copies the extended BootROM to the SDRAM, and then decompresses
and starts the extended BootROM. After startup, the system starts the extended BootROM menu.
:
:
:
:
:
:
:
:
LE02SRUA VER.A
Cavium Octeon
128KB
700MHz
133MHz
DDR2 SDRAM
512MB
667MHz
CF Card Init....Done
7-2
Issue 01 (2009-04-15)
The initial password is 7800, which can be changed. If three wrong passwords are entered consecutively,
the system restarts.
MAIN MENU
Boot with default mode
Boot from Flash
Boot from CFCard
Enter serial submenu
Enter ethernet submenu
Modify Flash description area
Modify bootrom password
Reboot
SUBMENU
^D = quit
boot device
: eth1
processor number
: 0
host name
: host
file name
: s9300.cc
# Name of the software program to be
loaded
inet on ethernet (e) : 192.168.1.1:ffffff00
inet on backplane (b): 192.168.1.1
host inet (h)
: 192.168.1.2
# IP address of the FTP server
gateway inet (g)
:
user (u)
: 9300
# FTP user name
ftp password (pw) (blank = use rsh): 9300 # FTP login password
flags (f)
: 0x0
target name (tn)
: octeon
startup script (s)
:
other (o)
:
The preceding information shows that the name of the software program to be loaded is s9300.cc,
the IP address of the FTP server is 192.168.1.2, the FTP user name is 9300, and the password
is 9300. Modify the preceding parameters according to the actual situation. The other parameters
do not need to be modified.
Issue 01 (2009-04-15)
7-3
eth
1
0
host
s9300.cc
10.164.44.119
192.168.1.1:ffffff00
10.164.19.46
10.164.44.1
9300
9300
0x0
octeon
Loading...................................Done!
Please type a new file name for saving it.
Press return key to save it named "s9300.cc".
Check disk space
Writing
file..............................................................................
...........................................Done
SUBMENU
MAIN MENU
Boot with default mode
Boot from Flash
Boot from CFCard
Enter serial submenu
Enter ethernet submenu
Modify Flash description area
Modify bootrom password
Reboot
7-4
^s9300.cc
Issue 01 (2009-04-15)
Issue 01 (2009-04-15)
7-5