Sei sulla pagina 1di 85
Arkadiy Shapiro Technical Marketing Engineer NX-OS and Nexus 7000 Network Failure Detection arshapir@cisco.com
Arkadiy Shapiro Technical Marketing Engineer NX-OS and Nexus 7000
Arkadiy Shapiro
Technical Marketing Engineer
NX-OS and Nexus 7000

Network Failure Detection

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

BRKRST-2333

Cisco Public

arshapir@cisco.com BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-2333 Cisco Public Ver 1.8
arshapir@cisco.com BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-2333 Cisco Public Ver 1.8
arshapir@cisco.com BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-2333 Cisco Public Ver 1.8
arshapir@cisco.com BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. BRKRST-2333 Cisco Public Ver 1.8

Ver 1.8

Why am I here? Access DC Nexus 2000 / 3000 / 3500 / 5000 Access

Why am I here?

Access DC Nexus 2000 / 3000 / 3500 / 5000 Access Catalyst 6500 100G 40G
Access
DC
Nexus 2000 / 3000 / 3500 / 5000
Access
Catalyst 6500
100G
40G
Catalyst 6500
CRS-3
Campus
Core
Routing Core
Campus
Core
ASR 9000
SP Edge
© 2013 Cisco and/or its affiliates. All rights reserved.
Cisco Public
100G
40G
Core ASR 9000 SP Edge © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

BRKRST-2333

4

Session Goals At the end of the session, the participants should:  Understand where failure
Session Goals At the end of the session, the participants should:  Understand where failure

Session Goals

At the end of the session, the participants

should:

Understand where failure detection fits in achieving network fast convergence

Be able to identify which failure detection technologies are needed to achieve business needs and required SLAs

Understand future advances in network

failure detection technologies

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

failure detection technologies BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 5

Cisco Public

5

Session Non-goals This session does not include:  Discussion on other aspects of fast convergence
Session Non-goals This session does not include:  Discussion on other aspects of fast convergence

Session Non-goals

This session does not include:

Discussion on other aspects of fast convergence

Details on software or hardware architectures of related Cisco products

Detailed roadmap discussion for related Cisco products

Detailed discussion on service / end-to-end failure technologies

Discussion on user-driven failure detection methods (ping, traceroute etc) and using scripts / EEM to automate those

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

6

Agenda  Overview  Layer 1 Failure Detection  Layer 2 Failure Detection  Layer
Agenda  Overview  Layer 1 Failure Detection  Layer 2 Failure Detection  Layer

Agenda

Overview

Layer 1 Failure Detection

Layer 2 Failure Detection

Layer 3 Failure Detection

Summary

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

7

Agenda  Overview  Layer 1 Failure Detection  Layer 2 Failure Detection  Layer
Agenda  Overview  Layer 1 Failure Detection  Layer 2 Failure Detection  Layer

Agenda

Overview

Layer 1 Failure Detection

Layer 2 Failure Detection

Layer 3 Failure Detection

Summary

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

8

Routing Convergence in Action

I don’t care, nothing changes for me

Overview

D Folks: my link to C is down Folks: my link to B is down
D
Folks: my link to C is
down
Folks: my link to B is
down
A
B
C
Ooops
Problem
Ok, fine, will use path
via D
t 0
t 1
t 2
t 3
t 4
Loss of Connectivity = t 4 – t 0

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

9

Routing Convergence Components

1. Failure Detection 2. Failure Propagation (flooding, etc.) IGP and BGP Reaction 3. Topology/Routing Recalculation
1. Failure Detection
2.
Failure Propagation (flooding, etc.)
IGP and BGP Reaction
3.
Topology/Routing Recalculation
4.
Update of the routing and forwarding table (RIB & FIB)
1
2
3
4
t
t 0
t 1
t 2
t 4
3
BRKRST-2333
© 2013 Cisco and/or its affiliates. All rights reserved.
Cisco Public

Overview

10

Failure Detection Overview  Detecting the failure is very critical but most challenging part of
Failure Detection Overview  Detecting the failure is very critical but most challenging part of

Failure Detection Overview

Detecting the failure is very critical but most

challenging part of network convergence

Failure Detection can occur on different levels / layers:

Physical Layer (1)

Data link Layer (2)

Network Layer (3)

Service / Application (not covered here)

Do you really need to touch all the layers?

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Overview

to touch all the layers? BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco

11

Interconnection Options L3 L2 L1 IP/MPLS A D Ethernet/FR/ATM … C SONET SDH OTN DWDN

Interconnection Options

L3
L3
L2
L2
L1
L1
IP/MPLS A D Ethernet/FR/ATM … C SONET SDH OTN DWDN B
IP/MPLS
A
D
Ethernet/FR/ATM …
C
SONET
SDH
OTN
DWDN
B
A D Ethernet/FR/ATM … C SONET SDH OTN DWDN B A. Layer 3 p2p B. Layer

A. Layer 3 p2p

B. Layer 3 with a Layer 1 (DWDM) “bump” in wire

C. Layer 3 with a Layer 2 (Ethernet / Frame Relay / ATM switch) “bump” in wire

Overview

D. Layer 3 with a Layer 3 (Firewall / router) “bump” in wire

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

12

Failure Detection Tools Layered Approach Service / Application Layer 3 MPLS Layer 2 Layer 1
Failure Detection Tools Layered Approach Service / Application Layer 3 MPLS Layer 2 Layer 1

Failure Detection Tools

Layered Approach

Service / Application Layer 3
Service /
Application
Layer 3
MPLS
MPLS
Layer 2
Layer 2
Layer 1
Layer 1
802.1ag CFM; Y.1731 PM; BFD for VCCV, GRE; FabricPath/TRILL OAM BFD for BGP, OSPF, IS-IS,
802.1ag CFM; Y.1731 PM;
BFD for VCCV, GRE; FabricPath/TRILL OAM
BFD for BGP, OSPF, IS-IS, EIGRP, FHRPs and static
BFD for MPLS LSPs / TE-FRR
802.1ag CFM/
802.3ah
UDLD
LACP
Y.1731 FM
Link OAM
Bit transmission
Signaling: Auto-negotiation / FEFI / Remote Fault Indication
Other: Carrier Delay / Debounce

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Overview

13

Engineering Complexity vs. Gain K.I.S.S Range of viable engineering options may vary by type of
Engineering Complexity vs. Gain K.I.S.S Range of viable engineering options may vary by type of

Engineering Complexity vs. Gain

K.I.S.S

Range of viable engineering

options may vary by type of application Number of possible approaches, or combinations of approaches.
options may vary by type of
application
Number of possible
approaches, or
combinations of
approaches.
Loss
(Impairments/Time)
Viable-
Re-engineering
Engineering
Required
Cost and
Complexity
Potential Over-
Engineering

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Overview

14

Agenda  Overview  Layer 1 Failure Detection  Layer 2 Failure Detection  Layer
Agenda  Overview  Layer 1 Failure Detection  Layer 2 Failure Detection  Layer

Agenda

Overview

Layer 1 Failure Detection

Layer 2 Failure Detection

Layer 3 Failure Detection

Summary

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

15

Layer 1 Failure Detection Layer 1 – IPoDWDM Proactive Protection  IP / optical integration
Layer 1 Failure Detection Layer 1 – IPoDWDM Proactive Protection  IP / optical integration

Layer 1 Failure Detection

Layer 1 IPoDWDM Proactive Protection

IP / optical integration enables the capability to identify degraded link using optical data (pre-FEC BER) and start protection (i.e. by signaling to the IGP/FRR) before traffic starts failing, achieving hitless failover in many cases

Optical

port on

router

Trans-

ponder

FEC

WDM
WDM
Working Switchover Protected path lost data path LOF BER
Working
Switchover
Protected
path
lost data
path
LOF
BER
FEC limit Optical impairments Corrected bits
FEC limit
Optical impairments
Corrected bits

Reactive protection

WDM port on router

FEC

WDM
WDM

Working path

Protect path

Near-hitless switch BER
Near-hitless
switch
BER
FEC limit Protection trigger Optical impairments Corrected bits
FEC limit
Protection
trigger
Optical impairments
Corrected bits

Proactive protection

HW

Support

CRS ASR 9000 XR 12000

7600

Check

specific

interface

types!

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

16

Layer 1 Failure Detection – Ethernet Link Fault Signaling Layer 1 Failure Detection  Ethernet

Layer 1 Failure Detection Ethernet

Link Fault Signaling

Layer 1 Failure Detection

Ethernet mechanisms like auto-negotiation (1 GigE), FEFI (100FX) or link fault signalling (802.3ae/ba) can signal local failures to the remote end

(802.3ae/ba) can signal local failures to the remote end R1 R2 tx rx X tx rx
R1 R2 tx rx X tx rx
R1
R2
tx
rx
X
tx
rx

Challenge to get this signal across an Eth-over-SDH/OTN cloud as relaying the fault information to the other end is not always possible

MUX-A

R1

MUX-B

R2

Optical Transport

is not always possible MUX-A R 1 MUX-B R2 Optical Transport rx X tx tx rx
rx X tx tx rx
rx
X
tx
tx
rx
tx rx rx tx
tx
rx
rx
tx
MUX-B R2 Optical Transport rx X tx tx rx tx rx rx tx “Bump” in Layer
MUX-B R2 Optical Transport rx X tx tx rx tx rx rx tx “Bump” in Layer
MUX-B R2 Optical Transport rx X tx tx rx tx rx rx tx “Bump” in Layer
MUX-B R2 Optical Transport rx X tx tx rx tx rx rx tx “Bump” in Layer
MUX-B R2 Optical Transport rx X tx tx rx tx rx rx tx “Bump” in Layer

“Bump” in Layer 1 link

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

19

Link Down Detection How Fast? Layer 1 Failure Detection  Link-down / interface-down event detection
Link Down Detection How Fast? Layer 1 Failure Detection  Link-down / interface-down event detection

Link Down Detection

How Fast?

Layer 1 Failure Detection

Link-down / interface-down event detection is hardware-dependent

Catalyst 6500 and Cisco 7600 OSM, SIP, 6708-10GE and more recent I/O

modules use interrupt-driven notification, offering <10ms detection

6704 offers <30ms with optimized polling All other older I/O modules are being polled in order, 20ms per port

worst case 48 * 20ms = 960ms to detect failure! Enhancement with CSCsr21196 (SXI, SRD2, SRC3) for fiber ports 60 msec

Nexus switches / CRS / ASR 9000 interrupt-driven notification

/ CRS / ASR 9000 – interrupt-driven notification BRKRST-2333 © 2013 Cisco and/or its affiliates. All
/ CRS / ASR 9000 – interrupt-driven notification BRKRST-2333 © 2013 Cisco and/or its affiliates. All

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

20

Carrier Delay  Running timer in software  Filters link up and down events, notifies
Carrier Delay  Running timer in software  Filters link up and down events, notifies

Carrier Delay

Running timer in software

Filters link up and down events, notifies protocols

By default, most IOS versions set timer at 2 seconds to suppress short flaps

This behaviour is not desirable for Fast Convergence

interface … carrier-delay msec 0

Not recommended to set carrier-delay to 0 on SVI

Standard routing platform feature

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Layer 1 Failure Detection

feature BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Layer 1 Failure

21

Asymmetric Carrier Delay Layer 1 Failure Detection  When connecting to an Ethernet Layer 2
Asymmetric Carrier Delay Layer 1 Failure Detection  When connecting to an Ethernet Layer 2

Asymmetric Carrier Delay

Layer 1 Failure Detection

When connecting to an Ethernet Layer 2 cloud, it may be

desirable to delay link-up for a bit, without changing link-down

carrier delay

Otherwise, the initial ARP request could get dropped in the L2 cloud, which can create short black-hole (due to incomplete adjacency)

SW Support IOS

12.0(32)SY2

12.2SRD

IOS XR

XR 3.4.0

interface … carrier-delay up 20 interface … carrier-delay up msec 20
interface …
carrier-delay up 20
interface …
carrier-delay up msec 20

Some device drivers have a built-in up-delay

POS: Generally 10 seconds 7600 ES20/40 WAN ports: 4 seconds

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

22

Debounce Timer  Delay link down notification only  Runs in firmware  100 msec
Debounce Timer  Delay link down notification only  Runs in firmware  100 msec

Debounce Timer

Delay link down notification only

Runs in firmware

100 msec default in NX-OS

300 msec default on IOS on copper, 10 msec on fiber

Most cases recommended to keep it at default

Standard switching platform feature

NX-OS switch(config)interface … switch(config-if)# link debounce time ? <0-5000> Timer value (in milliseconds)
NX-OS
switch(config)interface …
switch(config-if)# link debounce time ?
<0-5000> Timer value (in milliseconds)

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Layer 1 Failure Detection

23

Layer 1 Failure Detection Carrier Delay vs Debounce timer Carrier Delay / Asymmetric Carrier Delay

Layer 1 Failure Detection

Carrier Delay vs Debounce timer

Layer 1 Failure Detection Carrier Delay vs Debounce timer Carrier Delay / Asymmetric Carrier Delay Debounce
Layer 1 Failure Detection Carrier Delay vs Debounce timer Carrier Delay / Asymmetric Carrier Delay Debounce

Carrier Delay / Asymmetric Carrier Delay

Debounce timer

Carrier Delay / Asymmetric Carrier Delay Debounce timer Runs in software Runs in firmware Not applicable

Runs in software

Runs in firmware

Delay Debounce timer Runs in software Runs in firmware Not applicable to: • Switches except WAN

Not applicable to:

Switches except WAN interfaces ((i.e ES+ or SIP/SPA on Catalyst 6500)

Ethernet LAN switching interfaces on routers (i.e Cisco 7600 with WS-X6708 card)

Not applicable to :

Routers except Ethernet LAN switching interfaces (i.e Cisco 7600 with WS-X6708 card)

WAN interfaces on switches (i.e ES+ or SIP/SPA on Catalyst 6500)

SVIs

Filters link down and up events

Filters link down events only

Make sure to test before implementing!

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

24

Layer 1 Failure Detection Link Isolation - IP Event Dampening Logical Diagram SW Support IOS
Layer 1 Failure Detection Link Isolation - IP Event Dampening Logical Diagram SW Support IOS

Layer 1 Failure Detection

Link Isolation - IP Event Dampening

Logical Diagram

SW Support IOS IOS XE

IOS XR

Actual

interface

state

Actual interface state

Accumulated

penalty

Maximum penalty Suppress threshold Reuse threshold
Maximum penalty
Suppress threshold
Reuse threshold

Interface state seen by routing protocols

Interface state seen by routing protocols

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

25

Agenda  Overview  Layer 1 Failure Detection  Layer 2 Failure Detection  Layer
Agenda  Overview  Layer 1 Failure Detection  Layer 2 Failure Detection  Layer

Agenda

Overview

Layer 1 Failure Detection

Layer 2 Failure Detection

Layer 3 Failure Detection

Summary

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

26

Technology Analysis Layer 2 and Layer 3 Failure Detection  What layer?  Keepalive message
Technology Analysis Layer 2 and Layer 3 Failure Detection  What layer?  Keepalive message

Technology Analysis

Layer 2 and Layer 3 Failure Detection

What layer?

Keepalive message interval and timeout?

Types of failures detected?

Reaction to failures?

Methods to support ISSU?

Scale?

Protocol offload?

Standardization?

Types of interfaces supported?

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

 Types of interfaces supported? BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco

Cisco Public

27

Network Scenarios  Classical Ethernet Layer 2  Single p2p link  Bundle  FabricPath
Network Scenarios  Classical Ethernet Layer 2  Single p2p link  Bundle  FabricPath

Network Scenarios

Classical Ethernet Layer 2

Single p2p link

Bundle

FabricPath / TRILL

Single p2p link Bundle

Layer 3

Single p2p link Bundle SVI on top of Classical Ethernet SVI on top of FabricPath / TRILL

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Summary

/ TRILL BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Summary SVI SVI SVI
/ TRILL BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Summary SVI SVI SVI
SVI SVI SVI SVI
SVI
SVI
SVI
SVI

Cisco Public

28

Layer 2 – Data Link Layer Layer 2 Failure Detection  Generally only applicable to
Layer 2 – Data Link Layer Layer 2 Failure Detection  Generally only applicable to

Layer 2 Data Link Layer

Layer 2 Failure Detection

Generally only applicable to L2 transports using some form of keepalive

mechanism

PPP or HDLC keepalives Frame-Relay LMI ATM OAM Ethernet OAM, LACP (bundles), UDLD

Sub-second failure detection at scale typically not a goal using the features mentioned above

‒ Ethernet OAM / CFM is getting there… ‒ Fast UDLD

Tuning keepalive down to minimum is NOT recommended, can lead to false positives as keepalive processing may not be optimized

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

29

Layer 2 Failure Detection Unidirectional Link Detection (UDLD)  Light-weight Layer 2 failure detection protocol
Layer 2 Failure Detection Unidirectional Link Detection (UDLD)  Light-weight Layer 2 failure detection protocol

Layer 2 Failure Detection

Unidirectional Link Detection (UDLD)

Light-weight Layer 2 failure detection protocol

Designed for detecting:

One-way connections due to physical failure One-way connections due to soft failure Mis-wiring detection (loopback or triangle)

Cisco proprietary, but listed in informational RFC 5171

Runs on any single Ethernet link, even inside bundle

Typically a centralized implementation (hellos sent from

supervisor, not from LC)

Message interval: 7-90 sec (default: 15 seconds)

Detection: 2.5 x interval + timeout value (4 sec) ~ 21 sec

: 2.5 x interval + timeout value (4 sec)  ~ 21 sec Tx Tx Rx
Tx Tx
Tx
Tx

Rx

Rx

interval + timeout value (4 sec)  ~ 21 sec Tx Tx Rx Rx BRKRST-2333 ©

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

30

UDLD Basics of Operation Peer Discovery and Relationship  With ECHO messages, each device learns:
UDLD Basics of Operation Peer Discovery and Relationship  With ECHO messages, each device learns:

UDLD Basics of Operation

Peer Discovery and Relationship

With ECHO messages, each device learns:

What its connected to and peer’s message

interval

What its neighbors think they are connected to!

This information can then be used to detect faults

FLUSH message is sent when UDLD is disabled

Aging mechanism with PROBE messages

Information from neighbors that is not periodically refreshed is eventually timed out

This can also be used for fault detection

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Layer 2 Failure Detection

detection BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Layer 2 Failure Detection Cisco

Cisco Public

32

UDLD Scenario 1 Empty-Echo condition or age out Layer 2 Failure Detection Switch A e
UDLD Scenario 1 Empty-Echo condition or age out Layer 2 Failure Detection Switch A e

UDLD Scenario 1

Empty-Echo condition or age out

Layer 2 Failure Detection

Switch A e x/y e w/z Switch B X X X e P P U
Switch A
e x/y
e w/z
Switch B
X
X
X
e
P
P
U
k
k
U
D
t
t
D
L
M
M
L
g
D
g
D
r
r

Echo Packet from A to B has “My Switch-ID A, My Port-ID e x/y

When B sends the echo-reply back, it is expected to have “My Switch-ID

B, My Port-ID e w/z” AND “Your Switch-ID A, Your Port-ID e x/y”.

Transmit path failure from A to B

When B sends the echo-reply back, the echo-reply packet has only My

Switch-ID B, My Port-ID e w/z. B timed out!

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

33

UDLD Scenario 2 Miswiring Detection Layer 2 Failure Detection  Caused by packet flowing only
UDLD Scenario 2 Miswiring Detection Layer 2 Failure Detection  Caused by packet flowing only

UDLD Scenario 2

Miswiring Detection

Layer 2 Failure Detection

Caused by packet flowing only in one (uni) direction

Key differentiating factor of UDLD!

With SFP type fiber connection, this error is less common

Switch A

e x/y

Switch B

e w/z

Switch C e s/t
Switch C
e
s/t
common Switch A e x/y Switch B e w/z Switch C e s/t BRKRST-2333 © 2013

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

34

Layer 2 Failure Detection Fast UDLD SW Support IOS • 12.2.33 SXI4 • 12.2(54)SG 

Layer 2 Failure Detection

Fast UDLD

SW Support IOS 12.2.33 SXI4

12.2(54)SG

UDLD message interval to achieve sub-second detection

New Fast Hello TLV for backward compatibility

Message interval: 200 msec 1 sec

Similar considerations as Layer 3 timer tuning:

1 sec  Similar considerations as Layer 3 timer tuning:  CPU usage (false positives) and

CPU usage (false positives) and scale (not designed for this)

SSO / ISSU support

switch(config)#interface GigabitEthernet1/1 IOS switch(config-if)#udld fast-hello ? <200-1000> Time in
switch(config)#interface GigabitEthernet1/1
IOS
switch(config-if)#udld fast-hello ?
<200-1000> Time in milliseconds between sending of messages in steady state
switch#show udld fast-hello
Total ports with fast hello configured: 10
Total ports with fast hello operational: 5
Total ports with fast hello non-operational: 5
Fast hello configuration setting (millisecond):
Interface Gi1/1
Interface Gi1/6
200
operational
500
configured
© 2013 Cisco and/or its affiliates. All rights reserved.
Cisco Public

BRKRST-2333

37

Layer 2 Failure Detection UDLD Failure Reaction Normal vs. Aggressive mode Normal Aggressive Set port

Layer 2 Failure Detection

UDLD Failure Reaction

Normal vs. Aggressive mode

Normal

Aggressive

Set port to err-disable state in case of uni-

Set port to err-disable state in case of uni-

direction condition : Empty Echo packet, Uni-direction, TX/RX loop, and Neighbor Mismatch

direction condition : Empty Echo packet, Uni-direction, TX/RX loop, and Neighbor Mismatch

Does NOT err-disable the port in case of

Set port to err-disable state in case of

sudden cessation of udld packets

sudden cessation of UDLD packets:

port is put in err-disable mode if no udld packets are received for 3 x hello-time + 5 sec (=50 secs, default )

are received for 3 x hello-time + 5 sec (=50 secs, default ) BRKRST-2333 © 2013

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

38

Spanning Tree Bridge Assurance Layer 2 Failure Detection SW Support IOS • 12.2.33 SXI •
Spanning Tree Bridge Assurance Layer 2 Failure Detection SW Support IOS • 12.2.33 SXI •

Spanning Tree Bridge Assurance

Layer 2 Failure Detection

SW Support IOS

12.2.33 SXI

12.2.50SY

Turns STP into a bidirectional protocol

Ensures spanning tree fails “closed” rather than “open”

All ports with “network” port type send BPDUs regardless of state

If network port stops receiving BPDUs, port is placed in BA-Inconsistent

NX-OS

4.0(1)

state (blocked)

NX-OS %STP-2-BRIDGE_ASSURANCE_BLOCK: Bridge Assurance blocking port Ethernet2/48 VLAN0700. switch# sh spanning vl 700 |
NX-OS
%STP-2-BRIDGE_ASSURANCE_BLOCK: Bridge Assurance blocking port Ethernet2/48 VLAN0700.
switch# sh spanning vl 700 | in -i bkn
Eth2/48
Desg BKN*4
128.304 Network P2p *BA_Inc

Caveats:

Not recommended on VPC ports ISSU on Nexus 5000 not supported with STP BA (VPC peer-link is exception)

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

39

With Bridge Assurance

Layer 2 Failure Detection

Stopped receiving BPDUS! Malfunctioning Root BPDUs switch Network Network BA Inconsistent Network Network BPDUs
Stopped receiving
BPDUS!
Malfunctioning
Root
BPDUs
switch
Network
Network
BA Inconsistent
Network
Network
BPDUs
BPDUs
Blocked
BA Inconsistent
Network
Network
Stopped receiving
BPDUS!
Edge
Edge
%STP-2-BRIDGE_ASSURANCE_BLOCK: Bridge Assurance blocking port Ethernet2/48 VLAN0700.
switch# show spanning vl 700 | in -i bkn
Eth2/48
Altn BKN*4
128.304 Network P2p *BA_Inc
BRKRST-2333
© 2013 Cisco and/or its affiliates. All rights reserved.
Cisco Public
Layer 2 Failure Detection UDLD “Original” Deployment Scenarios Assist unidirectional Layer 2 protocols B Root
Layer 2 Failure Detection UDLD “Original” Deployment Scenarios Assist unidirectional Layer 2 protocols B Root

Layer 2 Failure Detection

UDLD “Original” Deployment Scenarios

Assist unidirectional Layer 2 protocols

B

Root switch

A

Assist unidirectional Layer 2 protocols B Root switch A 1 2 RSTP 802.1w Alternate 3 block
1 2 RSTP 802.1w Alternate 3 block
1
2
RSTP 802.1w
Alternate
3
block
B Root switch A 1 2 RSTP 802.1w Alternate 3 block STP Bridge Assurance C Figure

STP Bridge Assurance

1 2 RSTP 802.1w Alternate 3 block STP Bridge Assurance C Figure 1: Spanning Tree Loop

C

Figure 1: Spanning Tree Loop Prevention

B

Root switch

A

1 2 Alternate block
1
2
Alternate
block
Tree Loop Prevention B Root switch A 1 2 Alternate block STP Bridge Assurance Figure 2:

STP Bridge Assurance

B Root switch A 1 2 Alternate block STP Bridge Assurance Figure 2: Spanning Tree Fast

Figure 2: Spanning Tree Fast Convergence

Channel group 1 mode on

LACP
LACP

Figure 3: Ether-channel Convergence

C

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

42

Layer 2 Failure Detection UDLD Best Practices  How much do you really need UDLD?

Layer 2 Failure Detection

UDLD Best Practices

How much do you really need UDLD?

Physical uni-directional failures are communicated by Layer 1 mechanisms

STP Bridge Assurance to account for soft failures in either direction LACP to account for failures on bundle members Chance of mis-wiring may be rare Are you on Layer 3 / FabricPath p2p link with already bidirectional protocol?

3 / FabricPath p2p link with already bidirectional protocol?  If UDLD is needed:  Use
3 / FabricPath p2p link with already bidirectional protocol?  If UDLD is needed:  Use

If UDLD is needed:

Use normal mode Use default timers Only choose few interfaces to use for Fast UDLD

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

43

OAM Layer 2 Failure Detection Y.1731 Performance Management Connectivity Fault Management Customer E-LMI Access
OAM Layer 2 Failure Detection Y.1731 Performance Management Connectivity Fault Management Customer E-LMI Access

OAM

Layer 2 Failure Detection

Y.1731 Performance Management Connectivity Fault Management Customer E-LMI Access Access Core Access Access
Y.1731 Performance
Management
Connectivity
Fault Management
Customer
E-LMI
Access Access
Core
Access
Access
Business
Backbone
Backbone
Bridges
Bridges
Provider
Provider
Bridges
Bridges
MSE/BNG
Residential
Ethernet
IP/MPLS
Link OAM
MPLS
UNI
NNI
NNI
OAM
NNI
UNI

Current Protocol Positioning

Customer

Business

Business Residential
Business Residential

Residential

Link OAM - Any point-to-point 802.3 link

CFM / Y.1731 - End-to-End UNI to UNI

E-LMI - User to Network Interface (UNI)

MPLS OAM - within MPLS cloud

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

45

Ethernet OAM Building Blocks  IEEE 802.3ah (clause 57)  Ethernet Link OAM  Also
Ethernet OAM Building Blocks  IEEE 802.3ah (clause 57)  Ethernet Link OAM  Also

Ethernet OAM

Building Blocks

IEEE 802.3ah (clause 57)

Ethernet Link OAM

Also referred as 802.3 OAM or Link OAM

IEEE 802.1ag

Connectivity Fault Management (CFM) Also referred as Service OAM

ITU-T Y.1731

OAM functions and mechanisms for Ethernet-based networks

MEF E-LMI

Ethernet Local Management Interface

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Layer 2 Failure Detection

Interface BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Layer 2 Failure
Interface BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Layer 2 Failure
Interface BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Layer 2 Failure
Interface BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Layer 2 Failure

46

Link OAM IEEE 802.3ah, Clause 57 (IEEE 802.3-2008)  Provides mechanisms for “monitoring link operation”
Link OAM IEEE 802.3ah, Clause 57 (IEEE 802.3-2008)  Provides mechanisms for “monitoring link operation”

Link OAM

IEEE 802.3ah, Clause 57 (IEEE 802.3-2008)

Provides mechanisms for “monitoring link operation”

Runs on any single point-to-point Ethernet link

Uses “Slow Protocol” 1 frames called OAMPDUs

OAMPDU interval: 100 msec 1 sec (1-10 pps)

Minimum Timeout: 200 msec (IOS XR), 2 sec (IOS)

Extensible and flexible protocol

Support mainly on Carrier Ethernet platforms:

Cisco 7600, ASR 9000, ASR 901, ASR 903, ME switches

(1) No more than 10 frames transmitted in any one-second period

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Layer 2 Failure Detection

period BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Layer 2 Failure
period BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Layer 2 Failure

48

IEEE 802.3ah Key Functions  OAM Discovery Layer 2 Failure Detection  Discover OAM support,
IEEE 802.3ah Key Functions  OAM Discovery Layer 2 Failure Detection  Discover OAM support,

IEEE 802.3ah

Key Functions

OAM Discovery

Layer 2 Failure Detection

Discover OAM support, peer identity and capabilities per device

Link Monitoring

Basic error definitions for Ethernet so entities can detect degraded links and isolate them

Remote Failure Indication

Mechanisms for one entity to signal another that it has detected an error

Remote Loopback

Used to troubleshoot networks, allows one station to put the other station into a state whereby all inbound traffic is immediately reflected back onto the link

Remote MIB Variable Retrieval

Ability to read one or more MIB variables from the remote DTE

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

49

Link OAM Discovery  First phase of Ethernet OAM  Discovery has a simple state
Link OAM Discovery  First phase of Ethernet OAM  Discovery has a simple state

Link OAM Discovery

First phase of Ethernet OAM

Discovery has a simple state machine:

Send Information OAMPDU in a periodic fashion

Discover peer device and its OAM configuration and capabilities

Decide whether OAM clients can be fully operational on the link

Detect timeout based on lack of OAMPDUs from peer

No message interval exchange or negotiation!

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Layer 2 Failure Detection

switch#show ethernet oam discovery interface fas 1/1

FastEthernet1/1

Local client

------------

Administrative configurations:

Mode:

Unidirection:

Link monitor:

Remote loopback:

MIB retrieval:

Mtu size:

Operational status:

Port status:

Loopback status:

PDU revision:

Remote client

-------------

active not supported supported (on) not supported not supported

1500

operational no loopback

0

MAC address: 0011.9321.1640

Vendor(oui): 00000C(cisco)

Administrative configurations:

PDU revision:

Mode:

Unidirection:

Link monitor:

Remote loopback:

MIB retrieval:

Mtu size:

1

active not supported supported not supported not supported

1500

Cisco Public

51

Link OAM scale and ISSU • Scale  Slow protocol but 100 msec interval for
Link OAM scale and ISSU • Scale  Slow protocol but 100 msec interval for

Link OAM scale and ISSU

Scale

Slow protocol but 100 msec interval for all ports on a

linecard is not slow!

Protocol offload to I/O module CPU helps

Protocol offload to FPGA (ME 3400) helps even more!

ISSU (the “zero service disruption one”)

Need graceful protocol mechanisms to support SSO / ISSU standard does not specify

Not possible to inflate timers since timers are not

negotiated!

Layer 2 Failure Detection

since timers are not negotiated! Layer 2 Failure Detection BRKRST-2333 © 2013 Cisco and/or its affiliates.

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

53

Link OAM Basic Configuration IOS and IOS XR Layer 2 Failure Detection IOS XR IOS
Link OAM Basic Configuration IOS and IOS XR Layer 2 Failure Detection IOS XR IOS

Link OAM Basic Configuration

IOS and IOS XR

Layer 2 Failure Detection

IOS XR
IOS XR
IOS
IOS

TenGigE 0/1/0/0

TenGigEthernet4/1

T e n G i g E 0 / 1 / 0 / 0 TenGigEthernet4/1 interface
interface TenGigE 0/1/0/0 ethernet oam hello-interval 100ms connection timeout 2 Value in msec or sec
interface TenGigE 0/1/0/0
ethernet oam
hello-interval 100ms
connection timeout 2
Value in
msec or sec
Local hello
multiplier
interface TenGigEthernet4/1 ethernet oam ethernet oam max-rate 10 ethernet oam timeout 2 Value in pps
interface TenGigEthernet4/1
ethernet oam
ethernet oam max-rate 10
ethernet oam timeout 2
Value in pps
Value in
seconds

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

55

Link OAM - Link Monitoring  Monitor link quality every 1 sec (min)  Conditions
Link OAM - Link Monitoring  Monitor link quality every 1 sec (min)  Conditions

Link OAM - Link Monitoring

Monitor link quality every 1 sec (min)

Conditions monitored:

Errored Symbol Period

Errored Frame

Errored Frame Period

Errored Frame Seconds

Receive CRC (Cisco defined IOS only)

Transmit CRC (Cisco defined IOS only)

Configure error condition thresholds to:

Signal peer with “Event Notification” OAMPDU

Syslog / SNMP trap

Isolate the link

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Layer 2 Failure Detection

the link BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Layer 2 Failure Detection
the link BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Layer 2 Failure Detection

Cisco Public

56

Layer 2 Failure Detection Link OAM – Link Monitoring Example: CRC Detection and Link Isolation

Layer 2 Failure Detection

Link OAM Link Monitoring

Example: CRC Detection and Link Isolation (IOS)

Problem

CRC! CRC!
CRC!
CRC!

Ensure CRCs injected by devices don’t propagate

through the network Need to operate with or without neighbor discovery  Solution interface GigabitEthernet1/1
through the network
Need to operate with or without neighbor discovery
 Solution
interface GigabitEthernet1/1
ethernet oam
ethernet oam link-monitor receive-crc window 1
ethernet oam link-monitor receive-crc threshold high 10
ethernet oam link-monitor high-threshold action error-
disable-interface
……
Nov 10 09:56:08.643: EOAM LM(Gi1/1): sending an EventTLV!
Nov 10 09:56:09.643: %ETHERNET_OAM-5-LINK_MONITOR: 94 rx CRC
errors detected over the last 1 seconds on interface Gi1/1.
Nov 10 09:56:09.643: EOAM LM(Gi1/1): sending an EventTLV!
Nov 10 09:56:09.647: %PM-SP-4-ERR_DISABLE: link-monitor-failure
error detected on Gi1/1, putting Gi1/1 in err-disable state

IEEE 802.3ah for link monitoring and error-disable

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Link OAM Miswiring Detection (IOS XR only) Closing the gap with UDLD Layer 2 Failure
Link OAM Miswiring Detection (IOS XR only) Closing the gap with UDLD Layer 2 Failure

Link OAM Miswiring Detection (IOS XR only)

Closing the gap with UDLD

Layer 2 Failure Detection

SW Support IOS XR 3.9

Mechanism to detect miswiring of Ethernet

ports

Similar to UDLD, but using standard protocol with Cisco vendor extension

Uses existing 4-byte field in periodic OAMPU (Information OAMPDU Vendor TLV Vendor Informationfield)

Vendor Information is copied back by the

peer, allowing for MWD

Interoperates with other 802.3ah-compliant vendors

X Y I am X I am Z, I know Y I am Y, I
X
Y
I am X
I am Z,
I know Y
I am Y,
I know X
vendors X Y I am X I am Z, I know Y I am Y, I

Z

interface TenGigE 0/1/0/0

ethernet oam action wiring-conflict error-disable-interface

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

59

Link OAM Failure Reaction Path Isolation  No standards that define this!  Depending on
Link OAM Failure Reaction Path Isolation  No standards that define this!  Depending on

Link OAM Failure Reaction

Path Isolation

No standards that define this!

Depending on implementation, available options for

failure reaction / path isolation:

Syslog / SNMP trap Signal peer using specific OAMPDU Error-disable Error-block

Error-disable operate at Layer 1, useful when need to force manual intervention after error (like mis-wiring)

Today, only IOS XR can isolate path based on peer

timeout or received notification OAMPDU!

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Layer 2 Failure Detection

OAMPDU! BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Layer 2 Failure

60

Link OAM Failure Reaction Layer 2 Failure Detection Path Isolation with Ethernet Failure Detection (EFD)

Link OAM Failure Reaction

Layer 2 Failure Detection

Path Isolation with Ethernet Failure Detection (EFD)

Mechanism for OAM protocol to bring down interface “line protocol” state

when a problem is detected

“line protocol” state when a problem is detected  Interface / sub-interface / bundle is “down”

Interface / sub-interface / bundle is “down” to routing / switching protocols (MSTP, ARP, IGPs, BGP) will trigger reconvergence

E-OAM protocols continue to operate

Automatic recovery when fault is resolved

IOS XR only, IOS supports error-block

Benefits:

interface TenGigE 0/1/0/0 ethernet oam … action link-fault error-disable-interface action link-fault efd action
interface TenGigE 0/1/0/0
ethernet oam
action link-fault error-disable-interface
action link-fault efd
action discovery-timeout error-disable-interface
action discovery-timeout efd

Reduced interface up/down churn Deterministic recovery

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

61

Ethernet Failure Detection (EFD) Logical Diagram Layer 2 Failure Detection DOWN UP L2VPN IPv4 IPv6
Ethernet Failure Detection (EFD) Logical Diagram Layer 2 Failure Detection DOWN UP L2VPN IPv4 IPv6

Ethernet Failure Detection (EFD)

Logical Diagram

Layer 2 Failure Detection

DOWN UP

DOWN

UP

L2VPN IPv4 IPv6 MPLS SW Support IOS XR • 3.9 CDM Packet DOWNUP MAC layer
L2VPN
IPv4
IPv6
MPLS
SW Support
IOS XR
• 3.9
CDM
Packet
DOWNUP
MAC layer
EFD
Link OAM
I/O
Failure detected
UP
Interface

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

62

Link OAM vs UDLD Who Wins?  Link OAM adoption is growing, could be adopted
Link OAM vs UDLD Who Wins?  Link OAM adoption is growing, could be adopted

Link OAM vs UDLD

Who Wins?

Link OAM adoption is growing, could be adopted in

enterprises / DC in future

Stick with UDLD (at least for now):

Link OAM mis-wiring detection only on IOS XR as proprietary extension

Link OAM path isolation based on timeout only in IOS XR

Consider Link OAM today:

Must adhere to standard protocols

Link Monitoring capabilities

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Layer 2 Failure Detection

BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Layer 2 Failure Detection
BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Layer 2 Failure Detection

63

Layer 2 Failure Detection Link Aggregation Control Protocol (LACP) IEEE 802.1ax (formerly 802.3ad)  Protocol
Layer 2 Failure Detection Link Aggregation Control Protocol (LACP) IEEE 802.1ax (formerly 802.3ad)  Protocol

Layer 2 Failure Detection

Link Aggregation Control Protocol (LACP)

IEEE 802.1ax (formerly 802.3ad)

Protocol used to:

Ensure configuration consistensy across bundle

members on both ends

Ensure wiring consistency (bundle members between 2 chassis)

Detect unidirectional links

Bundle member keepalive

Peers negotiate requested send rate among

other things through LACPDUs

Loss of heartbeat typically triggers port suspend

 Loss of heartbeat typically triggers port suspend BRKRST-2333 © 2013 Cisco and/or its affiliates. All
 Loss of heartbeat typically triggers port suspend BRKRST-2333 © 2013 Cisco and/or its affiliates. All
 Loss of heartbeat typically triggers port suspend BRKRST-2333 © 2013 Cisco and/or its affiliates. All

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

64

Layer 2 Failure Detection LACP Slow, Fast and Super Fast Hellos  Traditional LACP heartbeat

Layer 2 Failure Detection

LACP Slow, Fast and Super Fast Hellos

Traditional LACP heartbeat intervals

Long interval: 30 sec 90 sec failure detection

interface Ethernet1/7 lacp rate fast

IOS / NX-OS

detection interface Ethernet1/7 lacp rate fast IOS / NX-OS  Short interval: 1 sec  3
 Short interval: 1 sec  3 sec failure detection IOS / IOS-XE / IOS
 Short interval: 1 sec  3 sec failure detection
IOS / IOS-XE / IOS XR / NX-OS
interface gig 0/1/2/3
bundle id <n> mode active
lacp period short
 Heartbeats typically sent from supervisor, so SSO /
ISSU will not work with aggressive timers
 Very fast LACP hellos sent from ASR 9K / CRS
linecard
 Proprietary Cisco extension on IOS-XR allows for:
interface gig 0/1/2/3
bundle id <n> mode active
lacp period 100

ISSU support with fast timers (from IOS XR 4.1)

Signalling at 100 msec with 300 msec failure detection

Use only if cant do per-link BFD or Fast UDLD and

SW Support: IOS XR 3.9

interface Bundle-Ether 1 lacp cisco enable
interface Bundle-Ether 1
lacp cisco enable

IOS XR

need sub-second detection!

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

65

Agenda  Overview  Layer 1 Failure Detection  Layer 2 Failure Detection  Layer
Agenda  Overview  Layer 1 Failure Detection  Layer 2 Failure Detection  Layer

Agenda

Overview

Layer 1 Failure Detection

Layer 2 Failure Detection

Layer 3 Failure Detection

Summary

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

66

Failure Detection at Layer 3 Layer 3 Failure Detection  In some cases, failure detection

Failure Detection at Layer 3

Layer 3 Failure Detection

In some cases, failure detection relies on checks at Layer 3

How quickly can I detect a failure (neighbor down event)?

How quickly can I detect a failure (neighbor down event)? L2 bridged network X X DWDM/X

L2 bridged network

X X DWDM/X without LoS propagation Something Something just happened a while happened! ago! Tunnels
X
X
DWDM/X without LoS
propagation
Something
Something just
happened a
while happened! ago!
Tunnels (GRE, IPsec,
etc.)

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

67

Layer 3 Failure Detection Is Layer 3 Failure Detection Tuning Necessary?  Needed when: 
Layer 3 Failure Detection Is Layer 3 Failure Detection Tuning Necessary?  Needed when: 

Layer 3 Failure Detection

Is Layer 3 Failure Detection Tuning Necessary?

Needed when:

Intermediate L2 hop over L3 link

Concerns over any protocol software failures

Concerns over unidirectional failures on point-to-point physical L3 links

unidirectional failures on point-to-point physical L3 links  May not be needed when:  Point-to-point physical

May not be needed when:

Point-to-point physical L3 links with no concerns over unidirectional failures

Enough software redundancy to account for protocol software failures

FHRPs are running in active-active mode (VPC/VPC+ in Nexus 5000 / 7000)

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

68

FHRPs with vPC / vPC+ in NX-OS Active/Active Mode  HSRP, VRRP and GBLP in
FHRPs with vPC / vPC+ in NX-OS Active/Active Mode  HSRP, VRRP and GBLP in

FHRPs with vPC / vPC+ in NX-OS

Active/Active Mode

HSRP, VRRP and GBLP in vPC / vPC+

environment operate in Active/Active mode

No additional configuration required

General best practices still apply, except:

Since running in active/active mode, aggressive timers can be relaxed

No need to manipulate priorities / preemption on different devices to achieve load-balancing

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Layer 3 Failure Detection

HSRP/VRRP “Active”: Active for shared L3 MAC HSRP/VRRP “Standby”: Active for shared L3 MAC
HSRP/VRRP
“Active”: Active
for shared L3
MAC
HSRP/VRRP
“Standby”: Active
for shared L3
MAC
“Active”: Active for shared L3 MAC HSRP/VRRP “Standby”: Active for shared L3 MAC L3 L2 Cisco

L3

“Active”: Active for shared L3 MAC HSRP/VRRP “Standby”: Active for shared L3 MAC L3 L2 Cisco

L2

Cisco Public

69

Layer 3 Failure Detection Protocol Timers Layer 3 Failure Detection  All Layer 3 protocols
Layer 3 Failure Detection Protocol Timers Layer 3 Failure Detection  All Layer 3 protocols

Layer 3 Failure Detection

Protocol Timers

Layer 3 Failure Detection

All Layer 3 protocols (FHRPs, BGP, EIGRP, OSPF etc) use HELLOs to:

Maintain adjacencies (pass protocol specific info)

Check neighbour reachability and detect failure

Hello/Keepalive and Dead/Hold timers can be tuned down, however it is

not recommended:

Each interface may have 2-3+ protocols establishing adjacency (e.g. HSRP, PIM, OSPF on SVI)

Increased supervisor CPU utilization false-positives

Configuration complexity and waste of link bandwidth

Challenges supporting ISSU / SSO

Challenges achieving sub-second detection

Having said this: works reasonably well in small & controlled environments

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

70

Bidirectional Forwarding Detection (BFD) RFC 5880 / 5881  Lightweight hello protocol designed to run
Bidirectional Forwarding Detection (BFD) RFC 5880 / 5881  Lightweight hello protocol designed to run

Bidirectional Forwarding Detection (BFD)

RFC 5880 / 5881

Lightweight hello protocol designed to run over

multiple transport protocols:

IPv4, IPv6, MPLS, TRILL

Designed for sub-second Layer 3 failure detection

Any interested client (OSPF, BGP, HSRP etc.) registers with BFD and is notified as soon as BFD detects a neighbor loss

All registered clients benefit from uniform failure

detection

Layer 3 Failure Detection

from uniform failure detection Layer 3 Failure Detection  Runs on physical, virtual and bundle interfaces

Runs on physical, virtual and bundle interfaces

Uses UDP port 3784 / 3785 (for echo)

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

71

Layer 3 Failure Detection with BFD Layer 3 Failure Detection  Bidirectional Forwarding Detection (BFD)
Layer 3 Failure Detection with BFD Layer 3 Failure Detection  Bidirectional Forwarding Detection (BFD)

Layer 3 Failure Detection with BFD

Layer 3 Failure Detection

Bidirectional Forwarding Detection (BFD) recommended Layer 3 failure

detection mechanism over lowered protocol timers

BFD general advantages:

Reduced control plane load and link bandwidth usage

Sub-second failure detection

In-flight timer negotiation

BFD platform-specific advantages:

Stateful restart, SSO and ISSU support

Protocol off-load / distributed implementation I/O module transmits / receives BFD packets

Per-link implementations with bundles

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

72

BFD Peer Establishment Timer Negotiation Layer 3 Failure Detection • No discovery – peer IP
BFD Peer Establishment Timer Negotiation Layer 3 Failure Detection • No discovery – peer IP

BFD Peer Establishment

Timer Negotiation

Layer 3 Failure Detection

No discovery peer IP provided by client!

Neighbors continuously negotiate their desired transmit and receive rates

in terms of microseconds.

The system reporting the slower rate determines the transmission rate.

reporting the slower rate determines the transmission rate. Negotiate rates Desired Receive rate = 50 ms
reporting the slower rate determines the transmission rate. Negotiate rates Desired Receive rate = 50 ms
reporting the slower rate determines the transmission rate. Negotiate rates Desired Receive rate = 50 ms
Negotiate rates
Negotiate rates
rate determines the transmission rate. Negotiate rates Desired Receive rate = 50 ms Desired Transmit rate
rate determines the transmission rate. Negotiate rates Desired Receive rate = 50 ms Desired Transmit rate
rate determines the transmission rate. Negotiate rates Desired Receive rate = 50 ms Desired Transmit rate

Desired Receive rate = 50 ms

Desired Transmit rate = 100 ms

Receive rate = 50 ms Desired Transmit rate = 100 ms Desired Receive rate = 60

Desired Receive rate = 60 ms

Desired Transmit rate = 40 ms

Green Transmits at 100ms

Orange transmits at 50ms

interface <name>

bfd interval <msec> min_rx <msec> multiplier <n>

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

73

BFD Operation Modes  Session established using asynchronous control packets  Asynchronous mode (no echo):
BFD Operation Modes  Session established using asynchronous control packets  Asynchronous mode (no echo):

BFD Operation Modes

Session established using

asynchronous control packets

Asynchronous mode (no echo):

Control packets sent at negotiated rate

Independent session

Neighbour declared dead if no packet is received for <interval * multiplier> period

Additionally, if echo is negotiated:

Control packets sent at slow rate

Self-directed echo packets sent at fast negotiated rate (min Rx interval), used for failure detection

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Layer 3 Failure Detection

Async Mode

Async Mode orange is alive green is alive
Async Mode orange is alive green is alive
Async Mode orange is alive green is alive

orange is alive

green is alive

Async Mode + Echo

Async Mode + Echo orange is alive g r e e n i s a l

orange is alive

green is alive

Async Mode + Echo orange is alive g r e e n i s a l
Async Mode + Echo orange is alive g r e e n i s a l

Cisco Public

74

BFD – OSPF Interaction Example Layer 3 Failure Detection OSPF peering OSPF X OSPF BFD
BFD – OSPF Interaction Example Layer 3 Failure Detection OSPF peering OSPF X OSPF BFD

BFD OSPF Interaction Example

Layer 3 Failure Detection

OSPF peering

OSPF X OSPF BFD notifies OSPF BFD notifies OSPF BFD Session BFD X BFD R1
OSPF
X
OSPF
BFD notifies OSPF
BFD notifies OSPF
BFD Session
BFD
X
BFD
R1
X
R2

OSPF registers with BFD

OSPF

registers

with BFD

X- Forwarding plane failure between R1 and R2 X- BFD detects failure between R1 and R2 X- OSPF adjacency reset between R1 and R2

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

76

Layer 3 Failure Detection BFD Off-load / Distributed Processing Nexus 7000 Architecture Example  Helps

Layer 3 Failure Detection

BFD Off-load / Distributed Processing

Nexus 7000 Architecture Example

Helps achieve higher BFD scale

SUP-BFD - BFD process running on

Supervisor Engine

Interfaces with LC-BFD processes Interfaces with BFD clients

LC-BFD BFD process running on CPU of each I/O module

Communicates with SUP-BFD process Generates BFD hellos (echo and async) Receives BFD hellos from peer (async)

Support for stateful restart, SSO and

ISSU

© 2013 Cisco and/or its affiliates. All rights reserved.

Supervisor Engine OSPF HSRP PIM BGP IS-IS Etc. SUP-BFD
Supervisor Engine
OSPF HSRP PIM
BGP
IS-IS
Etc.
SUP-BFD

EOBC

LC-BFD LC-BFD LC-BFD Module Inband Hardware I/O Module Hardware I/O Module Hardware I/O Module Similar
LC-BFD
LC-BFD
LC-BFD
Module
Inband
Hardware
I/O Module
Hardware
I/O Module
Hardware
I/O Module
Similar Architectures:
CRS-1
ASR 9000
12000 / XR12000
ASR 1K (from IOS XE 3.6)
7600 with ES+ I/O modules
Cisco Public
CRS-1 ASR 9000 12000 / XR12000 ASR 1K (from IOS XE 3.6) 7600 with ES+ I/O

BRKRST-2333

80

Layer 3 Failure Detection Layer 3 Fast Failure Detection and Link Bundles Challenges • Scenarios:
Layer 3 Failure Detection Layer 3 Fast Failure Detection and Link Bundles Challenges • Scenarios:

Layer 3 Failure Detection

Layer 3 Fast Failure Detection and Link Bundles

Challenges

Scenarios:

1. Layer 2 bundle between 2 SVIs

2. Layer 3 bundle

Single BFD

session

2 bundle between 2 SVIs 2. Layer 3 bundle Single BFD session • Each node uses

Each node uses a hash algorithm to distribute the load across bundle members

Chances are high that control plane packets are only carried on a single link:

‒ Can’t reliably test all links

Single bundle member malfunction can cause black holes which remain undetected

Rely on Layer 1 or Layer 2 (LACP/PaGP/UDLD/OAM) detection

Can use parallel Layer 3 links instead, load-sharing properties are often similar

Two approaches for BFD:

1. Single session

2. Per-link sessions

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

82

Layer 3 Failure Detection BFD over Bundle Members (BOB) CRS / ASR 9000 / XR

Layer 3 Failure Detection

BFD over Bundle Members (BOB)

CRS / ASR 9000 / XR 12000

Per-link Sessions
Per-link
Sessions
LC1 LC1 RP RP LC2 LC2
LC1
LC1
RP
RP
LC2
LC2

IPv4 BFD session per bundle member

IPv6 relies on IPv4 session state

Verify every member link forwarding state by establishing BFD session before its added to bundle

Master session on RP consolidates member states

and communicates with clients

Async + echo

Ethernet and POS bundles

IOS XR proprietary, close to proposed standard

bundles  IOS XR proprietary, close to proposed standard SW Support IOS XR 4.0.1 for CRS

SW Support IOS XR 4.0.1 for CRS / ASR 9000 IOS XR 4.1 for XR 12000

interface bundle-ether 1

bfd

address-family ipv4 fast-detect minimum-interval 15 multiplier 3 destination 10.11.12.13

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

83

BFD Per-link Mode Nexus 7000 LC1 LC1 SUP SUP LC2 LC2 Layer 3 Failure Detection
BFD Per-link Mode Nexus 7000 LC1 LC1 SUP SUP LC2 LC2 Layer 3 Failure Detection

BFD Per-link Mode

Nexus 7000 LC1 LC1 SUP SUP LC2 LC2
Nexus 7000
LC1
LC1
SUP
SUP
LC2
LC2

Layer 3 Failure Detection

Per-link Sessions
Per-link
Sessions

SW Support NX-OS 5.0(2a)

BFD session per port-channel member

Master session on SUP consolidates member states and communicates

with clients

LACP is required for port-channels

Async only, no echo

Layer 3 port-channel / sub-interface only

NX-OS proprietary

Minimum interval: 50 msec x 3

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

84

BFD Logical Bundles (BLB) Layer 3 Failure Detection CRS / ASR 9000 LC1 LC1 RP
BFD Logical Bundles (BLB) Layer 3 Failure Detection CRS / ASR 9000 LC1 LC1 RP

BFD Logical Bundles (BLB)

Layer 3 Failure Detection

CRS / ASR 9000 LC1 LC1 RP RP LC2 LC2
CRS / ASR 9000
LC1
LC1
RP
RP
LC2
LC2
Single Session
Single
Session

SW Support IOS XR 4.2.3 (CRS) IOS XR 4.3 (ASR9K)

Single BFD session per L3 destination address

Internal algorithm to decide which I/O module hosts BFD session

BFD packet distribution - Tx and Rx packets are polarized on one

bundle link per session

IPv4 and IPv6 sessions

Async only

Replaces BVLAN mode but backward compatible!

Verified interoperability with IOS and NX-OS single session modes

Minimum interval: 50 msec x 3 (depends on linecard)

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

86

BFD Logical Mode Nexus 3000 / 7000 LC1 LC1 SUP SUP LC2 LC2 Layer 3
BFD Logical Mode Nexus 3000 / 7000 LC1 LC1 SUP SUP LC2 LC2 Layer 3

BFD Logical Mode

Nexus 3000 / 7000 LC1 LC1 SUP SUP LC2 LC2
Nexus 3000 / 7000
LC1
LC1
SUP
SUP
LC2
LC2

Layer 3 Failure Detection

Single Session
Single
Session

SW Support NX-OS

5.0(2a)

5.0(3)U2(2)

Single BFD session per L3 destination address

Internal algorithm to determine which I/O module hosts BFD session

BFD packet distribution:

Prior to NX-OS 5.2(1) Tx packets are polarized on one bundle link per session

From NX-OS 5.2(1) Tx packets are round-robin load-balanced on all bundle links

Rx packets are always polarized on one bundle link per session

Async + echo

Verified interoperability with IOS XR BLB mode

Minimum interval is 250 msec x 3

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

87

BFD Interoperability with Bundles  Current standards do not address this!  Single session ‒
BFD Interoperability with Bundles  Current standards do not address this!  Single session ‒

BFD Interoperability with Bundles

Current standards do not address this!

Single session

Easiest to achieve with current standards and implementations

Verified interoperability between IOS XR BLB mode, IOS and NX-OS single session mode

Per-link sessions

Most recommended, but solutions are platform

proprietary

IETF draft-mmm-bfd-on-lags-03 will address interoperability!

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Layer 3 Failure Detection

BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Layer 3 Failure Detection Cisco Public
BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Layer 3 Failure Detection Cisco Public
BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Layer 3 Failure Detection Cisco Public

Cisco Public

88

BFD and FabricPath / TRILL Scenario 1 – FabricPath as BFD client  Use-case: peer
BFD and FabricPath / TRILL Scenario 1 – FabricPath as BFD client  Use-case: peer

BFD and FabricPath / TRILL

Scenario 1 FabricPath as BFD client

Use-case: peer switch path failure detection

Not supported for TRILL / FabricPath yet

Proposed standard:

draft-ietf-trill-rbridge-bfd-07

Does not cover bundle per-link IS-IS notifies BFD of Rbridge IDs

Link OAM could be adopted in future

FP / TRILL OAM in the works for service / end-to-end failure detection

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Layer 3 Failure Detection

FP FabricPath FP
FP
FabricPath
FP

Cisco Public

90

Layer 3 Failure Detection BFD and FabricPath / TRILL Point-to-Point vs. Shared Ethernet segment TRILL

Layer 3 Failure Detection

BFD and FabricPath / TRILL

Point-to-Point vs. Shared Ethernet segment

TRILL TRILL TRILL TRILL TRILL BFD FabricPath FP FP FP FP BFD
TRILL
TRILL
TRILL
TRILL
TRILL
BFD
FabricPath
FP
FP
FP
FP
BFD

TRILL specifies support shared Ethernet segment with several peers FabricPath can only peer on point-to-point links

BFD may be more needed for TRILL than FabricPath except…

BFD may be more needed for TRILL than FabricPath except… BRKRST-2333 © 2013 Cisco and/or its

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

91

Layer 3 Failure Detection FabricPath Design Perspective for Failure Detection Point-to-Point Leaf-Spine vs Data Center
Layer 3 Failure Detection FabricPath Design Perspective for Failure Detection Point-to-Point Leaf-Spine vs Data Center

Layer 3 Failure Detection

FabricPath Design Perspective for Failure Detection

Point-to-Point Leaf-Spine vs Data Center Interconnect

DCI may require BFD for FabricPath

Fabric Path Active DC1

Fabric Path Active DC2

Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine
Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine
Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine
Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine
Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine
Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine
Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine
Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine
Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine
Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine

Fat

Spine

Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine
Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine
Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine
Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine
Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine
Fabric Path Active DC1 Fabric Path Active DC2 Fat Spine

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

92

BFD and FabricPath / TRILL Layer 3 Failure Detection Scenario 2 – BFD client using

BFD and FabricPath / TRILL

Layer 3 Failure Detection

Scenario 2 BFD client using FabricPath / TRILL as transport

Routing protocol / FHRP peering over FabricPath network

SVI / sub-interface SVI SVI FabricPath FabricPath FabricPath SVI SVI / sub-interface SVI / sub-interface
SVI / sub-interface
SVI
SVI
FabricPath
FabricPath
FabricPath
SVI
SVI / sub-interface
SVI / sub-interface
BRKRST-2333
© 2013 Cisco and/or its affiliates. All rights reserved.
Cisco Public
93
Layer 3 Failure Detection BFD for Static Routes  Next-hop liveliness detection  Fail-close solution

Layer 3 Failure Detection

BFD for Static Routes

Next-hop liveliness detection

Fail-close solution (remove static route and not reinstate until BFD is up)

Must be configured on both ends

ip route 0.0.0.0/0 Vlan10 20.0.0.1 ip route 30.0.0.0/24 Vlan 20 10.0.0.1 ip route static bfd
ip route 0.0.0.0/0 Vlan10 20.0.0.1
ip route 30.0.0.0/24 Vlan 20 10.0.0.1
ip route static bfd Vlan10 20.0.0.1
ip route static bfd Vlan20 10.0.0.1
SVI 10
SVI 20
10.0.0.1
30.0.0.2
20.0.0.1
Internet
B
A
switch# sh ip route
0.0.0.0/0, ubest/mbest: 1/0
switch# sh ip route
30.0.0.0/0, ubest/mbest: 1/0
*via
20.0.0.1, Vlan 10, [1/0], static
*via
10.0.0.1, Vlan 20, [1/0], static
10, [1/0], static *via 10.0.0.1, Vlan 20, [1/0], static BRKRST-2333 © 2013 Cisco and/or its affiliates.

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

94

Layer 3 Failure Detection BFD Multi-hop RFC 5883 SW Support IOS IOS XR • BFD

Layer 3 Failure Detection

BFD Multi-hop

RFC 5883

SW Support

IOS

IOS XR

BFD sends packets with TTL=1

If go through a device that decrements TTL, need multi-hop

Use-case 1: static route or PBR through routed firewalls / NAT

Use-case 2: eBGP multi-hop

ip route 0.0.0.0/0 Vlan10 11.0.0.1 ip route static bfd Vlan10 20.0.0.1 ip route 30.0.0.0/24 Vlan
ip route 0.0.0.0/0 Vlan10 11.0.0.1
ip route static bfd Vlan10 20.0.0.1
ip route 30.0.0.0/24 Vlan 20 12.0.0.1
ip route static bfd Vlan20 10.0.0.1
SVI 10
SVI 20
10.0.0.1 11.0.0.1
12.0.0.1
20.0.0.1
30.0.0.2
Internet
A
B
switch# sh ip route
0.0.0.0/0, ubest/mbest: 1/0
switch# sh ip route
30.0.0.0/0, ubest/mbest: 1/0
*via
20.0.0.1, Vlan 10, [1/0], static
*via
10.0.0.1, Vlan 20, [1/0], static
10, [1/0], static *via 10.0.0.1, Vlan 20, [1/0], static BRKRST-2333 © 2013 Cisco and/or its affiliates.

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

95

BFD and Security Layer 3 Failure Detection  Support for SHA-1 (NX-OS / IOS) and
BFD and Security Layer 3 Failure Detection  Support for SHA-1 (NX-OS / IOS) and

BFD and Security

Layer 3 Failure Detection

BFD and Security Layer 3 Failure Detection  Support for SHA-1 (NX-OS / IOS) and MD5
 Support for SHA-1 (NX-OS / IOS) and MD5 (IOS) authentication  Disable platform hardware
 Support for SHA-1 (NX-OS / IOS) and MD5 (IOS) authentication
 Disable platform hardware security mechanisms for BFD echo to
function:
uRPF (per interface)
no [ip|ipv6] verify unicast source reachable-via [any|rx]
IDS checks (global)
no hardware ip verify address identical
IP redirects (per interface)
no ip redirects
 Open rules to allow echo packets though firewall or enable loopback as
source IP (default on IOS XR):
bfd echo-interface <a_loop_back_interface>

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

97

Layer 3 Failure Detection BFD Best Practices and Recommendations 1. If Layer 3 fast failure
Layer 3 Failure Detection BFD Best Practices and Recommendations 1. If Layer 3 fast failure

Layer 3 Failure Detection

BFD Best Practices and Recommendations

1.

If Layer 3 fast failure detection is needed, use BFD for all protocols

2.

If cant use BFD, check specific platform support for aggressive protocol timers

3.

Always plan your BFD scale and check with platform capabilities (centralized vs distributed architecture, interface and client support locally

and on peer)

4.

Use BFD echo (default on many platforms) whenever possible, check security

5.

On Layer 3 port-channels, use per-link mode and prefer that over echo

6.

BFD single-hop for BGP make sure neighbor update source is a directly connected interface

7.

Make sure BFD packets are prioritized appropriately (Marked with IP precedence 6 / DSCP CS6 / CoS 6, can also be classified by udp 3784+3785)

8.

Make sure neighbours support same BFD version (ver 0 / 1)

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

98

Agenda  Overview  Layer 1 Failure Detection  Layer 2 Failure Detection  Layer
Agenda  Overview  Layer 1 Failure Detection  Layer 2 Failure Detection  Layer

Agenda

Overview

Layer 1 Failure Detection

Layer 2 Failure Detection

Layer 3 Failure Detection

Summary

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

99

Protocol Comparison

Key Decision Criteria

Summary

Protocol Comparison Key Decision Criteria Summary For Your Reference BFD UDLD Link OAM OSI Layer L3

For Your Reference

BFD

UDLD

Link OAM

OSI Layer

L3

L2

L2

Standard

IETF RFC 5880 / 5881

(with some Cisco enhancements)

Cisco proprietary

IEEE 802.3ah

(with some Cisco enhancements)

Failures Detected Failure Reaction
Failures
Detected
Failure
Reaction

Uni-directional soft failures

Bidirectional soft failures

Uni-directional soft failures

Bidirectional soft failures

Mis-wiring Detection

Uni-directional soft failures

Bidirectional soft failures

Mis-wiring Detection (IOS XR)

Link Degradation

Notify peer and clients

Remove link from bundle (IOS

XR, IETF standard in future)

BFD dampening (IOS XR)

Error-disable (depending on mode)

Notify peer

Error-disable (depending on error type and platform)

Error-block

Ethernet Failure Detection (IOS XR)

Bundles and Virtual Interfaces

Bundle logical, bundle per-link, SVI, sub-interface

Single L2 links

Single L2 links

Message Interval and Timeout

Configurable, exchanged and negotiated

Timeout generally in msec

Configurable and exchanged

Timeout generally in 20+ seconds

Configurable, not exchanged

Timeout generally in 2+ seconds

ISSU

Timer inflation

Flush message sent (IOS XR)

No (can be extended in future)

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

100

Summary of Network Scenarios and Recommendations  Classical Ethernet Layer 2  Single p2p link
Summary of Network Scenarios and Recommendations  Classical Ethernet Layer 2  Single p2p link

Summary of Network Scenarios and Recommendations

Classical Ethernet Layer 2

Single p2p link

Bundle

FabricPath / TRILL

Single p2p link Bundle

Layer 3

Single p2p link Bundle SVI on top of Classical Ethernet SVI on top of FabricPath / TRILL

of Classical Ethernet  SVI on top of FabricPath / TRILL SVI SVI SVI SVI BRKRST-2333
of Classical Ethernet  SVI on top of FabricPath / TRILL SVI SVI SVI SVI BRKRST-2333
SVI SVI SVI SVI
SVI
SVI
SVI
SVI

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Summary

101

Summary

Summary

 Fast Failure Detection is Key to Fast Convergence  Business requirements and SLAs to
 Fast Failure Detection is Key to Fast Convergence
 Business requirements and SLAs to drive technology and protocol choice
 One protocol may be enough – keep it simple!
 Evolving field with IETF / IEEE / MEF and Cisco innovations
 Design your network to take advantage of best practices

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

102

BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 103
BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 103

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

BRKRST-2333 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 103

Cisco Public

103

Summary Related Cisco Live London 2013 events Session-ID Session Name BRKIPM-2265 Deploying BGP Fast Convergence
Summary Related Cisco Live London 2013 events Session-ID Session Name BRKIPM-2265 Deploying BGP Fast Convergence

Summary

Related Cisco Live London 2013 events

Session-ID

Session Name

BRKIPM-2265

Deploying BGP Fast Convergence / BGP PIC

BRKCRS-2041

Highly Available Wide Area Network Design

Related Past Cisco Live events

Session-ID

Session Name

TECRST-3190

IP Routing Fast Convergence

BRKNMS-2202

Ethernet OAM Technical Overview and Deployment Scenarios

BRKRST-2032

Highly Available Wide Area Network Design

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

104

Call to Action • Visit the Cisco Campus at the World of Solutions to experience
Call to Action • Visit the Cisco Campus at the World of Solutions to experience

Call to Action

Visit the Cisco Campus at the World of Solutions

to experience Cisco innovations in action

Get hands-on experience attending one of the Walk-in Labs

Schedule face to face meeting with one of Cisco’s engineers at the Meet the Engineer center

Discuss your project’s challenges at the Technical Solutions Clinics

BRKRST-2333

© 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

105