Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
NANOG18
Robert Raszuk - IOS Engineering raszuk@cisco.com
Location of files
This presentation, handouts & demo are located at: ftp://ftpeng.cisco.com/rraszuk/nanog18
RR_MPLS_TE_Nanog.pdf - this presentation
Reduce the overall cost of operations by more efficient use of bandwidth resources
by preventing a situation where some parts of a service provider network are over-utilized (congested), while other parts under-utilized
MPLS and Traffic Eng allows for one to spread the traffic and distribute it across the entire network infrastructure like magnetic fields between poles while also providing the redundancy required for high availability service.
(Eric Dean)
No Traffic Engineering
analogy
to Human Drivers
Traffic Engineering
analogy
to Auto Pilot
R1
Construct routes for traffic streams within a service provider in such a way, as to avoids causing some parts of the providers network to be over-utilized, while others parts remain underutilized
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
L3
L3
L3 L3
Physical
Logical
Routing at layer 2 (ATM or FR) is used for traffic engineering Analogy to direct highways between SFO-LAX & SAN-SMF. Nobody enters the highway in between.
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
R1
IGP routing scalability issue for meshes Additional bandwidth overhead (cell tax)
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
10
R1
11
R1
12
13
14
TE - key mechanisms
15
TE - key mechanisms
16
17
TE basics
Traffic within a Service Provider as a collection of POP to POP traffic trunks with known bandwidth and policy requirements
TE provides traffic trunk routing that meets the goal of Traffic Engineering
via a combination of on-line and offline procedures
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
18
Requirements:
Differentiating traffic trunks:
large, critical traffic trunks must be well routed in preference to other trunks
Handling failures:
automated re-routing in the presence of failures
Pre-configured paths:
for use in conjunction with the off-line route computation procedures
19
Requirements (cont.)
Constraining sub-optimality:
should re-optimize on new/restored bandwidth
in a non-disruptive fashion - maintain the existing route until the new route is established, without any double counting
Ability to spread traffic trunk across multiple Label Switched Paths (LSPs)
could provide more efficient use of networking resources
20
Design Constraints
Constrained to a single routing domain
initially constrained to a single area
21
Trunks Attributes
22
Trunk Attributes
Priorities
setup priority: priority for taking a resource
23
Trunk attributes
Ordered list of Path Options
possible administratively specified paths (via an off-line central server) - {explicit list} Constrained-based Dynamically computed paths based on combo of Bw and policies
Re-optimization
each path option is enabled or not for reoptimization, interval given in seconds. Max 1 week (7*24*3600), Disable 0, Def 1h.
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
24
Trunk Attributes
Resource class affinity (Policy)
supports the ability to include/exclude certain links for certain traffic trunks based on a user-defined Policy Tunnel is characterized by a
32-bit resource-class affinity bit string 32-bit resource-class mask (0= dont care, I care)
Link is characterized by a 32-bit resource-class attribute string Default-value of tunnel/link bits is 0 Default value of the tunnel mask = 0x0000FFFF
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
25
Trunk A to B:
tunnel = 0000, t-mask = 0011
26
Setting a link bit in the lower half drives all tunnels off the link, except those specially configured
Trunk A to B:
tunnel = 0000, t-mask = 0011
27
A specific tunnel can then be configured to allow such links by clearing the bit in its affinity attribute mask
Trunk A to B:
tunnel = 0000, t-mask = 0001
28
A specific tunnel can be restricted to only such links by instead turning on the bit in its affinity attribute bits
Trunk A to B:
tunnel = 0010, t-mask = 0011
No path is possible
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
29
Setting a link bit in the upper half drives has no immediate effect
Trunk A to B:
tunnel = 0000, t-mask = 0011
30
A specific tunnel can be driven off the link by setting the bit in its mask
Trunk A to B:
tunnel = 0000, t-mask = 0111
31
No path is possible
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
32
Trunk Attribute
Resource Class Affinity (Policy)
33
34
Link Attributes
TE-specific link metric
35
draft-li-mpls-igp-te-00.txt
36
Per-Priority Available BW
D T=0 Link L, BW=100 D advertises: AB(0)=100== AB(7)=100 AB(i) = Available Bandwidth at priority I
T=1 T=2
Link L, BW=100
T=3
D T=4
Link L, BW=100
Information Distribution
38
Information Distribution
39
Periodic Timer
Periodically, a node checks if the current TE status is the same as the one lastly broadcasted.
If different, it floods its updated TE Links status
40
Significant Change
Each time a threshold is crossed, an update is sent
Update
50%
Update
41
Due to the threshold scheme, it is possible that one node thinks he can signal an LSP tunnel via node Z while in fact, Z does not have the required resources
When Z receives the Resv message and refuses the LSP tunnel, it broadcasts an update of its status
42
Constrained-based Computation
43
Constrained-Based Routing
In general, path computation for an LSP may seek to satisfy a set of requirements associated with the LSP, taking into account a set of constraints imposed by administrative policies and the prevailing state of the network -- which usually relates to topology data and resource availability. Computation of an engineered path that satisfies an arbitrary set of constraints is referred to as "constraint based routing.
Draft-li-mpls-igp-te-00.txt
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
44
Path Computation
On demand by the trunks head-end:
for a new trunk
for an existing trunk whose (current) LSP failed for an existing trunk when doing reoptimization
45
Path Computation
Input:
configured attributes of traffic trunks originated at this router attributes associated with resources
46
Path Computation
Prune links if:
insufficient resources (e.g., bandwidth) violates policy constraints
47
Path Computation
Output:
explicit route - expressed as a sequence of router IP addresses interface addresses for numbered links loopback address for unnumbered links used as an input to the path setup component
48
Example
C 1000 BW(3)=60 BW(3)=80 0100 0000 BW(3)=20 0000 BW(3)=80
0000 BW(3)=50
D 1000 BW(3)=50
E 0010 BW(3)=70
Tunnels request:
Priority 3, BW = 30 units, Policy string: 0000, mask: 0011
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
49
50
MPLS Labels
Two types of MPLS Labels:
RSVP CR-LDP
MP-BGP PIM
51
52
53
diagnostics on LSP-tunnels
the concept of nodal abstraction preemption options that are administratively controllable draft-ietf-mpls-rsvp-lsp-tunnel-0X.txt
54
New C-Types are also assigned for the SESSION, SENDER_TEMPLATE, FILTER_SPEC, FLOWSPEC objects.
All new objects are optional with respect to RSVP (RFC2205).
The LABEL_REQUEST and LABEL objects are mandatory with respect to MPLS LSP signalisation specification.
55
LSP Setup
Initiated at the head-end of a trunk Uses RSVP (with extensions) to establish Label Switched Paths (LSPs) for traffic trunks
56
R3
R4 R2 R1
Label 49 Label 17 Pop
R5
Label 32
R6
R7
Label 22
Setup: Path (ERO = R1->R2->R6->R7->R4->R9) Reply: Resv communicates labels and reserves bandwidth on each link
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
57
Path: Common_Header Session(R3-lo0, 0, R1-lo0) PHOP(R1-2) Label_Request(IP) ERO (R2-1, R3-1) Session_Attribute (S(3), H(3), 0x04) Sender_Template(R1-lo0, 00) Sender_Tspec(2Mbps) Record_Route(R1-2)
58
Path State: Session(R3-lo0, 0, R1-lo0) PHOP(R1-2) Label_Request(IP) ERO (R2-1, R3-1) Session_Attribute (S(3), H(3), 0x04) Sender_Template(R1-lo0, 00) Sender_Tspec(2Mbps) Record_Route (R1-2)
59
Path: Common_Header Session(R3-lo0, 0, R1-lo0) PHOP(R2-2) Label_Request(IP) ERO (R3-1) Session_Attribute (S(3), H(3), 0x04) Sender_Template(R1-lo0, 00) Sender_Tspec(2Mbps) Record_Route (R1-2, R2-2)
60
Path State: Session(R3-lo0, 0, R1-lo0) PHOP(R2-2) Label_Request(IP) ERO () Session_Attribute (S(3), H(3), 0x04) Sender_Template(R1-lo0, 00) Sender_Tspec(2Mbps) Record_Route (R1-2, R2-2, R3-1)
61
Resv: Common_Header Session(R3-lo0, 0, R1-lo0) PHOP(R3-1) Style=SE FlowSpec(2Mbps) Sender_Template(R1-lo0, 00) Label=POP Record_Route(R3-1)
62
Resv State Session(R3-lo0, 0, R1-lo0) PHOP(R3-1) Style=SE FlowSpec (2Mbps) Sender_Template(R1-lo0, 00) OutLabel=POP IntLabel=5 Record_Route(R3-1)
63
Resv: Common_Header Session(R3-lo0, 0, R1-lo0) PHOP(R2-1) Style=SE FlowSpec (2Mbps) Sender_Template(R1-lo0, 00) Label=5 Record_Route(R2-1, R3-1)
64
Resv state: Session(R3-lo0, 0, R1-lo0) PHOP(R2-1) Style=SE FlowSpec (2Mbps) Sender_Template(R1-lo0, 00) Label=5 Record_Route(R1-2, R2-1, R3-1)
65
66
67
Path Monitoring
detects loops
copy of RRO to ERO allows for route pinning
68
Path Re-Optimization
69
R3
R4 R2 R1
49 17 Pop
R5
32
R6
R7
22
Current Path (ERO = R1->R2->R6->R7->R4->R9) New Path (ERO = R1->R2->R3->R4->R9) - shared with Current Path Until R9 gets new Path Message, current Resv is refreshed
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
70
R3
R4 R2
89
26 Pop Pop
R1
38 49 17
R5
32
R6
R7
22
Resv: allocates labels for both paths Reserves bandwidth once per link PathTear can then be sent to remove old path (and release resources)
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
71
3 01
Resource Sharing
72
Path: Common_Header Session(R3-lo0, 0, R1-lo0) PHOP(R1-2) Label_Request(IP) ERO (R2-1, ,R3-3) Session_Attribute (S(3), H(3), 0x04) Sender_Template(R1-lo0, 01) Sender_Tspec(3Mbps) Record_Route(R1-2)
73
Path State: Session(R3-lo0, 0, R1-lo0) PHOP(R1-2) Label_Request(IP) ERO (R2-1, ,R3-3) Session_Attribute (S(3), H(3), 0x04) Sender_Template(R1-lo0, 01) Sender_Tspec(3Mbps) Record_Route (R1-2)
74
75
RSVP: Common_Header Session(R3-lo0, 0, R1-lo0) PHOP(R3-3) Style=SE FlowSpec(3Mbps) Sender_Template(R1-lo0, 01) Label=POP Record_Route(R3-3)
76
77
RSVP: Common_Header Session(R3-lo0, 0, R1-lo0) PHOP(R2-1) Style=SE FlowSpec (3Mbps) Sender_Template(R1-lo0, 01) Label=6 Record_Route(R2-1, , R3-3) Sender_Template(R1-lo0, 00) Label=5 Record_Route(R2-1, R3-1)
78
RSVP state: Session(R3-lo0, 0, R1-lo0) PHOP(R2-1) Style=SE FlowSpec Sender_Template(R1-lo0, 01) Label=6 Record_Route(R2-1, , R3-3) Sender_Template(R1-lo0, 00) Label=5 Record_Route(R2-1, R3-1)
79
Fast Restoration
80
Path Protection
81
Path Protection
Step1: link failure detection
O(depends on L2/L1)
82
Path Protection
check if another oif available (if loose ero) if not, clear path state and send tear to head-end Step2: Either stepA or stepB alarms the head-end Step3: Re-optimization
dijkstra computation: O(0.5)ms per node (rule of thumb)
RSVP signalisation time to instal rerouted tunnel convergence in the order of several seconds (at least).
83
84
85
Objective
FRR allows for temporarily routing around a failed link or node while the head-end may reoptimize the entire LSP
rerouting under 50ms
Session_Attributes Flag 0x01 allows the use of Link Protection for the signalled LSP
87
R9
R5
Pop
R6
R7
22
88
Pop
R1
14
37
R6
R7
R5
89
R6
R7
On failure of link from R2 -> R4, R2 simply changes outgoing Label Stack from 14 to <17, 14>
90
R2
Push 37 R1
R5
R6 Swap 17->22
R7 Pop 22
Label Stack:
R1 37
R2 17 14
R6 22 14
R7 14
R4
R9 None
91
92
V1 Constrain
We protect the facility (link), not individual LSPs
scalability vs granularity
No node resilience
Static backup tunnel
93
Terminology
R8 R4 R2 R5
R9
R1
R6
R7
LSP: end-to-end tunnel onto which data normally flows (eg R1 to R9) BackUp tunnel: temporary route to take in the event of a failure
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
94
Terminology
Link Protection
In the event of a link failure, an LSP is rerouted to the next-hop using a preconfigured backup tunnel
95
LSPs are unidirectional, so the same protection should be enable for the opposite direction if reverse LSP is conf.
96
97
98
POP
R4
R9
R6
R7
For the blue LSP, R4 bound a global label of 14 Any MPLS frame received by R4, with label 14, will be switched onto the link to R9 with a POP, whatever the incoming interface
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
99
Expected call to net_cstate(idb, UP/DOWN) identifying the DOWN state of the protected int to start our protection action.
100
Path state
R4
R9
R2 R1 R5
R6
R7
101
R2
R3
R4
102
LSP reoptimization
103
In case of very fast up-down-up, the LSP will be put on the backup tunnel and will stay there as the IGP will not have originated a new LSP/LSA
a router waits a certain time before originating a new LSP/LSA after a topological change
104
Resv state
R2 R1
BackUP tunnel
R4 R5
Resv
R6 R7
R2s Path State has been informed that the Resv might arrive over a different intf as the one used by the Path message
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
105
In order to optimize the bandwdith usage, backup tunnels might be configured with 0kbps
no non-working bandwdith as in SDH!
Although usually the backbone is though as being congestion-free, during rerouting some local congestion might occur
Use diffserv to handle this short-term congestion
Use LSP reoptimization to handle the long-term congestion
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
106
Use WANDL (loaded with both L3 and L1/2 topologies) to compute the best paths for backup tunnels
Download this as static backup tunnels to the routers
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
107
108
Overview
R8
R9
R4 R3
R2
R1
R5
R7 R6
109
110
Then
R2 checks if it already has a tunnel to R4 If not, R2 builds a backup tunnel to R4 (currently just like in link protection - manual explicit setup). R2 sends a Path onto the tunnel with Session (R9, 1, R1), Sender (R2, 1), Session Attribute (0x01 OFF, 0x02 ON) and PHOP R2
111
112
R4s states are refreshed by the secondary path messages (over the backup tunnels)
ERO of the original path is adjusted at R2 NHOP is modified in R2 (from R3 to R4) PHOP is modified in R4 (from R3 to R2)
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
113
RESV is being sent back from R4 to R2 directly If R3 is still active and just the R2-R3 link failed R4 needs to ignore & drop any Tear-Down msg R3 would be sending after the termination of reception of path refreshes from R2.
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
114
A node may fail while the link is still up A nodes linecard processes might survive, a main process failure (freeze of the RP process)
115
A possible solution
RP RP
LC ... LC
LC
116
117
Enhancement to SPF
During SPF each new node found is moved from a TENTative list to PATHS list. Now the first-hop is being determined via:
A. Check if there is any TE tunnel terminating at this node from the current router and if so do the metric check B. If there is no TE tunnel and the node is directly connected use the first-hop from adj database C. In non of the above applies the first-hop is copied from the parent of this new node.
118
A. Relative +/- X
B. Absolute Y The default is relative metric of 0. Example: Metric of native IP path to the found node = 50 1. Tunnel with relative metric of -10 => 2. Tunnel with relative metric of +10 => 40 60
10
119
If the metric of the found TE tunnel at this node is higher then the metric for other tunnels or native IGP path this tunnel is not installed as next hop
If the metric of the found TE tunnel is equal to other TE tunnels the tunnel is added to the existing nexthops If the metric of the found TE tunnel is lower then the metric of other TE tunnels or native IGP the tunnel replaces them as the only next-hop.
120
121
Auto-Bandwidth
Global command:
122
Auto-Bandwidth
Per tunnel command:
Every Y minutes, update the BW constraint of the tunnel with the maximum of:
the largest 5-min values sampled during the last Y minutes (Def Y = 24 * 3600sec) - 24h a configured maximum value (config-if)# tunnel mpls traffic-eng auto-bw
{frequency <seconds>} {max-bw <kbs>}
if the new Bw is not available, the old one is maintained (the new BW is signalled via a 2nd tunnel to follow make before break model)
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
123
Example
124
Verbatim
Applies to explicitly routed LSPs Disable any check against TE/IGP database of the head end RSVP still check BW (and policy when this will be in Path) hop by hop Application: manual TE through multi-area IGP
CLI: tunnel mpls traffic-eng path-option verbatim
125
In-Progress
Allows an end-head to account for bw consumed by tunnels that it has just signalled and for whom the IGP LSA/LSP update has not reflected the available bandwdith
126
Example
All tunnels require 45 units of BW In-progress counters reset upon new LSA/LSP reception In-progress counter decremented upon receipt of path-error
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
127
Benefits
Speed-up the installation of tunnels as it avoids spending time trying not working solutions Allows for better load-balancing
igp metric then max(min(path-bw)!
128
Under/Overbook
ML: Maximum link bandwidth:
This sub-TLV contains the maximum bandwidth that can be used on this link in this direction (from the system originating the LSP to its neighbors). This is useful for traffic engineering.
129
Under/Overbook
...
Physical T1
s0
...
130
Standby
Current solution
Tu1: bw1 A B
Tu2: bw2
Tu3: bw3 Tu4: bw4
Solution:
4 tunnels from A to B:
Tu1s relative metric: -3
131
132
133
MPLS TE can operate simultaneously (and orthogonally) with MPLS Diff-Serv All Precedence/DSCP packets follow the same TE tunnels
Diff-Serv provides selective discard (via WRED), and selective scheduling (via WFQ)
134
135
In order to optimize the bandwdith usage, backup tunnels might be configured with 0kbps
no non-working bandwdith as in SDH!
Although usually the backbone is though as being congestion-free, during rerouting some local congestion might occur
Use diffserv to handle this short-term congestion
Use LSP reoptimization to handle the long-term congestion
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
136
137
Two proposed signaling mechanisms for MPLS traffic engineering are being considered by the IETFs MPLS work group
RSVP (Cisco and a number of Gigabit router startups (Avici, Argon, Ironbridge, Juniper, and Torrent)) CR-LDP (Ericsson, Ennovate, GDC, Nortel)
138
Why RSVP ?
What is needed: An IP signalling Protocol!
ability to establish and maintain Label Switched Path along an explicit route
139
140
RFC2205: provides a general facility for creating and maintaining distributed reservation state across a mesh of multicast and unicast delivery paths TE: use as a general facility for creating and maintaining distributed forwarding & reservation state across a mesh of delivery paths
141
TE: transfer and manipulate explicit route and label control parameters as opaque data pass explicit route parameter to the appropriate routing module, and label parameter to the MPLS module
142
143
144
State applies to a collection of flows (i.e. a traffic trunk), rather than to a single (micro) flow RSVP sessions are used between routers, not hosts
145
146
TE/RSVP Scalability
With basic RSVP (RFC2205), 10000 RRR LSP tunnels flowing through a 75x0 or 12000 is not a problem Already Deployed on a number of Tier-1 ISP backbones
http://www.nanog.org/mtg-9905/hanna.html
Ship with 12.0(5)S
147
Conclusion
Using RSVP as MPLS/TE signalling protocol is the natural and consistent choice
148
Summary
149
Traffic Eng
Could be used for other applications as well Shipping and deployed in production
NANOG18 - Robert Raszuk
2000, Cisco Systems, Inc.
150
Presentation_ID
151