A Distributed and Scalable

A Distributed and Scalable Routing Table Manager for the Next Generation of IP Routers
Kim-Khoa Nguyen, Brigitte Jaumard, and Anjali Agarwal, Concordia University Abstract
In recent years, the exponential growth of Internet users with increased bandwidth requirements has led to the emergence of the next generation of IP routers. Distributed architecture is one of the promising trends providing petabit routers with a large switching capacity and high-speed interfaces. Distributed routers are designed with an optical switch fabric interconnecting line and control cards. Computing and memory resources are available on both control and line cards to perform routing and forwarding tasks. This new hardware architecture is not efficiently utilized by the traditional software models where a single control card is responsible for all routing and management operations. The routing table manager plays an extremely critical role by managing routing information and in particular, a forwarding information table. This article presents a distributed architecture set up around a distributed and scalable routing table manager. This architecture also comes provides improvements in robustness and resiliency. The proposed architecture is based on a sharing mechanism between control and line cards and is able to meet the scalability requirements for route computations, notifications, and advertisements. A comparative scalability evaluation is made between distributed and centralized architectures in terms of required memory and computing resources.
he explosive growth of the Internet has resulted in very stringent scalability requirements on routers and other network systems. Until very recently, core network operators met these requirements by adding more routers usually, mid-size routers in their networks. This approach, referred to as router cluster, imposes extra cost for management and maintenance, particularly when the number of connections grows very quickly. We present a more cost-effective approach, where a cluster of mid-size routers is replaced by a next-generation router with a very large switching capacity. One of the main challenges for this new approach is that router architectures have not evolved much recently with respect to the increased traffic demand. For example, the throughput per single chassis of the recent Cisco CRS did not increase compared to the previous 12000 router series (both have 1.2 Tbps). The reason for this is that these routers are designed with only one powerful controller card, using the router- cluster approach rather than more powerful routers. In recent routers (e.g., the Cisco CRS), the throughput is increased only if more chassis are added with additional control cards. In fact, very few control tasks are offloaded to the line cards, mostly for forwarding information table (FIT) management. Because all line cards share a single control card, current architectures are not scalable. Thus, an innovative solution that is based on the task sharing between control and line cards is required to increase the scalability of routers. The solution enables router modules to be added as capacity requirements increase, and it guarantees
equal performance of the routing software components regardless of the number of physical interfaces, router adjacencies, and IP routes. The resiliency also is improved by the redundancy and replication of critical functions over multiple modules. The availability is provided with the modular structure that limits the impact of faults in individual modules. With a modular design, routing software components can run independently on the same or separate central processing units (CPUs) and interact with each other, regardless of their respective physical location. This approach produces a robust network that is not vendor-specific and can use modules developed by different manufacturers. The routing table manager (RTM) [1] is one of the main software component of the router. It links the different routing protocol modules. In core routers, RTM plays an important role by managing all of the best routes coming from various sources. Possible sources are the different routing protocols, such as Open Shortest Path First (OSPF), Intermediate System to Intermediate System (IS-IS), or Border Gateway Protocol (BGP). The RTM also gathers information from different sources such as the static routes configured by the system user and the dynamic routes. Based on all of the route information, the RTM module computes the overall best routes. The RTM also is responsible for redistributing routes coming from a routing protocol to other routing protocols. In addition, it also can filter route information that is being redistributed. With the ever-increasing number of interconnections
0890-8044/08/$25.00 2008 IEEE

Authorized licensed use limited to: Techno India. Downloaded on July 10, 2009 at 07:12 from IEEE Xplore. Restrictions apply.
IEEE Network March/April 2008
between routers, the size of the routing table managed by the RTM module tends to increase rapidly. This requires routers to have more CPU cycles, more powerful accompanying hardware resources, and an increased memory size to contain all available routing information. Until recently, the only valid solution to support the increasing Internet traffic was to periodically upgrade the router control card on which the RTM module was running or to replace the whole router with a new one, having more powerful hardware resources (e.g., CPU and increased memory size), demanding some service interruptions. An alternate solution is to implement distributed and scalable routers [2]. In this article, we describe the benefits and limitations of a distributed router design and propose a distributed architecture for the RTM. We first review the hardware architecture of next-generation routers and provide an overview of the functionality of the RTM. The critical issues for a centralized RTM architecture are then discussed, leading to a proposal of a completely distributed architecture for the RTM. We then present a comparative scalability evaluation of the proposed distributed architecture with a centralized one, in terms of required memory and computing resources.
Next-Generation Routers and the Routing Table Manager

The first and second generations of IP routers were basically made of a single central processor running all routing protocol modules and with multiple line cards interconnected through a shared bus. Their performance depends on the throughput of the shared bus and on the speed and capabilities of the central processor; and therefore, they are not able to meet todays bandwidth requirements. The third generation or the current generation routers were introduced to solve bottlenecks of the second generation [3]. The switch fabric replaces the shared bus: it is a crossbar connecting multiple cards together, thus providing ample bandwidth for transmitting packets simultaneously among line cards. These routers have a set of line cards, a set of forwarding engines, and a single control card that are interconnected through a switch fabric. The header of an incoming packet entering a line card interface is sent through the switch fabric to the appropriate forwarding engine. The forwarding engine determines to which outgoing interface the packet should be sent. This information is sent back to the line card through the switch fabric, which forwards the packet to the egress line card. Other functionality, such as resource reservation and maintenance of the routing table, are handled by the modules running on the control card. The architecture for next-generation routers is essentially switch-based. However, the switching capacity is enhanced up to petabits per second [4]. The hardware architecture of these routers is based on three types of cards (Fig. 1a): The line card provides multiple gigabit interfaces. The ingress network processor (iNP) is programmable with parallel processing capability. It does packet forwarding, classification, and flow policing. The iNP contains a FIT that is used to determine the destination of data packets. Control packets can be filtered and forwarded to the CPU for processing. The ingress traffic manager (iTM) forwards the packets from the iNP to the switch fabric while maintaining traffic load balancing using traffic access control, buffer management, and packet scheduling mechanisms. Data packets travel through the switch fabric to the egress line card, and control packets are sent to the control card. The egress traffic manager (eTM) receives packets from the switch fabric plane directly connect-
ed to its line card, performs packet re-ordering, and controls congestion. The egress network processor (eNP) sends out the packets with per-egress-port output scheduling mechanisms. The CPU is multi-purpose and able to perform control plane functions with the help of the built-in memory. The control card or route processor is designed to run the main routing protocol modules (i.e., BGP, OSPF, IS-IS, and multiprotocol label switching [MPLS]), the RTM, and the command line interface (CLI). The control card architecture is similar to a line card, but its processing power and storage capabilities are far superior, and there is no interface to external devices. The control card has one iTM chip and one eTM chip to provide interfaces between the local processor and the switch fabric planes. They are responsible for managing flows of control packets. The control and line cards are interconnected by a scalable switch fabric that is distributed into identical and independent switching planes. The switch fabric is made of so-called matrix cards that provide data switching functions. Per-flow scheduling, path balancing, and congestion management within the switch fabric are achieved by the fabric traffic manager chipsets integrated on the matrix card. Each line card or control card has an ingress port and an egress port connecting to a matrix card. Each switching plane is made of the same number of matrix cards. Several topologies may be used to connect the matrix cards. The Benes topology [4] is recommended, due to its non-blocking characteristics. One of the most important software components of the router is the RTM. It builds the FIT from the routing database that stores all routes learned by different routing and signaling protocols, including the best and the non-best routes. For a set of routes having the same destination prefix, only one route is deemed the best, which is based on a pre-configured preference value assigned to each routing protocol. For example, if static routes have a high preference value and OSPF routes have a low preference value, and if a route entry having the same destination prefix was recorded by each protocol, the static route is considered to be the best route and is added to the FIT (Fig. 1b). However, some services, such as Resource Reservation Protocol (RSVP), can use non-best routes to forward data with respect to user-defined parameters. Therefore, the RTM must keep all routes, allow users or requested modules to access the route database, and make routing decisions based on request next hop and explicit route resolution; notify any change in the routing tables generated by the underlying routing protocols (e.g., Routing Information protocol [RIP], OSPF, IS-IS, BGP); alert the routing protocols about the current state of physical links, such as the up/down status, available bandwidth, and so on to manage associated link states and indirectly route status; communicate with a policy manager module for making route filtering decisions for routing protocols (e.g., OSPF or BGP); and alert the routing protocols about resource reservation failures. Another requirement for the RTM is to contain a very large number of routes, such as the ever increasing BGP routes. Because router vendors do not increase the memory of the main control card by much, Internet service providers (ISPs) are very careful about the amount of information routers must store.
Toward a Distributed RTM Architecture

In legacy routers, although a data packet is transmitted over line cards, forwarding decisions are made by the control card or separated forwarding engines. Consequently, data transmission is interrupted if the control card fails. The integration of the forwarding table on line cards as in the recent router

ISIS
OSPF
CPU and memory
MPLS
BGP
RTM Control card
IP
FIT eTM
iTM
Switch fabric Line card 1 iTM CPU IP iNP FIT Interface specific chipset Memory FIT eNP iNP FIT Interface specific chipset eTM iTM CPU IP Memory FIT eNP Line card 2 eTM
Data packet Legend:
Control packet Data flow
Control packet Control messages (a)
Data packet
Route update
CLI
30.0.0.0 is directly connected, Serial 0
Control card Rotary database
RTM OSPF
30.0.0.0/24 via 20.2.1.1, Ethernet 0
0 30.0.0.0/24 via 20.2.1.1, 00:00:03, Ethernet 0 S 30.0.0.0 is directly connected, Serial 0
Switch fabric Line card Forwarding engine (Network processor-NP) Forwarding engine (Network processor-NP) Line card
Forwarding information table (FIT)

S 30.0.0.0 is directly connected, Serial 0
Forwarding information table (FIT)

S 30.0.0.0 is directly connected, Serial 0
Legend: O: OSPF (b)
S: Static
I Figure 1. Next generation routers and the RTM: a) architecture of the next generation router; b) routing database update and selection of the best routes by the RTM.
products [5], enables them to perform non-stop forwarding; in other words, a data packet can be transmitted regardless of the control card failures. Making a local copy of the forwarding table on a line card is easy, and the control plane need not be changed. A non-stop routing system [6] enables the control card to self-recover from a failure without disrupting the routing protocol interaction with other routers and without dropping any packets. Having back-up control cards is not enough to achieve this goal due to the large time gap to switch over.
In addition, the control card is usually costly. A possible solution is based on protocol extensions that impose a peer router to wait for the failed one to be restarted [7]. However, it may not be supported by all legacy routers. A better solution for the next-generation router is to design a more scalable control plane, which we address in this article with a specific architecture for the RTM. In recent router products, the RTM module is neither distributed nor scalable (Fig. 2a). Legacy routers consist princi-
8
Control card ISIS control ISISRTM OSPF control OSPFRTM BGP control BGPRTM MPLS control MPLSRTM
Control card
ISIS
OSPF
BGP
MPLS
HW
HW
RTM
IP G-RTM FIT FIT
IP
Switch fabric Line card Line card HW FIT
Switch fabric
Forwarding engine
ISIS control
OSPF control
BGP control
MPLS control
FIT HW
IP
(a)
(b)
I Figure 2. Current RTM architectures: a) RTM in a non-distributed routing architecture; b) RTM distributed on the control card.
pally of an RTM located on a single control card [1] that processes information from all routing protocols and networks to which the router connects. There is no RTM module running on any line card. Resiliency can be enhanced at the control card level by a back-up instance that takes over the primary one in case of failures. One of the primary requirements for next generation routers is scalability [4]. In general, a router installed in a core network must exchange control messages with hundreds of peers. Due to the growing bandwidth, a large number of line cards will be added to the router platform. This imposes several challenges to the operation of routing protocols. Current generation routers provide terabit throughput, whereas next-generation routers will reach petabit throughput. Such a system will be able to support about one hundred thousand routes with high flapping rates, which exceeds the capacity of a single control card. Therefore, task sharing should be taken into account to make the system more scalable. One of the possible solutions is to have additional control cards [9]. Each control card runs an instance of a routing protocol module or manages certain parts of the global routing table. However, the control cards are often costly, and the processing capabilities are not improved much due to the quantity and delay of messages exchanged between the different control and line cards in a system. Some recent products also have been introduced with standalone modules responsible for each protocol. Each protocol module is attached with a smaller RTM, denoted by Interior Gateway Protocol (IGP)-RTM or Exterior Gateway Protocol (EGP)-RTM, managing the routes coming from different domains of the routing protocols as shown in Fig. 2b [9]. The global RTM (G-RTM) collects best routes from IGP/EGP RTMs to build the FIT. When a routing protocol receives a link-state notification message through the corresponding signaling component on a line card, the control component
located on the control card re-computes the best routes and updates its local IGP-RTM. The G-RTM also is notified through the link with the IGP-RTM. The overall best routes of the system are selected among those provided by different protocols. The route update of each routing protocol is advertised by the G-RTM to other routing protocols in order to notify the neighbors. Finally, the overall best routes are updated to the FIT on the line cards through the connection with the G-RTM. Such an architecture enables the routing protocols to have a flexible access to the routing tables managed by the GRTM. The resiliency is improved because the routing protocol still can use the IGP/EGP-RTMs when the G-RTM temporary fails. However, the following are critical issues: Although the IGP/EGP-RTMs are distributed on a per protocol basis, they are basically independent processes running on the same control card. This leads to heavy resource consumption and to some overloading of the control card as the number of routes increases. In the case where routing protocols are distributed on the line cards to improve the scalability and fully exploit the available memory and CPU resource of the line cards [9], the IGP/EGP-RTM modules also must be migrated to the line cards. It is not very efficient to perform the FIT update operations at the control card level by G-RTM because the FITs are hosted by the line cards. To make the control plane more scalable, some router vendors and researchers have introduced early products where some protocol functions are implemented in a distributed way. For example, the OSPF Hello protocol is designed to run at line card level in the Avici TSR product [6]. Similar work also was presented in [8]. However, to the best of our knowledge, no product or router model with a distributed route management function has been introduced in the market yet.

OSPF control BGP control
MPLS control ISIS control
Control card Routing policy
Routing G-RTM table IP
G-RTM Routing table
Control card
FIT Master line card 1 LC-RTM FIT EGP IGP Line card 2 LC-RTM FIT EGP IGP Line card N LC-RTM FIT EGP IGP Cluster C1 C2 CN
Switch fabric Line card N Line card 2 OSPF link BGP link RSVP-TE LDP IP forwarding engine Line card 1 LSDB Route advertisement LC-RTM
FIT
(b)
(a)
I Figure 3. Proposed RTM distributed architecture: a) distributed RTM architecture on the control card and line card; b) distribution of RTM.
Proposed Model for Distributed RTM

Basically, the RTM module is responsible for managing the routing tables and the routing policy modules. It also provides APIs that allow exchanges of routing information obtained from routing protocols for processing and making path decisions. To take advantage of the new generation router architecture, which provides ultra-high internal switching speed and additional processing and memory resource on line cards, we investigate the ability to move some functions of the RTM from the control card to the line cards. This proposal targets next-generation routers with petabit switching capacity, full memory, and processing capabilities on line cards. These routers also are designed with a distributed control plane where some parts of the routing protocols run on line cards [9]. Our distributed model of RTM consists of two main components (Fig. 3a): Each line card handles a line card (LC)-RTM process. The LC-RTM obtains route information from the local instance of the routing protocol modules running on its line card and computes the best routes for each network domain associated with the line card, depending on its port connections. This task can be achieved by exchanging information among the line cards connected to the same domain and in some cases (i.e., for interdomain routes), it may obtain the routing information from the control card in order to make routing decisions at the platform level. Line cards are organized in a cluster framework, where each cluster corresponds to a set of line card ports. Most often, all ports of a given line card are connected to the same domain, therefore each cluster will usually correspond to a domain or sub-domain in a network (Fig. 3b). G-RTM runs on a control card and obtains routing information from LC-RTMs to update the routing table and consequently, the forwarding table of the router. The GRTM also manages the configuration of static routes configured by users (through an external routing policy module) or traffic engineering (TE) based routes. Addi-
tional control cards can be added to share processing tasks or to save back-up information of the G-RTM used for resiliency purposes. However, load balancing and G-RTM resiliency are beyond the scope of this article. We also assume that there is a routing policy module located on the control card that allows users to configure route filtering policies and IGPs/EGPs interworking and to modify path attributes for BGP routing protocols according to specific policies. We investigate the capability of using the distributed model proposed for an RTM based on the following aspects: Link state notification: the RTMs must be notified of the changes in the routing information generated by the underlying routing protocols (i.e., RIP, OSPF, IS-IS, BGP) or by the user so that the best routes and/or TE/quality of service (QoS)-based routes are re-computed and the forwarding table is updated. Advertisement: the RTMs must send an alert message to the routing protocols about the current state of physical links, such as the available bandwidth. This information helps the routing protocols to update their link state database (LSDB), to flood QoS-related information to the routing domain, or to build QoS forwarding tables. Path computation: best routes or TE/QoS-based routes are computed based on information collected from different routing protocols and from the user through a CLI. When the RTM is distributed on the line cards, information provided by each process must be consistent and unique for the whole platform. QoS and traffic engineering: The routing policy module on the control card establishes the QoS and TE-based routers for specific connections and replaces the existing best routes. These routes can be defined by the user or by QoSenabled protocols such as Resource Reservation Protocol with Traffic Engineering Extensions (RSVP-TE) or constraint-based routing label distribution (CR-LDP). Note that traffic behavior is not dealt with in this article. The distributed RTM model may be required to handle additional platform specific functions that are not considered in this article, as mentioned in the following:
10
BOP G-RTM RTM API DS Sockets IP Static route API MPLS Policy distribution DS IP FIT LCRTM RTM API DS Routing Routing socket sockets API OSPF LSA BGP4_RECV
BGP4_SEND RIB-OUT
BGP import policy
BGP export policy RTM API DS
Distribution database
LC-RTM RiBIN RiBlocal Distribution database
DS
RTM
Distribution database
DS
(OSPF) (a) (b)
(BGP)
I Figure 4. Architecture of G-RTM and LC-RTM: a) architecture of G-RTM located on a control card; b) architecture of LC-RTM located on a line card, and its interfaces with OSPF and BGP routing protocols.
Management of routing tables generated by underlying unicast and multicast protocols. Management of the static routing tables (containing default routes or the routes to often-accessed networks). Management of the routing tables per virtual private network (VPN) basis (allowing overlapping addresses or service to each VPN). Asynchronous notifications to users about changes in the routing tables. Although the RTM plays a central and key role in a router, its architecture has never been revealed by manufacturers. Some recent research [8] stated that the routing table management and update functions should remain in the control card. The authors conclusion is suitable for medium scale routers where the computing and memory resources on line cards are not sufficient to support independent LC-RTM (e.g., routers having ten line cards and one hundred interfaces). The model we propose in this article deals primarily with very large scale core routers having up to thousands of line cards with petabit switching capacity. The available memory on each line card is also on the order of tens to hundreds of Mbytes in total. Such a router is less concerned with resource limitation problems. Figure 4 presents the architectures we propose for GRTM and LC-RTM. The inter-card communication between LC-RTMs located on line cards and the G-RTM lo ca t e d on t he c o n t r o l c ar d o r amo n g L C - R TMs is achieved by a specific communication channel called distribution services (DS). Designed as an abstract layer, it also provides a synchronization mechanism to manage module activations, monitoring, and state transitioning facilities (active, back-up, in-service upgrade, etc.). DS maintains a distribution database that enables requested modules to obtain appropriate data. The G-RTM is able to record the FIT through routing socket services provided by the IP stack. Route update information can be received from neighbor routers through interfaces between LC-RTM and routing protocols running on its line card (Fig. 4b). Also, route advertisements can be sent to neighbor routers using the same interface. Basically, the model we propose works as follows.
Link State Notification

In the proposed architecture, LSDBs are stored on line cards, making them locally available to the requested processes such as LC-RTM or RSVP-TE. Recall that the LSDB is specific to a routing domain, such as OSPF areas. In a centralized model, the LSDB is handled by the control card; hence, synchronization is not required. In our distributed model, we must ensure that all line cards connected to a routing domain maintain the same LSDB. This can be achieved by having a line card acting as a master, assuming the path computation for the cluster of line cards connecting to the same routing domain. When a line card in the cluster receives a link state notification message, it forwards the message to the master. The master updates its database and synchronizes other line cards in its cluster. An appropriate election mechanism for the master line card is required for each cluster. To simplify the architecture, we can assign the first line card on which the routing protocol is activated as the master for that cluster.
Advertisement
When a line card detects a change on its physical link states, or a link state is changed by the user, the LC-RTM located on the line card is asked to broadcast a notification to all line cards in the router. Routing tables must be recalculated, and notifications are sent to the neighbor routers.
Path Computation
The path computation is processed on a routing protocol basis. For link-state routing protocols (e.g., OSPF, IS-IS), path computation can be performed by the master line card of the cluster. On the other hand, distance vector-based protocols send the route update information they obtain from neighbors to the control card in order to perform the computation. Basically, the path computation process proceeds as follows: The routing protocol modules receive update information from neighbors or detect local link modifications by themselves. The LC-RTM running on the same line card is notified. Based on the protocol identification, it decides to send the notification to the G-RTM located on the control card or

11
LC-RTM
Traffic engineering database
Routing
arrive faster and more efficiently to the required modules because they can be provided directly by the LC-RTMs. Communication among routing protocols and RTMs is also more efficient, and the bandwidth of the switch fabric can be saved.
Implementation and Scalability Evaluation

Path computation element Adjacent node
Signaling protocols (LDP, RSVP-TE)
Signaling
I Figure 5. Line card components of the distributed RTM.
forward this information to the master line card of the cluster to which it belongs. The G-RTM or appropriate master line card runs specific algorithms (e.g., Dijkstra for link-state based protocols or Bellman-Ford for distance vector-based protocols) to build the network topology and produce the best routes. A new route or an updated route is registered to the forwarding tables located on line cards through the G-RTM.
QoS and Traffic Engineering Specification

The model we propose provides user QoS and traffic engineering specification functions through an interface between the routing policy module and the G-RTM, both located on the control card. QoS and TE-based routes also can be established using specific protocols like RSVP-TE or CRLDP. In that case, specific parameters will be updated, first to the LC-RTM; then computed routes will be updated to the G-RTM. Offloading some of processing tasks from the control card to line cards helps to reduce potential bottlenecks experienced on the control card when the number of requests increases, resulting from an increasing number of routes, hence of line cards, to be supported by the core router. The LC-RTM also is able to react rapidly to the physical link modification and efficiently exploit additional resources available on the line cards of next-generation routers. In addition, the model we propose has the following advantages: Scalability: It balances the path computation load between the control card and the line cards. RTM functions are distributed as far as possible, allowing the control card to be available for more complicated tasks, such as router management and user interaction. High availability: Because route information and LSDBs have a back up on the line cards, we provide a high redundancy level for RTMs. Also, problems on the control card will not slow down the procedures on the line cards. Robustness: In our architecture, the path computation is performed on each cluster instead of the whole router, which leads to a rapid convergence in case of topology changes. Routing information and notification also can
To manage the BGP routes, the RTM has two tables. The routing information base input (RIB-IN) handles the routes advertised by BGP neighbor routers (so called BGP speakers). The routing information base local (RIB-LOC) contains the routes the router discovers by itself (e.g., physical links of the line card or the routes learned by other protocols such as OSPF). By combining these two tables, the RTM determines the best routes for the BGP, which are stored in the routing information base output (RIB-OUT) table, taking into account the additional user policy configurations. Then, the RIB-OUT table is advertised to the BGP neighbor routers. The LC-RTM has access to the LSDB managed by the OSPF module running on the same line card. This enables the OSPF to be updated with the route changes and the link status information managed by the LC-RTM. The OSPF best route computation is achieved by the OSPF module so the LC-RTM is not involved in this process. However, the final results will be stored in the routing table through the RTM API services. The functions provided by the G-RTM and the LC-RTMs are implemented as APIs. They include the store, access, look-up, list, remove, update, and back-up functions. Each function is represented by a type-length-value (TLV) structure. A module, for example, MPLS, can execute an RTM function by sending a message containing this data structure to the G-RTM or LC-RTM. The Type field is the name of the operation, followed by the length of the structure; the Value field contains additional information on the function, such as the parameters to be processed. Based on a local routing table, such a distributed RTM architecture helps to compute effectively the constrained shortest path first (CSPF) routes. On the line card, the LCRTM module consists of two main components (Fig. 5): The traffic engineering database (TED) contains the topology and resource information of the cluster. The TED may be fed by an IGP protocol instance running on the same line card or on the control cards. The path computation element (PCE) achieves the path computation based on a network graph and applies computational constraints during the computation. We investigate the distributed path computation model in the interdomain, intradomain, and interlayer context. Interdomain path computation may involve the association of topology, routing, and policy information from multiple domains. This can be performed at the LC-RTM level. Intradomain path computation deals with the routing information coming from a single domain. This is achieved by routing protocols running on the line cards, such as OSPF or IS-IS. Interlayer path computation aims at performing the path computation at one or multiple layers while taking into account topology and resource information at these layers. This is achieved by the LC-RTM and local QoS (L-QoS) modules. The CSPF computation process can be described as follows. The RSVP-TE module on the ingress line card of the router receives a PATH message from the upstream router.
12
Memory (kbytes) 100,000
Number of CPU cycles 100,000
10,000
10,000
1000
1000
100
100
10
10
1 0 16 32 48 64 80 96 112 128 144 Number of line cards
1 0 16 32 48 64 80 96 112 128 144 Number of line cards
Np = 1, CC, Centralized Np = 2, CC, Centralized Np = 3, CC, Centralized
Np = 1, Master, Proposed Np = 2, Master, Proposed Np = 3, Master, Proposed
Np = 1, CC, Proposed Np = 2, CC, Proposed Np = 3, CC, Proposed
I Figure 6. Performance comparison between the centralized and the proposed distributed architectures: a) memory used by RTMs in our proposed distributed architecture and in the centralized architecture; b) CPU resources used by RTMs in our proposed distributed architecture and in the centralized architecture.
The RSVP-TE module on the ingress line card checks the admission status (grant/deny) for the new request based on information in the TED. The LC-RTM computes the next hop (downstream) router using the PCE and the traffic engineering database. In case of interdomain path computation, the request is sent to the master of the domain, which is able to build the interdomain topology with other domains. In case of intradomain path computation, routing protocol modules running on the same line card are invoked. In case of interlayer path computation, the PCE uses information contained in the traffic engineering database. The egress line card connecting to the downstream router is contacted in order to forward the PATH message.
Scalability Evaluation
To compare the scalability achieved by the distribution architecture and the centralized architecture, we estimate the number of exchanged messages, the number of consumed CPU cycles, and the amount of required memory in the two architectures with different router configurations. The router configuration parameters for the scalability evaluation of our architecture are as follows: Number of line cards the router supports. The more line cards that are added, the higher the connectivity the router has. The number of interfaces (ports) located on a line card. They are optical interfaces with high capacity (10-40 Gb/s). In practice, a line card can have about 10 ports. In our evaluation, we use this configuration. The number of routing protocols (Np) currently running on the router. In practice, a router may support one or more of the following protocols: RIP, OSPF, IS-IS, BGP, MPLS, Label Distribution Protocol (LDP), and RSVP. The number of messages going through the switch fabric is almost the same on both the centralized and the distributed architectures. In the centralized architecture, link notification messages received by all line cards are forwarded to the GRTM on the control card through the switch fabric. In the distributed architecture, they are forwarded to the master line card of each cluster and only the best routes are sent to the G-RTM on the control card. Therefore, our architecture does not increase the traffic on the switch fabric.
In the centralized architecture, all available routes are stored by the G-RTM located on the control card, thus it occupies a lot of memory. In the proposed distributed architecture, available routes of each cluster are kept by a master line card using the line card memory. Therefore, we compare the memory requirement of the control card in the centralized architecture and the master line cards in the distributed architecture. As can be seen from Fig. 6a, the memory requirement increases with the number of line cards and the number of protocols running on the router. Figure 6a also shows the memory requirement on the control card for the distributed architecture, which is reduced considerably because most of the memory requirement has been moved to the line card. Although some optimization techniques can be deployed to save memory on the recent routers, the proposed architecture considerably improves the scalability of the router by distributing route storage, especially the label switched path (LSP) storage, on line cards. Hence, each line card will save only the LSPs going through the line card. In the centralized architecture, each line card must store all the LSPs going through the router. For the centralized architecture, there is no RTM on line cards, so the CPU resource for the RTM is consumed mainly on the control card. On the other hand, the CPU cycles for the RTM are used mainly on master line cards in the proposed architecture. Therefore, we compare these two congestion points in Fig. 6b. We can see that the CPU utilization is much higher in the centralized architecture than on each master line card. In other words, the distribution enables the load on the control card to be transferred to the master line cards so the control card congestions can be avoided. Each master line card serves only a small set of line cards; therefore, its capacity can satisfy the current demand. Even if the size of a cluster increases, we can still divide it into smaller segments with a master for each segment to avoid the bottlenecks.
Conclusion
The RTM is one of the most important components of a router. It plays a decisive role for routing performance and connectivity of the network. In this article, we presented a novel distributed architecture model for the RTM for next-generation IP routers. The model we propose can exploit additional com-

13
puting and memory resources that are available in line cards and the very high-speed communication channel among line cards. This model can use the highly scalable hardware architecture of IP routers efficiently. Routes can be computed more efficiently and in a scalable manner, based on interfaces between the LC-RTMs and routing protocols running on line cards. The robustness, availability, and resiliency of the router also can be considerably improved. The scalability evaluation between the proposed and a centralized architecture in terms of required memory and computing resources shows that the load of the control card has been moved to the line card, thus enabling the router to support a larger number of line cards.
[8] M. Deval et al., Distributed Control Plane Architecture for Network Elements, Intel Tech. J., vol. 7, no. 4, 2003. [9] K. K., Nguyen et al., Towards a Distributed Control Plane Architecture for Next Generation Routers, ECUMN 2007, France, Feb. 2007.
Biographies
KIM KHOA NGUYEN (kk_nguye@encs.concordia.ca) received his M.Sc. in computer science from the Francophone Institute for Computer Science in 2001 and his Ph.D. in electrical engineering from Concordia University in 2007. Since 2002 he has been working with the Optimization of Communication Networks Research Laboratory at Concordia University. His current research includes router architectures and QoS for distributed systems. From 1998 to 2002 he was a senior engineer at Vietnam Data-Communication. ANJALI AGARWAL [SM03] received her Ph.D. in electrical engineering in 1996 from Concordia University, Montreal, her M.Sc. in electrical engineering in 1986 from the University of Calgary, and her B.E. in electronics and communication engineering in 1983 from Delhi College of Engineering, India. She is currently an associate professor in the Department of Electrical and Computer Engineering at Concordia University. Her current research interests are various aspects of real-time and multimedia communication over the Internet and wireless access networks. Prior to joining the faculty at Concordia, she worked as a protocol design engineer and software engineer in industry. BRIGITTE JAUMARD () holds a Concordia University Research Chair, Tier 1, on the optimization of communication networks at the Concordia Institute for Information Systems and Engineering (CIISE) of Concordia University. She was previously awarded a Canada Research Chair, Tier 1, in the Department of Computer Science and Operations Research at the Universit de Montral. She is an active researcher in combinatorial optimization and mathematical programming, with a focus on applications in telecommunications and artificial intelligence. Recent contributions include the development of efficient methods for solving large-scale mathematical programs and their applications to the design and management of optical, wireless, and 3G/4G networks. In artificial intelligence her contributions include the development of efficient optimization algorithms for probabilistic logic (reasoning under uncertainty) and automated mechanical design. She has published over 150 papers in international journals in operations research and telecommunications.
Acknowledgement
The authors would like to thank Hyperchip, Inc. for providing us with financial support. The project also benefited from the support of the Concordia Research Chair of B. Jaumard on the optimization of communication networks.
References
[1] A. Zini, Cisco IP Routing, Addison-Wesley, 2002, pp. 80111. [2] O. Hagsand, M. Hidell, and P. Sjodin, Design and Implementation of a Distributed Router, Proc. 5th IEEE Intl. Symp. Signal Processing and Info. Tech., Dec. 2005, pp. 22732. [3] A. Csaszar et al., Converging the Evolution of Router Architectures and IP Networks, IEEE Network Mag., vol. 21, no. 4, JulyAug. 2007. [4] H. J. Chao and B. Liu, High Performance Switches and Routers, Wiley-Interscience, 2007. [5] Cisco Systems, Cisco 12000 Series Internet Router Architecture; http://www.cisco.com [6] H. Kaplan, Non-Stop Routing Technology, white paper, Avici Systems Inc., 2002. [7] M. Leelanivas, Y. Rekhter, and R. Aggarwal, Graceful Restart Mechanism for Label Distribution Protocol, IETF RFC 3478, Feb. 2003.
14

A Distributed and Scalable

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

A Distributed and Scalable

Caricato da

Copyright:

Formati disponibili

A Distributed and Scalable Routing Table Manager for the Next Generation of IP Routers

0890-8044/08/$25.00 2008 IEEE

IEEE Network March/April 2008

Next-Generation Routers and the Routing Table Manager

Toward a Distributed RTM Architecture

IEEE Network March/April 2008

CPU and memory

RTM Control card

Data packet Legend:

Control packet Data flow

Control packet Control messages (a)

30.0.0.0 is directly connected, Serial 0

Control card Rotary database

0 30.0.0.0/24 via 20.2.1.1, 00:00:03, Ethernet 0 S 30.0.0.0 is directly connected, Serial 0

Forwarding information table (FIT)

Forwarding information table (FIT)

Legend: O: OSPF (b)

IEEE Network March/April 2008

IP G-RTM FIT FIT

Switch fabric Line card Line card HW FIT

IEEE Network March/April 2008

OSPF control BGP control

MPLS control ISIS control

Control card Routing policy

Routing G-RTM table IP

G-RTM Routing table

Proposed Model for Distributed RTM

IEEE Network March/April 2008

BGP import policy

BGP export policy RTM API DS

LC-RTM RiBIN RiBlocal Distribution database

(OSPF) (a) (b)

Link State Notification

IEEE Network March/April 2008

Traffic engineering database

Implementation and Scalability Evaluation

Signaling protocols (LDP, RSVP-TE)

I Figure 5. Line card components of the distributed RTM.

QoS and Traffic Engineering Specification

IEEE Network March/April 2008

Memory (kbytes) 100,000

Number of CPU cycles 100,000

1 0 16 32 48 64 80 96 112 128 144 Number of line cards

1 0 16 32 48 64 80 96 112 128 144 Number of line cards

Np = 1, CC, Centralized Np = 2, CC, Centralized Np = 3, CC, Centralized

Np = 1, Master, Proposed Np = 2, Master, Proposed Np = 3, Master, Proposed

Np = 1, CC, Proposed Np = 2, CC, Proposed Np = 3, CC, Proposed

IEEE Network March/April 2008

IEEE Network March/April 2008

Potrebbero piacerti anche