Sei sulla pagina 1di 5

A Parallel Multiple Hashing Architecture for IP Address Lookup

Hyesook Lim * and Yeojin Jung

Information Electronics, Ewha W.University, Seoul, Korea

* E-mail:!a
TEL: +82-2-3277-3403
Address lookup is one of the main functions of the Internet routers and a very important feature in evaluating router performance. As the Internet traMc keeps growing and the number of routing table entries is continuously growing, eacient address-lookup mechanism is essential. In recent years, various fast address-lookup schemes have been proposed, but most of those schemes are not practical in terms of the memory size required for routing table and the complexity required in table update. In this paper, we have proposed a parallel IP address lookup architecture based on multiple hashing. The proposed scheme has advantages in required memory sue, the number of memory accesses, and table update. We have evaluated the performance of the proposed scheme through simulation using data from MAE-WEST router. The simulation result shows that the proposed scheme requires a single memory access for the address lookup of each route when 203kbytes of memory and a few-hundred-entryTCAM are used.
AbstractIndex TermsIP address lookup, best matching prefix, parallel multiple hashing, longest prefix matching

required memory size. As a number of networks attached to backbone router grow exponentially, the required memory size for storing routing table is significantly increased. It should be considered to keep numbers of prefixes compact in the routing table. The complexity in routing table update and the scalability toward IPv6 also has to be considered. In this paper, we propose an IP address lookup structure which fully explores these factors. The proposed scheme combines hardware parallelism with multiple hashing proposed in[1]. The rest of the paper is organized as follows. Section II describes issues of existing approaches in IP address lookup. Section I11 explains address lookup using multiple hashing, and Section IV describes, the proposed scheme. In Section V, simulation results and the performance evaluation results are shown. Section V concludes the paper.


R outers

have to perform IP address lookup in real-time for the incoming packet in order to forward the packet toward the final destinations. Classless Inter-domain Routing (CIDR) scheme is introduced to solve the issue of IP address space exhaustion by allowing network aggregation. With CIDR, the IP prefixes of routing table have arbitrary lengths. As a result, when a packet arrives, a router compares the destination IP address of input packet with all prefixes of its routing table and determines the most specific matching among matching entries, and this is called the longest prefix matching (LPM). Since prefix length is not specified in the IP address of incoming packet, the longest prefix matching is a complex operation and becomes the bottleneck of router.performance. There are several important factors in evaluating router performance related to IP address lookup. First one is the number of memory accesses since memory accesses are the major overhead of router performance which has not been improved as much as link speed increases. Second factor is the

11. PREVIOUS SCHEMES A number of previous 1P address lookup schemes are classified as follows. First one is a TCAM-based scheme [2]. TCAM performs IP address lookups of entire entries concurrently with one memory access. However, it is more expensive than common memory, and it has smaller storage space than a same size RAM as well as it has higher power consumption. Therefore, it is impractical to implement routing table with several hundred-thousands of prefixes with TCAM. Moreover, TCAM has a scalability issue to IPv6 which has the 128-bit address space. Secondly, there have been proposed numerous lookup algorithms based on trie structure. A trie is a tree-based data structure allowing the organization ofprefixes on a digital basis by using the bits of prefixes to direct the branching (31. However, several issues are related to trie structure. Assuming W is the height of a trie, it requires W memory accesses. Trie also has large storage requirements [3]-[8]. In order to reduce the number of memory accesses in trie, prefix expansion has been proposed [4], and the prefix expansion has been used in [7]. Prefixes shorter than 24 bits are expanded to 24 bits, and then initial lookup is performed on the 224-entrytable. If the indexed entry indicates that there possibly exist longer prefixes, an additional lookup is executed at the

0-7803-8375-3/04/$20.00Q 2004 IEEE


second memory using the pointer indicated at the indexed enby of the first memory. This scheme has an advantage of maximum two memory accesses. However, 32Mhytes of memory is required to store 224entries. The scheme proposed in [8] first constlucts a forwarding table with 2"entries after expanding prefixes into 16 bits, and then builds sub-trees pointed by each entry. If input prefix is longer than 16, address lookup is executed along the sub-tree. This scheme requires long preprocessing time 'to construct sub-trees, and hence table updating is an issue of the scheme. Finally, there are hash-based schemes. Hashing has been popularly used in address lookups by exact matching. Several schemes have been proposed to apply hashing for IP address lookup [9], [IO]. By constructing separate routing tables and separate hash functions for each prefix length, Lim et al. [9] suggested parallel hashing for each prefix length. Collided prefixes are stored in sub-tables, and binary search is applied for collided prefixes. The number of memory accesses is varied because of the binary search on sub-tables in their scheme. Waldvogel et al. [IO] proposed to organize a routing table by prefix length and apply binary search on the prefix length in the routing table. Memory access in each prefix length is performed by hashing in their scheme. The binary search requires the worst case of IogzW memory accesses (W is the number of different prefix lengths). Moreover, the scheme requires long preprocessing to compute markers noting the existence of longer prefixes, and hence the routing table update is not trivial. The scheme also assumes to find out a perfect hash function for a given prefix distribution, but it is known that it takes several minutes to find out the perfect hash function [4].Therefore, the application of this scheme is limited only when prefix information is infrequently changed.

32 bits. As a result, their scheme requires 4Mbytes of memory space to store a routing table and achieves a route lookup with maximum 2 memory accesses.

Figure I Routing table Construction using multiple hash functions

IV. THE PROPOSED SCHEL4E In this section, we propose ii parallel IP address lookup architecture based on multiple hashing. Tht: proposed scheme has multiple routing tables separated by prefix length, and address lookup in each table is performed in parallel using hashing. Hashing is implemented with cyclic redundancy code (CRC) checker in our proposed scheme. Figure 2 shows the proposed hardware structure.


Hashing converts a bit representation into a shorter bit representation which is used as an index of a table, and hence hashing produces collision, in which different bit representation is converted into a same hash index. Good hash functions produce few collisions, and it takes several minutes to find out a semi-perfect hash function for a given prefix distribution. Instead of looking for a semi-perfect hashing function, Broder et al. proposed multiple hashing [I]. Their analysis and simulation results show that multiple hash functions associated with multiple entries in each routing table effectively improve hashing performance. Figure 1 shows the construction of routing table using multiple hashing. A hash key is passed through both of hash functions and two hash indices are obtained. Each hash index is used as an index of each table, and the hash key is stored into the bucket of the table which has fewer loads. Therefore, keys are more evenly distributed into multiple tables. In order to apply the multiple hashing into IP address lookup, Broder et al. have tailored the binary search on levels presented in [IO] after extending prefixes into 16,26, and

Figure 1 Proposed hardware stmchlre

A. Parallel Lookup As mentioned in earlier section, LPM is a complex operation

since there could exist many mat:hing prefixes with different lengths in the routing table and the longest matched prefix should be determined. In other words, search has to be continued until the entire table is examined since there possibly exist longer matched prefixes in the table even though a match is found. The proposed scheme organizes multiple routing tables separated by prefix length and stores each table into a separate memory. Hence the LPM problem is converted into the exact matching problem in each table, and as a result, parallel lookups using hashing on each fable are achieved. In other words, finding a match in each prefix table is performed by a separate process, and every process is executed in parallel. The longest match is finally determined among matches gathered from all processes.


B. Multiple Hashing Using CRC A hash function takes an incoming IP address as input and generates a shorter fixed-size string known as a hash index. The hash index is used as a pointer approaching a routing table. Good hash functions provide few collisions, which means not many prefixes are mapped into the same hash index [I I]. An important issue in using hashing for IP address lookup is how to minimize collisions. Our proposed scheme applies the multiple hashing presented by Broder, et al. [ I ] in order to solve collision problem and uses CRC as a hash function which is known as a semi-perfect hash function [12]. While Broder, et al. suggested the software-based scheme which searches an appropriate hash function for a given prefix distribution, our proposed scheme is a hardware-based scheme which uses a fixed hash function and applies parallel searching in each prefix table. Additionally, we showed a scheme which extracts multiple hash indices for each prefix length from a single CRC hardware and hence significantly reduces the burden of hardware implementation. Figure 3 depicts the CRC-32 hashing hardware structure used in the proposed scheme [ I I]. Since the routing table in the proposed scheme is organized independently by prefix length, separate hash indices are required as indices of routing tables in each prefix length. Extracting hash indices from CRC hashing hardware is explained as follows. First, each bit of destination IP address is serially entered into CRC hashing hardware. After L (for L = 8,9, ,,., 32) cycles, two fixed hash indices for prefix L are extracted from the register 0 to the register L-I. By repeating the same procedure, hash indices for different prefix lengths are taken at different timing from CRC registers in the proposed scheme. For example, hash index for prefix length 8 i s taken from CRC registers after 8 cycles, and one clock cycle later, the hash index for prefix length 9 is chosen. All the hash indices for each routing table are available after 32 cycles.

C Building Forwarding Tables .

We refer the analysis of [I] in order to determine the number of buckets and the number of entries per bucket in each table. Assuming that N prefixes are hashed into NI2 buckets using two hash indices, the analysis shows that the probability for a bucket to have two or more loads is 5.0e-7. Figure 4 shows the bucket structure which stores maximum two loads. Each bucket ofthe routing table consists of a field to indicate a number of item and multiple fields to store loads, and each load consists of a field for prefix and a field for forwarding RAM pointer as shown in Figure 4.





Forwarding PAM winter

Figure 4 Enntry structure of the forwarding tablc in the proposed scheme

The length ofhash hit is determined according to the number of buckets. In order to store N prefixes into tables with total capacity of 2N loads, we need two tables composed of NI2 buckets, each bucket having two loads. Because hash index is used as an index of routing table, the required length of hash index is the nearest integer of log@/Z). We use the minimum length of hash index is 2. Forwarding table is built using the algorithm shown in Figure 5. First, a prefix enters to CRC hashing hardware bit by bit, and after L (L is the prefix length) cycles, two hash indices are extracted from CRC registers between bit 0 to bit L-1. Each hash index indicates a bucket of each table, and the new prefix is stored into the bucket which has smaller number of loads. If two buckets have equal loads, the prefix is placed in the bucket of the first table in default. In case of overflows, the prefix is stored into the overflow table. In the proposed scheme, two hash tables per each length of prefixes are used, and hence total 48 hash tables (prefix 8-32, except the prefix length 31) and an overtlow table are used. Figure 6 depicts the process of constructing the forwarding table.

Figure 3 CRC-32

We analyze whether the 32 cycles to obtain all the hash indices do not prevent routers f o working at line rate. rm Suppose that minimum size packet length is 72 bytes including preamble and Start of Frame Delimiter (SFD), and Inter-Frame Gap (IFG) is 12bytes. In case that the forwarding engine operates at IOOMHz clock, the required time to obtain all the hash indices are 320ns, and hence a router which has the aggregated bandwidth of up to 2.1 Gbps works at line rate. At 200MHz clock, a router with the aggregated bandwidth ofup to 4.2 Gbps works at line rate. Beyond this rate, multiple hashing hardwares are required as shown in Figure 2.

Figure 5 Algorithm to build forwarding table


in parallel.

Figure 6 Construction procedure of forwarding table

D. Overflow Table
In case that both buckets are full and have no space to store more prefix, a newly inserted prefix has to be stored in the overflow table. As will be shown on Table 1 of section V, using two hash indices and using 203 Khytes of memory, overtlow rate is 0.52% of the entire prefixes. A small-sized TCAM can he used for the overflow table. TCAM is also searched in parallel.
E. Searching Forwarding Tables Searches in each table are executed in parallel using the hash indices obtained from CRC hash function. As mentioned earlier, 48 hash indices are obtained from a single CRC hardware. As shown in Figure 7, entries in each table are concurrently searched for the buckets indicated by hash indices. Additionally, overflow TCAM is also searched in parallel. The LPM (Longest Prefix Match) is selected by the priority encoder among matching entries resulted from the tables and the overflow TCAM. The packet is finally forwarded to the output port pointed by the forwarding RAM pointer indicated by tbe selected entry. Figure 8 is the block diagram of searching procedure. As shown in Figure 8, only a single memory access time is required since lookup in each prefix length is performed
let D[31:31-Ltl] is L bits of destination address D. D[31:31-Ltl] serialiy entered to CRC hash function Exlrict H,(L), H,(L) from CRC registers Do Parallel (L-8-32)
At cycle yfor L=&-32),

Figure 8 Search P m s d u r r offorwarding table

F. Update andExpansion to IPv6 Routing table update for the proposed scheme is incremental. Update process is the same as building process. The newly added prefix is located into the bucket with fewer loads. If both of buckets are full, the newly added prefix is stored into the overflow TCAM. Prefix deletion is also incremental. After deleting the prefix from the huoket indicated by hash index, loads of the bucket are re-arranged in order to make no invalid ently between the valid entries, and the number of items is reduced by one. If the bucket has no matching prefix, the prefix is searched on overflow table and deleted. It does not require long computation to build a new table, and hence fast update is achieved. Too many memories may be required toward IPv6 since proposed scheme uses separate memory in each prefix length, and the problem is expected to be solved by prefix grouping. The grouping solutiorl is not included in this paper due to the page limitation.

V. SIMULATION RESULT AND COMPARISON We have performed address, lookup simulation for our proposed scheme using data from a snapshot of the ) MAE-WEST (2002/03/3 5, which has 29584 prefixes. Figure 9 shows the prefix distribution through MAE-WEST. In order to find out memory efficiency rate and overflow rate, we have performed several testcases based on given prefix distribution. The memory efficiency rate represents how many prefixes are filled over the entire table entrier;. Table 1 describes the testcases and the results according to the number of buckets and the number of hash functions. The Case 1 is to use a single hash function. For each prefix length (except the prefix length 31), a !;ingle table is used, and hence total 24 tables and an overtlow TCAM are used in simulation. When N items are hashed into N/2 buckets which have 4 entries

tablelgtr=H,(L) table2gh=H,(L) If(D[3l:31-Ltl[=pre~r(tablelgtr)) Then fwdgtr=fwd~tr(tabiel~tr) Else if(D[31:31-Ltl[=prelx(table2gtr)) Then fwdslr=lwdstr(table2~tr) End Do Parallel Search fromovemow CAM Determine LPM among matching entries
Figure 7 Searching Algorithm


per bucket, about 203Kbytes of memory is required, and the overflow rate is 3.4%. In case that two hash functions are used
MAE-WEST 0311 512QOZ



- 1 E+OJ
1 E+Ol

I E102

separately constructed in each prefix length. Therefore, searches in each prefix length are performed in parallel. This also makes it possible to apply hashing to I address lookup. P The proposed scheme applies multiple hashing to improve hashing performance. The proposed scheme requires just one memory access to find out the longest prefix match using total 203 Kbytes of memories and a small-sized TCAM. It also has excellent characteristics in routing table update and in the scalability to IPv6. Our proposed structure can be easily implemented in VLSl since it has a regular and modular structure.
TABLE2 COMPARISON EXISTING WITH SCHEMES Address Lwkup NumbcrofMemory Accesses Forwarding Table Scheme (Minimum, Maximum) Size Hungs scheme [81 DIR-24-8 [7] DIR-21-3-8 [7]
I, 3


.. .












Prelix Lenmh IS-321

450KB 470KB 33MB 9MB

l50KB- I60KB

Figurc 9 Prefixes Distribution of MAE-WESTRoutel

1,2 1.3 2.9

1.5 I* I

(Case 2) under the same condition, in which 48 tables and an overflow TCAM are used, the overflow rate is significantly reduced to 0.52% (154 entries). The reason is that prefixes are more evenly distributed by using multiple hash functions. The Case 3 uses two hash functions and 3 entries per bucket instead 2, and the overflow is completely removed. The Case 4 uses Ni4 buckets with three hash functions and results in high memory efficiency rate but more overilows. The case consumes 152 Kbytes of memory, and 136 overflows are occurred. The Case 4 can be chosen for optimum memory usage. The testcases show that multiple hash functions improve hashing performance, and the memory efficiency is traded off with memory ovefflow rate.

SF? [I31
Parallel hashing [9] PIOPOSed ArchitccNre

189KB 203KB + I54-enuyCAM

[I] A. Broder and M. Mitzenmachcr, Using Multiple Hash Functions to Improve IP Laokups, IEEE MFOCOM, pp. 1454.1 463,2001. [2] A. McAuley and P. Francis,Fast routing lookup using CAMs, in P m . IEEE MFOCOM. 1993.p~.1382-1391. . [I] M. A. Ruiz-Sanchez, E. W Biersack and W. Dabbous, Survey and Taxonomy of IP Addrcsa Lookup Algorithms, IEEE Network pp. 8-23, March/April2OOl, [4] V. Srinivasan and G. Varghesc, Fast address lookups using controlled prefix expansion, in P m .ACM Sigmetricr98 Conf., Madison, WI, pp. 1-11. 151 W. N. Eathenon, Hardwarc-based Internet pmtocol prefix lookups, M.S. thesis, Washington Univ., St. Lauis, MO, 1998. [Online]. Available 81 . [6] David E. Taylor, Jonathan S. Turner, John W Lockwood, Todd S. Spraull and David B. Parlour, Scalable IP Laokup for lntemct routers, IEEE Journal an Selected Areas in Communications, Vol.21, N0.4, pp.522-533, May 2003. [7] N. McKeown, P. Gupta and S. Lin, Routing lookups in hardware st memory access speeds, in h c IEEE INFOCOM98 C o d , o. pp. I24LL1247. P [SI Ncn-Fu Huang and Shi-Ming Zhao, A Novel I Routing Lookup Scheme and Hardwarc Architceerc for Multigiga bit Switching routen, IEEE Joumal on Sclccted Areas in Communications, Vol. 17, No. 6, pp. 1093-1 104, June 1999. [9l Hycrook Lim, Ji-Hyun Sco and Yca-Jin lung, High Speed IP Address Lookup Architecm Using Hashing, IEEE Communication Lctters, Vol. 7, No. 10, pp. 502-504, Ocl. 2003. [IO] M. Waldvogcl, G. Varghcse, 1. Turner, and E.Platmsr, Sealable high speed IP routing lookups, in P m . ACM SIGCOMM97 Cont. Cannes, France, pp. 25-35. [ I l l Rich SeifcrSThe Switch book, Wiley, 2000 [I21 Raj lain. Comparison of Hashing Schemes far Address h k u p in ComputerNetwar?#. in IEEE Tmnsaclions on Communications, Vol. 40, No. 1O.pp. 1570-1573, Ocl. 1992. [I31 M. Dcgcrmark. A. Bmdnik, S. Carlsson. S. Pink, Small Forwarding Tables for Fast Routing look up^". Pmc. ACM SIGCOMM, pp.3-14. 1997

Number Number Number Entrssl Memory Memory Ovcflow of Item of Bucket of Hash Bucket Size Efficiency Rate N N N N NI2

4 2 3 2

203KB 49.85% 203KB 49.85% 303KB 33.41% 1S2KB 66.3%

3.4% 0.52% 0% 0.46%

2 3 4


2 2 3

We compare our proposed scheme with existing schemes in the Table 2. The proposed scheme is worth to pay close attention in considering memory size and memory access times.


We have proposed a practical and efficient hardware structure for IP address lookup. The essence of the proposed structure is to apply parallelisms onto multiple hashing. Prefixes are classified according to prefix length, and tables are