Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Rajeev Jayaraman
Xilinx Inc. 2100 Logic Drive San Jose, CA 95131, USA rajeev.jayaraman@xilinx.com
ABSTRACT
FPGAs have been growing at a rapid rate in the past few years. Their ever-increasing gate densities and performance capabilities are making them very popular in the design of digital systems. In this paper we discuss the state-of-the-art in FPGA physical design. Compared to physical design in traditional ASICs, FPGAs pose a different set of requirements and challenges. Consequently the algorithms in FPGA physical design have evolved differently from their ASIC counterparts. Apart from allowing FPGA users to implement their designs on FPGAs, FPGA physical design is also used extensively in developing and evaluating new FPGA architectures. Finally, the future of FPGA physical design is discussed along with how it is interacting with the latest FPGA technologies.
temporary and only until an equivalent ASIC is deployed. Typically, FPGAs in pre-production use indicate a very tight timeto-market requirement that cannot be met by ASICs. Similar to FPGAs in production use, the performance requirements could be very high. Prototyping: FPGAs in this category are used primarily to prototype a system. The volume requirements are fairly small and the performance requirements may not be stringent. Emulation: Emulation is an effective way of functionally debugging the system and FPGAs are sometimes used to emulate complete systems. The volume requirements are very small and the performance requirements are not critical. Figure 1. shows the relative usage of FPGAs in these 4 categories.
Keywords
FPGA, Physical design, Placement, Routing.
1. INTRODUCTION
Field Programmable Gate Arrays (FPGA) have revolutionized digital system design in the past 15 years. Their programmability and fast time-to-market have made them very popular with digital system designers. About 5 years ago, FPGAs were being used primarily as glue logic in a system. Now, with the arrival of multimillion gate FPGAs and the availability of a variety of systemlevel features on them, FPGAs are being used to design complete systems. FPGAs are used in systems for a variety of different reasons. Their use can be classified into four broad categories [10]. They are: Production use: In this category, FPGAs are an integral part of the system in production. Further, due to low volume requirements or rapidly changing market conditions, there is no migration plan to ASICs. Since they are part of production systems, the performance requirements for FPGAs may be very high. Pre-production use: This category of FPGA use is very similar to production use in all respects except one: the use of FPGAs is
Figure 1. FPGA Use As can be seen in the figure, production and pre-production systems comprise the overwhelming majority of FPGA use: a far cry from the days when FPGAs were primarily used for prototyping and emulation. This directly implies that the time-tomarket and high FPGA performance requirements are crucial determinants of FPGA software. The rapid growth and adoption of FPGAs in digital systems can be traced to three main factors: Business climate, FPGA device features and density, and FPGA software. Business Climate: The business factors that have contributed to the success of FPGAs are reduced time-to-market and lower lifecycle costs. With respect to time-to-market, FPGAs have proved to be very valuable in reducing the system design cycle. Additionally, their re-programmability implies that late feature requirements or bugs caught late in the design cycle are easier and less expensive to fix than ASICs. The re-programmability for FPGAs is also important in lowering the overall lifecycle costs of the system since new features and modifications can be implemented in systems that have already been deployed in the field. FPGA device features and density: Another reason for the popularity of FPGAs among system designers is the addition of several system-level features on FPGAs. Not long ago, FPGAs consisted primarily of configurable logic elements and routing.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISPD01, April 1-4, 2001, Sonoma, California, USA. Copyright 2001 ACM 1-58113-347-2/01/0004$5.00.
214
However, in the last 5 years, vendors have started implementing a wide variety of system-level features on their FPGAs such as embedded block RAMs; multiple system clocks with associated clock management circuitry; I/Os that can be configured according to several I/O standards, and embedded processors. Along with the addition of system-level features, the gate densities of FPGAs have also grown by orders of magnitude in the past 5 years. For example, in 1996, the largest FPGA offered by Xilinx, the XC4025 consisted of 25000 equivalent userprogrammable gates. Today, Xilinx offers the X2V6000 that can implement an equivalent of 6 million user-programmable gates. The combination of system-level features and the large number of user-programmable gates allow FPGAs to implement complete systems on a chip. FPGA Software: Another important reason for the popularity of FPGAs is the FPGA design software. By FPGA software we mean the software that is provided by FPGA and FPGA-CAD vendors to FPGA users to implement their design on an FPGA. Current FPGA software tools give the user the sophistication and capability to start with behavioral and RTL-level descriptions and compile multi-million gate systems in a matter of a few hours. Such fast compile times along with the softwares ease-of-use have shortened the FPGA design cycle and have fueled the rapid adoption of FPGAs in systems. In this paper we will restrict our attention to FPGA physical design software that is used in implementing the users design on the FPGA. It is important to clarify that we will not be discussing the physical design software used for the actual layout of the FPGA silicon by the FPGA vendors. Instead, we will be discussing the software used in the implementation of the users design on the configurable logic and routing elements on the FPGA. Most FPGA vendors rely on third-party EDA vendors to provide synthesis and schematic-based design entry mechanisms. On the other hand, FPGA vendors are typically the primary source for the physical implementation tools such as placement, routing, and configuration programming. The primary reason for this is that the physical implementation software is very closely tied to the FPGA architecture. In fact it is developed simultaneously with new FPGA architectures. The algorithms for these FPGA physical implementation tools started out as algorithmic modifications to classical ASIC physical design algorithms. However, over time, they have evolved in subtle but different ways from classical ASIC physical design algorithms. This paper will discuss the state-of-the-art in physical design algorithms for FPGAs and contrast them with those algorithms for ASICs. To understand physical design for FPGAs, it is instructive to understand the requirements that drive FPGAs. In the next section, we will discuss the requirements of FPGA physical design software and contrast it with ASIC physical design software. In Section 3 the typical FPGA design flow is described. We define the placement and routing problems for FPGAs along with a discussion of some basic algorithms for them. We conclude with some thoughts on the future of FPGA physical design software.
215
are manufactured by the vendors), FPGAs have not only become the technology leaders but have actually become the drivers for the latest process advances in the semiconductor fabrication facilities. Inherent in the state-of-the-art processes come the set of challenges referred to as deep sub-micron effects. With ASICs, the designer has to account for all the deep sub-micron effects in their design. Consequently, ASIC software must provide users with tools to address these deep sub-micron challenges. However, in the case of FPGAs, the FPGA vendors design their FPGAs such that the end user of FPGAs does not have to directly account for many of the deep sub-micron effects. Of course, a result of this design is that some FPGA performance may be sacrificed. However, not having to account for some of these deep submicron effects simplifies not only the design cycle for FPGA users but also the development of FPGA software. Currently, FPGA software does not concern itself with some deep sub-micron effects such as cross-talk and signal integrity to as great an extent as ASIC software. For example, even at 0.13u FPGA software does not have to contend seriously with these deep sub-micron effects. Of course, as geometries get smaller, FPGA software may have to start accounting for these DSM effects since the FPGA architecture itself may not be able to completely shield the user from having to account for them. While FPGA users do not have to concern themselves with most DSM effects, some DSM effects such as the dominance of routing delays over logic delays are effects that the first FPGAs have had to deal with. In fact, the dominance of routing delays over logic delays in FPGAs is not a result of the sub-micron geometries but more a result of the FPGA architecture. The reason for this is that a typical FPGA connection consists of a combination of metal and one or more programmable interconnect points (PIP) that are usually implemented as pass gates. These pass gates make the routing delay dominate the logic delay in FPGAs. This dominance of routing delay has influenced architecture decisions of FPGAs. While it is not possible to reduce the routing delay beyond a certain amount, most FPGA architectures attempt to at least make the routing delays highly predictable leading to physical design algorithms that thoroughly exploit this characteristic.
Another factor that forces FPGA physical design software to be simple and require less support is the economics of FPGA software. Given the relatively low cost of FPGA software compared to ASIC design software, the support costs account for a large fraction of the overall cost. This imposes the requirement that FPGA software need as little support as possible. This requirement for FPGA physical design software manifests itself as a tendency to hide a lot of the tool and algorithm complexities from the user.
216
Similarly, the routing of the logic elements implies that specific routing resources are configured in order to achieve the required connections.
Figure 3. Traditional FPGA Architecture Figure 2. FPGA Design Flow. The design entry phase is identical to the ASIC design flow. Considering the high gate densities of contemporary designs, hardware description languages such as Verilog and VHDL are the current method of choice for design entry. The design verification phase is also similar to the ASIC design flow. Design verification in the form of formal verification or functional simulation can be done directly on the design entry. On the other hand, verification such as back annotation and timing simulation can also be performed on the implemented design. The part of the FPGA flow that we will concern ourselves in this paper is the design implementation phase. The design implementation phase can be mainly divided into synthesis, placement, and routing. FPGA synthesis tools have been traditionally developed by synthesis vendors rather than the FPGA vendors. On the other hand, as discussed earlier, the FPGA vendors themselves have been the primary developers of the physical design tools such placement, routing and bit-stream generation. In the design implementation phase, the first task is of synthesis and technology mapping. An input HDL description is synthesized and mapped into logic elements such as Lookuptables (LUTs), Flip-flops, I/O blocks etc. that are the basic building blocks of the target FPGA architecture. The resulting netlist consists of these logic elements connected together to implement the user design. This netlist is used as the input to the placement tool that places these elements on the FPGA sites that implement these logic elements. After all the logic elements are placed appropriately on sites on the FPGAs, they are connected together by the routing tool. Once the placement and routing is completed, the FPGA is configured to implement the design. The placement of the logic elements in the design net-list on the logic element sites on the FPGA dictates the configuration of those logic element sites. As shown in the figure, an FPGA consists of a 2-dimensional array of logic blocks. These logic blocks can be further decomposed into a hierarchical collection of different logic elements such as LUTs, Flip-flops and Muxes etc. The figure shows each logic element consisting of 4 sub-blocks where each sub-block consists of a 4-input LUT and a Flip-flop. There is a local routing network within the logic block that provides very fast and almost complete connectivity with all other logic elements within the logic block. On the periphery of the FPGA are the programmable I/O blocks through which the FPGA connects to the external world. Connecting these logic and I/O blocks is a mesh of uncommitted routing resources that can be programmed to achieve different connections. The routing fabric is represented as a set of routing resources and a set of switch and connection blocks. Connection blocks connect the routing resources to the pins of the logic block while the switch blocks connect different routing resources that are incident to the switch block. Typically, there is a hierarchy of routing resources i.e., some resources can connect to switching and connection blocks in adjacent tiles, others can connect to blocks that are a specific distance apart, and some can connect all the blocks in the same row or column on the FPGA. While FPGA architectures differ in the kinds of logic elements, the number of logic elements, the amount of routing resources, and the routing fabric, they can be abstracted down to the model shown in Fig. 3.
3.2 Placement
As discussed earlier, synthesis creates a netlist consisting of a set of logic elements and a set of connections between them. Since, the synthesis step involves technology mapping, these logic elements can be directly mapped to the logic element resources on the FPGA device. Given a list of logic elements, connected to each other by nets, the placement problem can then be defined as placing these logic elements on the available logic element sites on the FPGA such that the connections between them, as specified
217
by the nets, can be routed completely using the available routing fabric of the FPGA. As one can see, it is almost identical to the classical ASIC placement problem. Let us now take a detailed look at the objective functions, constraints, and the algorithms for FPGA placement.
either required to run quickly and produce a reasonable result or to produce superior results with longer runtimes. Typically, the logic elements of FPGAs are arranged in a hierarchical fashion. For example, in the Xilinx Virtex family of FPGAs, a combination of 2 look up tables (LUTs) and 2 flipflops is referred to as a slice. Two such slices make up a configurable logic block (CLB), and the entire FPGA consists of a 2-dimensional array of CLBs. Logic elements at different levels in the hierarchy have different connectivity and configuration specifications. For example, the reason that LUTs and flip-flops are grouped together in a slice is to allow for efficient registering of combinational logic functions. Additionally, the control signals to the LUTs and flip-flops in the same slice may be shared and their configuration may be constrained. This means that not all combinations of LUTs and flip-flops can be placed in a single slice. An important decision that must be made for FPGA placement is its unit of placement. If the unit of placement is too fine e.g. LUTs and flip-flops, placement will have to deal with a very large number of movable objects which in turn will have a deleterious effect on runtime. Additionally, a coarser unit of placement such as a CLB may not give enough flexibility to attain good results. Table 17 shows the number of LUTs/Flip-flops, slices, and CLBs in the largest family members of the Virtex-E and Virtex-II. There are a couple of things that can be noted from this table: the largest FPGA (Virtex-II, XC2V10000) has close to quarter million LUTs and Flip-flops, and the largest FPGA is at least an order of magnitude larger than the smallest FPGA in the family. Table 1. Placement Entities in FPGAs
Device XCV50E (Virtex-E) XCV3200E (Virtex-E) XC2V40 (Virtex-II) XC2V10000 (Virtex-II) Array 16x24 104x156 8x8 128x120 LUTs,FFs 3072 129792 1024 245760 Slices 768 32448 256 61440 CLBs 384 16224 64 15360
The logic element hierarchy is used extensively in the choice of the placement algorithm. For example, min-cut algorithms can be used very effectively for placement of hierarchical FPGAs [7]. By using clustering and changing the unit of placement, min-cut algorithms can yield very good placement. Additionally, even simulated annealing algorithms can employ hierarchical clustering techniques to make use of the inherent hierarchy of logic resources. While basic FPGA placement can be tackled with the standard set of ASIC placement algorithms, there are some FPGA specific constraints that require variations of these standard algorithms. For example, the presence of architectural constraints that restrict how I/Os can be configured along banks on a side of the Virtex-E FPGA presents a placement problem that is solved using modifications to a basic simulated annealing algorithm [6].
3.3 Routing
The FPGA routing problem can be defined as the problem of choosing specific FPGA routing resources to achieve the connections specified in the net-list while meeting the users timing constraints. The typical model for FPGA routing is illustrated in the following figure.
218
handle generalized graphs. For example, finding a Steiner point in ASICs implies a rectilinear Steiner point while on an FPGA it implies finding a Steiner point on a general graph. Routing for ASICs is performed using a two-phased approach. A global routing phase precedes the detailed routing phase. The global routing phase abstracts the details of the routing problem into regions and performs coarse routing on these regions. This is followed by the detailed routing phase that uses the results of the coarse routing and performs detailed routing within the regions. The key assumption that allows for this two-phased approach is that the regions (typically rectangular channels) are a good abstraction of the underlying routing problem. With FPGAs, this assumption that the coarse routing regions are a good abstraction of the underlying routing structure is not always valid. This leads to the possibility that the coarse routing determined by global routing may not be accurately refined into the underlying detailed routing. Consequently, the two-phased approach of global followed by detailed routing does not apply to all FPGA architectures. In FPGA routing, instead of a global routing phase, there is often a phase that is referred to as global resource assignment. Some routing resources in the FPGA fabric are designed for special kinds of nets. For example, FPGAs may have special routing structures that allow for low skew and delay when high fanout nets are routed on them. The global resource assignment phase attempts to recognize nets that ought to use global routing resources on the FPGA and assigns them optimally to these global resources. Typically, this phase is followed by a single detailed routing phase. However, some of the common approaches to detailed routing in ASICs are either not applicable at all (channel routing) or are not very suitable (maze routing). Since the concept of a channel is not explicit in the FPGA, channel routing algorithms are not employed in FPGA routing. While maze routing algorithms are applicable to FPGA routing, they can be inherently slow. Another issue with maze routing is that it does not consider the side effects on other connections i.e., it is net ordering dependent. Detailed routing algorithms in FPGAs are based on some modification to basic maze routing algorithms. Maze routing and wavefront expansion techniques are employed on the connectivity graph. Different heuristics such as future costs computations, and partial wavefront expansion etc., are employed to speed up the basic maze routing algorithm. However, one of the main drawbacks of using the maze routing algorithm is the fact that it is dependent on the order of routing of nets. For example, a net being routed first does not consider the effect of its routing on the routing of subsequent nets. This problem is exacerbated in FPGAs due to the relatively scarce amount of routing resources. One very popular FPGA routing algorithm that minimizes the negative effects of the net ordering problem is the Pathfinder [2] algorithm. It is very well suited to FPGAs since it adapts very well to the FPGA connectivity graph. In this algorithm, individual connections are routed to minimum cost on the FPGA connectivity graph; once a connection is routed, the routes are recorded and the connection is then ripped out. This procedure is repeated for the next connection and so on. In a single iteration of this algorithm, each connection is routed in this fashion as if it were the only connection to be routed. In effect, this is equivalent
Figure 4. FPGA Routing Model The FPGA is represented as a connectivity graph where the nodes of the graph are the routing segments while the edges are the programmable interconnect points (PIPs). In the example shown, there are 5 PIPs corresponding to 5 possible edges, and 5 nodes corresponding to the routing segments. This is the underlying connectivity graph. A net that connects L1, L2, and L3 can be routed by programming the PIPs C, D, and E. The associated routing graph for this route is shown in the figure as the dark edges on the underlying connectivity graph. Some of the programmable interconnect points in the FPGA are pass transistors while others are buffered switches. Since the positions of these buffered PIPs are pre-determined by the architecture, FPGA physical design algorithms cannot insert buffers where required, instead they have to judiciously use buffers where they are available in the FPGA routing fabric. This means that normal ASIC buffer insertion methods dont apply to FPGAs. The size of the routing connectivity graph can be extremely large. Figure 5. shows the total number of nodes and arcs vs. the number of LUTs in the Virtex-II series of FPGAs. Note that the largest Virtex-II FPGA, the XC2V10000 contains close to 60 million arcs and 6 million nodes in the routing connectivity graph. A design that utilizes a high percentage of an FPGA may use as many as 25% of these arcs and nodes. Such large graphs impose serious restrictions on how the connectivity graph can be manipulated by the routing algorithm.
Number of nodes and arcs in the routing connectivity graph for Virtex-II Millions 60 40 20 0 0 50000 100000 150000 Number of LUTs Nodes Arcs
Figure 5. Size of the Routing Connectivity Graph in Virtex-II This representation of the routing graph in FPGAs gives rise to an important distinction between ASIC and FPGA routing algorithms. In ASIC routing, a route is expressed in terms of the underlying rectilinear grid that is sometimes referred to as the Hanan grid. As we have seen, an FPGA does not have a rectilinear grid and the FPGA routing problem, therefore, becomes a problem of embedding the net-list onto this connectivity graph without using the same node twice. This implies that the standard rectilinear grid based routing algorithms must be modified to
219
to routing every connection in the absence of any existing routes or obstacles. After all the nets are routed once i.e. after a single iteration, the demand for every resource on the FPGA is computed. The demand for a resource is computed as the number of nets that used that resource to complete a route. A demand of 1 on a routing resource implies that only one net required the use of that resource to complete its route. In effect, there is no conflict for the use of the resource. However, a demand greater than 1 implies a routing conflict i.e., more than 1 net requires the use of the resource for a minimum cost route. In subsequent iterations, the cost of a resource that has high demand is raised and the entire process of routing connections individually is repeated. Raising the cost of resources that have heavy demand ensures that some of the nets that used the resource in previous iterations will complete their routes by using less expensive nodes. The iterations continue, with the costs of the resources having heavy demand getting progressively higher. Routing is complete when the demand for all resources is no greater than one. This algorithm addresses the slow speed of maze routing since it requires every connection to be routed in an obstacle free environment. Additionally, since every net is routed several times to account for heavily used resources, it avoids the net ordering problem inherent in maze routing. The presence of the connectivity graph with a finite number of nodes and edges has given rise to some new formulations of the FPGA routing problem. Recent approaches ([4],[5]) attempt to formulate the FPGA routing problem using Boolean Satisfiability. In this approach, variables in the SAT problem instance correspond to the allocation of specific resources to nets. Routing is completed when a set of values can be assigned to the variables causing the SAT problem to evaluate to TRUE. While this approach is not used in practice yet due to runtime and memory concerns, it does, however, present a unique perspective to the FPGA routing problem.
and route. The placed and routed design is then evaluated for feedback to another pass of synthesis. This feedback is used to modify the re-synthesis process to account for the routing delays.
3.5 Results
Table 2 illustrates a sampling of results of the Xilinx physical design software (PAR) for a range of Virtex-E devices run on SUN UltraSparc-II. The runtimes are measured when the software is able to achieve the user defined frequency requirement. For the smallest device, the place and route runtimes are less than a minute. On the other hand for the XCV2000E which has 38400 LUTs, the total runtime is under an hour. Table 2. Results of Xilinx PAR
Design XCV50E XCV1000E XCV2000E Slices 766 (99%) 8635 (70%) 14037 (90%) Nets 1615 17468 27639 Runtime Placement 00:00:45 00:09:59 00:22:05 Routing 00:00:50 00:08:02 00:22:27 Frequency (Mhz) Req. 6.67 68 47 Act. 8 72 48
It must be pointed out that while the XCV50E design (which is a glue logic design) has a small frequency requirement, it has a large number of logic levels (69). The runtimes, of course, increase as the frequency requirement becomes more stringent or as the design gets more congested.
4. Conclusions
FPGAs have evolved into very popular vehicles for designing systems. The increasing numbers of system-level features and user programmable gates have contributed to their popularity. There are some fundamental differences between the ASIC and FPGA design flow. Consequently, FPGA physical design software has evolved differently from ASIC physical design software though their roots are common. The FPGA design methodology and requirements have resulted in a different set of goals and objectives for FPGA physical design software. Runtime is of supreme importance due to the fast timeto-market value proposition of FPGAs. At the same time, in some situations fast FPGA circuit speeds may become the primary goal of FPGA physical design software at the expense of runtime. This has resulted in the use of algorithms that can be tuned for speed and performance. Due to the design of the FPGAs, deep submicron issues such as signal integrity and cross talk do not have to be considered by FPGA physical design software. However, another deep sub-micron issue relating to the dominance and unpredictability of routing delays has been an FPGA characteristic since the invention of FPGAs. FPGA vendors have been addressing this issue through innovative FPGA architectures that improve routing predictability. FPGA placement algorithms are very similar to classical ASIC placement algorithms. However, they use routing delay estimation strategies that are unique to FPGAs. On the other hand, the FPGA routing problem is very different from that of ASICs due to the underlying representation of the FPGA routing graph. Traditional maze routing algorithms are not very well suited to FPGA routing, but variations on the basic maze routing algorithm such as PathFinder have proven very popular.
220
5. REFERENCES
[1] V. Betz and J. Rose, VPR: A New Packing, Placement and Routing Tool for FPGA Research, Proc. 7th Intl. Workshop on FieldProgrammable Logic and Applications, 1997. L. E. McMurchie and C. Ebeling, PathFinder: A Negotiation-Based Path Driven Router for FPGAs, Proc. ACM/IEEE Intl. Symp. on FPGAs, 1995. S. K. Nag and R. A. Rutenbar, Performance-Driven Simultaneous Placement and Routing for FPGAs, IEEE Trans. on CAD, pp 499 518, June 1998. G. Nam, K. A. Sakallah, and R. A. Rutenbar, Satisfiability-Based Layout Revisited: Detailed Routing of Complex FPGAs Via SearchBased Boolean SAT, Intl. Symp. on FPGAs, 1999.
[5]
G. Nam, F. Aloul, K. A. Sakallah, and R. A. Rutenbar, A Comparative Study of Two Boolean Formulations of FPGA Detailed Routing Constraints, Proc. Intl. Symp. on Physical Design, 2001. J. Anderson, J. Saunders, S. Nag, C. Madabhushi, R. Jayaraman, "A Placement Algorithm for FPGA Designs with Multiple I/O Standards", Proc. 10th Int. Conf. on Field-Programmable Logic and Applications, August 2000. M. Hutton, K. Adibasmii, and A. Leaver, Timing-driven Placement for Hierarchical Programmable Logic Devices, Proc. Intl. Symp. on FPGAs, 2001. http://www.synplicity.com/products/amplify.html http://www.xilinx.com/partinfo/databook.htm
[6]
[2]
[3]
[7]
[4]
[8] [9]
221