Sei sulla pagina 1di 7

Digital Application-Specific IC Design

Dijkstras Algorithm
Narwal Ajit S., Graduate Student, NC State University

Abstract: This work involves design of an ASIC that


performs the function of determining the minimum distance between a pair of origin and destination pair within a given frame-work of paths/network. The eponymous method was invented by Dutch computer scientist Edsger Dijkstra in 1956. This method has several applications and finds particular usage in networking engineering where packet data routing is a critical aspect of the design and hence the speed and efficiency. The work involves both simulation works using ModelSim followed by synthesis of the design using the software platform Synopsys. The project requirements were comprehensively met and the entire network of inputs yields results in a time of 4 ms with a clock time period of 13.8 ns.

expandable conditional availability.

upon

wider

memory

Summary: Clock period: 13.8 ns Number of cycles taken: 2.89997105 Total time taken: 4.00 ms Area: 3913 um2 Efficient usage of space with an area of 3913 with a clock rate of 13.8 ns achieved. The report is divided into sections with details such as system micro-architecture, interfacespecifications, technical implementation, and verification, results, followed by conclusions summarizing learning through the project, and lastly suggested areas of improvement.

Introduction
The aim of the following work is to build a hardware accelerator for solving the shortest spanning tree between a pair of graph nodes in a given weighed network and produce results with a string set consisting of path length, all the visited nodes in the way including the source node. The algorithm is a well known method for performing similar tasks as mentioned and is better known as Dijkstras algorithm. The hardware code is written in Verilog with innovative design and coding steps to yield an efficient design. Some of them are: 1) Parallel processing of 4 nodes to ensure higher hardware engagement. 2) Simultaneous hardware acceleration for algorithm implementation with results being written at the same time ensuring expedited program execution. 3) Modular/Lego-like design works up to a network of 256 nodes with present SRAM usage, and further

Micro-Architecture Implementation

and

Technical

The algorithm is divided into 4 major modules. Each one is described briefly in this section. 1) Readinput: This module assumes the responsibility of retrieving source and destination pair data from the input file supplied by the user, these files are referred to as Input_large_HEX.dat (or Input_small_HEX) which contain information of source and destination pairs to be analyzed. Upon the flagging of the setup using start/reset readinput module fetches information from input file and waits for the next pair to be fed into the algorithm until the present working node pair is analyzed and the minimal path found. The writing part to the output SRAM (named as sram_1R1W) is executed while the subsequent pair is fetched and

the same procedure is repeated again until the last pair is input and results are returned. 2) large_graph: This module performs the function of fetching data from large/small graph file for the network. Graph file contains the information about the network including the nodal connections and the associated weights corresponding to each path. Module large_graph generates critical signals and data strings such as feeding input to the Dijkstras module, signifying the end of all lines being read (all lines corresponding to each workign node beign analyzed), synchronyzing data input from graph file to match the output and algorithm times and meanwhile incrementing addresses to fetch data every clock cycle for mathematical anaysis. 3) dj: At the heart of this implementation is the module dj. This module is responsible for generating the minimum distance between a pair of nodes and hogs the maximum processing capability. The basic premises is that when a source node arrives, its corresponding data is retrieved from graph and each daughter node is analyzed to determine the next working node. This is done through comparing the distance of each nodes present weight with the new weight that is being assigned to it during the present iteration. When a nodes new weight is lesser than its pre-existing weight then the weight is updated or else the the same weight continues for the next iterative cycle. It must be mentioned here that the working/volatile memory (mentioned as working-sram through this work) is used in this procedure to store temporary data about the nodes. This data encompasses associated accumulated nodal weight, their parent and weather or not the node has been visited or not (the concept of visited and unvisited is explained further in the next paragraph). The working-sram is also serves the purpose of assisting the output module, called output_dj, with backtracking the information for all the nodes that fall in the critical path which has to be traversed in order to reach the destination node. This data is written in the outputsram which is the final output of the exercise.

Once a node is selected amongst the connected nodes of a working node, a new next-node is determined based on the above procedure. Also, it should me mentioned that in cases such as two nodes carrying the same weight it is so chosen that thw node having the bigger parent weight amongst the new and pre-existing is selected to be elgible for being the next-node. This new node which is selected from above mentioned procedure is marked as visited and for the same source/destination pair, this node will never be visited again (or in other words this node shall never be the candidate to be a next-node. The above procedure is repeated for the next considered node until the point when the deestination node is selected as the nex-node. This signifies that the shortest path has been determined and the algorithm can now proceed to the next step of writing data to the output SRAM. 4) output_dj: Performs the function of writing data including final path weight and nodes falling in the path from source to destination followed by a customary FFFF to distinguish between two consecutive sets of data. This data is gets written in the file output_validation.dat which is verified for veracity against the standard output file. 5) Other modules instrumental in connecting the above modules include a) top module which has all the above modules combined, b) top_with_mem module which, as the name suggests, contains the top module along with the all the SRAM files, and c) test module, which is the test fixture for obtaining the simulation waveforms for analysis and the very critical debugging tool.
Name sram_2R (Graph) sram_1R (Input) sram_1R1W (Output) sram_1R1W (Working) Purpose Store graph to search Input buffer for source, destination pairs Output buffer for writing results Memory to store intermediate results Size 8K128 bits 10248 bits Ports 2 Read 1 Read

16K16 bits 8K128 bits

1 Read+1 Write 2 Read+1 Write

Design Structure
The structure and data flow is explained in this section.
128-bit data

Out of the extracted 128 bits only 214 bits are useful to us since these contain information about a nodes parent, weight and also if it is visited or unvisited. The data corresponding to a single node is thius derived in parallel from working memory and also from the graph (after the present working node weight is added to the weight from the graph for equivalent weight of a path to that particular node). These two are compared to see which path is profitable to take with reference to least distance, if that is equal then next their corresponding parents are compared which finally decides that what data has to be written to the working memory SRAM. Here, it is imperative to mention that first of these above to conditions the 1st bit of data for each node coming from working memory should be checked for being visited or unvisited, if unvisited then the comparison is done with the involved weights otherwise the weight is substituted by FF such that a visited node cannot be selected again. Talking about the two novel innovations, since the cumulative weight cannot exceed 12 bits hence at once 4 nodes can be worked upon by the Dijkstras algorithm here even though there are two read and one write port. This is because one read port is engaged for writing to output SRAM. Other innovation is that to save space no nodal information, either from graph nor from working memory, was saved at any point of time, this indeed leads to stretching of critical path but it saves on space. There was this trade-off which was considered while designing the architecture of this work. The third innovation is that the program is modular, i.e., think of Lego blocks, such that the present design is capable of handling a network of maximum 256 nodes, however this limit can be increased to even more by increasing the hardware width and minor tweaking of the code. The code nevertheless works well for both small and large graph cases provided in the problem statement.

Graph Data Large_graph (graph module)


8 bit data 13 bit address

Input File

8 bit data Readinput (Input module)

Working Memory
16 bit data

128-bit data

Module dj

Output_dj (Output module)


16 bit data Output SRAM

Figure 1. ASIC implementation

The birds eye-view of the module arrangement is shown in Fig. a) with readinput module extracting source and destination pair from the user-supplied input file, the data size is 8 bits. This data is passed on to large_graph module which further generates input for module dj, or the Dijkstras algorithm. The data exchange is such that in one clock cycle Module dj receives all 128 bits (i.e., one line of data from graph-data). In parallel, data information from working memory is extracted 128 bits in each clock-cycle.
1 bit 12 bits 8 bits

Visited/ Weight Parent Unvisited


21 bits

Node1

Node 2

Node 3

Node 4

Working memory data


Figure 2. Working memory data

Interface Specification
Signal Size (bits) From To Description

Signal DJ_input

Size (bits) 128

From large_grap h large_grap h

To dj

Description Single line data bus from SRAM_2R Present working node in Dijkstras algorithm Signifies the end of all lines beign read correspondin g to one node (followed by 5 clock cycle delay) Same as above (3 delays included) Same as above (1 delays included) Same as above (2 delays included) Used to extract data for first 4 nodes and last 4 nodes from 128 bit bus coming from SRAM_2R Stores the node number for next_node for Dijkstra Used for selecting node pairs (first/last) from graph data Write address for SRAM_2R1 W Data from for SRAM_2R1 W Read address to SRAM_2R1 W

clock reset start path_found 1 1 1 dj

All modules All modules All modules readinput

Clock signal Reset signal Start signal Signalling the end of one iteration of source, destination pair Read address to graph file

Working_no de

dj

ALR_5_del

large_grap h

dj

read_address

10

Readinput

Output_dj

ALR_3_del

large_grap h

dj

read_bus a

8 1 readinput

readinput Large_gr aph, dj

readinput

Large_gr aph, dj

readinput

Large_gr aph, dj

readinput

Large_gr aph, dj

source

readinput

destination

readinput

Large_gr aph, dj, output_dj dj, output_dj SRAM_2 R large_gra ph Large_gr aph

read_address _1 read_bus_1

13 128

large_grap h SRAM_2R

next_node

dj

Read bus from graph Signifies the start of new iteration, 1 clock cycle delay from path_found Signifies the start of new iteration, 2 clock cycle delay from path_found Signifies the start of new iteration, 3 clock cycle delay from path_found Signifies the start of new iteration, 4 clock cycle delay from path_found Source node number from graph Destination node number from graph Read address for graph file Data bus from SRAM_2R Selected node after one go of Dijkstras algorithm

ALR_1_del

large_grap h

dj

ALR_2_del

large_grap h

dj

select_group

dj

large_gra ph

nn_wire_nu m

dj

large_gra ph

cl2

dj

large_gra ph

wrt_address _wm

13

dj

SRAM_2 R1W

rdbus_frm_ wrkmem address_wrk mem

128

dj

SRAM_2 R1W SRAM_2 R1W

13

dj

Signal WE_wm

Size (bits) 1

From dj

To SRAM_2 R1W

Description Write enable for SRAM_2R1 W Write bus to SRAM_2R1 W Signifies that the destination node has been reached using the algorithm Signifies the end of writing to SRAM_1R1 W, and Dijkstras can start again Carries the weight of the destination node to be written to SRAM_1R1 W Singnifies the end of the data being written to SRAM_1R1 W while Dijkstras is running parallely Read bus 2 from SRAM_2R1 W Readbus from Output RAM Write enable to Output RAM Write bus to Output RAM Read address for Output RAM Write bus to Output RAM Read address for SRAM_2R1 W

Interfacing Timing Diagrams


All the following timing diagrams begin at 50ns.

Wrtbus_to_ wrkmem DR

128

dj

SRAM_2 R1W Output_dj

dj

comp1

output_dj

dj

Figure 3. Interfacing for readinput module

destn_weigh t

12

dj

output_dj

comp2

output_dj

dj

ReadBus2

128

SRAM_2R 1W

output_dj

ReadBusos

16

SRAM_1R 1W SRAM_1R 1W output_dj output_dj

output_dj

Figure 4. Interfacing for dj module

WE_op

output_dj

WriteBusos ReadAddres sos WriteAddres sos ReadAddres s2

16 14

SRAM_1 R1W SRAM_1 R1W SRAM_1 R1W SRAM_2 R1W

14 13

output_dj output_dj

Results
The design works at a clock frequency of 72.4 MHz, with an area of 3913.65 um2 without any setup or hold violations. The number of cycles taken for large-graph were 2.89997105 which results in the total time of 4 ms. This gives Total cycles Cycle time Area = 1.5661010 ns um2. Performance/area= 6.38477 10-11 ns-1um-2

Conclusion
Figure 5. Interfacing for output_dj module

This work yielded some useful insights into the working of Dijkstras algorithm and into the design of a digital ASIC. Schemes for optimization were explored and implemented. Several innovative methods were employed to expedite the process (mentioned in the introduction) and to utilize minimum area possible which resulted in an overall good performance of the chip. There are some areas of improvement for example further pipelining so that the critical path is reduced and also the 2nd read port of the SRAM_2R could be used in parallel to yield faster performance for the design.

Figure 6. Interfacing for large_graph module

Verification
Post coding and testing each individual module for functional correctness, the modules were connected with each other to yield the overall code, which was again tested and the results were verified by comparing them with the supplied output HEX file. A perfect match between the SRAM_1R1W data, for both small graph and large graph, with the HEX file data confirmed the correctness of the design and hence the verification was established.

Appendix
This section contains details about a few ignorable warnings that were faced when the design was synthesized. 1) In design 'top', net 'dj/add_309/CO' driven by pin
'dj/add_309/U1_11/COUT' has no loads. (LINT-2) This warning implies that the concerned node has been not connected with any load, since the design is not using carry bits so this warning is ignorable. 2) In design 'output_dj', port 'ReadBus2[127]' is not connected to any nets. (LINT-28) This warning occurs because of the unutilized read-bus lines that are not used in the design. It is therefore ignored. Also it should be his could be done away by using a dummy wire between this read bus and the dj module. 3) Cell output_dj/U318 conflicts with the NOR2_X2 in the target library. (OPT-106) This is again an ignorable warning. 4) Disabling timing arc between pins 'A1' and 'ZN' on cell 'output_dj/U15' to break a timing loop. (OPT-314) This is because of a feedback loop, however the values at the positive edge of clock are the appropriate signals and yield good results, which renders them being harmless while resulting in the warning. This is, hence, ignored.

References:
[1] Course notes, Franzon, P, Digital ASIC design, Jan. 2013, NCSU.

Potrebbero piacerti anche