Sei sulla pagina 1di 17

US008533139B2

(12) United States Patent (10) Patent N0.: US 8,533,139 B2


Jamriska et a]. (45) Date of Patent: Sep. 10, 2013

(54) OPTIMIZING COMPUTATION OF MINIMUM OTHER PUBLICATIONS


CUT IN GRAPHS WITH GRID TOPOLOGY Liu et al. Parallel graph-cuts by adaptive bottom-up merging, IEEE
' v _ N ' _ CVPR, 2010, pp. 2181-2188.*
(75) Inventors ondre] Jamnska, Praha (CZ), Danlel Delong et al. (A Scalable Graph-Cut Algorithm for N-D Grids,
Syk0ra, Praha (CZ) IEEE CVRP, 2008, pp. 8*
Bader et al. A Cache-Aware Parallel Implementation of the Push
- . - - - - Relabel Network Flow Algorithm and Experimental Evaluation of
(73) Asslgnee' glee? Teiliznllcal ,Unlligrslfy 1 rague the Gap Relabeling Heuristic, PDCS 2005, pp. 8*
ac ty 0 ectnca ngmeenng Alahari et al., Reduce, Reuse & Recycle: Ef?ciently Solving Multi
Prague (CZ) Label MRFs, Proc. IEEE Conf. on Computer Vision and Pattern
Recognition, Jun. 23, 2008. http://ieeexplore.ieee.org/xpls/absiall.
( * ) Notice: Subject- to any disclaimer,- the term of this JSP/ZlaHTJIumbTT/If
us
?ftozgetal/gjiiicvpr
peop e p o 1 papers
http/reziareh'mleroso?'eonven'
.p .
Patent 15 extended or adJusted under 35 Arora et al., An Ef?cient Graph Cut Algorithm for Computer Vision
U'S'C' 154(1)) by 220 days Problems, Proc. llth European Conf. on Computer Vision, Part III,
Sep. 5, 2010, pp. 552-565.
(21) Appl. N0.: 13/226,109 (Continued)
(22) Filed: seP- 6: 2011 Primary Examiner * Li-Wu Chang
(74) Attorney, Agent, or Firm * Buchanan Ingersoll &
(65) Prior Publication Data Rooney PC
US 2013/0060724 A1 Mar. 7, 2013 (57) ABSTRACT
Approaches for optimizing computation of minimum cut or
(51) Int. Cl. . H h . . 1 1. f d d
G06F 17/00 (2006 01) maximum oyv ongrap scompnsmg'ap ura1tyo no es an
' edges W1th grld-like topologies are dlsclosed. Embodlments
G06N 5/00 (200601) exploit the regular structure of input graphs to reduce the
(52) U-s- Cl- memory bandwidthia main bottleneck of popular max
USPC .......................................................... .. 706/45 ?QW/inin-cut algorithms based on ?nding augmenting paths
(58) Field of Classi?cation Search on a residual graph (such as Ford-Fulkerson [1956] or
None Boykov-Kolmogorov [2004]). Disclosed embodiments alloW
See application ?le for Complete Search history more than 200% speed-up Without sacri?cing optimality of
the ?nal solution, Which is crucial for many computer Vision
_ and graphics applications. Method and system embodiments
(56) References Clted replace standard linked list representation of general graphs
With a set of compact data structures With blocked memory
U-S~ PATENT DOCUMENTS layout that enables ?xed ordering of edges and implicit
6,744,923 B1 6/2004 Zabih et al. branchless addressing of nodes. The embodiments are
6,973,212 B2 12/2005 BOYkOV et a1~ orthogonal to other optimizations such as parallel processing
3,2328 goylliov e: ai' or hierarchical methods and can be readily plugged into exist
7844l 13 B2 11/2010 Dg??g; :1 ' ing min-cut/max-?oW computation systems to further
2002/004401 A1 4/2002 Boykov et al. improve their Performance
2004/0008886 Al l/2004 Boykov
2009/0252416 A1* 10/2009 Komodakis et a1. ........ .. 382/180 20 Claims, 8 Drawing Sheets

1 10L grid-optimized minimum out solver 120


140 179 speedup module

166 ~input gnd-lika


- ~ graph '-L-j
. ,. . . grid coordinates nude index
-| module node ind swam il-1_>_ 122
130- a l \ J 2 I array based

algoriihm execution Index mph 5" 1" -a.124


F160 module node "9M5 " mums
132
client application 1_ tree growing node index
module ' neighbor node ~.L125
access module
domain-speci?c neighbor node index
graph generator 1341'
I path ' edge index
I reverse edge _=_128
N module reverse edge Index
. access module

164 136-? edge saturation status .


\ orphan adopting
edge saiuiatlnn _
module _ tracking module L129
edge saturatlon up date

output minimum out minimum cut output


module
l
168 0,
15B
US 8,533,139 B2
Page 2

(56) References Cited Computing, Dec. 18, 2007, pp. 197-208. http://dl.acm.org/citation.
cfm?id:1782200; http://citeseer.ist.psu.edu/vieWdoc/sum
OTHER PUBLICATIONS mary?doi:10.1.1.102.4206.
Juan et al., Active graph cuts, IEEE Conf. on Computer Vision and
Boykov et al., Graph cuts and ef?cient N-D Image Segmentation, Pattern Recognition (CVPR), Jun. 2006, vol. I, pp. 1023-1029.
International Journal of Computer Vision 70(2), Nov. 2006, pp. 109 Juan et al., Capacity Scaling for Graph Cuts in Vision, IEEE Conf.
131 . on Computer Vision, Oct. 14-21, 2007, pp. 1-8.
Boykov et al., Interactive Graph Cuts for Optimal Boundary & Kohli et al., Dynamic Graph Cuts for Ef?cient Inference in Markov
Region Segmentation of Objects in N-D Images, Proc. Eighth Inter Random Fields, IEEE Transactions on Pattern Analysis and
national Conf. on Computer Vision, Vancouver, Canada, Jul. 7, 2001, Machine Intelligence, vol. 29, No. 12, Dec. 2007, pp. 2079-2088.
vol. 1, pp. 105-112. Lempitsky et al., LogcutiEf?cient Graph Cut Optimization for
Boykov et al., An Experimental Comparison of Min-Cut/Max-Flow Markov Random Fields, Proc. of International Conf. on Computer
Algorithms for Energy Minimization in Vision, IEEE Transactions Vision, Oct. 14-20, 2007.
on PAMI, Sep. 2004, vol. 26, No. 9, pp. 1124-1137. Liu et al., Paint Selection, ACM Transactions on Graphics, vol. 28,
Chandran et al., A Computational Study of the Pseudo?ow and No. 3, Aug. 2009, Article 69:1-7.
Push-Relabel Algorithms for the Maximum Flow Problem, Opera Liu et al., Parallel Graph-Cuts by Adaptive Bottom-up Merging,
tions Research, Mar. 2009, vol. 57 No. 2, pp. 358-376. http://or. Proc. of IEEE Conf. on Computer Vision and Pattern Recognition,
journal.informs.org/content/57/2/358; http://riot.ieor.berkeley.edu/~ Jun. 13, 2010, pp. 2181-2188. http://ieeexplore.ieeeorg/xpls/absi
dorit/pub/CH-ORpdf. all.j sp?arnumber:5539898; http://research.microsoft.com/enus/um/
Delong et al., A Scalable Graph-Cut Algorithm for N-D Grids, people/jiansun/papers/CVPRIOiParallelGCpdf.
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Schmidt et al., Ef?cient Planar Graph Cuts With Applications in
Jun. 23, 2008, pp. 1-8. Computer Vision, Proc. of IEEE Conf. on Computer Vision and
Dixit et al., GPU-Cuts: Combinatorial Optimisation, Graphic Pro Pattern Recognition, Jun. 20, 2009, pp. 351-356. http://ieeexplore.
cessing Units and Adaptive Object Extraction, CERTIS, ENPC, ieee.org/xpls/absiall.jsp?arnumber:5206863; http://cvpr.in.tum.
France, Mar. 2005, pp. 1-16. http://WWW.enpc.fr/certis/. de/old/pub/pub/ schmidtietialicvpr09 .pdf.
Fishbain et al., Competitive Analysis of Minimum-Cut Maximum Strandmark et al., Parallel and Distributed Graph Cuts by Dual
Flow Algorithms in Vision Problems, Jul. 26, 2010, pp. 2-16. http:// Decomposition, Proc. of IEEE Conf. on Computer Vision and Pat
arxiv.org/abs/1007.4531v2. tern Recognition, Jun. 13, 2010, pp. 2085-2092. http://ieeexplore.
Ford et al., Maximal Flow Through a Network, Canadian Journal of ieee.org/xpls/absiall.jsp?arnumber:5539886&tag:1; WWW.robots.
Mathematics 8, 1956, pp. 399-404. ox.ac .uld~vgg/rg/slides/parallelgraphcuts.pdf.
Goldberg et al., Maximum Flows by Incremental Breadth-First Vineet et al., Cudacuts: Fast Graph Cuts on the GPU, Proc. of
Search, Proc. of ESA 2011 (LNCS), vol. 6942/2011, pp. 457-468. Computer Vision and Pattern Recognition Workshops, Jun. 23, 2008,
http://WWW.springerlink.com/content/q50p581644h05338/; http:// pp. 1-8. http://ieeexplore.ieee.org/xpls/absialljsp? arnumber:
research.microsoft.com/pubs/150437/ibfs-proc.pdf. 4563095; http://cvit.iiit.ac.in/papers/rtGCutsi2008.pdf.
Harish et al., Accelerating Large Graph Algorithms on the GPU
Using CUDA, Proc. of International Conf. on High Performance * cited by examiner
US. Patent Sep. 10, 2013 Sheet 2 of8 US 8,533,139 B2

3.50m uso
FUR
3

WW
I)?r:l
In.51. .
wow
..r

.8E2:3mOs1E80o2%4m:.

ouhmow
US. Patent Sep. 10, 2013 Sheet 3 of8 US 8,533,139 B2

mom

Bmu:Ec?wi>Eonamstw Em

Now
US. Patent Sep. 10, 2013 Sheet 4 of8 US 8,533,139 B2

m22ucEw=o5$m>u02Uswo<
US. Patent Sep. 10, 2013 Sheet 5 of8 US 8,533,139 B2

wwm
.OEm

*5ESg9-0uto:s

wmm
US. Patent Sep. 10, 2013 Sheet 7 of8 US 8,533,139 B2

receive input grid-like graph 1


725
l
determine size of a block
and size of the padded grid 2 727
l
allocate memory pool for arrays
and auxiliary data structures /I/

l 731
iterate over each node of the input graph /2/

compute node's array index 733

initialize residual capacities ,4?


of node's outgoing edges 6/ 735
l
augment source> node> sink path 4r; 737
l
activate node if it remains 7
connected to a terminal "il/ 739

initialize node's tree membership CL, 741

execute Boykov-Kolmogorov algorithm


743

> grow the search trees 1 r 745


749

augmenting path V output minimum cut


found?
747

75% augment the path deallocate memory pool, and

adopt orphaned nodes


755

FIG. 7
US. Patent Sep. 10, 2013 Sheet 8 0f 8 US 8,533,139 B2

806 800

804
4 * Processor /I/ 830

?/BOZ
A Display interface """""" " Display

808
<> Main Memory
Communications
Infrastructure 810
Secondary Memory /l/
812
Hard Disk ?/
Drive 314 ?/
?/ Removable
Removable Storage
Storage * '''' " '* Unit
rive 820

7 Interface 4r ______ __ _, Removable


Storage
Unit

4 > Network __________ __


$822
Interface ___________ _.
26
828
Communications
824 Path

FIG. 8
US 8,533,l39 B2
1 2
OPTIMIZING COMPUTATION OF MINIMUM cial, especially for interactive applications that strive to mini
CUT IN GRAPHS WITH GRID TOPOLOGY miZe a users idle time While still providing accurate results.
The methods, systems, and computer readable media dis
FIELD OF THE INVENTION closed herein are orthogonal to existing optimizations, such
as parallel processing and multi-resolution methods. Thus,
The present invention is generally related to ?nding mini embodiments of the present invention can be easily incorpo
mum cuts in graphs, and more speci?cally to systems and rated into existing systems to further improve their perfor
methods for improving the performance of minimum cut mance.
computation in image processing and computer vision appli Embodiments of the invention comprise methods, systems,
cations by reducing the memory bandwidth bottleneck and 10 and computer readable media that may improve the speed and
avoiding latencies due to branching. In particular, embodi ef?ciency of minimum cut computation by employing: com
ments of the invention improve the caching behavior When pact and static data structures; a cache-aWare memory layout;
computing minimal cuts in graphs With grid-like topologies and implicit branchless addressing. By employing these ele
(i.e., topologies close to a regular lattice) by employing com ments, the methods, systems, and computer readable media
pact data structures, cache-aWare memory layout and branch disclosed herein speed-up the computation of minimum cut in
less implicit addressing of adjacent nodes. graphs With topologies close to a regular grid.
BACKGROUND OF THE INVENTION BRIEF DESCRIPTION OF THE
DRAWING/FIGURES
Many computer vision and graphics applications rely on 20
?nding minimal cuts in graphs, With many of these graphs The accompanying draWings, Which are incorporated
having grid-like topologies. Examples of such computer herein and form part of the speci?cation, illustrate the present
vision and graphics problems include interactive tWo dimen invention and, together With the description, further serve to
sional (2D)/three dimensional (3D) image and video segmen explain the principles of the invention and to enable a person
tation, image restoration, painting, image fusion and re-tar 25 skilled in the relevant art(s) to make and use the invention.
geting, texture synthesis, shape ?tting, and 3D surface FIG. 1 illustrates an environment for optimiZed computa
reconstruction. tion of minimum cut, in accordance With an exemplary
One traditional approach of ?nding the minimum cut in a embodiment of the present invention.
graph is the maximum ?oW/minimum cut algorithm by FIG. 2 provides an example of a directed capacitated graph
Boykov and Kolmogorov. The Boykov-Kolmogorov algo 30 Wherein edge capacities are re?ected by their thickness
rithm (BK algorithm) is in turn based on the Ford-Fulkerson according to the prior art.
algorithm, Which repeats the process of ?nding and augment FIG. 3 depicts data packing and subdivision into separate
ing paths With non-Zero residual capacities until no more arrays on a 4-connected grid Wherein each node is connected
paths remain. An added value of the BK algorithm as com to its left, right, top and bottom neighbor, in accordance With
pared to the Ford-Fulkerson algorithm is the usage of tWo 35 an exemplary embodiment of the present invention.
concurrent search trees together With a tree-reuse strategy to FIG. 4 illustrates addressing outgoing edges and avoiding
avoid loss of information gained during previous augmenta pointers to reverse edges by using a lookup table, in accor
tions. dance With an exemplary embodiment of the present inven
HoWever, existing implementations of the BK algorithm tion.
pose signi?cant challenges for application developers and 40 FIG. 5 depicts a cache-aWare memory layout of arrays to
interactive systems. For example, existing implementations improve caching behavior, in accordance With an exemplary
of the BK algorithm are geared toWard general graphs. This embodiment of the present invention.
results in poor performance on grid-like graphs, since the FIG. 6 illustrates six least signi?cant bits of nodes indices
memory bandWidth required When accessing the data struc inside a block of 8x8 nodes, in accordance With an exemplary
tures necessary to represent general graphs is often the main embodiment of the present invention.
bottleneck of the minimum cut computation. FIG. 7 is a ?owchart representing a method for optimiZing
Accordingly, What is needed are systems, methods, and computation of minimum cut in grid-like graphs, according to
computer program products that reduce the time needed to an exemplary embodiment of the invention.
obtain a minimum cut in a grid-like graph by utiliZing graph s FIG. 8 depicts an exemplary computer system in Which the
regular structure to optimiZe the computation of the cut. 50 present invention may be implemented.
BRIEF SUMMARY DETAILED DESCRIPTION

The present disclosure is directed to e?icient computation Of particular concern to the present methods, systems, and
of minimum cuts in graphs With topologies close to that of a 55 computer readable media is the reduction of processing time
grid. Exemplary methods, systems, and computer readable required to obtain a minimum cut in a grid-like graph.
media are disclosed for speeding up the minimum cut com According to an embodiment, minimum cut computation is
putation by utiliZing the regular structure of grid-like graphs optimiZed through implementation of an e?icient variant of
to reduce the memory bandWidth bottleneck and avoid laten BK algorithm. In an embodiment, the BK algorithm is opti
cies due to branching, employing compact data structures, 60 miZed for graphs With grid-like topologies.
cache-aWare memory layout, and branchless implicit In this Way, embodiments of the present methods, systems,
addressing of adjacent nodes. Exemplary methods presented and computer readable media address a main bottleneck of
herein result in performance gains of more than double the the BK algorithm, Which is the large amount of memory
speed of existing methods for graphs With dense terminal bandWidth needed When processing general graphs.
connections and up to triple the speed of existing methods for 65 By exploiting the regular structure of grid-like graphs,
graphs With sparse terminal connections, Without sacri?cing embodiments presented herein considerably improve
optimality of the resulting cut. Such improvements are cru memory-caching behavior by employing compact data struc
US 8,533,139 B2
3 4
tures and cache-aWare blocked memory layout With implicit ship ?ag into a single ?eld 307, nodes parent 310, index 308
branchless addressing. The modi?cations presented herein of edge connecting the node to its parent, and timestamp 314.
result in more than a 200% reduction in processing time for According to certain embodiments of the present methods,
graphs With dense terminal connections and a 300% reduc systems, and computer readable media, costly dynamic
tion in processing time for graphs With sparse terminal con memory allocations are avoided by Working inside a pre
nections, Without sacri?cing optimality of an outputted ?nal allocated memory pool of a conservative siZe. Each node,
cut. As Will be appreciated by persons skilled in the relevant such as node 416 in FIG. 4, has a unique index {0, . . . , N-l}
art(s), such improvement is crucial especially for interactive and can be grouped With its four outgoing edges having ?xed
applications Where the aim is to minimize a users idle time ordering. They are addressed by index {0, 1, 2, 3} as illus
Waiting for a ?nal cut, and yet still provide accurate results. trated in FIG. 4 (see, e.g., indices 418, 420, 422, and 424 for
The methods, systems, and computer readable media pre adjacent nodes 416). Instead of storing pointers to adjacent
sented herein utiliZe the folloWing elements: compact, static nodes, certain embodiments of the present methods, systems,
data structures; a cache-aWare memory layout; and implicit and computer readable media compute the indices of nodes
branchless addressing. Each of these elements is described in neighbors on the ?y, based on the nodes index, as described
detail in the folloWing sections. in more detail beloW in the discussion of implicit branchless
Compact and Static Data Structures addressing. Another potential advantage of certain embodi
Embodiments of the present methods, systems, and com ments is that unlike traditional methods, they avoid storing
puter readable media represent the distribution of ?oW using pointers to reverse edges. The reverse of a nodes outgoing
a residual graph. In accordance With an embodiment, for a edge is accessed as the neighbors outgoing edge in the oppo
residual graph, each edge has a residual capacity rc, Which is 20 site direction. The index of an edge in opposite direction is
the amount of ?oW that can be pushed along the edge With out determined using a small lookup table REV:[2; 3; 0; 1] (see,
exceeding its capacity. A residual graph is typically repre e.g., FIG. 4). Thus, in an embodiment, for each edge, only the
sented With adjacency list. In this representation, each node edges residual capacity is stored (i.e., in main memory 808,
has a linked list of edges to adjacent nodes. By exploiting the secondary memory 810, or removable storage units 818; 822
regularity of a grid structure, embodiments of the present 25 depicted in FIG. 8). This simpli?cation can be used even in
methods, systems, and computer readable media can repre cases When selected nodes or edges are missing. Embodi
sent the residual graph much more e?iciently than traditional ments can still represent the graph as perfectly 4-connected
techniques. by assigning a residual capacity of Zero to missing edges or to
FIG. 2 depicts a directed capacitated graph 202 consisting edges adjacent to missing nodes.
of a set of nodes and a set of directed edges connecting the 30 Next, according to an embodiment, the actual values of
nodes. In graphs 202 and 204 the terminal nodes are labeled residual capacities are used only during the augmentation
as the source, s, and the sink, t. The graph 202 has the topol phase. In other phases, the only important information is
ogy of a 2-dimensional 4-connected grid (terminals and their Whether a given edge has Zero or non-Zero residual capacity,
adjacent nodes are not considered). In FIG. 2, the edge i.e., Whether it is saturated or not. Reading several bytes Worth
capacities are re?ected by their relative thickness. 35 of single bit of information is Wasteful and ine?icient.
Existing implementations of the BK algorithm are geared Instead, in an exemplary embodiment, an additional satura
toWard general graphs. This results in a poor performance for tion ?ag, sat, is stored for each edge. This binary ?ag indicates
graphs With grid-like topologies, such as graph 202 depicted that the edge has Zero residual capacity.
in FIG. 2, because the memory bandWidth required When Certain exemplary embodiments of the present methods,
accessing data structures necessary to represent general 40 systems, and computer readable media include a groWth and
graphs is often a bottleneck When computing the minimum adoption phase, Wherein during the groWth and adoption
cut. phases the saturation ?ags are read instead of full residual
Computation of minimum cut is important in many appli capacities. During augmentation phase, the saturation ?ag of
cations that employ discrete energy minimiZation to solve an edge is updated Whenever the edge becomes either satu
labeling problems. As illustrated in FIG. 2, the edges in 45 rated or unsaturated. The additional cost of updating these
graphs 202 and 204 are assigned some capacity. A capacity of ?ags is amortiZed by fetching less data from memory in the
a directed edge (p, q) may differ from the capacity of the groWth and adoption phase.
reverse edge (q, p). An s/t cut C on a graph With tWo terminals As the TREE ?ag and saturation ?ags {satO, . . . sat3} are
is a partitioning of the nodes in the graph into tWo disjoint often accessed at the same time, embodiments pack them
subsets S and T such that the source s is in S and the sinkt is 50 together into a compact single byte structure TREE-SAT. In
in T For simplicity, s/t cuts are referred to herein as cuts. an embodiment, the ?rst tWo bits are used to represent the
Graph 204 depicts one example of a cut in a graph. Any s/t cut three possible values of the TREE ?ag and the next four bits
partitions the nodes of graph 202 into disjoint groups each are occupied by the saturation ?ags {satO, . . . , sat3}. The last
containing exactly one terminal. Therefore, any cut corre tWo bits are unused. They are utiliZed in 6-connected 3D grid
sponds to some assignment of nodes to labels (terminals). If 55 graphs. For graphs With higher connectivity, the TREE-SAT
edge capacities are appropriately set based on parameters of structure expands to tWo or more bytes. In an embodiment,
an energy, a minimum cut Will correspond to a labeling With pointers to arrays {rcO, . . . , rc3} are aggregated in the four
the minimum value of this energy. element indirection table RC 302 provided in FIG. 3. The RC
FIG. 3 illustrates data packing and subdivision into sepa 302 table is used for indirect addressing of residual capacities
rate arrays on a 4-connected grid Wherein each node is con 60 using the edge index {0, 1, 2, 3}. The residual capacity of each
nected to its left, right, top and bottom neighbor (see, e.g., edge is initialiZed to the edges capacity. For nodes that are
302, 304, 306, 308, 310, and 314). As shoWn in FIG. 3, connected to both source and sink, an initial step is to try to
separate arrays can be allocated and used to store data for push a saturating ?oW along the source-node-sink augment
individual ?elds of all nodes, including residual capacities of ing path. After this step, at most one of the tWo edges remains
nodes four outgoing edges 312, residual capacity 304 of edge 65 non-saturated. Residual capacity of the non-saturated edge is
connecting node to terminal, saturation ?ags of nodes out then stored as rcst. In an embodiment, after initialization, the
going edges 306 packed together With nodes tree member original capacities of edges are completely discarded.
US 8,533,139 B2
5 6
Cache-AWare Memory Layout alWays 000 at the left boundary and higher three bits are
According to an embodiment, ?elds of a node are grouped alWays 111 at the bottom boundary (see, e.g., indices 630).
together and they are accessed by the nodes unique index
(see FIG. 3). The individual ?elds can be stored separately System Embodiment
using the Structure of Arrays (SoA) layout. For all nodes, the
values of a single ?eld are stored as a separate continuous FIG. 1 illustrates an example system 100 for optimiZing
array in memory (see, e.g., 305, 307, 309, 310, 312, and 314 computation of minimum cut in graphs according to an
in FIG. 3). With this layout, the data are naturally split into a embodiment of the invention. System 100 includes a grid
hot part and a coldpart. For example, When the augmenting optimiZed minimum cut solver 110 and client application
path is traversed to determine its minimal residual capacity, 160. In an embodiment client application 160 can be con?g
only the PARENT index 305, PRED index 309, and the ured to run on one or more client devices (not shoWn), that are
residual capacities need to be accessed. These indices and the coupled to the grid-optimized minimum cut solver 110 via a
residual capacities comprise the hot data. Other ?elds are not netWork (not shoWn). As Will be appreciated by persons
accessed, they comprise the cold data. Since the cold ?elds skilled in the relevant art(s), the netWork coupling the grid
are stored at different places in memory, they do not pollute optimiZed minimum cut solver 110 to one or more client
the caches. devices hosting client application 160 may be, but is not
The access pattern during tree groWth and path augmenta limited to, a Wireless or Wired public or private netWork, a
tion is irregular, but exhibits certain amount of spatial coher local area netWork (LAN), a Wide area netWork (WAN), or the
ence. As shoWn in FIG. 5, an embodiment of the present Internet.
methods, systems, and computer readable media exploits this 20 According to embodiments, system 100 depicted in FIG. 1
to improve caching behavior. As shoWn in FIG. 5, embodi utiliZes the folloWing elements: compact, static data struc
ments store each array in a blocked memory layout (500). The tures; a cache-aWare memory layout; and implicit branchless
grid 500 is divided into blocks of 8x8 nodes (see nodes 526). addressing. Each of these elements is described in detail in
Fields of nodes that are inside the same block are stored at sections folloWing the description of FIG. 1 beloW.
consecutive memory locations in a scan line order (see 528). 25 Grid-optimized minimum cut solver 110 includes an ini
Individual blocks are also arranged in a scan line order. tialiZation module 140, a Boykov-Kolmogorov (BK) algo
With this layout, a TREE-SAT ?eld for the Whole 8x8 rithm execution module 130, a speedup module 120, and a
block of nodes can ?t into single 64-byte cache line. In accor minimum cut output module 150. It is to be appreciated that
dance With an embodiment, the PRED ?eld also ?ts in a single the modules depicted in FIG. 1 may be implemented in hard
cache line. According to an embodiment, blocks of 2-byte and 30 Ware, softWare, ?rmWare or any combination thereof. Client
4-byte ?elds are spread over 2 and 4 cache lines. application 160 includes a domain-speci?c graph generator
This blocked layout can greatly improve the caching 164.
behavior. For example, When a TREE-SAT ?eld of some node According to an embodiment, the computation of mini
is accessed for the ?rst time, a cache miss Will occur and the mum cut, such as output minimum cut 168 shoWn in FIG. 1,
?eld is transferred to the cache along With ?elds of all nodes 35 is optimiZed through implementation of an e?icient variant of
lying in the same 8x8 block. If some neighboring node is BK algorithm. In an embodiment, the BK algorithm is opti
accessed next, it is likely it Will lie in the same block as the miZed for graphs With grid-like topologies, such as input
previous one. In this case, the neighbors TREE-SAT ?eld is grid-like graph 166 shoWn in FIG. 1. As shoWn in FIG. 1,
already in cache, Which leads to a cache hit. input grid-like graph 166 can be received from a client appli
The individual arrays are addressed by nodes index u. In 40 cation 160 comprising a domain-speci?c graph generator
blocked layout 500, the index of a node With grid coordinates 164.
x and y is computed as In the example embodiment depicted in FIG. 1, minimum
cut output module 150 is hosted by grid-optimized minimum
cut solver 110. In an alternative embodiment, minimum cut
Where W is a Width of the padded grid. This can be evaluated 45 output module 150 may be separate from and external to
e?iciently using logical shifts and bitWise conjunctions: grid-optimized minimum cut solver 110.
According to the example embodiment depicted in FIG. 1,
client application 160 may execute on a computing device
The grid 500 is padded With dummy nodes in each dimen remote from grid-optimized minimum cut solver 110. Such
sion, such that its extents are divisible by 8. Each array is 50 computing device may be for example, implemented as com
aligned on a 64-byte boundary. puter system 800 depicted in FIG. 8. The computing device
Implicit Branchless Addressing can be, but is not limited to a computer Workstation, mobile
To avoid stalls due to unpredicted branches, an embodi computing apparatus, or server that is remote from grid
ment of the present methods, systems, and computer readable optimiZed minimum cut solver 110. Alternatively, client
media replaces branching With conditional moves and small 55 application 160 may reside locally on the same computing
lookup tables. In an embodiment, the index of a left, right, top device With the grid-optimized minimum cut solver 110.
and bottom neighbor of a node With index u is computed as: In the example illustrated in FIG. 1, the optimiZed BK
left(u) :11 & 0001111, ? u l : u 57 algorithm is executed by the BK algorithm execution module
right(u) :( ~u) & 0001111, ? u +1 : u +57 130. In the embodiment depicted in FIG. 1, the BK algorithm
top(u) :11 & 1110001, ? u 8 : u Yofs 60 execution module 130 includes tree groWing module 132,
bottom(u) :( ~u) & 1110001, ? u +8 : u +Yofs path augmenting module 134, and orphan adopting module
Where YOfSI8. (W 8 +1). 136.
The binary constants are used to detect Whether the node Speedup module 120 includes node index generator 122,
With index u lies at the block s boundary. As illustrated in the array based graph and tree representation module 124, neigh
exemplary embodiment of FIG. 6, the six least signi?cant bits 65 bor node access module 126, reverse edge access module 128
of the nodes index share speci?c binary patterns at the and edge saturation tracking module 129. As illustrated in
blocks boundary. For example, the loWer three bits are FIG. 1, there are several data items 170 exchanged betWeen
US 8,533,139 B2
7 8
the sub-modules of speedup module 120, initialization mod described above and depicted in FIGS. 1 and 3-6. Flowchart
ule 140 and the sub-modules of the BK algorithm execution 700 is described with reference to the embodiments of FIGS.
module 130. The exchange of speci?c data items 170 between 1 and 3-6. However, ?owchart 700 is not limited to those
the modules and sub-modules is described below with con example embodiments. Note that the steps in the ?owchart do
tinued reference to FIG. 1. not necessarily have to occur in the order shown.
As shown in FIG. 1, in an embodiment, the input grid-like The method begins at step 725 where an input grid-like
graph 166 is received from the domain-speci?c graph gen graph is received. In an embodiment, this step comprises
erator 164 by the initialization module 140. After the initial receiving input grid-like graph 166 from the domain-speci?c
ization module 140 receives the input grid-like graph 166, it graph generator 1 64 described above with reference to FIG. 1.
performs an initialization of the residual graph and search Step 725 can be performed by initialization module 140. After
trees in cooperation with the speedup module 120. The ini the input grid-like graph is received, the method proceeds to
tialization module 140 obtains nodes indices based on their step 727.
grid coordinates from the node index generator 122. The In step 727, a size of a block and size of the padded grid is
initialization module 140 sends nodes grid coordinates to the determined. According to an embodiment, this step can be
node index generator 122. In response to receiving grid coor performed by speedup module 120. After the sizes of the
dinates, the node index generator in turn generates and sends block and padded grid are determined, the method proceeds
node index to the initialization module 140. After the initial to step 729.
ization is complete, the initialization module 140 passes con In step 729, memory pool is allocated for arrays and aux
trol to the BK algorithm execution module 130. iliary data structures. According to embodiments, arrays 305,
With continued reference to FIG. 1, the BK algorithm 20 307, 309, 310, 312 and 314 described above with reference to
execution module 130 determines the minimum cut in the FIG. 3 are allocated in this step. In an embodiment, step 729
input graph 166 by executing the computational steps of the comprises allocating the compact and static data structures
BK algorithm. Each iteration of the BK algorithm comprise described above with reference to FIG. 3. After a memory
three phases: growing phase, augmenting phase and adopting pool is allocated for arrays and the auxiliary data structures,
phase. These phases are performed by the tree growing mod 25 control is passed to step 731.
ule 132, path augmenting module 134 and orphan adopting In step 731, for each node of the grid-like graph input in
module 136. step 725, steps 733-741 are iterated. Thus, step 731 comprises
During the minimum cut computation, modules 132, 134 repeating steps 733-741 for each node in the input grid-like
and 136 read and modify information stored in nodes ?elds. graph. In embodiments, steps 733-741 can be performed by
Access to these ?elds is provided by the array based graph and 30 initialization module 140. Steps 733-741 are described in
tree representation module 124. Upon receiving index of a relation to a current node being processed in the input grid
node from the BK algorithm execution module 130, the array like graph received in step 725. Each of these iterated steps
based graph and tree representation module 124 returns a are described below.
reference to the requested ?eld back to the BK algorithm In step 733, an array index is computed for the current
execution module 130. This reference can be thenused by one 35 node. In accordance with an embodiment, this step can be
ofthe modules 132, 134 or 136 to read or modify value ofthe performed by node index generator 122 described above with
nodes ?eld. reference to FIG. 1 . After the node s array index is computed,
During the minimum cut computation, modules 132, 134 the method proceeds to step 735.
and 136 also need access to neighboring nodes and reverse In step 735, the residual capacities of the current nodes
edges. Access to nodes neighbors is provided by the neigh 40 outgoing edges are initialized. According to an embodiment,
bor node access module 126. The BK algorithm execution the residual capacities (rc) are initialized to the values of input
module 130 ?rst sends the node index to the neighbor node graph edges capacities. After initializing the residual capaci
access module 126, which in turn computes the index of ties of the node s outgoing edges, the methodproceeds to step
neighboring node and sends it back to the BK algorithm 737.
execution module 130. 45 In step 737, the path from a source terminal through the
Modules 132 and 136 query the saturation of residual node to a sink terminal is augmented. According to an
graph s edges during the minimum cut computation. The BK embodiment, this step can be performed by initialization
algorithm execution module 130 receives the edges satura module 140 in cooperation with speedup module 120
tion status from the edge saturation tracking module 129. described above with reference to FIG. 1. After the path from
Module 136 also updates the saturation status of edges. The 50 a source terminal to a sink terminal through the current node
saturation status of an edge is changed by the edge saturation is augmented, the method proceeds to step 739.
tracking module 129 in response to receiving edge saturation In step 739, the current node is activated if it remains
update from the BK algorithm execution module 130. connected to a terminal. In this step, if is determined that the
In an embodiment, after determining the minimum cut, the current node is still connected to a terminal, the node is
BK algorithm execution module 130 passes control to the 55 activated and control is passed to step 741. If it is determined
minimum cut output module 150. that the current node is no longer connected to a terminal, then
In the example embodiment illustrated in FIG. 1, the mini the method proceeds to step 741 without activating the node.
mum cut output module 150 forwards the output minimum In step 741, the current nodes tree membership is initial
cut 168 back to the client application 160. ized. In accordance with an embodiment, this step can be
Method for Speeding up the Minimum Cut Computation 60 performed by initialization module 140 in cooperation with
FIG. 7 is a ?owchart 700 illustrating steps involved in array based graph and tree representation module 124. After
speeding up the minimum cut computation for graphs with the nodes tree membership is initialized, control is passed to
grid-like topologies, in accordance with an exemplary step 743.
embodiment of the present methods, systems, and computer In step 743, the BK algorithm is executed. According to an
readable media. 65 embodiment, this step can be performed by the BK algorithm
More particularly, ?owchart 700 illustrates the steps by execution module 130. As shown in FIG. 7, step 743 com
which optimized minimum cut computation is performed, as prises steps 745-753. Each of these steps is described below.
US 8,533,139 B2
9 10
In step 745, the search trees are grown. In an embodiment, an example computer system 800 in which the present meth
this step can be performed by tree growing module 132 when ods, systems, and computer readable media, or portions
it is invoked by the BK algorithm execution module 130. The thereof, can be implemented as computer-readable code
trees are grown by expanding active nodes to their neighbors. stored on a computer readable media that when read can carry
Indices of neighboring nodes can be retrieved from the neigh out the functions and process identi?ed herein. For example,
bor node access module 126 based on the index of expanded system 100 of FIG. 1 and the methods illustrated by ?owchart
node. Search trees are grown to neighboring nodes that are 700 of FIG. 7 can be implemented in computer system 800
connected to active nodes by non-saturated edges only. Satu using hardware, compiled software, ?rmware, non-transitory
ration status of nodes outgoing edge can be retrieved from computer readable media having instructions stored thereon,
the edge saturation tracking module 129. When saturation or a combination thereof and may be implemented in one or
status of the reverse edge is queried instead, the reverse edge s more computer systems or other processing systems.
index can be obtained from the reverse edge access module Various embodiments of the invention are described in
128 ?rst. Access to individual ?elds of each node can be terms of this example computer system 800.
provided by the array based graph and tree representation After reading this description, it will become apparent to a
module 124. After the search trees are grown, control is person skilled in the relevant art how to implement the inven
passed to step 747. tion using other computer systems and/or computer architec
In step 747, an evaluation is made regarding whether an tures.
augmenting path has been found. In this step, if it is deter A computer system 800 includes one or more processors,
mined that an augmenting path has not been found, this means such as a processor 804. A processor 804 can be a special
that the minimum cut has been determined and control is 20 purpose or a general purpose processor. The processor 804 is
passed to step 749 where the minimum cut is output. If it is connected to a communication infrastructure 806 (for
determined that an augmenting path has been found, then example, a bus, or network).
control is passed to step 751. The computer system 800 also includes a main memory
In step 751, the path is augmented. According to an 808, preferably random access memory (RAM), and may also
embodiment, step 751 can be performed by path augmenting 25 include a secondary memory 810. The secondary memory
module 134 when it is invoked by the BK algorithm execution 810 may include, for example, a hard disk drive 812, a remov
module 130. Path augmentation is performed by traversing able storage drive 814, ?ash memory, a memory stick, and/or
each tree to its root, decrementing residual capacities of edges any similar non-volatile storage mechanism. The removable
in the path direction and incrementing residual capacities of storage drive 814 may comprise a ?oppy disk drive, a mag
reverse edges. Access to nodes ?elds, which contain the 30 netic tape drive, an optical disk drive, a ?ash memory, or the
residual capacities and trees structure, canbe provided by the like. The removable storage drive 814 reads from and/or
array based graph and tree representation module 124. When writes to a removable storage unit 815 in a well known man
reverse edge is accessed, its index is retrieved from the reverse ner The removable storage unit 815 may comprise a ?oppy
edge access module 129 ?rst. During augmentation, at least disk, magnetic tape, optical disk, etc. which is read by and
one of the edges along the path becomes saturated. Saturation 35 written to by the removable storage drive 814. As will be
status of these edges can be updated by the edge saturation appreciated by persons skilled in the relevant art(s), the
tracking module 129. Nodes that are connected to their par removable storage unit 815 includes a non-transitory com
ents by saturated edges are orphaned. After the path is aug puter usable storage medium having stored therein computer
mented the method proceeds to step 753. software and/or data.
In step 753, orphan nodes are adopted. In accordance with 40 In alternative implementations, secondary memory 810
an embodiment, this step can be performed by orphan adopt may include other similar means for allowing computer pro
ing module 136. During adoption, search for a new parent is grams or other instructions to be loaded into the computer
performed for each orphaned node. The search tries to ?nd the system 800. Such means may include, for example, a remov
parent among orphaned nodes neighbors, which are con able storage unit 822 and an interface 820. Examples of such
nected by non-saturated edges and reside in the same tree as 45 means may include a program cartridge and cartridge inter
the orphaned node. Indices of nodes neighbors can be face (such as that found in video game devices), a removable
retrieved from the neighbor node access module 126. Edges memory chip (such as an EPROM, or PROM) and associated
saturation status can be obtained from the edge saturation socket, and other removable storage units 822 and interfaces
tracking module 129. If no parent was found the nodes tree 820 which allow software and data to be transferred from the
membership is changed, otherwise the tree structure is 50 removable storage unit 822 to the computer system 800.
updated. Trees structure and tree membership of each node is The computer system 800 may also include a communica
contained in nodes ?elds. Access to these ?elds can be pro tions interface 824. The communications interface 824 allows
vided by the array based graph and tree representation mod software and data to be transferred between computer system
ule 124.After any orphan nodes are adopted, control is passed 800 and external devices. The communications interface 824
back to step 745. 55 may include a modem, a network interface (such as an Eth
In step 749, the minimum cut is output. In an embodiment, ernet card), a communications port, a PCMCIA slot and card,
step 749 can be performed by minimum cut output module or the like. Software and data transferred via communications
150, which forwards the output minimum cut identi?ed in interface 824 are in the form of signals which may be elec
step 747 to the client application 160. After the minimum cut tronic, electromagnetic, optical, or other signals capable of
is output, the method proceeds to step 755 where the memory 60 being received by communications interface 824. These sig
pool allocated in step 729 is de-allocated and the method nals are provided to communications interface 824 via a
ends. communications path 826. The communications path 826
Example Computer System Implementation carries signals and may be implemented using wire or cable,
Various aspects of the present methods, systems, and com ?ber optics, a phone line, a cellular phone link, an RF link or
puter readable media can be implemented by software com 65 other communications channels.
piled in a process to form a speci?c purpose computer, ?rm In this document, the terms computer program medium,
ware, hardware, or a combination thereof. FIG. 8 illustrates non-transitory computer readable medium, and computer
US 8,533,139 B2
11 12
usable medium are used to generally refer to media such as determining a siZe of an n-dimensional rectangular block;
removable storage unit 818, removable storage unit 822, and padding the grid With a number of dummy nodes to make
a hard disk installed in hard disk drive 812. Signals carried the extents of the grid evenly divisible by the siZe of the
over communications path 826 can also embody the logic block in each dimension;
described herein. Computer program medium and computer dividing the grid into blocks of a determined siZe;
usable medium can also refer to memories, such as main representing a residual graph Without explicit connectivity
memory 808 and secondary memory 810, Which can be information, Wherein the connectivity information is
memory semiconductors (e.g. DRAMs, etc.). These com determined on the ?y and is based on the graphs knoWn
puter program products are means for providing softWare to
regular structure;
the computer system 800. associating a set of ?elds With each node, Wherein the ?elds
Computer programs (also called computer control logic) comprise:
are stored in the main memory 808 and/or the secondary
memory 810. Computer programs may also be received via
residual capacities of the nodes k outgoing edges;
the communications interface 824. Such computer programs, a residual capacity of an edge connecting the node to a
When executed, enable the computer system 800 to imple terminal;
ment the present methods, systems, and computer readable a tree membership ?eld indicating the nodes member
media as discussed herein. In particular, the computer pro ship to one of the search trees;
grams, When executed, enable processor 804 to implement an index of the nodes parent node;
the processes of the present methods, systems, and computer an index of an edge connecting the node to its parent; and
readable media, such as the steps in the methods illustrated by 20 the nodes timestamp;
?owchart 700 of FIG. 7 discussed above. Accordingly, such arranging the ?elds associated With each node into separate
computer programs represent controllers of the computer arrays, Wherein each array corresponds to a single ?eld
system 800. Where the methods, systems, and computer read and each array element corresponds to a single node;
able media are implemented using softWare, the softWare may assigning an array index to each node based on the node s
be stored in a computer program product and loaded into the 25 grid coordinates;
computer system 800 using the removable storage drive 814, de?ning a ?xed ordering of the nodes k outgoing edges;
interface 820, hard drive 812, or communications interface assigning an integer index to each of the nodes k outgoing
824. edges based on the ?xed ordering;
The methods, systems, and computer readable media can accessing the nodes neighbors by computing the neigh
also be implemented computer program products comprising 30 bors array indices based on the array index of the node;
softWare stored on any computer useable medium. Such soft and
Ware, When executed in one or more data processing device, accessing the reverse edge of the nodes outgoing edge by
causes a data processing device(s) to operate as described determining the integer index of the neighbor s outgoing
herein. Embodiments of the invention may employ suitable edge in the opposite direction.
computer useable or readable medium, known noW or devel 35 2. The method of claim 1, Wherein the siZe of the block is
oped in the future. Examples of computer useable mediums determined such that the blocks extent is a poWer of tWo in
include, but are not limited to, primary storage devices (e.g., each dimension and the number of nodes inside the block is
any type of random access memory), secondary storage equal to the siZe of a cache line.
devices (e.g., hard drives, ?oppy disks, CD ROMS, ZIP disks, 3. The method of claim 1, Wherein the array indices are
tapes, magnetic storage devices, optical storage devices, 40 assigned to the nodes by:
MEMS, nanotechnological storage device, etc.), and commu proceeding through the blocks in a scan line order; and
nication mediums (e. g., Wired and Wireless communications assigning consecutive array indices to the nodes inside
netWorks, local area netWorks, Wide area netWorks, intranets, each block in a scan line order.
etc.). 4. The method of claim 3, Wherein the nodes assigned
Conclusion 45 array index is computed based on the nodes grid coordinates
While various embodiments of the present invention have by performing operations comprising: additions, multiplica
been described above, it should be understood that they have tions, logical shifts, and bitWise conjunctions.
been presented by Way of example only, and not limitation. It 5. The method of claim 1, Wherein the array indices of the
Will be understood by those skilled in the relevant art(s) that nodes neighbors are computed based on the nodes array
various changes in form and details may be made therein 50 index by performing operations comprising: additions, bit
Without departing from the spirit and scope of the invention as Wise negations, bitWise conjunctions, and conditional moves.
de?ned in the appended claims. For example, in the above 6. The method of claim 1, Wherein the integer index of the
embodiments and description, the invention has been reverse edge is determined based on the integer index of the
described With reference to particular examples, such as edge using a lookup table.
graphs having topology of a 2-dimensional 4-connected grid. 55 7. The method of claim 1, Wherein a saturation status of the
It should be understood that the invention is not limited to k outgoing edges is represented by an additional k-bit satu
these examples. The invention is applicable to any elements ration ?eld, Wherein the saturation ?eld is updated Whenever
operating as described herein. Accordingly, the breadth and the saturation status of any of the outgoing edges changes.
scope of the present invention should not be limited by any of 8. The method of claim 7, Wherein the saturation ?eld is
the above-described exemplary embodiments, but should be 60 merged With the node s tree membership ?eld and the merged
de?ned only in accordance With the folloWing claims and ?eld is arranged into a single array.
their equivalents. 9. The method of claim 1, Wherein the ?rst element of each
What is claimed is: array is aligned at an address evenly divisible by the siZe of a
1. A method of speeding up computation of a minimum cut cache line.
in a graph With a topology of an n-dimensional k-connected 65 1 0. The method of claim 1, Wherein the arrays and auxiliary
grid, Wherein the computation is performed using a variant of data structures are stored in a pre-allocated memory pool of
the Boykov-Kolmogorov algorithm, the method comprising: ?nite siZe.
US 8,533,139 B2
13 14
11. A non-transitory computer readable storage medium saturation ?eld is updated Whenever the saturation status of
having instructions stored thereon that, in response to execu any of the outgoing edges changes.
tion by a computing device, cause the computing device to 16. The computer readable storage medium of claim 15,
perform operations for speeding up computation of a mini Wherein the saturation ?eld is merged With the nodes tree
mum cut in a graph With a topology of an n-dimensional membership ?eld and the merged ?eld is arranged into a
k-connected grid, Wherein the computation is performed single array.
17. The computer readable storage medium of claim 11,
using a variant of the Boykov-Kolmogorov algorithm, com Wherein the ?rst element of each array is aligned at an address
prising: evenly divisible by the siZe of a cache line.
determining a siZe of an n-dimensional rectangular block; 18. The computer readable storage medium of claim 11,
padding the grid With a number of dummy nodes to make Wherein the arrays and auxiliary data structures are stored in
the extents of the grid evenly divisible by the siZe of the a pre-allocated memory pool of ?nite siZe.
block in each dimension; 19. A minimum cut computation system comprising:
dividing the grid into blocks of a determined siZe; a processor; and
representing a residual graph Without explicit connectivity a memory arranged to store executable instructions to
information, Wherein the connectivity information is cause the processor to speed up computation of a mini
determined on the ?y and is based on the graph s knoWn mum cut in a graph With a topology of an n-dimensional
regular structure; k-connected grid, Wherein the computation is performed
associating a set of ?elds With each node, Wherein the ?elds using a variant of the Boykov-Kolmogorov algorithm by
comprise: performing operations comprising:
20 determining a siZe of an n-dimensional rectangular block;
residual capacities of the nodes k outgoing edges;
a residual capacity of an edge connecting the node to a padding the grid With a number of dummy nodes to make
the extents of the grid evenly divisible by the siZe of the
terminal; block in each dimension;
a tree membership ?eld indicating the nodes member
ship to one of the search trees; dividing the grid into blocks of a determined siZe;
25 representing a residual graph Without explicit connectivity
an index of the nodes parent node;
an index of an edge connecting the node to its parent; and information, Wherein the connectivity information is
the nodes timestamp; determined on the ?y and is based on the graphs knoWn
arranging the ?elds associated With each node into separate regular structure;
arrays, Wherein each array corresponds to a single ?eld associating a set of ?elds With each node, Wherein the ?elds
30
and each array element corresponds to a single node; comprise:
assigning an array index to each node based on the node s residual capacities of the nodes k outgoing edges;
a residual capacity of an edge connecting the node to a
grid coordinates;
de?ning a ?xed ordering of the nodes k outgoing edges; terminal;
assigning an integer index to each of the node s k outgoing a tree membership ?eld indicating the nodes member
35
edges based on the ?xed ordering; ship to one of the search trees;
accessing the nodes neighbors by computing the neigh an index of the nodes parent node;
bors array indices based on the array index of the node; an index of an edge connecting the node to its parent; and
and the nodes timestamp;
accessing the reverse edge of the nodes outgoing edge by arranging the ?elds associated With each node into separate
40
determining the integer index of the neighbor s outgoing arrays, Wherein each array corresponds to a single ?eld
edge in the opposite direction. and each array element corresponds to a single node;
12. The computer readable storage medium of claim 11, assigning an array index to each node based on the node s
Wherein the siZe of the block is determined such that the grid coordinates;
blocks extent is a poWer of tWo in each dimension and the de?ning a ?xed ordering of the nodes k outgoing edges;
45 assigning an integer index to each of the nodes k outgoing
number of nodes inside the block is equal to the siZe of a cache
line. edges based on the ?xed ordering;
13. The computer readable storage medium of claim 11, accessing the nodes neighbors by computing the neigh
Wherein the array indices of the nodes neighbors are com bors array indices based on the array index of the node;
puted based on the nodes array index by performing opera and
50 accessing the reverse edge of the nodes outgoing edge by
tions comprising: additions, bitWise negations, bitWise con
junctions, and conditional moves. determining the integer index of the neighbor s outgoing
14. The computer readable storage medium of claim 11, edge in the opposite direction.
Wherein the integer index of the reverse edge is determined 20. The system of claim 19, Wherein the nodes assigned
based on the integer index of the edge using a lookup table. array index is computed based on the nodes grid coordinates
55
15. The computer readable storage medium of claim 11, by performing operations comprising: additions, multiplica
Wherein a saturation status of the k outgoing edges is repre tions, logical shifts, and bitWise conjunctions.
sented by an additional k-bit saturation ?eld, Wherein the * * * * *

Potrebbero piacerti anche