Sei sulla pagina 1di 10

Sparse Matrix Ordering

Semestral Project for the Combinatorial Optimalization Course Final Report Ondrej Ivan ck
ivancond@fel.cvut.cz

May 03, 2011

Formulation of the Problem

According to the requirements of the course Combinatorial Optimalization I have choosen the sparse matrix ordering problem. The goal is to nd such permutation of rows and columns of the matrix that number of multiplications and divisions used to solve the sparse linear equation system will be minimal.

ure 1. Even though both matrices have the same number of non-zero elements, there is a signicant computation reduction by simply permutation of rows and columns. This is because of lls occured during LUF. A good ordering scheme is one in witch the resulting matrix has similar structure as the original A matrix. This means that the number of lls is minimal.

1.1

Ordering Schema

(a) Left-up arrow

Node ordering schema is importanat in minimizing the number of mul. and div. required for both L and U triangularization and forward/backward substitution (FBS). A good ordering will result in the addition of few lls to the triangular factors during the LU factorization (LUF) process. A ll is a non-zero element in the L and U matrix that was zero in the original A matrix. If A is a full matrix, 3 n = n 3 mul. and div. are required for the LUF process and = n2 mul. and div. are required for the FBS. The number of mul. and div. required can be substantially reduced in sparse matrix solution if a proper node ordering is used.[1, p. 111]


(b) Right-down arrow

Figure 1: Left-up arrow sparse matrix: The number of mul. and div. required for the LUF is = 40, for FBS is = 25. Total number of mul. and div. operations are + = 65. The sparse matrix during LUF becomes full.

Right-down arrow sparse matrix: The number of mul. and div. required for the LUF is = 8, for FBS is = 13. Total number of 1.2 Example mul. and div. operations are + = 21. The sparse matrix during LUF remains sparse. The number of operations required for the solution of system Ax = b is examined in g1

1.3

a vertex v that minimizes degGk1 (v) = |adjGk1 (v)|. The algorithm is a symmetric The problem of optimal ordering problem is an variant of the Markowitz scheme[5] and was NP-complete problem [2], but several schemes rst applied to sparse symmetric factorization have been developed that provide the nearby Tinney and Walker[6]. Over the years many optimal results. enhancements have been proposed to the basic algorithm that have greatly improved its e2 Overview of the methods[4] ciency. There are many recent works about ordering schemas. This is because the specic problems construct specic types of sparse matrices (band-diagonal, block triangular, block tridiagonal, . . . )[3, p. 77]. Below, the most used methods are decribed. They can be divided in two categories, according how the elimination tree is build. Most state-of-the-art ordering schemes for sparse matrices are a hybrid of a bottom-up method such as minimum degree and a top-down scheme such as Georges nested dissection. To explain ordering methods, the notation must be given. Sparse matrices are represented as undirected graphs (sparse matrix have the structure of adjacence matrix for this graph). All schemes are described for the undirected graph G = (V, E), E V V , associated with the symmetric matrix S. Let v be a vertex of G. The set of vertices that are adjacent to v is denoted by adjG(v). 2.1.2 Minimum Deciency Fill

Complexity

A less popular bottom-up scheme is the minimum deciency or minimum local ll (MF) heuristic. The exact amount of ll is used to select a vertex for elimination. The minimum deciency algorithm has received much less attention because of its prohibitive runtime.

2.2

Top-down Ordering

2.1

Bottom-up Ordering

Bottom-up methods build the elimination tree from the leaves up to the root. In each iteration k a greedy heuristic is applied to Gk1 to select a vertex for elimination. This section briey describes two of the most popular bottom-up algorithms, the minimum degree and the minimum deciency ordering heuristics. 2.1.1 Minimum Degree Ordering

As mentioned above, at each iteration k the minimum degree (MD) algorithm eliminates 2

The most popular top-down scheme is Georges nested dissection (ND) algorithm[7, 8]. The basic idea of this approach is to nd a subset of vertices S in G, whose removal partitions G in two subgraphs G(B) and G(W ) with V = S B W and |B|, |W | |V | for some 0 < < 1. Such a partition of G is denoted by (S, B, W ). The set S is called vertex separator of G. If we order the vertices in S after the (black) vertices in B and the (white) vertices in W , no ll-edge can occur between B and W . Typically, the columns corresponding to S constitute a full o-diagonal block in the Cholesky factor. Therefore, S is supposed to be small. Once S has been found, the algorithm is recursively applied to each connected component of G(B) and G(W ) until a component consists of a single vertex or a clique. In this way the elimination tree is built from the root down to the leaves. Graph partitioning heuristics are usually divided into construction and improvement heuristics. A construction heuristic takes the graph as input and computes an initial sepa-

rator from scratch. An improvement heuris- titioning algorithms are better). The permutic tries to minimize the size of a separator tation is reconstructed after recursion termithrough a sequence of elementary steps. nation (function permutation). The two bisected components of the graph are ordered rst and the separators (dependent elements) 3 Multilevel Nested Dissec- are ordered as last. However, the part of mation Method trix for separators must be factorized after the factorization of the independent submatrices As some ordering methods are implemented in are computed. MATLAB as standard functions1 , I decided to study and implement nested dissection order- Algorithm 1 NestedDissection(A) ing method. Require: A sparse matrix A Ensure: Permutation vector P for matrix A

3.1

Nested Dissection

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:

Fundamental part of this method is nding the graph partitioning (bisectioning) into the two equal partitions, minimizing the edge cut. After that, the same procedure is performed on the bisections recursively. The skeleton is showed in the algorithm 1. The termination condition of recursion is the size of matrix A. For incomplete nested dissection methods, function small ordering can be some simpler ordering method. For complete nested dissection, there is a condition if the matrix A can be bisected. If not, trivial permutation is returned. The rst step is to create the graph structure G from input sparse matrix A. Actually, A is the adjacence matrix for this graph. Function MultilevelBisection returns vector indicating in witch partition the node belongs to. For undirected graph G = (V, E), the bipartitions are subgraphs G1 and G2 of graph G, where G1 G2 = , B(i) = 0 if V (i) G1 and B(i) = 1 if V (i) G2 . Vertex separator S are such nodes, their some incident nodes are in G1 and some in G2 . Removing these nodes causes partition of the graph into two independent components. The big deal of this separation is fact, that submatrices constructed from these components can be factorized paralell (but for this purposes, k-way graph par1

if is small(A) then P small ordering(A) return P end if G create graph(A) B MultilevelBisection(G) S vector separator(G, B) A1 first submatrix(G, B, S) P1 NestedDissection(A1 ) A2 second submatrix(G, B, S) P2 NestedDissection(A2 ) P permutation(P1 , P2 , S) return P

3.2

Multilevel Bisection

colperm, symrcm, colamd, symamd, amd, dmperm

For computation speedup, the bisection is computed iteratively from the initial partition of the coarsed graph to ner graph and the partition is rened each step (but sometimes each second or third is enough). The body of MultilevelBisection in the algorithm 2 is simple. Coarsening of the graph is based on the maximal independent edge matching, but several other methods could be used (subsection 3.4). To nd the initial bisection, the spectral partitioning was used (subsection 3.5). Renement of bisection partly depends on the method of initial bisectioning; here the Reileigh quotient iteration was used (subsection 3.6). 3

Algorithm 2 MultilevelBisection(G) Require: Graph G Ensure: Bisection of graph G

Algorithm 3 MIEM(E, V )

Require: Set of edges E and vertices V Ensure: Maximal independent edge matching M 1: GO G {original graph} 1: state(Vi ) = OPEN i {1 . . . n} 2: while not(is small(G)) do 2: H E 3: G coarse graph(G) 3: M 4: end while 4: while H = do 5: B initial bisection(G) 5: emax max cost(H) 6: while G = GO do 6: n adj (e max) 7: G uncoarse graph(G) 7: n+ adj+ (e max) 8: B refine bisecton(B) 8: if state(n ) = OPEN state(n+ ) = 9: end while OPEN then 10: return B 9: M M {emax } 10: state(n ) CLOSED 11: state(n+ ) CLOSED 3.3 Graph Coursening 12: end if 13: H H/{emax } The function coarse graph could be based on 14: end while multiple coursening methods, I have choosed 15: return M the maximal independent edge matching described bellow. Edges, that are matched, will 3.5 Spectral Partitioning be deleted in the coursening step. The adjacent nodes are split together and the cost for Initial partitionig are computing by eigenthis new node is the sum of costs of splitted vector corresponding to the second smallest nodes. Edges incident with the nodes between eigenvalue (called algebraic conectivity) of the two partition are also split together and the Laplace matrix of a graph. This vector is also costs are summed. The mapping for the edge referred as Fielder vector. and node indices from the ner graph to the coarsed graph must be computed to interpo- Algorithm 4 SpectralPartition(L) late the Fiedler vector. A demonstration of Require: Laplace matrix L of a graph coursening steps are in the appendix, page 7. Ensure: Partition of a graph into two as possible equal partition, minimizing the edge cut between them
1:

3.4

Maximal Matching

Independent

Edge

2: 3: 4: 5: 6: 7: 8:

This greedy algorithm sorts edges ascendingly by costs and puts edge to matching if its input and output nodee are opened. Then it closes these nodes and continues to check the next edge with the maximal cost. 4

v fiedler(L) m mean(v) if vi < m then pi 0 else pi 1 end if return p

3.6

Reileigh Quotient Iteration

The Reileigh quotient iteration renes the interpolated eigenvector. Fiedler vector obtained [6] W. F. Tinney and J. W. Walker, Difrom a coarsed graph by function fiedler in rect solutions of sparse network equations SpectralPartition. This vector is interpoby optimally ordered triangular factorizalated to vector corresponding to a ner graph. tion, Proc. of the IEEE, 55 (1967), pp. Since it a good quess, RQI terminates about 18011809. few steps. [7] A. George and J. W. H. Liu, An autoAlgorithm 5 RQI(v, L) matic nested dissection algorithm for irregular nite element problems, SIAM J. NuRequire: Approximation of Fielder vector v mer. Anal., 15:5 (1978), pp. 1051069. and Laplace matrix L of a graph Ensure: Rened Fiedler vector of a graph [8] A. George and J. W. H. Liu, A fast 1: vT Lv implementation of the minimum degree al2: while (Lv)T (Lv) 2 > eps2 do gorithm using quotient graphs, ACM Trans. 3: solve(L I))x = v Math. Software, 6 (1980), pp. 33358. 4: v x/ x 5: vT Lv 6: end while 4 Performance 7: return v Performance of this ordering method depends on many parameters. I have compared the processor time for MATLAB ordering methReferences ods anf my symnd (symmetric nested dissection) method. Results are very bad for symnd, [1] Mariesa L. Crow, Computational Methit is 1000 times slower than orher methods. On ods for Electric Power Systems, CRC the other hand, the quality of ordering is quite Press, second edition, 2009. good. The comparison of the ordering quality can [2] M. Yannakakis, Computing the minimum ll-in is NP-complete, SIAM Jour- be also be compared very easy. I have plot nal of Algebraic Discrete Methods, vol. 2, the nonzero elements of the matrices and then nonzero elements after LU factorization. Some 1981, pp. 7779. interesting visualizations and numbers are in [3] W.H.Press et al., Numerical Recipes, the gures 3, 4 and 5. Cmabridge University Press, third edition, 2007.

ear programming, Management Science, 3 (1957), pp. 255269.

Conclusion

[4] Jurgen Schultze Towards a Tighter Coupling of Bottom-up and Top-down I have studied the nested dissection ordering method and implemented it in MATLAB. Sparse Matrix Ordering Methods. Since it was a little bit hard problem in some [5] H. M. Markowitz, The elimination form situations, MATLAB allowed me to debug my of the inverse and its application to lin- code and visualize results very eectively. 5

I must nd many information and read several papers about algorithms used in symnd. I have found, that graph partitioning is a very practical problem, not only in sparse matrix ordering. It must be solved in many paralell computations to nd mapping from tasks to sources. Number of tasks for one source must be equal as possible as the number of tasks for other sources. Also, the communication between sources (i.e. dependent variables) must be minimal. Other problem ist hardware/software codesign, where must be specied, which parts of system will be mapped on hardware or software. Some computation limit must be satised (parts rather mapped to hardware) while minimizing costs (parts rather mapped to softvare). This implementation does not fully support the advances of nested dissection algorithm. When computing the factorization, it can begin on the lowest level of recursion (two independent submatrices and dependent separator area, which must be computed after ind. two submatrices) and these values used for factorization bigger part of matrix one level higher in the recursion. In my tests, the factorization is performed on the whole ordered matrix, which causes some higher values of ll-ins. Since the performance of the symnd is absolutely slow, it was a good practise to implement such algorithms.

Image Appendix

3 2(1) 2 1(1) 1(1) 1 2(1) 3(1) 0

3(1) 4(1) 5(1) 6(1) 7(1) 9(1) 9(1) 8(1) 11(1)

11(1)14(1)13(1) 12(1) 16(1)

3 1(1) 15(1) 18(1) 16(1) 1 2 1(2) 2(1)

2(2) 3(1) 4(2) 5(2) 5(2) 4(1) 3(2) 7(1) 6(1)

6(2) 8(1) 8(2) 9(1) 7(2)

8(1) 10(1)10(1) 7(1) 4(1) 5(1) 6(1) 13(1)

17(1) 0

12(1)15(1)14(1)

3.5 3 2.5 2(4) 2 1.5 1 0.5 0 0.5 1 4(2) 5(2) 5(1) 3(1) 4(1) 1(1) 1(4) 6(1) 2(1) 3(4)

3 2.5 2 1.5 1 0.5 0 0.5 1 1 2 3 4 5 6 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 3(2) 2(2)

1(8) 1(2) 2(6)

1.5

1(14)

1 1(2) 0.5

2(2)

0.5 1 1.5 2 2.5 3 3.5 4 4.5

Figure 2: Graph coursening steps. Costs for nodes and edges are in the brackets. Green edges indicate the edge matching, these edges will be removed in the next step. Nodes incident with these edges are split together.

0 5 10

0 5 10 0 5 10 colperm: fills=32

0 5 10 0 5 10 symrcm: fills=35

0 5 10 0 5 10 dmperm: fills=36 nnz in A fills in lu( A )

0 5 10 no ordering: fills=32

0 5 10 0 5 10 colamd: fills=32

0 5 10 0 5 10 symamd: fills=32

0 5 10 0 5 10 amd: fills=32

0 5 10 0 5 10 symnd: fills=32

Figure 3: Dierent ordering methods on small matrix. Blue color ordered matrix, red color ll-ins.

Figure 4: Results on bigger matrix, shape similar as amd, but nonzeros two times more.

0 50 100 150 200 250 300 0 50 100 150 200 250 no ordering: fills=13482 300

0 50 100 150 200 250 300 0 50 100 150 200 250 colperm: fills=30627 300

0 50 100 150 200 250 300 0 50 100 150 200 250 symrcm: fills=13497 300

0 50 100 150 200 250 300 0 50 100 150 200 250 dmperm: fills=13482 300

nnz in A fills in lu( A ) 0 50 100 150 200 250 300 0 50 100 150 200 250 colamd: fills=9642 300 0 50 100 150 200 250 300 0 50 100 150 200 250 symamd: fills=6681 300 0 50 100 150 200 250 300 0 50 100 150 200 250 amd: fills=6583 300 0 50 100 150 200 250 300 0 50 100 150 200 250 symnd: fills=9716 300

Figure 5: Other diagonal shaped matrix.

Figure 6: Same shape, dierent size.

10

Potrebbero piacerti anche