Sei sulla pagina 1di 12

Parallel Computing for Computational Mechanics

Project 3

November 29, 2013

Coarse Mesh Pressure Plot

Fig. 1: Coarse Mesh Pressure Plot

Fine Mesh with Projected Pressure

Fig. 2: Fine Mesh with Projected Pressure using MPI 16 Processors 1

Parallelization of Code

The optimized projection code from Project 1 is parallelized using MPI. The roundrobin communication is implemented as discussed in the class and practised in the last homework. The data values at ne nodes are initialized with arbitrary large value 1e20 to mark them unprojected. This helps in avoiding repeating the projection of data values on the nodes which are already mapped on previous processors. Some parts of the projection code are present in Apendix A.

Speedup Vs Number of Processors 140 Speedup Linear 120

100 Speedup factor

80

60

40

20

0 0 20 40 60 80 Number of Processors 100 120 140

Fig. 3: Speedup factor Vs. No. of Processors From Fig. 3, it can be observed that speedup increases sublinearly upto around 75 processors. Thereafter, it attens and drops after 120 processors. This behaviour is observed because after 75 processors, communication starts dominating.

Following table shows the timings with increasing number of processors. The code is run in queue on RWTH BCS cluster. No. of Processors Serial 1 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 72 80 96 120 128 Time (s) 304.865613 314.416346 147.617687 86.591866 62.595602 48.636833 39.797718 33.81357 29.679973 26.177833 23.393518 21.238808 19.385921 18.114679 16.641899 15.729858 14.489544 13.703405 13.006781 12.373219 11.886773 11.303027 11.007674 10.228624 10.014772 9.711025 9.168055 9.222781 8.685567 8.615141 8.054651 7.827556 7.58492 7.72241 6.893956 6.896045 6.068668 5.870825 6.075566 Speedup 1 0.9696060777 2.0651996803 3.5206540069 4.8703102176 6.2680890427 7.6602382076 9.0159069273 10.2715726864 11.6457309511 13.0318150524 14.3539128938 15.7258455763 16.8294453355 18.3188228699 19.3809759758 21.0399996025 22.2470254656 23.438543326 24.6386974966 25.6469943525 26.9715360319 27.695224259 29.8045954177 30.4410325068 31.3931845505 33.2524183156 33.0551056129 35.0996083503 35.3865363318 37.8489396996 38.9470225445 40.192909088 39.4773134294 44.2213440295 44.207948179 50.2350762968 51.927965831 50.1780410253

Table. 1: Timings with dierent number of processors

Timeline

The run for generating vampir trace was done on harpertown. So, there is a dierence of timings for 4 processors. For BCS, it takes around 86 seconds while on harpertown, it takes around 55 seconds.

Fig. 4: Entire Timeline for 4 processors

Fig. 5: Communication pattern: Zoomedin Timeline for 4 processors

Appendix A: Some parts of MPI Projection code


./Project03/src/mpi code.c
1

#i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e

< s t d i o . h> < s t d l i b . h> <math . h> < s t r i n g . h> < s y s / t y p e s . h> < s y s / s t a t . h> f u n c t i o n s . h mpi . h

#d e f i n e MAXI( a , b ) ( ( a ) < ( b ) ? ( b ) : ( a ) )
11

13

15

17

19

21

23

25

/ P a r t i t i o n t h e mesh f o r each p r o c e s s o r / v o i d m p i p a r t i t i o n ( i n t myrank , i n t nprocs , int nNodes coarse l , int nElem coarse l , int nNodes fine l , int nElem fine l , const i n t nNodes coarse , c o n s t i n t nElem coarse , const int nNodes fine , const int nElem fine ) { // MPI I n i t i a l i z e and d e t e r m i n e n p r o c s and myrank MPI Comm size (MPI COMM WORLD, n p r o c s ) ; MPI Comm rank (MPI COMM WORLD, myrank ) ; nElem coarse l = ( nElem coarse 1) / ( nprocs ) + 1 ; i f ( ( myrank+1) ( n E l e m c o a r s e l )> n E l e m c o a r s e ) n E l e m c o a r s e l = MAXI( n E l e m c o a r s e myrank ( n E l e m c o a r s e l ) , 0 ) ; nNodes coarse l = ( nNodes coarse 1) /( nprocs ) + 1 ; i f ( ( myrank+1) ( n N o d e s c o a r s e l )> n N o d e s c o a r s e ) n N o d e s c o a r s e l = MAXI( n N o d e s c o a r s e myrank ( n N o d e s c o a r s e l ) , 0 ) ; n E l e m f i n e l = ( nElem fine 1) / ( nprocs ) + 1 ; i f ( ( myrank+1) ( n E l e m f i n e l )> n E l e m f i n e ) n E l e m f i n e l = MAXI( n E l e m f i n e myrank ( n E l e m f i n e l ) , 0 ) ; n N o d e s f i n e l = ( nNodes fine 1) /( nprocs ) + 1 ; i f ( ( myrank+1) ( n N o d e s f i n e l )> n N o d e s f i n e ) n N o d e s f i n e l = MAXI( n N o d e s f i n e myrank ( n N o d e s f i n e l ) , 0 ) ; return ; } / RoundRobin communication / v o i d mpi communicate ( d o u b l e mesh , d o u b l e data , i n t nNodes , c o n s t i n t MAX Nodes ,

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

i n t ndf , i n t dim , c o n s t i n t myrank , const i n t nprocs ) { i n t MAXCOUNT m = MAX Nodes dim ; i n t MAXCOUNT d = MAX Nodes ndf ; // Attach b u f f e r i n c a s e o f MPI Bsend i n t b u f s i z e m = MAXCOUNT m s i z e o f ( d o u b l e ) ; i n t b u f s i z e d = MAXCOUNT d s i z e o f ( d o u b l e ) ; // B u f f e r s i z e = Mesh B u f f e r + Data B u f f e r + Node B u f f e r + MPI BSEND OVERHEAD i n t b u f s i z e = b u f s i z e m + b u f s i z e d + 1 + MPI BSEND OVERHEAD; d o u b l e abuf = ( d o u b l e ) m a l l o c ( b u f s i z e ) ; // Determine p r e v i o u s and next p r o c rank f o r communication i n t prev = ( myrank+nprocs 1)%n p r o c s ; i n t next = ( myrank+1)%n p r o c s ; M P I B u f f e r a t t a c h ( abuf , b u f s i z e ) ;

57

59

61

63

65

67

69

71

73

75

77

79

81

83

85

87

// Round r o b i n communication l o o p . This i s adapted from l a s t homework f o r t h i s p r o j e c t i o n code . f o r ( i n t i p r o c s =0; i p r o c s < 2; i p r o c s ++){ i f ( i p r o c s > 0) { MPI Recv ( mesh , MAXCOUNT m, MPI DOUBLE, prev , 0 , MPI COMM WORLD, MPI STATUS IGNORE) ; MPI Recv ( data , MAXCOUNT d, MPI DOUBLE, prev , 1 , MPI COMM WORLD, MPI STATUS IGNORE) ; MPI Recv ( nNodes , 1 , MPI INT , prev , 2 , MPI COMM WORLD, MPI STATUS IGNORE) ; } i f ( i p r o c s < 1) { MPI Bsend ( mesh , MAXCOUNT m, MPI DOUBLE, next , 0 , MPI COMM WORLD) ; MPI Bsend ( data , MAXCOUNT d, MPI DOUBLE, next , 1 , MPI COMM WORLD) ; MPI Bsend ( nNodes , 1 , MPI INT , next , 2 , MPI COMM WORLD) ; } } M P I B u f f e r d e t a c h (&abuf , &b u f s i z e ) ; return ; }

89

91

./Project03/src/mesh io.c
2

10

#i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e

< s t d l i b . h> < s t d i o . h> < u n i s t d . h> <math . h> < s t r i n g . h> < s y s / t y p e s . h> < s y s / s t a t . h> < f c n t l . h> mpi . h f u n c t i o n s . h

12

14

16

18

20

22

24

/ Read f i n e mesh , c o n n e c t i v i t y and o p t i o n a l l y s c a l a r v a l u e s f i l e s / v o i d Read Fine Mesh ( c h a r m x y z f i l e , // Input : (X, Y, Z ) Co o r d i n a t e s F i l e name c o n s t i n t nNodes , // Input : Number o f nodes c o n s t i n t dim , // Input : Number o f s p a c e d i m e n s i o n s i n t myrank , // Input : P r o c e s s o r rank d o u b l e mesh ) // Output : Array o f c o o r d i n a t e s o f nodes { //Read m x y z f i l e l o c a l f i l e on each p r o c e s s o r MPI File fh m ; MPI Datatype ftype m ; i n t o f f m = myrank dim nNodes 8 ; MPI Type contiguous ( dim , MPI DOUBLE, &ftype m ) ; MPI Type commit(& ftype m ) ; M P I F i l e o p e n (MPI COMM WORLD, m x y z f i l e , MPI MODE RDONLY, MPI INFO NULL , &fh m );

26

28

30

M P I F i l e s e t v i e w ( fh m , off m , MPI DOUBLE, ftype m , n a t i v e , MPI INFO NULL) ;


32

34

M P I F i l e r e a d ( fh m , mesh , nNodes , ftype m , MPI STATUS IGNORE) ; i f ( isLittleEndian () ) s w a p b y t e s ( ( c h a r ) mesh , dim nNodes , 8 ) ; M P I F i l e c l o s e (&fh m ) ; MPI Type free (& ftype m ) ; return ; } / Read c o a r s e mesh , s c a l a r v a l u e s f i l e s / v o i d Read Coarse Mesh ( c h a r m x y z f i l e , // Input : (X, Y, Z ) Co o r d i n a t e s F i l e name char d a t a f i l e , // Input : Data F i l e name [ o p t i o n a l : Put NULL i f don t want t o s p e c i f y name ] c o n s t i n t nNodes , // Input : Number o f nodes c o n s t i n t nElem , // Input : Number o f Elements c o n s t i n t nEn , // Input : Number o f Element nodes c o n s t i n t dim , // Input : Number o f s p a c e d i m e n s i o n s c o n s t i n t ndf , // Input : Number o f d e g r e e s o f freedom i n t myrank , // Input : P r o c e s s o r rank d o u b l e mesh , // Output : Array o f c o o r d i n a t e s o f nodes d o u b l e data ) // Output : Array o f Data v a l u e s [ o p t i o n a l : Put NULL i f don t want t o s p e c i f y name ] { //Read m x y z f i l e l o c a l f i l e on each p r o c e s s o r MPI File fh m ; MPI Datatype ftype m ; i n t o f f m = myrank dim nEn nElem 8 ; MPI Type contiguous ( dim nEn , MPI DOUBLE, &ftype m ) ; MPI Type commit(& ftype m ) ; M P I F i l e o p e n (MPI COMM WORLD, m x y z f i l e , MPI MODE RDONLY, MPI INFO NULL , &fh m );

36

38

40

42

44

46

48

50

52

54

56

58

60

62

64

66

M P I F i l e s e t v i e w ( fh m , off m , MPI DOUBLE, ftype m , n a t i v e , MPI INFO NULL) ; M P I F i l e r e a d ( fh m , mesh , nElem , ftype m , MPI STATUS IGNORE) ; i f ( isLittleEndian () ) s w a p b y t e s ( ( c h a r ) mesh , dim nEn nElem , 8 ) ; M P I F i l e c l o s e (&fh m ) ; //Read d a t a f i l e l o c a l f i l e on each p r o c e s s o r MPI File f h d ; MPI Datatype f t y p e d ; i n t o f f d = myrank ndf nEn nElem 8 ; MPI Type contiguous ( ndf nEn , MPI DOUBLE, &f t y p e d ) ; MPI Type commit(& f t y p e d ) ; M P I F i l e o p e n (MPI COMM WORLD, d a t a f i l e , MPI MODE RDONLY, MPI INFO NULL , &f h d ); M P I F i l e s e t v i e w ( f h d , o f f d , MPI DOUBLE, f t y p e d , n a t i v e , MPI INFO NULL) ; M P I F i l e r e a d ( f h d , data , nElem , f t y p e d , MPI STATUS IGNORE) ; i f ( isLittleEndian () ) s w a p b y t e s ( ( c h a r ) data , ndf nEn nElem , 8 ) ; M P I F i l e c l o s e (& f h d ) ; MPI Type free (& ftype m ) ; MPI Type free (& f t y p e d ) ; return ; }

68

70

72

74

76

78

80

82

84

86

88

90

92

94

96

98

100

102

104

106

108

110

112

/ Write mesh , c o n n e c t i v i t y and o p t i o n a l l y s c a l a r v a l u e s f i l e s / v o i d Write Data ( d o u b l e data , c o n s t i n t nNodes , c o n s t i n t nElem , c o n s t i n t nEn , c o n s t i n t dim , c o n s t i n t ndf , c o n s t i n t myrank , c o n s t i n t nprocs , char d a t a f i l e ) { // Write d a t a f i l e l o c a l f i l e on each p r o c e s s o r MPI File f h d ; MPI Datatype f t y p e d ; // A f t e r round r o b i n communication , t h e data has t r a v e l l e d through each processor // and i n t h e end r e s i d e s on p r e v i o u s p r o c e s s o r . i n t o f f d = ( myrank+1)%n p r o c s ndf nNodes 8 ; MPI Type contiguous ( ndf , MPI DOUBLE, &f t y p e d ) ; MPI Type commit(& f t y p e d ) ; M P I F i l e o p e n (MPI COMM WORLD, d a t a f i l e , MPI MODE RDWR | MPI MODE CREATE, MPI INFO NULL , &f h d ) ;

114

116

118

120

122

M P I F i l e s e t v i e w ( f h d , o f f d , MPI DOUBLE, f t y p e d , n a t i v e , MPI INFO NULL) ;


124

126

i f ( isLittleEndian () ) s w a p b y t e s ( ( c h a r ) data , ndf nNodes , 8 ) ; M P I F i l e w r i t e ( f h d , data , nNodes , f t y p e d , MPI STATUS IGNORE) ; M P I F i l e c l o s e (& f h d ) ; MPI Type free (& f t y p e d ) ; return ; }

128

130

132

./Project03/src/map pressure.c
1

#i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e #i n c l u d e

< s t d i o . h> < s t d l i b . h> < s t r i n g . h> < s y s / t y p e s . h> < s y s / s t a t . h> < f c n t l . h> mpi . h f u n c t i o n s . h

11

13

15

17

19

21

23

25

27

/ Map p r e s s u r e v a l u e s from c o a r s e mesh t o f i n e mesh / v o i d Map Data ( d o u b l e m e s h c o a r s e , double data coarse , c o n s t i n t nElem coarse , c o n s t i n t nEn , c o n s t i n t dim , c o n s t i n t ndf , double mesh fine , double d a t a f i n e , int nNodes fine , c o n s t i n t MAX Nodes , c o n s t i n t myrank , const i n t nprocs ) { // L o c a l r e f e r e n c e c o o r d i n a t e s d o u b l e xi , eta , z e t a ; d o u b l e xbar , ybar , zbar , xbar1 , ybar1 , zbar1 , xbar2 , ybar2 , zbar2 , xbar3 , ybar3 , z b a r 3 ; d o u b l e x4 , y4 , z4 , r e c i d e n o m ; int iproc = 0; // Mapping t o l e r e n c e double t o l = i n i t T o l ; int i , j ; // C a l c u l a t e o f f s e t s f o r mesh and data a r r a y s i n t ElemOff = dim nEn , DataOff = ndf nEn ; f o r ( i =0; i < n N o d e s f i n e ; i ++){ // C o n d i t i o n t o check i f node i s a l r e a d y mapped on p r e v i o u s p r o c e s s o r s i f ( d a t a f i n e [ i ndf ] > 1 e19 ) {

29

31

33

35

37

39

41

43

45

47

f o r ( j =0; j <n E l e m c o a r s e ; j=j ++){ // m e s h c o a r s e [ { 0 : x / 1 : y / 2 : z } + dim { node no . : 0 / 1 / 2 / 3 } + ElemOff { elem no . : j }] // O p t i m i z a t i o n : A c c e s s m e s h c o a r s e o n l y once and u s e t h e v a l u e s 4 t i m e s in subsequent c a l c u l a t i o n s x4 = m e s h c o a r s e [ 0 + dim 3 + ElemOff j ] ; y4 = m e s h c o a r s e [ 1 + dim 3 + ElemOff j ] ; z4 = m e s h c o a r s e [ 2 + dim 3 + ElemOff j ] ; // O p t i m i z a t i o n : A c c e s s xbar1 = m e s h c o a r s e [ 0 + ybar1 = m e s h c o a r s e [ 1 + zbar1 = mesh coarse [ 2 + mesh dim 0 dim 0 dim 0 c o a r s e and e x p l o i t s p a t i a l l o c a l i t y + ElemOff j ] x4 ; + ElemOff j ] y4 ; + ElemOff j ] z4 ; x4 ; y4 ; z4 ; x4 ; y4 ; z4 ;

49

51

53

55

57

xbar2 = m e s h c o a r s e [ 0 + dim 1 + ElemOff j ] ybar2 = m e s h c o a r s e [ 1 + dim 1 + ElemOff j ] z b a r 2 = m e s h c o a r s e [ 2 + dim 1 + ElemOff j ] xbar3 = m e s h c o a r s e [ 0 + dim 2 + ElemOff j ] ybar3 = m e s h c o a r s e [ 1 + dim 2 + ElemOff j ] z b a r 3 = m e s h c o a r s e [ 2 + dim 2 + ElemOff j ] xbar ybar zbar = m e s h f i n e [ i dim ] = m e s h f i n e [ i dim + 1 ] = m e s h f i n e [ i dim + 2 ] x4 ; y4 ; z4 ;

59

61

63

65

67

// Only one d i v i s i o n t o c a l c u l a t e r e c i p r o c a l o f common denominator r e c i d e n o m = 1 . 0 / ( xbar1 ( ybar2 z b a r 3 ybar3 z b a r 2 ) xbar2 ( ybar1 z b a r 3 ybar3 z b a r 1 ) + xbar3 ( ybar1 z b a r 2 ybar2 z b a r 1 ) ) ; x i = ( ( ybar2 z b a r 3 ybar3 z b a r 2 ) xbar + ( xbar3 z b a r 2 xbar2 z b a r 3 ) ybar + ( xbar2 ybar3 xbar3 ybar2 ) z b a r ) r e c i d e n o m ;

69

71

e t a = ( ( ybar3 z b a r 1 ybar1 z b a r 3 ) xbar + ( xbar1 z b a r 3 xbar3 z b a r 1 ) ybar + ( xbar3 ybar1 xbar1 ybar3 ) z b a r ) r e c i d e n o m ;
73

z e t a = ( ( ybar1 z b a r 2 ybar2 z b a r 1 ) xbar + ( xbar2 z b a r 1 xbar1 z b a r 2 ) ybar + ( xbar1 ybar2 xbar2 ybar1 ) z b a r ) r e c i d e n o m ;
75

77

i f ( ( xi >=0.0 t o l )&&(eta >=0.0 t o l )&&(z e t a >=0.0 t o l ) &&((1 xi eta z e t a ) >=0.0 tol ) ) { d a t a f i n e [ i ndf ] = d a t a c o a r s e [ 0 + ndf 0 + DataOff j ] x i + d a t a c o a r s e [ 0 + ndf 1 + DataOff j ] e t a \ + d a t a c o a r s e [ 0 + ndf 2 + DataOff j ] z e t a + d a t a c o a r s e [ 0 + ndf 3 + DataOff j ] ( 1 . 0 xi eta z e t a ) ; d a t a f i n e [ i ndf +1] = d a t a c o a r s e [ 1 + ndf 0 + DataOff j ] x i + d a t a c o a r s e [ 1 + ndf 1 + DataOff j ] e t a \ + d a t a c o a r s e [ 1 + ndf 2 + DataOff j ] z e t a + d a t a c o a r s e [ 1 + ndf 3 + DataOff j ] ( 1 . 0 xi eta z e t a ) ; d a t a f i n e [ i ndf +2] = d a t a c o a r s e [ 2 + ndf 0 + DataOff j ] x i + d a t a c o a r s e [ 2 + ndf 1 + DataOff j ] e t a \ + d a t a c o a r s e [ 2 + ndf 2 + DataOff j ] z e t a + d a t a c o a r s e [ 2 + ndf 3 + DataOff j ] ( 1 . 0 xi eta z e t a ) ; d a t a f i n e [ i ndf +3] = d a t a c o a r s e [ 3 + ndf 0 + DataOff j ] x i + d a t a c o a r s e [ 3 + ndf 1 + DataOff j ] e t a \

79

81

83

85

87

10

89

91

+ d a t a c o a r s e [ 3 + ndf 2 + DataOff j ] z e t a + d a t a c o a r s e [ 3 + ndf 3 + DataOff j ] ( 1 . 0 xi eta z e t a ) ; break ; } } } tol = initTol ; i f ( i== n N o d e s f i n e 1) { i f ( i p r o c==nprocs 1) break ;

93

95

97

99

101

// B e f o r e s e n d i n g data , w a i t t i l l each p r o c e s s o r f i n i s h e s i t s chunk MPI Barrier (MPI COMM WORLD) ; mpi communicate ( m e s h f i n e , d a t a f i n e , n N o d e s f i n e , MAX Nodes , ndf , dim , myrank , n p r o c s ) ; i = 0;

103

105

i f ( myrank==0) f p r i n t f ( s t d o u t , \ n i=%d \ t i p r o c=%d \ t n p r o c s=%d \ n , i , i p r o c , nprocs ) ;


107

i p r o c ++;
109

} // f p r i n t f ( s t d o u t , \ n \\ Mapping %l f %% done /\n , \ ( d o u b l e ) ( i +1) / ( n N o d e s f i n e ) 1 0 0 ) ; } return ; }

111

113

115

Note: Please have a look at les located in /src/ for detailed implementation.

11

Potrebbero piacerti anche