Sei sulla pagina 1di 8

1

Calculating Dispersion Derivatives in Fiber Optic Design


Linda Kaufman
Abstract Maxwells equation for modeling the guided waves in a circularly symmetric ber leads to a family of partial differential equation-eigenvalue systems. In ber design one would like to determine the index prole which is involved in Maxwells equation so that certain optical properties, which sometimes involve derivatives of the eigenvalues, are satised. In this paper we will discuss how to to determine derivatives of the eigenvalue problem and the gradients of the dispersion with respect to design parameters in the model.

I. I NTRODUCTION There are various ber optic design methods which if given a refractive index prole n at specic wavelengths will use Maxwells equations to produce such quantities as the effective area, the group delay, the bend loss. the disperson and/or the Petermann radii of the ber. It makes sense to invert the process: determine the refractive index prole that satises a given set of properties rather than determining the properties from the index prole([16]). The inversion process suggests parameterizing the prole by some parameters p and then using using an optimization packages such as [6] or [15] that invokes the ber optics simulator to determine p which saties specic criteria. The performance of these packages is usually enhanced if the function is locally smooth and if given analytic derivatives. Functions that represent numerically computed derivatives such as dispersion are usuallynot sufciently smooth. Our experience with using numerical differentiation to specify the derivatives of the dispersion suggests that numerical differentiation should be avoided. Thus the aim of this paper is to provide formulae for the analytic computation of the dispersion and its gradient with respect to the design parameters. We provide two formulae for the derivative of the dispersion with respect to the design parameters. The rst involves differentiating the derivative of the dispersion with respect to the elements of p. Because the dispersion is itself a derivative, one can interchange the order of differentiation and this leads to a formula with fewer solves. It should be emphasized that our research is a postprocessing step after Maxwells equations has been used to determine a few eigenvalues and their corresponding modes. Our approach is not dependent whether the model is a simple one dimensional model as in the example that we will refer to throughout this paper or a full vectorial microstructured model. In fact, in the latter case, the differences between the formulae is more pronounced. We are not concerned whether one uses a nite element approach as in [3] , [18], and in our
Linda Kaufman is a member of the Computer Science Department, William Paterson University and acknowledges the support of a CFR grant there.

example, a Galerkin approach, a plane-wave method [8] or a nite difference method[7]. The underlying algebraic problem may be linear or nonlinear. The problem may come from a straight ber or one that has been wrapped around a spool as in [14]. We are also not concerned exactly how one obtains the eigenvalues and modes of interest; only that are computed as accurately as possible. Our motivating example is a ber that is perfectly straight, circular, and uniform along its length so that Maxwells equations for guided waves of the ber can be reduced to a family in m of problems of the form 1 d m d (1) (r ) + 2 n2(r) 2 )x = 2 x. r dr dr r The index of refraction prole n(r) is in some regions a piecewise constant or linear function that can be parameterized by several design parameters relating to the widths and heights of each region. In (1) is a specied frequency and m is a specied mode number. As explained in [9], for a particular index prole n(r), one is interested in various integrals of the modes( the eigenvectors), the group delay g = , the dispersion 2 , (2) and the dispersion slope ( 3 (3) 2 where is the wavelength and 2c = where the frequency is measured in radians per second and c is the speed of light. If the index prole n is partially dened by a parameter p, then to use an optimization package to determine p, one might require such quantities as 4 . 2 p (4)

The nite element method converts the family of differential equations for the example in (1) to a family of symmetric generalized tridiagonal eigenvalue problems in and m. If for a particular value of m, the boundary condition comes from truncating the wave format some radius beyond the core of the ber is expressed as the mth order modied Bessel function of the second kind, the eigenvalue problem can be expressed as (A + s()eq eT )x = M x. q (5)

where s() involves the appropriate Bessel functions, A is an q q symmetric tridiagonal matrix for a nite element discretization , eq is the last column of the qq Identity matrix,

and M is the positive denite mass matrix which represents the inner product of the basis functions used in the nite element method[19]. In [10] several algorithms are described that may be used to solve (5). According to [13], if one uses a nite element method with a spacing then the eigenvalue satises
2 2 = / 2 + kcl

(6)

where is dened in (1) and kcl is the wave number of the cladding. In general the wave number is given by kcl = n/c. (7)

In this paper we will consider problems that have been reduced to B(, )x = M x (8) where B(, ) = A()+S(, ), M is independent of and the dispersion can be expressed as a function of the second derivative of . In our sample S(, ) = s()eq eT . For many q applications M will be the Identity matrix. Because we are trying to produce analytic derivatives we assume that B and M can be differentiated with respect to frequency and with respect to the design parameters. Also if the operator is nonlinear with respect to the eigenvalue, i.e., S is nonzero, then S can be differentiated with respect to the eigenvalue. The underlying problem can be one or many dimensional and does not have to derived solely from Maxwells equations or come from the nite element approach. Our formulae were derived assuming the matrix B is real and symmetric and that M is independent of . In the future we will try to extend our results to problems where B is unsymmetric or complex to accommodate such methods as the imaginary beam projection method [17] and M is dependent on if that is necessary. In section II we will review formulae for derivatives of eigenvalues and indicate how the amount of computation can be reduced by noticing common subexpressions. For gradient calculations, interchanging the order of differentiation leads to various formulae with differing computational characteristics. In section III we compare the actual computation time for the competing formulae and implementations for our simple example. II. A NALYTIC
DERIVATIVES

kcl = (e() + e ())/c , and kcl = (2e () + e ())/c , which means that one needs either a differentiable formula for e() or a program for calculating e(), on which one can use an automatic differentiator like ADIC[1]. In the rest of this section we concentrate on obtaining formula for and the gradient of this quantity with respect to the design parameters. 2.1 Calculation of and Let us continue with the shorthand A and x to denote the derivatives of the elements of A and x with respect to respectively. We will let S be the total derivative of S()with respect to and S the partial derivative. Our initial aim to obtain an expression for , to be used for the group delay. To simplify our notation let B = (A + S()), and = 1 xT S x. Differentiating the equation Bx = M x from (5) with respect to yields ((M A) S S )x + (M B)x = 0. (10)

Unfortunately (10) has both and x , but a simple trick, which we will use again and again, untangles the situation and gets rid of the x term. Because xT (M B) = 0, multiplying (10) by xT , implies xT (M A S )x xT S x = 0. If x is chosen such that xT M x = 1, then = xT (A + S )x/. (12) In our example in (1) where S() = s()eq eT , s is always q negative, so = 1 s (eT x)2 is never zero. q In the case B is complex hermetian so that if B = C + iD, then B = B H where B H = (C iD)T , then in (11) and (12), xH can be substituted for xT and the result will still hold. If M is dependent on , the formula for would be = xT (A + S + M )x/. In [13] the group delay was computed using (12) with S = 0. In the program developed by Lenehan and Reid, the dispersion was determined numerically using the group delay data at various wavelengths. Thus (12) is not new, but its derivation was included here because the same technique will be used to derive analytic expressions for the dispersion derivatives. To obtain to be inserted into the formula for the dispersion in (9), we differentiate equation (10) which produces (( M A ) (S ) (S ) S ))x + 2( M B )x + (M B)x = 0 (13) (11)

With the formula in the dispersion given by (2), assume for our sample problem that is dened by (6). Let us use , , kcl to denote the derivatives of , , and kcl with respect to . Differentiating (6) with respect to yields 2 2 + 2( )2 = / 2 + 2kcl + 2kcl kcl . Thus g = = 2 ( / 2 + 2kclkcl )/(2). and because = 2c we have the following formulae for the dispersion: 2 2 = 2 ( / 2 +2kcl +2kcl kcl 2g 2 )/(4c)). (9) The terms involving kcl in (9) are rather easy to obtain if one assumes that at the cladding n(r, ) = e(), where e() is the index of pure silicon dioxide, so that kcl = e()/c. Then

because M is independent of . Multiplying equation (13) by xT , and = xT ((S ) + (S ) )x renders xT M x = (xT A x 2xT ( M B )x + )/.
T

(14)

Differentiating the equation x M x = 1 with respect to yields T x M x = 0, (15) since M = 0, which with (14) suggests that is given by = (xT A x + 2xT B x + )/, (16)

which calls for the determination of x . From (10) and (12) one gets (17) (M B)x = ( M B )x. Unfortunately (M B) is singular, but if is not a multiple eigenvalue of B, the vector x can be determined uniquely by using the condition x T M x = 0. Again for complex hermitian matrices, the formulae hold with xH substituted for xT . 2.2 The gradient of the dispersion Since our aim is to determine the parameters, on which the index prole depend, that give a prescribed dispersion, the derivatives of the dispersion with respect to these parameters are needed by most optimizers. Our frustrating experience in trying to determine a proper step size for numerical differentiation led to this quest for an analytic gradient. Referring to (9), we note that the terms with kcl are independent of the design parameters and we thus are concerned with the term. At rst glance (16), (17), and (12), seem to require the derivative with respect of each of the design parameters of , , , x, and x , but before we move forward, we should remember that our function is a derivative itself and that we can interchange the order of differentiation. Although interchanging the order yields equivalent formulae, the work involved in computing these formulae may be different. Thus we will look at several equivalent formulae for some of the quantities and theoretically and computationally compare them. Let us use the convention that k is the derivative of with respect to the kth design parameter, Ak is the derivative of the A matrix with respect to the kth design parameter, Mk is the derivative of M with respect to its kth design parameter, etc and try to derive an expression for k . The total derivative of S with respect to the kth design parameter witll be devoted by Sk and its partial derivative by Sk . Differentiating the equation Bx = M x with respect to the kth parameter in the model yields ((M A)k Sk k S )x + (M B)xk = 0, (18)

Multiplying (22) by xT eliminates the last term on the left hand side of the equation leaving xT (M B)k x = xT (M B) xk xT (M B)k x . (23) T Because xT (M B) xk = x (M B)xk = T x (M B)k x, equation (23) is equivalent to xT (M B)k x = 2xT (M B) xk , (24)

which with the conditions M = 0, and conditions (11), (15), and (21) yields k = (xT (Ak )x + + 2xB xk )/ (25)

where = xT ((Sk ) + k (S ) )x . However, the formula given for k in (25) is not the only formula one can derive. Equation (23) is equivalent to xT (M B)k x = 2xT (M B)k x , which with the condition in (15) yields k = (xT (Ak )x+ +2xT Bk x 2xT Mk x xT Mk x)/. (27) The formula in (27) does not contain any reference to xk , which is included in the formula in (25). If xk is needed to compute the gradient of the effective area than the formula in (27) seems cheaper. In practice the formulae in (25) and in (27) only if is sufciently accurate. Continuing onto k , we are again faced with several different formulae. One formula for k comes from differentiating (24) with respect to which along with the conditions (11), (15), and (21) gives k = (xT Ak x + 2(xT B xk + xT Bk x + xT B xk + xT B x ) + )/. k (26)

(28)

which when multiplied by xT eliminates the last term on the left hand side of the equation. Imposing the condition xT M x = 1 implies k = xT (Ak + Sk Mk )x/. With the above formula for k we can get from (18) (M B)xk = (k M + Mk Bk )x. (20) (19)

where collects the terms other than uk xT S x in xT S()k x. The formula for k in (28) requires the computation of xk , which can be obtained by differentiating (22) with respect to giving (M B)xk = ( M B)k x( M B )xk (M B)k x . (29) Since xT M x = 0, one can add the condition xT M xk = x T M xk xT Mk x . Another formula for k comes from differentiating (26) with respect to which gives xT (M B)k x = 4xT (M B)k x 2x (M B)k x 2xT (M B)k x . Since xT M x = 0, xT M x = x M x , and one can show that k = (xT Ak x + 4xT Bk x + 2x Bk x + 2xT Bk x + xT Mk x 4 xT Mk x 2(x Mk x + xT Mk x ))/. (32)
T T T T

Differentiating the equation xT M x = 1 with respect to the kth design parameter yields the additional condition that 2xT M x = xT Mk x. k (21)

(30)

With the formulae for given in (12) and k given in (19), let us try to derive a formula for k . Differentiating (10) again with respect to the kth parameter yields (M B)k x+(M B) xk +(M B)k x +(M B)xk = 0 (22)

(31)

The vector x in (32) can be obtained by differentiating the formula (M B)x = (M B) x again with respect to giving (M B)x = 2( M B )x (M B) x, (33)

in addition to the condition in (31). 2.3 Operation counts and implementation details for dispersion calculations Since every gradient calculation may involve 10 eigenvalues, 20 design parameters, and 30 combinations of m and in (1), the operation count for any one gradient evaluation might be important in practice. The rst formula for k in (28) comes from differentiating with respect to the design parameters the formulae for the dispersion, but it can be the less efcient approach. The second formula in (32) comes from twice differentiating with respect to the formula for derivatives of the eigensystem with respect 2 2 to the design parameters. Because pk = pk , the two formulae are equivalent, but they involve matrix multiplications with different matrices which can have different zero structures. The relative efciency of the various formulae given for the gradient of the dispersion depends primarily on the number of solves required. Certainly it is more cost effective to compute x once for all the design parameters and use the formula for k in (27) and k in (32) than to compute xk and xk for each of the K design parameters and use (28). Since both formulae require x and x, then for 20 design parameters, the formula for k in (32) requires 2 solves, while that in (28) requires 41 solves. For a 2 dimensional problem, the cost of the solves denitely dominate and the algorithm in [11] preserves symmetry and is generally faster than unsymmetric Gaussian elimination. The relative efciency also depends on the zero structure of the matrices involved. If a 2 dimensional model gives rise to a nite difference ve point model or a nite element 7 point model, that structure should be used. For the simple sample problem dened in (1), the B matrix is tridiagonal and B is diagonal. For the concentration parameters (heights and depths) Mk is identically zero and Bk , Bk , and Bk are only nonzero in the regions in which they dene the concentrations. Looking at the formulae for k in (32), it is obvious that the work in every term is either 0 or signicantly reduced for these parameters. On the other hand, the work in only 2 out of the 5 terms for k in (28) is reduced, since xk and xk can be full vectors. One can suggest several ways for improving the efciency of the implementation. Perhaps one of the most striking features of the formulae for x in (17), xk in (20), xk in (29) and x in (33) is that they all involving solving a linear system with the same coefcient matrix (M B). Thus one should create a decomposition of that matrix once and save it for many right hand sides. Expressions are sometimes repeated. For example, (16), (12), and (17) contain the vector A x. Similarly, during the computation of the derivatives with respect to the design parameters, Ak x appears both in the formulae for k in (19), xk in (20), and k in (32). The product Ak x appears in (32)

and (27). More striking is the appearance of Mk x 6 times in (27), (32), and (19), 3 of them in the form xT Mk x and twice in the form xT Mk x . Table 1: Dispersion algorithm and multiplication count 1. Set w = A x, w = A x,v = M x 3q 2. Set = w T x/, add s () to wq q 3. Solve (M B)x = v + w a. form right hand side q b. LU decomposition 4q c. Solve 4q 4. Make x T v = 0 2q 5. Set = (xT w + 2w T x + )/ 2q The cost of computing the dispersion for each eigenvalue is probably less than solving the nonlinear eigenvalue problem, but the cost of computing the gradient of the dispersion is substantially more. The operation counts in Tables 1 and 2 for the sample problem use Gaussian elimination. The multiplication count in Table 2 assumes the worst case scenario in which Ak for the Maxwell equations is tridiagonal with no nonzero rows . If one had 20 design parameters, Table 2 yields 446q multiplications using (32) for k and 686q multiplications using (28), an increase of 53 percent over (32). Table 2: Dispersion derivatives and Multiplication count Steps done once. Not done for each design parameter: 1-4. As in Table 1 15q 5. Set y = B x , z = w + 2y 2q 6. Set = (xT z + )/ q Add s () to last elements of z and w 7. Solve (M B)x = v 2 v + z 6q T 8.Make vT x = v x 2q Total 26q Steps done for the kth design parameter: 9. Set wk = Ak x, vk = Mk x, and = xT vk 5q T 10. Set k = (wk x )/ q Add sk () to last element of wk 11. Set wk = Ak x q Using (32) for k T 12. Let = vk x and yk = Bk x 2q 13. Set k = (xT (wk + 2yk ) 2 + )/ 2q Add sk () to last element of wk 14. Set k = (xT Ak x + 4x T wk 4 + T T 2wk x + 2x T yk 2(x T Mk x + vk x + )/. 8q Total for 9-14 21q Using (28) for k 12. Solve (M B)xk = k v vk + wk 5q T 13. Make xT v = vk x/2 2q k 14. Set t = A xk + wk and d = t + Bk x 4q 15. Set k = (xT z + )/ q Add sk () to last elements of d and t 16. Solve (M B)xk = d k v vk k v . 7q 17. Make vT xk = v T xk . 2q T T 18. Set k = (x Ak x + 2(w xk + wT xk + tT x ) + )/. 5q Total for 9-11 and 12-18 33q 2.4 Dispersion slope The dispersion slope calculation is sometimes done when trying to design a dispersion compensating ber to restore the signal that has spread itself over more wavelengths than

desired. The dispersion slope requires which can be expressed in a number of ways stemming from differentiating (10) twice giving (M B) x + 3(M B) x + 3(M B) x + (M B)x = 0. (34)

Multiplying (34) by xT removes the last term on the left hand side from consideration leaving us with xT (M B) x = 3xT (M B) x 3xT (M B) x (35) which leads us to the formula = (xT A x + 3xT B x + 3xT B x 3 xT M x + )/. (36) where x is given by (33) and has all the terms in xT S ()x except xT S x. However, one can bypass the computation of x by combinT ing (35), (33), and the fact that xT (M B) = x (M B) to obtain xT (M B) x = 6xT (M B) x 6x (M B) x . (37) Because xT M x = 1 and xT M x = 0, (37) reduces to = (xT A x + 6xT B x + 6x B x 6 x M x + )/. (38) The signicance of the formula for in (38) is that it does not involve x . On the other hand, if one is computing x anyway in order to compute the gradient for the design parameters, then the formula for in (36) is essentially a byproduct of the work for computing x . 2.5 The gradient of the dispersion slope The derivative of the dispersion slope with respect to its parameters requires k . Our experience with the gradient of the dispersion suggest that it is more economical to compute k by rst differentiating the eigenvalue with respect to the design parameters and then to taking its the third derivative with respect to . Differentiating (30) with respect to yields xT (M B)k x = 6(xT (M B)k x + xT (M B)k x + x (IM B)k x +
T T T T

and xT M x + 3x M x = 0. The formulae in (38) and (40) are strikingly simple. To compute the dispersion slope using (38) one needs only x and x , both of which are also needed for the dispersion calculations. The formula in (40) given for the calculation of the derivative of the dispersion slope with respect to the design parameters does not require xk or xk ; it just involves x which can be done once and used for every design parameter. With the steps for computing and k given in Table 3, it costs less to compute these quantities than to use (28) for the gradient of the dispersion. . For the original problem in (1) for 20 parameters it would cost 579j operations to compute the gradient of the dispersion slope, much less than the alternative formula in (28). Table 3: Dispersion slope derivatives ands multiplications required Steps done once. Not done for each design parameter: 1. Proceed as in Table 2 steps 1-8. 24q 2. Additional one time costs a. Set w = A w q b. Set r = w + 3y + 3B x 3q c. Set = (xT r 3 vT x + )/ 2q d. Add s to last elements of w and r. e. Solve (I B)x = r x 3 x 3 x . 7q f. Make vT x = 3vT x . 2q Total 39q Steps done for the kth design parameter: 1. Proceed as in Table 2 steps 9-13. 11q 2. Compute k a. Set wk = Ak x, vk = Mk x , = x T vk 3q b. Set k = (xT wk + 4x T wk + 2x T yk + T T 2wk x 4 2( + vk x + )/. 5q Add sk to last element of wk T 3. Set k = xT Ak x + 2wk x + T T 6((wk + Bk x ) x + (wk + yk ) x ) + T T 6( + (vk x + ) + vkT x ) 2vk x )/ 8q Total 27q III. C OMPUTATIONAL
EVIDENCE

To indicate the difference between the efciencies of the two formulae for k on a somewhat realistic example we T x (M B)k x ) 2xT (M B)k x . (39) augmented the problem that was posed in Lenahan and Friedrichsen [13] for m = 0 involving an initial region of Imposing the conditions xT M x = 0, xT M x + x T M x = silicon dioxide doped with Germanium by a sequence of four 0, and its derivative 3x T M x + xT M x = 0 removes the constant step regions of particular concentrations. coefcients of the terms k , k , and k from (39) leaving us We will show how the matrices and their derivatives can be with efciently formed for the same problem. We will also show that the theoretical differences between the two formula for the k = (xT Ak x xT Mk x + 2xT Bk x 2xT Mk x dispersion gradient are reected in practice and that it pays to 6( xT Mk x + (xT Mk x + x T Mk x ) + x T Mk x exploit the sparsity structure of the derivative matrices. T T xT Bk x xT Bk x x Bk x x Bk x ) + )/. (40) The index prole for the core region had the form where has all terms in xT (S()k ) x except k xT S x. Differentiating (10) twice with respect to suggests that x in (40) can be determined by solving (M B)x = ((M B) x + (41) 3(M B) x + 3(M B) x ). (r, ) = e() + C(r)h(, sign(C(r)) (42)

where e()is the index of pure silicon dioxide, C(r) denotes the dopant concentration and h is a function of the Sellmeier coefcients [5] at and the reference frequency. In particular, given a reference wavelength 0 if C(r) was not a uourine

layer, we used h() = (g() e())e(0 )/(g(0 ) e(0 )) where g is the index of refraction of doped slica of 13.4 per cent gernmanium, and if C(r) was a uourine doped layer we used h() = (f() e())(f(0 )/(f(0 ) e(0 ))) where f is the index of 1 per cent uorine. In our example the we considered 6 regions with C(r) in (42) dened by 12(r/rl ) 1 ( )1/2 1) 0 r rl1 12 C(r) = y rlj1 < r rlj ; j = 2, .., 5 j 0 rl5 < r rl6 (43) where the rlj is the outer radius of the j th region. As one is performing an optimization in which the widths of the region are allowed to change, the number of gridpoints per region may change as the grid is adapted to the current problem and the amount of work per gradient evaluation of the dispersion may change. To give one instance we will assume a uniform grid was used in the nite element discretization and rl1 was set at 100 units, rlq = rlq1 +100 for q = 2, 3, 4, 5 and rl6 at 600 units, but the the radius from rl2 through rl5 were considered parameters that could be varied. The four constant layers alternated between a layer of Fl doped silicon dioxide of constant concentration and a layer of Ge doped silicon dioxide of constant concentration. The last layer was a cladding region of pure silicon dioxide. Let ri denote the radius of the ith grid point. Let mi = (2ri + ri1)(ri ri1)/6 and mi+ = (2ri + ri+1)(ri+1 ri)/6. (45) The lumped diagonal mass matrix M determined from the nite element discretization has components mi,i = mi + mi+ , and the subdiagonal of the A matrix has the form ai+1,i = ai,i+1 = (ri + ri+1)/(2(ri ri1)). (46) (44)

t3,i = mi 2C(ri), t4,i = mi+ 2C(ri+1), we can rewrite the potential term as vi = u1(, ri )t1,i + u1(, ri+1 )t2,i + u2 (, ri)t3,i + u2 (, ri+1)t4,i.

(48)

Interior to layers, the dopant concentration would not change sign and would could reduce (48) to vi = u1(, ri )(t1,i + t2,i) + u2 (, ri)(t3,i + t4,i). Notice that u1 and u2 are dependent on and on the sign of the concentration. Also notice that the t quantities are independent of so that with (48) we have separated variables. There are several advantages to (48). Once the t vectors are formed, to obtain A and A means just multiplying these vectors by the derivatives of the four scalar quantities composed of the u1 and u2 for positive and negative values of C. Thus aii = u1(, ri )t1,i + u1(, ri+1 )t2,i + u2(, ri )t3,i + u2(, ri+1 )t4,i.

(49)

On a uniform grid mi,i = i for i = 1, ..., 600 and ai+1,i = i + .5. The diagonal of the A matrix is given by aii = di + vi (), where di takes care of the rst term of the differential equation and is independent of so it does not enter into the A calculation and vi takes care of the rest. We set d1 = 1.5 and for i = 2, . . . , 599 di = (ri +ri1 )/(2(ri ri1))(ri +ri+1 )/(2(ri ri+1 )), (47) which for a uniform grid means that di = 2i. The potential term has the form vi = 2 (mi (n(ri, )2 e()2 )+mi+ (n(ri+1, )2 e()2)). Since (n(r, )2 e() ) = C(r)2 h(, sign(C(r))2 + 2C(r)e()h(, sign(C(r)), if we set u1(, ri ) = 2 h(, sign(C(ri ))2 , u2(, ri ) = 2 e()h(, sign(C(ri )), t1,i = mi C(ri)2 , t2,i = mi+ C(ri+1)2 ,
2

Secondly, if for a single function and gradient evaluation, one keeps the same grid for say 10 values of , one can precompute the t vectors and just do a few scalar/vector multiplications for each A, A , and A . Thirdly, to nd the derivatives of v, v and v with respect to the design parameters really entails nding the derivatives of the t vectors with respect to the design vectors and substituting these derivatives for t in (48). Again these derivatives of the t vectors with respect to the design parameters can be computed once and saved for several values of and for computing Ak , Ak , and Ak for each . In our example the reference wave length was .80, the wave length was .85, which corresponded to a reference frequency of .0628319 and a frequency of 0.0591359 respectively using the formula that = 24.4/550 where 4.4 represents the core radius measured in microns and 550 indicates the gridpoint at that core radius. For this frequency, e() was 1.45291 and for a germanium layer, h() was 1.44493 and for a uorine layer, h() was 1.45288. Using c = 4.4/550 2 2 in (7),the value of kcl was 115.346 , the value of (kcl ) 2 was 3936.03, and the value of (kcl ) was 68151. The value of was set to 25, was set to .003 in (43) and the ys for consecutive layers were (-.08, .02, -.05, .01). Thus the refractive index prole looked like Figure 1. The only positive eigenvalue was 0.00136171, was 0.0611100, and was 1.44672. For this example s() was -22.6354, s () was -1.00023, and s () was 2.02763102. This value of gave a of 11.6885. With a conversion factor z = 10000/3, the group delay g was calculated as 2 z( + kcl )/(2) or 8151460. The dispersion was calculated 2 2 as z ( + (kcl ) 2(g /z)2 )/(4)) or 941988. Figure 2 indicates the only positive eigenvalue as a function of and the calculated value of as a function of wavelength. The rst six parameters dened concentrations (, , and y2 through y5 ) and the last four parameters dened widths of the the constant step regions. For the concentration parameters

Fig. 1.

Sample refractive index prole

Fig. 3.

The effect on changing the paramets on the dispersion

Fig. 2. eigenvalues and second derivative of eigenvalues as a function of wavelength for the prole of Figure 1

Mk was identically zero and Ak , Ak , and Ak were diagonal and only nonzero in the regions in which they dened the concentrations. For example, the derivative of A with respect to y2 is diagonal and is given by (rl1 + 1/3)(u1y2 + u2) A 2i(u1s2 + u2 ) ii = y2 (rl2 1/3)(u1y2 + u2) 0 for i = l1 for l1 < i < l2 for i = l2 elsewhere.

where u1 = u1 (, sign(C(rl2 )) and u2 = u2(, sign(C(rl2 )). Now consider differentiating A and M with respect to a parameter that represents the width of a layer ending with rlj , (j) and let ri represent the derivative of ri with respect to that parameter. We then have for i < lj1 0 (i lj1 + 1)/(rlj rlj1 ) for lj1 < i lj (j) ri = for lj < i l5 1 (i l5 )/(rl6 rl5 ) for i > l5 . (50) To determine the derivatives of M with respect to the width parameters involves differentiating (45) and (44) and substituting (50) where appropriate. To determine Ak one should differentiate for the width parameters requires differentiating (46 ) and (47), and (45) and (44) in the formulae for v and substituting (50) where appropriate.

The gradient of was (-1.19e-10, -5.24e-6, -2.10e-3, 5.90e-1, -2.36e-3, -1.57e-5, 6.71e-2, 2.00e-1, 5.20e-4, -6.56e7). The gradient of the dispersion was ( -9.05e-5, -4.13, 1.08e3 , 2.91e5, -1.20e3 , -1.1 94e1 , 3.31e4, 9.89e4 , 2.73e2 , -5.01e-1 ) which suggested that changing y3 and the length of the third region would have the greatest impact in changing and the dispersion. The gradients computed via (32) and (28) agreed to 10 decimal places. In Figure 3 we see that changing parameter 4, corresponding to y3 has more of an impact than changing either parameter 8, the length of the third interval, or changing parameter 3, y2 , as our values of the gradient suggest. Because y2 is negative, making it less negative increases the value of the dispersion. Table 4 gives some of our computational experience on (1) for calculating the dispersion and gradient with respect to the 10 parameters using various formulae. The programs were run on a Sun Enterprise 450 system with four Ultra SPARC II processors using the g++ compiler and for all versions all the multidimensioned arrays were laid out similarly so that the inner loops had strides of 1. To obtain the times, each segment of the code was run 100 times and the times were divided by 100. None of the times for calculating the dispersion and solving the nonlinear eigenvalue problem included the time for creating the matrices. The eigenvalue time was inserted so that one could see that initially the dispersion calculation was a major contributor to the computation time. All the dispersion codes took advantage of the fact that A and Ak were tridiagonal and the other matrices were diagonal. The code for the dispersion gradients, cited in Table 4, which had the largest computational time was a C++ version of the FORTRAN program written by Lawrence Cowsar and Linda Kaufman in 1998-2000 at Bell Labs. It used (28) for computing the dispersion gradient, did not refactor the matrix for the solves but otherwise did not consider common subexpressions as outlined in Table 2. Another reason for the increased computational time was that the code reverted to the standard eigenvalue formulation for the solves with the coefcient matrix (G I) where G = M 1/2BM 1/2 and thus the program had to continually transform vectors back and forth from the standard domain to the generalized domain. Our computational evidence supports the theory that using

(32) is faster than using (28) for the dispersion gradient. Eliminating the solves for the gradient had a deeper impact in practice than in theory because we were removing sequential operations and keeping inner products that used optimized BLAS[2]. When we took into consideration sparsity, we used the fact that the nonzeroes were in consecutive rows. We considered a row nonzero during the gradient calculations if either that row was nonzero in either Mk or Ak and used 2 one-dimensional arrays, one to indicate the rst row of nonzeroes in Mk or Ak and the other to indicate the last row of nonzeroes in either Mk or Ak . Accidental zeroes were not considered. The data indicates, as was pointed out in our discussions, that it paid to take into consideration the zero structure of Mk , Ak , Ak , and Ak . Table 4: Time(in ms.) for dispersion gradient calculation Eigenvalue computation 12.9 Using (28) without common subexpressions 43.3 Using (28) and Table 2 but not zero row structure 33.6 Using (32) and Table 2 but not zero row structure 15.8 Using (28), Table 2 and zero row structure 25.4 Using (32), Table 2 and zero row structure 7.0 IV. C ONCLUSION We have presented two formulae for computing the gradient of the dispersion and suggest an efcient algorithm for computing the dispersion of the slope. Our computational evidence on a sample problem supports our theoretical evidence that it is faster to use a formula for the gradient of the dispersion that rst differentiates the original generalized eigenvalue problem with respect to the design parameters and then with respect to the frequency. We have shown that taking advantage of the zero structure of the matrices reduces the time for our simple example to the point that the computation of the eigenvalues dominate. On a two dimensional problem the solve times should dominate the computation, and the differences between the times for the two gradient formulae should be even more pronounced. Although our computational results were only given for our motivating one dimensional problem, the formulae are based only the algebraic form given in (8), which does not stipulate whether the problem is one dimensional or comes from a holey ber simulation. It is independent of whether it comes from a largely linear eigenvalue problem or a predominantly nonlinear one. It does not matter if (8) comes from a nite element, a nite difference, a plane wave, or a Galerkin formulation. Our formulae are a post processing step and the user may choose his favorite way to solve the eigenvalue problem. We are looking forward to investigating how easily the formulae can be modied to handle an imaginary distance beam propagation technique which gives rise to complex matrices. ACKNOWLEDGMENT I thank Bill Reed, formerly of Bell Labs, for bringing the ber optics problem to my attention, Lawrence Cowsar of Bell Labs for providing the program for determining a good discretization of the pde and the optical properties of the ber and Edwin Torres, an undergraduate at William Paterson

University, with assistance in programming the dispersion formulae. R EFERENCES


[1] C. Bischof, L. Roh, A. Mauer-Oats,ADIC- an extensible automatic differentiation tool for ANSI-C. Software: Practice and Experience , Vol. 27, No.12, 1997, pp.14271456. [2] www.netlib.org/blas/ [3] F. Brechet, J. Marcou, D. Pagnoux, and P. Roy, Complete analysis of the characteristics of propagation into photonic crystal bers by the nite element method, Opt. Fiber Technol Vol. 6, 2000 pp.181-191 . [4] K.H. Burrell, Algorithm 484: Evaluation of the modied Bessel functions K0(Z) and K1(Z)for complex arguments, Communications of the ACM Vol. 17, 1974, pp. 524-526 [5] J.W. Fleming,Material Dispersion in Lightguide Glasses, Elect. Lett, Vol. 14, No.11, May 1978, pp. 326-328. [6] P.E. Gill, W. Murray, and M.A. Saunders. SNOPT: An SQP algorithm for large-scale constrained optimization, SIAM Journal on Optimization Vol. 12 (2002), pp.979-1006. [7] S. Guo, F. Wu, S. Albin, H. Tai, and R. Rogowski ,Loss and Dispersion analysis of microstructured bers by nite difference method, Optics Express, Vol. 12, No 15, 2004, pp.3341-3352 [8] S. Johnson and J. Joannopoulos, Block-iterative frequency-domain methods for Maxwells equations in a planewave basis, Opt. Express Vol. 8, 2001, pp.173-190 [9] S.V.Kartalopoulous, Introduction to DWDM Technology , IEEE Press, Piscataway New Jersey, 2000. [10] L. Kaufman, Eigenvalue Problems in Fiber Optics Design, SIAM J. on Matrix Analysis and Applications , January 2006, pp.105-117 [11] L. Kaufman, The retraction algorithm for factoring banded symmetric matrices, submitted to Numerical Linear Algebra with Applications, May 2006. [12] P. Lancaster, On eigenvalues of matrices dependent on a parameter, Numerische Mathematik, Vol. 6, 1964, pp.377-387 [13] T.A. Lenahan, Calculation of Modes in an Optical Fiber using the Finite Element Method and EISPACK, The Bell System Technical Journal, Vol. 62, No. 9, November 1983, pp. 2663-2694 [14] D. Marcuse, Field deformation and loss caused by curvature of optical bers, J.Opt.Soc.Am Vol. 66, No.4, April 1972, pp. 311-320 [15] http://www-neos.mcs.anl.gov/ [16] W. A. Reed, Fiber Design for the 21st Century, Optical Society of America,San Francisco, 1999 [17] K. Saitoh and M. Koshiba Full-vectorial imaginary-distance beam propagation method based on a nite element scheme: application to photonic crystal bers, IEEE Journal of Quantum Electronics, Vol. 38, No.7, 2002, pp.927-933 [18] K. Saitoh, M. Koshiba, T. Hasegawa, and E. Sasaoka, Chromatic dispersion control in photonic optical bers: application to ultra-attened dispersion, Optics Express, Vol.11, no.8, 2003, pp.843-852 [19] G. Strang and G.J. Fix, An Analysis of the Finite Element Method, Prentice Hall, Englewood Cliffs, 1973.

Potrebbero piacerti anche