Sei sulla pagina 1di 28

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Convergence Rate Estimates


for the Conjugate Gradient Method
Igor Kaporin

Parallel Computing Lab.,


Dept. Applied Optimization Problems
A. A. Dorodnicyn Computing Center
Russian Academy of Sciences, Moscow, RUSSIA
The Rome-Moscow School of Matrix Methods
Moscow Part: September 10-15, 2012

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Outline of the talk


 CG as the Optimum Krylov Subspace Method
 Spectral bound convergence rate estimates
 New estimates via the K-condition number
 The Preconditioned CG method
 Preconditioning via K-optimization

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

CG as the Optimum Krylov Subspace Method (1)

Consider a system of linear algebraic equations


Ax = b;
x 2 Rn; b 2 Rn;
AT = A > 0;
where A is large, sparse, and not well-conditioned.
The CG approximations xk to the solution x of the
linear system are constructed from the initail residual
r0 = b Ax0 in the form
xk = x0 + r0 (1k) + : : : + Ak 1r0 (kk) ;
where the scalar coecients are chosen such that
f (1k); : : : ; (kk)g = arg min kx xk kA.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

CG as the Optimum Krylov Subspace Method (2)

The CG algorithm is as follows:


r0 = b Ax0,
p0 = r0;
for i = 0; 1; : : : :
i = riT ri=pTi Api,
xi+1 = xi + pi i,
ri+1 = ri Api i,
i = riT+1ri+1=riT ri,
pi+1 = ri+1 + pi i.

Note that rk = b Axk = k (A)r0.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

CG as the Optimum Krylov Subspace Method (3)

Therefore, one has

k (i)
X
k ti;
k (t) = 1

x xk  A 1k (A)r0;
i=1
and the solution is well approximated even for k  n,
a sucient condition for which is kk(A)k  1.
Indeed, by the optimality of k (A) one has
kx xk kA = kk (A)r0kA 1  kk(A)kkr0kA 1

= kk (A)kkx x0kA  k~(A)kkx x0kA;


where ~k () is any polynomial such that
deg ~k  k;
~k (0) = 1:

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

CG as the Optimum Krylov Subspace Method (4)

Since A is SPD, it holds (e.g. by the spectral decomposition of A)


k~k(A)k = i=1
max
j

~
(

)
j
;
i
k
;:::;n
where i = i(A) > 0 are the eigenvalues of A
numbered in the nondecreasing order.
Using di erent particular choices of ~k (), one can construct various CG convergence estimates of the type
kx xkkA  max j~ ()j  '( ; : : : ;  )
n
1
kx x0kA =i k
the right-hand side of which is anyway dependent on
the spectrum of A.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Spectral bound convergence rate estimates (1)

The well-known standard result follows from


kx xkkA  max j~ ()j  max j~ ()j;
1n k
kx x0kA =i k
where ~k is expressed via a properly translated and
scaled kth degree Chebyshev polynomial Tk () of the
1st kind, which yields
0 s 1
!
kx xkkA  1.T n + 1 < 2 exp @ 2k 1 A :
k n 
kx x0kA
n
1
Recall that
!
q
q




k
k
1
T k (z ) = 2 z + z 2 1 + z z 2 1

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Spectral bound convergence rate estimates (2)

This estimate readily yields an iteration number bound


for the CG to converge with the relative precision ":
& s
'
2
1

n
k  2  log "
1
Note that in practice, an a priori estimation of the spectral bounds (more precisely, a control of their ratio) may
be impossible to perform.
However, a posteriori estimates of 1 and n are readily
available from i and i generated by the CG algorithm.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Spectral bound convergence rate estimates (3)

Thus we have the estimate


0
1
kx xkkA < 2 exp B@q 2k CA ;
kx x0kA
C(A)
where
C(A) = max((AA))  1
min
is the spectral condition number of an SPD matrix A.
In this case, for some  > 0,
A ! I
i
C(A) ! 1

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

New estimates via the K-condition number (1)

In the presented theory, a key role plays the matrix


functional K(A) de ned as
K(A) = (n 1traceA)n= det A
0 n 1n n
X A .Y
1
@
i = K(1; : : : ; n)
= n i
i=1
i=1

The latter holds by the well-known property


traceA =

n
X

i=1

i;

det A =

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

n
Y

i=1

i :

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

New estimates via the K-condition number (2)

In a complete analogy with the spectral condition


number, for the K-condition number it holds ( > 0)
A ! I
i
K(A) ! 1
when A is an SPD matrix. This is nothing but the
Arithmertic-Geometric Mean (AGM) inequality written
for f1; : : : ; ng.
First we demonstrate an elementary proof of a (rather
rough but instructive) estimate for the decrease of the
kx xk kA in the CG method.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

New estimates via the K-condition number (3)

Theorem 1. [Kaporin,Axelsson'00] Let A be SPD.


Then for any even k satisfying
2 log2 K(A) < k < n
it holds
kx xk kA < K(A)2=k 1k=2 :
kx x k
0 A

Note: this bound is not precise.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

New estimates via the K-condition number (4)

Proof. Let k = 2m. Using the general estimate with


~k (t) =

m
Y

(1 t=i)(1 t=n+1 i);

i=1

one readily gets


kx xk kA  max j~ ( )j 
max
j~ ( )j
;:::;n m k i
kx x0kA i=1;:::;n k i i=m+1

m
Y

max

i=1 m+1tn m
m 
Y

i=1

j(1 t=i)(1 t=n+1 i)j




K(i; n+1 i) 1 :

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

New estimates via the K-condition number (5)

Using an obvious consequence of the AGM inequality,


11=m
0 m 11=m 0 m
@ Y iA + @ Y (1 i)A  1;

i=1
i=1
with i = 1=K(i; n+1 i)

we obtain

00
m
kx xk kA  B@@ Y
K
(

;

i
n+1
kx x0kA
i=1

11=m 1m
C
A
1
)
A ;
i

and it only remains to prove that


m
Y

i=1

K(i; n+1 i)  K(A)

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

0 <  i < 1;

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

New estimates via the K-condition number (6)

The latter also follows the AGM inequality:


0 n 1 0 n 1n
X
Y
1
K(A) @ iA = @ n iA
i=1
i=1

0 0m
11n
m
n
m
X i + n+1 i X i + n+1 i
X AA
1
@
@
+
+
= n

i
2
2
i=1
i=1
i=m+1
0m
10 n m 1
!
2
Y i + n+1 i A @ Y A
@

i
i=1

0m
Y
@
K(i; n+1
=

Q.E.D.

i=1

i=m+1

10 n 1
Y A
A
@
i
i)

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

i=1

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

New estimates via the K-condition number (7)

The corresponding iteration number bound is


3
2
1
4
log
K
(
A
)
+
3
log(
"
)
77 :
6


k < 66
log 4 + logK(A)(" 1) 7
For a given 0 < "  1, this condition yields
kx xkkA  "kx x0kA
This can be shown setting t = K(A)2=k in the inequality
 t 
t 1  ( 1) 1  ;
 > 1;
t > 1:
Hence we establish a CG iteration number bound whicn
grows sublinearly with respect to log 1" .

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

New estimates via the K-condition number (8)

Concerning the roughness of the above estimate: an


unimprovable (best possible) bound was found, but it
relates to reduction of another error norm:

Theorem 2. [Kaporin'92,94] Let A be SPD. Then


for any even k satisfying
1k<n
it holds
krkk  K(A)1=k 1k=2 :
kr k
0

This means nearly 2 times reduction of the above


iteration number bound.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

New estimates via the K-condition number (9)

An example is shown in the Figure, where we consider


a matrix A of the order n = 50 with the prescribed
eigenvalues n+1 j (M ) = n2 + 1 j 2.
With the dashed line the A 1-norm of the residual and
its upper bound via C(A) are shown,
while the solid line corresponds to the Euclidean norm
of the residual and its estimate via K(A).
It is quite clear that the solid lines behave much more
similar to each other than the two dashed lines.
Note that the considered example has isolated smaller
eigenvalues and clusterized largest eigenvalues, which
is exactly the class of eigenvalue distributions which we
prefere to deal with.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

New estimates via the K-condition number (10)

The A 1- and H -norms of the residuals and


their upper bounds vrs. the CG iteration number

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

The Preconditioned CG method (1)

Obviously, if some condition number of the matrix A


is very large, then the CG method may require a huge
number of iterations to converge (especially in computer arithmetics) - despite of its optimality.
To overcome this drawback, a very simple but powerful
idea of preconditioning is applied.
Namely, let us substitute x = GT y, where det G 6= 0,
and solve the preconditioned linear system
GAGT y = Gb;
by the same CG method. This time, we will have the
preconditioned matrix M = GAGT instead of A in every
formula above!

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

The Preconditioned CG method (2)

Using appropriate substitutions, and denoting H =


GT G, we readily obtain the preconditioned CG method:
r0 = b Ax0,
p0 = Hr0;
for i = 0; 1; : : : :
i = riT Hri=pTi Api,
xi+1 = xi + pi i,
ri+1 = ri Api i,
i = riT+1Hri+1=riT Hri,
pi+1 = Hri+1 + pi i

Here we have rk = b Axk = k (AH )r0.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

The Preconditioned CG method (3)

Both the above iteration number bounds are (nearly)


proportional to
q
C(HA)
in the standard (spectral bounds based) CG theory, or
log K(HA)
when using the nth power of the arithmetic-togeometric mean ratio for the spectrum of HA
(i.e., the K-condition number) for the same purposes.
Hence, the central problem in the PCG theory is:

using easy-to-multiply by a vector matrices H ,


reduce the condition number of HA
- as low as possible

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

The Preconditioned CG method (3)

The K-optimization vrs. the C-optimization:

 the (second) estimate via K is as sharp


as the one via C;
 the estimates via K re ect the superlinear
convergence of CG, while the one via C does not;
 generally, the K(HA)-optimization can be feasible,
while C(HA)-optimization may not;
 K(HA)-optimization tends to clusterize the
spectrum of HA near the largest e.v.'s of HA,
while C(HA)-optimization may not.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Preconditioning via K-optimization (1)

A simplest possible example is the preconditioning


by a symmetric diagonal scaling (i.e., G = D)
M = DAD:
It can be shown that the K-optimality is attained when
D = (Diag(A)) 1=2:
That is,
D = arg min
K(DAD);
D
where the minimum is taken over the set of all
SPD diagonal matrices D.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Preconditioning via K-optimization (2)

Indeed, denoting i = (A)ii(D)2ii, one has


 1 Pn n
n i=1 i
1
K(DAD) = det DAD Qn   det DAD = K(DAD);
i=1 i

where D is de ned above. The AGM inequality shows


that the equality in the latter estimate is attained i
i =  > 0.
Finally, under a natural restriction  = 1, one gets the
required equality D = D.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Preconditioning via K-optimization (3)

It must be stressed that this (so-called Jacobi) scaling


is not optimum in the sense of the spectral condition
number C(DAD) for arbitrary SPD matrices. The exception are some special cases, e.g., consistently ordered matrices.
An example: the Toeplitz SPD tridiagonal matrix
T =[-1, 2, -1] is consistently ordered and therefore is
C-optimally scaled. Therefore, the inverse of it has
the same property since for any SPD A it always holds
C(A) = C(A 1).
At the same time, T 1 has non-constant diagonal and
therefore is not K-optimally scaled.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Preconditioning via K-optimization (4)

In a similar way, it can be shown that the Block Jacobi


preconditioning for an SPD matrix is K-optimum over
the set of all block-diagonal matrices with prescribed
structure.
However, we prefer to consider this structure later as a
particular case of a more general construction.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

References
 [Kaporin'92] Explicitly preconditioned conjugate gradient
method for the solution of nonsymmetric linear systems. Int.
J. Computer Math., 40, 169{187, 1992.
 [Kaporin'94] New convergence results and preconditioning
strategies for the conjugate gradient method. Numer. Linear
Algebra with Appls., 1, no.2, 179{210, 1994.
 [Kaporin,Axelsson'00] On the sublinear and superlinear rate
of convergence of conjugate gradient methods. Numerical
Algorithms, 25, 1{22, 2000.

Please purchase VeryDOC PS to PDF Converter on http://www.verydoc.com to remove this watermark.

Potrebbero piacerti anche