Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
extremization problem
Narayanan Krishnamurthy1
for ECE-2671 taught by Dr. Simaan A. Marwan2
Abstract. The subject of this technical report is the evaluation of different unconstrained optimization methods; that determine the extremum points of the given objective
function(s). The analysis is done using Matlab, and assumes the same starting point and
termination criteria for comparing the different iterative algorithms.
Key words. Gradient Methods, Method of Steepest Descent, Conjugate Direction
Method, Newton-Raphson Method, Region Elimination Methods, Bisection Method, Equal
Spacing Method, Fibonacci Search, Golden Search, Polynomial Approximation,
1 Department of Electrical and Computer Engg. 2*Professor ECE Department University of Pittsburgh,
Pittsburgh, PA 15261
Narayanan Krishnamurthy
Contents
1 Introduction
4 Gradient Methods
4.1
4.2
5
8
4.3
10
12
12
14
16
7 Conclusion
17
17
17
18
19
21
23
25
27
Narayanan Krishnamurthy
Introduction
The study of methods for extremizing a function f (x) over the entire Rn is an unconstrained
optimization problem.An important assumption for using most of the algorithms discussed
here to minimize the function f (x) is that the requirement that the function be differentiable
over the chosen domain, and that partial derivatives of at-least first order exist, namely the
Gradient of the function. Some algorithms require the existence of second order derivatives
and positive definiteness of the Hessian - the second order derivative matrix. These assumptions on the behavior of the function f (x) and its Taylor approximation is taken up next
[3].
The First-Order Necessary condition for a relative minimum point x of f (x), over Rn , for
any feasible direction d(the feasible directions are the directions used in the neighborhood
of a point, to evaluate the functions behavior in that neighborhood); satisfies the following
condition:
f (x )T d 0
(2.1)
f f
f
f (x ) =
,
,...,
x1 x2
xn
T
(2.2)
The Second-Order Necessary condition for a relative minimum point x of a twice differentiable function f (x) is:
f (x )T d 0
if f (x )T d = 0,
(2.3)
then dT 2 f (x )d 0
(2.4)
where f (x )T d = 0 condition occurs if the extremum point x is an internal point with all
possible feasible directions d in the functions domain Rn . 2 f (x ) is the Hessian or second
Narayanan Krishnamurthy
f (x ) =
2 f
,
xi xj x i,j
where 1 i, j n
(2.5)
The Second-Order Sufficient conditions for a relative minimum point x of a twice differentiable function f (x) is:
f (x ) = 0 and 2 f (x ) is positive definite
(2.6)
The Taylors approximation of the function f based on the first and second variants of
f is given by:
1
f (x) = f (x0 )+ < f (x0 ), (x x0 ) > + < (x x0 ), 2 f (x0 )(x x0 ) >
2
+ Higher Order Terms
(2.7)
where < ., . > is the notation for inner product between vectors in Rn , for more details
on vector spaces and FONC/SONC and sufficiency conditions refer [2, 4].
Two objective functions were provided to evaluate the performance of the various algorithms,
the initial iteration guess x0y 0 is chosen as [2 2]T and termination condition of the iterative
search for the relative minimum is ||f (xky k)|| < where = 0.05.
(3.1)
(3.2)
The objective functions f1 and f2 are common test functions used to test convergence
properties of any new algorithm. f1 was introduced by Witte and Holst (1964), while f2 was
proposed by Rosenbrock (1960).
Iso-lines or contours are good to visualize the gradient or descent/ascent direction of the
functions, and to locate the minima or maximum, especially if the function is dependent on
just two variables.
f (x ,x ) = (x x ) + (x 1)
90
72
0.9
1
1
458
871
.91
1.9
87
0.5
1.5
71
6.38
37
282.2
9032
188
.193
55
94.0
37
6.
38
71
1
94 88.
.09 19
67 355
74
2
74
645 4
5.0 77
67
8. 65
10340.96
5
0
6
9 7
58
9
4. 87
.870
846
56 .483
9
0
2.7741
47
32
90
2.2
28
5
5
93
8.1
18
74
67
9
.0
94
9677
0.5
0
0.5
1
77
94
96
.0
.0
96
94
77
18
8.
19
35
94.096774
9355
188.1
0.5
94
28
2.2 1 .096
77
90 88.
4
6556470.4376 32 1935 4
8.6.58 83 .38
5
77 06 87 71
42 5
6.3
1.5
1
2 87 7
03 .3 38
29 376 .48
0
47
58
87
729
0.9
94
1.
91
1411.4516
548
1317.3
75
658.67742
564.58065
470.48387
37
2.
03
2.
2
45 2
86 74
4. .83 32
5 10
8
6.
0.5
10 112 122
3
3
2. 846 9 5.069.161 .2581
77 .8 40 45 3
41 70 .967
9 97 74
75
28
1
916
3.8
0.5
1 2
28
2.
29 18
47
03 8.1
5
93
65 64 0.4 376 2
55
8. .5 83 .3
67 80 8 87
75 74 65 7 1
2.7 2
8 7
10 46.419
1135.0 870
9
2
14 9.1645 974
61 0 .
11
3 9
.
1176931541511631 1 6774
87 .79950 7.22
.8341.654.5 353.2
879 5248 48 58
4
1
3
2. .89
91 1
8 7 61
1
458
1.9
2.5
x2
871
2.91
4.
8
5. 64
83 52
3.
7
89
7 6 42
16 2
8. .78.810
75 3 3
1 .91 1.94 0.97
61 23 2
87 58
29
1
1
3
03
2
1.5
3.5
2
74 5 7
67 0 6 8
8. 58 83
2
74
65 64. 0.4
03 5
67
5 47
29 35
2. 19 4.09
28 88.
9
1
2.5
2 42
206
3
03 7
13.67
19
47
81 83
.70 7561
12.6
10
6. 5.
8.
48
3
2
0
11.67
9
5
2
4
10
9.7
86
9.729 .7019
4.
03
3
832
7.7
13
8.756
7.7832
6.81
3
032
032 42
6.81
7
5.83
5.8374
1
61
2
91
58
52
4
6
3.8 871
94
4.8
4.86452
1.
1
2.9
3
0
29
3.89161
97
0.
2
1
f (x ,x ) = 100*(x x ) + (x 1)
16.5394
15.56
14.5935 65
1 13.6
11. 2.647 206
674 7
8
3.5
x2
1.
94
29
58
03
1
14
4
1
.
5
.
16 9 12 0.7
5.88645
.53 135 .64 101
37 2 3.
19181. 79.45.56 771.69
89
0.9
2210..45485512 65 13 748 9.7 7.76.8412
16 2.9 1.9
729
29 83 03
.4403812 3
1
.62
18 458
03
03 23 2
319
71
06
1
1 2
0.
97
Narayanan Krishnamurthy
0.5
0.5
1.5
(a)
(b)
Gradient Methods
(4.1)
where tk is the step-size for the kth iteration. Note the process of determining the
minimum involves traversing along the negative direction of the gradient vector given by
f (xk ). Ak is a linear transformation of the gradient vector, which provides more flexibility
in exploring for the minima that just the line search provided by the gradient. Ak in case of
the Newton-Raphson is 2 f (xk )1 .
4.1
Constant and variable step size steepest descent methods involves choosing a constant tk
or multiple tk s in the equation 4.1. In the variable step size method, at every step of
generation of next exploration point in the function, that step from the chosen array that
gives a minimum value of f evaluated at that point is chosen. Optimal Step-size method
may be thought of as an extension of the variable step-size method taking the step-size to
71
0.5
0.5
81
45
1.9
61
91
1
87 3.8
1
9
2.
0.5
0
0.5
1
0.5
x1
1.
90
72
0.9
94
58
71
8
.91
2
0
0.5
(a)
81
45
1.9
3
290
0.97
81
45
1.9
0.9
72
90
3
1.9
2.
91
1. 871
94
58
1
0.
97
29
03
03
29
97
0.
58
1.5
0.9
581
2903
58
18
0.972903
94
2.9
90
72
71
0.97
1.
45
86
4.
2
74
83
5.
61
91
3.8
0.5
3.8916
2.918
1
16
89
71
3.
18
2.9
42
37
52 5.8 32
64
0
4.8
81
6.
4.86452
1.5
1
16 52
89 64
3. 4.8
1
3.
13
4.88916
1
.
64 1
16 6120 0.70 9.7
1 2
1 .5 46
6.8 52
11978..53194 .593 191 903
7.7103 5.8
2120.4458253 155.
1 .6
832 374
.4.043812
56 2.67448
23
2
65
391
77
90
72
0.9
2
03
81 2
6. 74
3 52
8
5. 64
8
4.
2. 1.9
91 4
87 58
1 1
187
1.9
2.9
0.5
94
03
29
9.72903
8.756
13
7.78
323
6.81
032
5.837
42
2.5
77
.64
03
206 12
29
9.7
748
11.6
19
10.70
3
561
8.7
323
7.78
61
032
91
6.81
3.8
742
5.83 2
71
1
5
8
4
1
58
4.86
2.9 .94
1
13.6
6.
8
7. 103 5.8
7 2
8.832 374
75 3
2
61
3
1
458
16.5394
13.6 14.5
15.5665
206 935
12
10.7 11.6
.6
019 748 477
3.5
3.89161
1.5
1 2
97
0.
x2
2.5
0.9
72
90
3
2 2
03 74
81 3
6. 5.8 2
5
4
86
4. 61
1
89
3. 71
8
91
2.
1.
3.5
16.5394
20677 748
15.5665
13.6
13.6
3
64 .6 9
20 14.5935
12. 11 .7013 561
12.646
10290 8.7
77
9.7
11.6748
10.7019
9.72903
323
8.75613
7.78
7.783
23
6.81
032
032
6.81 3742
5.8
5.83742
452
81
4.86
45
4.86452
161
1.9
3.89
1
87
61
2.91
3.891
0.
97
6. 5.8
29
81 37 4.
03
8. 7.7032 42 864
75 8
2.
52
91 1.9
61 32
87 45
3 3
81
1
10
3.8
.
91
13 7019
61
.62
06 11. 9.
1615 1 674 729
6.85.83 4
20 1 1.75.3.95646.59 81 03
1 7 .
.493.418514 5 35 2.6
8.77.703242 8645
1 58.4283
47
56 832
2
1 52
7
13 3
Narayanan Krishnamurthy
1.5
(b)
x1
-1.0762
-0.9756
-0.8351
-0.6830
-0.5009
-0.2811
-0.0163
0.2830
0.5529
0.6871
0.7124
0.7338
0.7532
0.7708
0.7867
0.8013
0.8145
0.8267
0.8379
x2
1.9670
1.7244
1.4926
1.2541
1.0178
0.7877
0.5751
0.4027
0.3059
0.3058
0.3557
0.4012
0.4424
0.4799
0.5142
0.5456
0.5745
0.6012
0.6259
f1(x1,x2)
4.9648
4.4998
4.0000
3.4528
2.8408
2.1434
1.3633
0.6182
0.1999
0.1256
0.1058
0.0897
0.0765
0.0656
0.0565
0.0488
0.0423
0.0368
0.0321
Norm(gradient)
1.7511
1.8067
1.8861
1.9889
2.1212
2.2640
2.3027
1.9113
0.8948
0.3730
0.3354
0.3035
0.2760
0.2519
0.2308
0.2122
0.1955
0.1806
0.1672
Narayanan Krishnamurthy
21
0.8482
0.6487
0.0280
0.1551
22
0.8578
0.6700
0.0246
0.1441
23
0.8666
0.6897
0.0216
0.1341
24
0.8748
0.7081
0.0190
0.1250
25
0.8823
0.7252
0.0167
0.1166
26
0.8894
0.7412
0.0147
0.1089
27
0.8960
0.7562
0.0130
0.1018
28
0.9021
0.7701
0.0115
0.0953
29
0.9078
0.7832
0.0102
0.0893
30
0.9132
0.7955
0.0090
0.0837
31
0.9182
0.8070
0.0080
0.0786
32
0.9229
0.8179
0.0071
0.0738
33
0.9273
0.8280
0.0063
0.0693
34
0.9314
0.8376
0.0056
0.0651
35
0.9352
0.8465
0.0050
0.0613
36
0.9389
0.8550
0.0044
0.0577
37
0.9423
0.8629
0.0040
0.0543
38
0.9455
0.8704
0.0035
0.0511
39
0.9485
0.8775
0.0031
0.0482
---------------------------------------------------------------------The minima of f1 found after 39 iterations,
using constant step of 0.150000, gradient methods are x1=0.948498 x2=0.877480,
and the min value of function is =0.003144
---------------------------------------------------------------------Convergence of f1 using variable steps chosen from 0.05,0.15 and 0.25,
and negative direction of gradient is used as the line of exploration.
iteration
2
3
4
5
6
7
8
9
10
11
12
13
14
x1
1.3015
1.1441
1.2024
1.1793
1.1833
1.1558
1.1658
1.1478
1.1555
1.1383
1.1383
1.1161
1.1241
x2
1.4738
1.5134
1.4766
1.4710
1.4389
1.4273
1.4109
1.3901
1.3610
1.3564
1.3321
1.3175
1.3046
f1(x1,x2)
0.1394
0.0626
0.0419
0.0386
0.0351
0.0326
0.0302
0.0271
0.0248
0.0228
0.0204
0.0186
0.0171
Norm(gradient)
1.8044
0.7662
0.2635
0.1617
0.1989
0.2138
0.1373
0.1506
0.1983
0.1215
0.1324
0.1682
0.1038
Narayanan Krishnamurthy
15
1.1114
1.2882
0.0152
16
1.1139
1.2670
0.0137
17
1.0972
1.2591
0.0125
18
1.1044
1.2426
0.0114
19
1.0882
1.2357
0.0104
20
1.0954
1.2203
0.0095
21
1.0802
1.2141
0.0087
22
1.0868
1.1999
0.0079
23
1.0730
1.1943
0.0072
24
1.0788
1.1814
0.0065
25
1.0666
1.1761
0.0059
26
1.0713
1.1646
0.0054
27
1.0608
1.1595
0.0049
28
1.0655
1.1458
0.0044
29
1.0577
1.1439
0.0040
30
1.0559
1.1339
0.0035
---------------------------------------------------------------------The minima of f1 found after 30 iterations,
using Variable Stepsizes chosen from =0.090,0.150,0.200, gradient
method are x1=1.055874 x2=1.133855,
and the min value of function is =0.003482
----------------------------------------------------------------------
0.1067
0.1232
0.1206
0.1175
0.1137
0.1094
0.1047
0.0996
0.0942
0.0887
0.0832
0.0777
0.0724
0.0889
0.0510
0.0494
Convergence of f2:
The function f2 did not converge at all for constant and variable
stepsize methods, various intial points ranging from [-1 1], [-1.4
0.5] [2 2] and various step sizes ranging from 0.05,0.09,0.15 were
tried in vain the steepest descent algorithm couldnt locate the
descent direction to iterate on and hence a failed output:
iteration
x1
x2
f2(x1,x2)
Norm(gradient)
not a descent direction ... exiting
The extremum could not be located after 2 iterations
4.2
Newton-Raphson method
The Newton-Raphson method, uses not only the first variant, but also the second variant
information in its search for the minima. The Taylors approximation of a function given by
the expression 2.7, can be re-written (ignoring the higher order terms) as:
4.86452
3.89
1
458
91
1.9
2.
0.972903
0.5
0
1.
94
2.
58
87
1.9
0.5
81
45
91
90
2
.97
1
2.9
1
87
0.5
1 2
16 5
89 64
3. 4.8
1
3
28 7 6 .
2. 3 8
2 7
18 903 1
94 8.1 2
9
.0
96 355
77
4
1
58
94
1.
1.5
1.5
94.096774
1
0.5
0
35
19
.09
18
8.1
8.
94
0.5
5.
83
6 7
7 .8 42
8. .78 103
75 32 2
61 3
3
61
91
3.8
52 42 2
64 37 03
4.8 5.8 6.81
0.5
03
03
9
72
0.
1
87
1.9
161
1
187
2.9
3.89
161
2
1.5
2.5
1
58
42
28
94
2.
.0
56
29
96
65 4.5 4
03
77
8 . 80 7 0
2
4
6 7 6 .4
74 5 83
87
2
84
6
10 .870
37
35 97
6.3
.06
18
87
1154113 45
1 28 8.1
75
94
051.417
2
5
2 9
.
.
1
1659.5451.35 11 9 774 6564.5 47 .290 355 0967
17939. .6846 48 12 29 40 19 8.6 80 0.
32
74
4
23 .16 .96
877445
6
7
74 5 838
.25 13 77
.83192
2
7
4
81
87
32
6.810 2
2
74
645
5.83
4.8
5.837
2.5
1505.5484
122
1411.451
581
6
3.2
35 11293.258
1317.3548 1229.1613
.06 .16 1
2 5
7
45 13
74 06 7
11245 67748709
75
67 8 8
940
.06 .9 6.
2.7
8. 4.5 83
.967
1035 940 84
74
65 56 0.4
56
74
1
4.5
9
846.87
47
097
47 8065
7419
0.4
752.7
83
87
658.67742
564.58065
376
1
7
.38
470.4838 76.387 2
71
3
03
55
282
.29
93
.290
282
8.1
32
74
18
67
.09
188.19
94
355
10
3.5
0.97
1.9
290
458
3
1
8.756
29
9.72903
13
7.78323
2
2
03 2 45
81 4 6
6. .837 4.8
5
1
16 71
89 8
3. 2.91
97
206
12.64
77
10.7
019
11.674
8
0.
13.6
3.5
16.5394
15.5665 .5935 6
7
14 3.620 2.647 748
1
1 11.6
03
29
9.7
19
3
10.70
613.7832
5
8.7
7
2.
0.
91 1.94
87 58 972
90
1
1
3
11
3
.8
13 .67
4.8916
15 .6122.468 9.7
1
6
0
.
16 56 6 47
29
5.8 452
.5 6 1 7
03
37
20 1839154 4.59
76. .8142
.4139.4 7.
10
3
.70 7830232
1.48555212 5
81 3
3
19
Narayanan Krishnamurthy
18
67
74
9677
93
94.0
55
0.5
x1
0.5
71
38
6.
37
35
188.19
1.5
x1
(a)
(b)
(4.2)
which is also clear from fundamental definition of derivatives in calculus. Now finding
the root of f (x) we get:
xk+1 = xk 2 f (xk )1 f (xk )
(4.3)
Convergence of f1:
iteration
x1
x2
f1(x1,x2)
Norm(gradient)
2
1.0593
0.5733
0.3046
2.6785
3
1.0310
1.0622
0.0010
0.0653
4
1.0000
0.9991
0.0000
0.0044
---------------------------------------------------------------------The minima of f1 found after 4 iterations,
with initial pt 2.000000 2.000000, using Newton-Raphson(NR) method are x1=1.0, x2=1.0,
and the min value of function is =0.000001
----------------------------------------------------------------------
Convergence of f2:
iteration
x1
2
1.0012
3
1.0012
x2
0.0099
1.0025
f2(x1,x2)
98.5152
0.0000
Norm(gradient)
444.3233
0.0025
Narayanan Krishnamurthy
10
4.3
The conjugate gradient method uses a method similar to Gram-Schmidts orthogonalization procedure to generate Q-Conjugate vectors [1]. That is dk is constructed as a linear
combination of dk1 vectors, with the coefficient k1 chosen such that < dk , Qdk1 >= 0.
The conjugate gradient algorithm can be expressed as the following sequence of steps:
epsilon
r0
=
H
=
d0
=
t0
=
x1_y1 =
r1
=
b0
=
d1
=
= 0.05
%termination condition
gradient_f1(x0_y0)
hessian_f1(x0_y0)
-r0
- <d0,r0> / <d0,(H d0)>
x0_y0 + t0*d0
gradient_f1(x1_y1)
<r1,r1>/<r0,r0>
-r1 + b0*d0
=
=
=
=
=
=
hessian_f1(x0_y0)
xk-1_yk-1 + tk-1 * dk-1
gradient_f1(xk_yk)
<rk,rk> / <rk-1,rk-1>
-rk + bk-1 * dk-1
- <dk,rk> / <dk,(H dk)>
x1
1.4975
1.4275
1.4032
1.3242
x2
2.1115
2.1236
2.1165
1.9980
f1(x1,x2)
0.2646
0.1901
0.1843
0.1649
Norm(gradient)
1.7980
0.4035
0.2955
0.8101
45
2
0.5
3.
2.
29
03
2
94
.0
96
77
4
28
19
8.
94
.09
67
74
6774
188
94.09
0.5
55
355
.19
188
.193
94
.09
67
18 74
8.1
93
5
18
77
96
.0
94
2
03 71
29 8
2. .3
28 376
0.5
x1
(a)
1
1.5
94.096774
0.5
6.
86
4.
16
0.5
71
18
2.9
89
0.5
1
16
581
4
1.9
4
81 5. .86
0 83 45
7. 32 742 2
8. 78
75 32
61 3
11 9.73
.67 29
48 03
15
.5614 112.
3
1187.565.59.36624077
210.94.84152136.55 6
.43528 39
11
4
2903
0.97
58
89
3.
71
0.9
94
90
72
1.
1.5
4
5 6 70.
18
4 4
8.
65 .5 8 838
19
8. 0 6 7
35
67 5
5
75742
2.
77
11 1 8419
28
12 29 0 46
2
3
37 .290
1323..165.0 .87
6
12 1 6 0
4 .3 32
16 15 7.355813 45 997
5670.4871
40
1973.991510448
4.5 83
.96
8774.64 51. 1
80 87
77
.8139 5254.45
65
4
87 8416
916
3.8
18
0.
1
2.9
2
2
74 03
83 81
5. 6.
0.5
2.5
58
.94
90
2
97
71
0.5
58
1.
3.891
2.918
03
81
5
94
61
87
1
1.
94
29
3
1
7
18
2.9
4.86452
1.5
91
2.
1
.89
1505.5484
1411.4516
1121223.21317.354
97
613
8
9.1 45 4 70
42 5
1039.161 581
11235.069677 46.8 677 806
5.06 3
8
10 940.
8. 4.5 87
940.945
5
19 6 56 483 1
6774
56
4
7
.7
0. 87
47 4.58
752.7
846
752
47 6.3
0.4 06
7419 .87097
8 5
37
37 387
6.3
658.67742
87
5
1
564.5806
470.48387
32
90
4
2.2
376.3871
282.2
28
77
188
9
96
0
3
2
.19
.0
5
355
94
935
.1
8
18
032
5.8374
2
61
3
7.7832 1032
6.8
52
742
5.83 4.864
3.5
0.9
2.9 1.94
729
18
03
71 581
2.5
97
8.75613
6.81
0.
2
2
3 3
32 10 2 45
78 6.8374 .86
8 4
.
5
7.
2. 1.9
91
0.9
87 458
72
1
1
90
3
3.8
91
61
6.8 5 4.8
10 .83 645
10
7.372 742 2
.70
83
19
23
3.5
16.5394
15.5665
12. 13.62
6
647 06
14.5935 13.620
7
48
77 11.67
12.64
10.7 11.6
748
019
9
1
3
0
9.72
10.7 2903 7561
903
9.7
8.
35
11
Narayanan Krishnamurthy
1.5
x1
(b)
0.3224
0.2226
0.1211
0.1405
0.2646
0.4254
0.3617
0.1371
0.0475
Convergence of f2:
The function f2 does not converge using current implementation of
conjugate gradient method. The Gradient vector/Hessian becomes abnormally
large after a few iterations, due to precision errors while
calculating the gradient when the itinerant approaches the
minima. Note the direction of approach of itinerant before the
abnormal gradient value is headed in the right direction i.e. towards
the minima. A better approach would be to use direct methods to
calculate next estimate like the polynomial fit thereby circumventing
precision error in calculating first and second variants. The failed output:
iteration
x1
2
1.4813
3
1.4337
4
1.4257
5
1.4249
6
1.4203
7
0.8654
8
4.4958
The extremum could not be located after
Narayanan Krishnamurthy
x2
f2(x1,x2)
2.0134
3.4993
2.0290
0.2583
2.0316
0.1814
2.0317
0.1807
2.0223
0.1792
0.4497
8.9710
10.6666
9124.72
8 iterations
12
Norm(gradient)
113.955
16.914
1.5335
0.2822
2.3043
119.390
17279
The direct search methods precludes the calculation of the computation intensive gradient
vectors or Hessian matrices. But like most methods they work best for some functions while
their performance and convergence is dismal for others.
5.1
The Golden Search method uses the limiting properties of the Fibonacci sequence Fk when
k ,the sequence and the limiting properties are:
F0 = F1 = 1
Fk = Fk1 + Fk2 ,
(5.1)
when k
(5.2)
Fk1
0.382
Fk+1
Fk
0.618
Fk+1
(5.3)
(5.4)
Thus at every iteration in golden search the uncertainty interval (region of locating a
minima) is reduced by 38.2 percent till you converge to a sufficiently small region where the
minima is found.
Results of convergence of f1 using Golden Search method:
iteration
x1
x2
f1(x1,x2)
1
2.0000
2.0000
5.0000
2
1.0968
1.0968
0.0207
3
1.0968
1.0968
0.0207
4
1.0968
1.0968
0.0207
Norm(gradient)
18.4391
0.6930
0.6930
0.6930
87
91
0.
97
29
03
2.
0.5
3.8
91
61
0.5
94
2
45
86
4. 42
7
83
0.9
58
5.
03
729
1.
2.
91
1
458
1.9
87
71
8
.91
188
94
1.5
.09
.19
67
74
355
282.29032
94.09677
1
0.5
0.5
0.5
x1
(a)
7
38
48
0.
35
47
19
77
96
.0
94
77
96
.0
94
0
0.5
1
94
.09
67
18
74
8.1
5
93
1
8. 2
18 03
9
.2
2
28
6774
94.09
93
55
9355
188.1
2
1
355
.19
188
8.
0.9
729
1
03
18 .9458
71
1
18
1
58
94
1.
03
2.5
2 5
74 06
67 8
8. .5
65 564
71
38
6. 32
37 .290
2
28
3
56 4 76.
4. 7 0 38
5 . 7
65806 483 1
8. 5 8 7
67
74
2
94
0.9
67
9
1312 174
75
1
37 282 18 4.0
14 1 2 0
2
16 510151.74.33.25 135. 846.77 564 47 6.3 .290 8.19 967
7
8
9359.545154 811206 .8 41
3
.
.749.6846 8 9.145 70 9 5800.48 71 32 55 4
6
97
61
1945
5 387
2
3
1.9
29
458
0
29
97
0.
0.9
1.5
1.
81
5
94
1505.5484
122
81
11 3.25 1317.31411.4516
3.25 13
548
67 10 29.1681
12229.1645
3
7
1
84 4 5.06 3
11 5.06
6.8
3
45
10
70
56
75297
4
4.5
940.9677.87097 19
.77
419
80
846 2.774
65
5
7
37
6.3 47
658.67742
0
28 871 .483
87
2.2
564.58065 .48387
90
470
32
376.3871
94
3.5
2.9
2.5
97
2
03 42
81 37
6. 5.8
2
45
86 61
4. 91
8 1
.
3 87
91
2.
0.
3.5
16.5394 5
13.6
15.566
206
14.5935
206 477
13.6 12.6 748
12.6
4
03
11.6
10.7 11.6777
29
48
019
9
9.7
1
0
10.7
3
3
561 .7832
9.7
.7
29
03
8
7
8.756
13
7.78323
6.81
032
032
6.81 3742
5.8
5.8374
2
452
4.86
4.86452
161
3.89
871
.91
1
2
16
3.89
4. 3
86 .8
4 91
5 52 61
7
0.
2.
8. .78 6.8 .837
91 1.94 972
75 32 10 4
90
87 58
9.76133 32 2
3
1
1
29
03
13
1
.62 0.70
06 1 19
2
4.8 3.8
1 .64
64 91
220191178.516 4.159 77 1
7.7 6 5.852 61
1.6
1..443.4.54182.3535.355
83.81 374
74
013 8152 94 66
23 03 2
8
9
5
2
13
Narayanan Krishnamurthy
1.5
0.5
0.5
x1
(b)
1.5
5.2
Narayanan Krishnamurthy
14
The algorithm for fitting a quadratic polynomial to known three points a,b,c and function values fa,fb,fc at
these points is as follows[1]:
d0 is any direction vector that is normalized such that ||d0|| = 1
d0
= [1/sqrt(2) 1/sqrt(2)]^T %(say)
x0_y0
= [2 2]^T
%initial point
t0
= 0.15
%(say) assumed initial step size
fa
= f(x0_y0)
fb
= f(x0_y0 + t0.* d0)
if
fa < fb
fc
= f(x0_y0 - t0.* d0)
else
fc
= f(x0_y0 + 2*t0.*d0)
while norm(gradient_fl(xk_yk)) < e {
1 f a a2
1 f b b2
2
1 (b2 a2 )f a + (c2 a2 )f b + (a2 b2 )f c
1
1 1 fc c
=
tk =
=
2 1 a fa
2 (b c)f a + (c a)f b + (a b) f c
22
1 b fb
1 c fc
2 =
}
The essence of the above algorithm is in approximating the
neighbourhood of f(x) with a quadratic polynomial:
f (x + tk d0 ) 0 + 1 tk + 2 t2k
Results of convergence of f1 using quadratic approximation:
iteration
x1
x2
f1(x1,x2)
Norm(gradient)
2
1.1923
1.1923
0.0896
1.5476
3
1.0413
1.0413
0.0036
0.2754
4
1.0024
1.0024
0.0000
0.0151
----------------------------------------------------------------------
1
458
1
87
1
9
2.
2.9
16
0.5
1.9
18
71
0
0.5
x1
(a)
8.
94
.0
96
77
4
35
19
1.5
47
290
0.97
18
1.9
45
81
91
61
58
89
58
94
1.
03
29
94
45
86
4.
2
74
83
3.
1.5
94.096774
1
0.5
1312 10.7
14.62.64 01
.5 0 7 9 9.
72
1716. 9365 7
. 5
11 903
195123394 1
.
21 2.4518 5.5 6748
.400.481.4
6
3931 852 65
1.
0.9
0
3 7 .4 8
6 . 38
38 7
71
2
45
86
97
1
2. .94
91 58
87 1
1
0.972903
03
9
72
1505.5484
12
9
1411.4516
13
84 40.9
613
1 23.2 17.3548
5
6.8 67 10129.1 581
9.1 5
42 06
1125.064
70 74 35.0 613
19 77 58
97
645
74 58.6 64.
103
7
.
2 6 5
4
75
2
56
940.9677
4.5
03
6.87097
80
29
752.7741 84
65
2.
9
8
2
470
658.67742
.48
5
387
564.5806 .48387 1
28
87
470
2.2 376.38
.3
90
71
376
32
9032
188
74
282.2
.19
55
67
355
.193
.09
188
94
77
94
0.5
67
18
74
93
55
71 87
38 83
6. .4
37 470
94.096774
5
188.1935
0.5
5
93
8.
18
8.1
96
.0
94
.09
28
65564
2.
8. .5
29
67 80
03
74 65
2
2
8
9
11103 46.8
37
18 4.0
29 5. 70
8.1 96
47 6.38
13122.1606 97
93 774
0.4 71
11 3 1 4
55
75
8
16 4171.3.5253 5
3
94 2.
1793 1150.4 4881
28
0.9 77
5 87
87.745955. 1
2.2
67 419 64.5
.83199.546
90
74
80
87 64854
32
65
2
1
187
5.
1
4.
72
0.9
3
90
58
.94
0.5
2.5
3.8
87
2.91
2.9
0.5
58
4.8
5 64
6 .83 52
7..7810742
83 32
23
3.89161
3.89161
4.86452
2
1.5
3.5
4
.9
0.9
72
90
3
2.5
71
18
9
2.
0.
2
03 2
81 4
6. 837
5.
3.
89
16
1
3.5
16.5394
1
12. 3.620
15.5665 .5935 6
48
647 6
14 3.620
.67
03
7
1 77
11
29
9.7
12.64
19
0
0.7
10.7 11.6748
1
019
8.75
613
9.72903
13
7.78
8.756 323
323
7.78
6.81
032
61
032
6.81
2
91
742 .8645 3.8
5.837
5.83
4
42
4.
8
5 . 645
6. 8 3 7 2
7. 810 42
8.783 32
75 23
61
3
15
Narayanan Krishnamurthy
0.5
1.5
x1
(b)
Narayanan Krishnamurthy
16
The programming experience and visually seeing the convergence of different algorithms
from program output or through plots helped in understanding the different methods viz.;
the gradient and direct search methods. Here are the observations of the exercise:
The contour plots(figure 1) of f1 and f2 indicate they have a relative minima at [1 1]T .
It is also apparent from the expressions for the functions given by equations 3.1.
The constant step-size steepest descent method on f1 converged to the minima in 39
iterations,while the variable step-size method though more complex than the first was a
clear improvement with convergence in 30 iterations. Note: If the sequence of steps for
the variable step-size was taken to the limit, it would be the optimal step-size method
(ideally the best method but computationally expensive)
The Rosenbrock function f2 did not converge for steepest descent and conjugate gradient methods. The method was unable to locate a descent direction to iterate upon
for the former, see results of section 4.1. While in the later, though the algorithm
heads in the right direction i.e. towards the minima, precision errors in calculating the
gradient vector close to zero makes the algorithm go awry, resulting in abnormally high
gradients in the opposite direction - see figure 4 for the partial success of the current
implementation.
The Newton-Raphson method like conjugate gradient method, though computationally
expensive involving the calculation of the first and second variants of the functions at
every iteration, converged to the minima with least number of iterations of all the
methods. f1 converged in 4 iterations, while f2 converged in 3 iterations.
The direct search methods were the computationally least expensive methods, as they
did not involve computations of Gradients/Hessians. The complexity of the methods increase from Golden-search to Quadratic fit, While the Golden search relies on
the limiting behavior of the Fibonacci sequence in determining the next itinerant,the
Quadratic fit locates the next itinerant by minimizing a fitted quadratic polynomial
through the known values of the function evaluated at the three known points .
The Quadratic fit method converged to the minima in 4 iterations on f1 test function,
while it converged to the minima in 6 iterations on the f2 test function
Narayanan Krishnamurthy
17
Conclusion
References
[1] M. Aoki, Introduction to Optimization Techniques, The Macmillan Company, 1971.
[2] E. K. P. Chong and S. H. Zak, An Introduction to Optimization Second Edition,
Wiley-Intersciences Series, 2001.
[3] D. G. Luenberger, Introduction to Linear and Nonlinear Programming, AddisonWesley, 1972.
[4] S. A. Marwan, Lecture notes. ECE-2671 fall 2005.
The code for testing the various algorithms on f1 and f2 functions are identical, the only
changes are in the call to appropriate functions, for example Newton-Raphson algorithm
tested on f2 , would call gradient f2 instead of gradient f1 , and similarly call appropriate
functions depending on whether the test is on f1 or f2 . Since code is identical, the code for
various algorithms tested on f1 is provided here in the annex:
A.1
Narayanan Krishnamurthy
clabel(C,h),xlabel(x_1,Fontsize,18),ylabel(x_2,Fontsize,18),title(f(x_1,x_2)
= (x_1^2 - x_2)^2 + (x_1 - 1)^2 ,Fontsize,18);
grid on
A.2
18
Narayanan Krishnamurthy
break;
end
disp(sprintf(%-4d\t\t%3.4f\t\t\t%3.4f\t\t%3.4f\t\t%3.4f,i,xk_yk(i,1),xk_yk(i,2),
f1(xk_yk(i,:)),norm(gradient_f1(xk_yk(i,:)))));
if norm(gradient_f1(xk_yk(i,:))) < e
disp(----------------------------------------------------------------------);
disp(sprintf(The minima of f1 found after %d iterations,,i));
disp(sprintf(using constant step of %f, gradient methods are
x1=%f x2=%f,,t, xk_yk(i,1),xk_yk(i,2)));
disp(sprintf(and the min value of function is =%f, f1(xk_yk(i,:))));
disp(----------------------------------------------------------------------);
break;
end
end
n = 30; %# of contours
format long g
[X,Y] = meshgrid(-1:.02:2,-1.4:.02:4);
Z = (X.^2 - Y).^2 + (X - ones(size(X))).^2;
%Then, generate a contour plot of Z.
[C,h] = contour(X,Y,Z,n);
clabel(C,h),xlabel(x_1,Fontsize,18),ylabel(x_2,Fontsize,18),
title(sprintf(f_1(x_1,x_2),Const Stepsize=%1.3f,t),Fontsize,18);
grid on
hold on;
convergence = [x0_y0 xk_yk];
%scatter(convergence(1,:),convergence(2,:));
plot(convergence(1,:),convergence(2,:),-ro);
A.3
19
Narayanan Krishnamurthy
x0_y0 = [2 2];
% termination point
e = 0.05; N =1000;
% constant step size
tf1(1) = 0.09; tf1(2) =0.15; tf1(3) =0.2;
t = check_min_t_f1(tf1(1),tf1(2),tf1(3),x0_y0);
xk_yk(1,:) = x0_y0 - tf1(t).*gradient_f1(x0_y0);
disp(iteration
x1
x2
f1(x1,x2)
Norm(gradient));
for i = 2:N
t = check_min_t_f1(tf1(1),tf1(2),tf1(3),xk_yk(i-1,:));
xk_yk(i,:) = xk_yk(i-1,:) - tf1(t).*gradient_f1(xk_yk(i-1,:));
if f1(xk_yk(i,:)) > f1(xk_yk(i-1,:))
disp(not a descent direction ... exiting);
disp(sprintf(The extremum could not be located after %d iterations,i));
break;
end
disp(sprintf(%-4d\t\t%3.4f\t\t\t%3.4f\t\t%3.4f\t\t%3.4f,i,xk_yk(i,1),
xk_yk(i,2),f1(xk_yk(i,:)),norm(gradient_f1(xk_yk(i,:)))));
if norm(gradient_f1(xk_yk(i,:))) < e
disp(----------------------------------------------------------------------);
disp(sprintf(The minima of f1 found after %d iterations,,i));
disp(sprintf(using Variable Stepsizes chosen from
=%1.3f,%1.3f,%1.3f, gradient method are x1=%f x2=%f,,
tf1(1),tf1(2),tf1(3),xk_yk(i,1),xk_yk(i,2)));
disp(sprintf(and the min value of function is =%f, f1(xk_yk(i,:))));
disp(----------------------------------------------------------------------);
break;
end
end
n = 30; %# of contours
format long g
[X,Y] = meshgrid(-1:.02:2,-1.4:.02:4);
Z = (X.^2 - Y).^2 + (X - ones(size(X))).^2;
%Then, generate a contour plot of Z.
[C,h] = contour(X,Y,Z,n);
clabel(C,h),xlabel(x_1,Fontsize,18),ylabel(x_2,Fontsize,18),
title(sprintf(f_1(x_1,x_2) Variable Stepsizes chosen from =
20
Narayanan Krishnamurthy
%1.3f,%1.3f,%1.3f,tf1(1),tf1(2),tf1(3)),Fontsize,18);
grid on
hold on;
convergence = [x0_y0 xk_yk];
%scatter(convergence(1,:),convergence(2,:));
plot(convergence(1,:),convergence(2,:),-ro);
A.4
21
Narayanan Krishnamurthy
%Gradient methods f1
clear;clc; echo off;
format long g;
% initial point
x0_y0 = [2 2];
% termination point
e = 0.05; N =1000;
% constant step size
t = 1;
[T, H ] = is_positive_h(hessian_f1(x0_y0));
if T < 0
disp(Unable to proceed with non positive hessian)
break;
end
xk_yk(1,:) = x0_y0 - t.*gradient_f1(x0_y0)*inv(H);
disp(iteration
x1
x2
f1(x1,x2)
Norm(gradient));
for i = 2:N
[T, H ] = is_positive_h(hessian_f1(xk_yk(i-1,:)));
if T < 0
disp(Unable to proceed with non positive hessian)
break;
end
xk_yk(i,:) = xk_yk(i-1,:) - t.*gradient_f1(xk_yk(i-1,:))*inv(H);
%
%
%
%
%
22
Narayanan Krishnamurthy
end
end
n = 30; %# of contours
format long g
[X,Y] = meshgrid(-1:.02:2,-1.4:.02:4);
Z = (X.^2 - Y).^2 + (X - ones(size(X))).^2;
%Then, generate a contour plot of Z.
[C,h] = contour(X,Y,Z,n);
clabel(C,h),xlabel(x_1,Fontsize,18),ylabel(x_2,Fontsize,18),
title(sprintf(f_1(x_1,x_2), Newton-Raphson Initial pt %1.1f,%1.1f,
x0_y0(1),x0_y0(2)),Fontsize,18);
grid on
hold on;
convergence = [x0_y0 xk_yk];
%scatter(convergence(1,:),convergence(2,:));
plot(convergence(1,:),convergence(2,:),-ro);
A.5
23
Narayanan Krishnamurthy
24
Norm(gradient));
[T, H ] = is_positive_h(hessian_f1(xk_yk(i-1,:)));
if T < 0
disp(Unable to proceed with non positive hessian)
break;
end
xk_yk(i,:) = xk_yk(i-1,:) + t(i-1).*d(i-1,:);
r(i,:) = gradient_f1(xk_yk(i,:));
%b(i-1) = r(i,:)*(H*d(i-1,:))/(d(i-1,:)*(H*d(i-1,:)));
b(i-1) = r(i,:)*r(i,:)/(r(i-1,:)*r(i-1,:));
d(i,:) = -r(i,:) + b(i-1).*d(i-1,:);
t(i) = - d(i,:)*r(i,:)/(d(i,:)*(H*d(i,:)));
%
%
%
%
%
Narayanan Krishnamurthy
25
break;
end
end
n = 30; %# of contours
format long g
[X,Y] = meshgrid(-1:.02:2,-1.4:.02:4);
Z = (X.^2 - Y).^2 + (X - ones(size(X))).^2;
%Then, generate a contour plot of Z.
[C,h] = contour(X,Y,Z,n);
clabel(C,h),xlabel(x_1,Fontsize,18),ylabel(x_2,Fontsize,18),
title(sprintf(f_1(x_1,x_2), Conjugate Gradient Initial pt
%1.1f,%1.1f,x0_y0(1),x0_y0(2)),Fontsize,18);
grid on
hold on;
convergence = [x0_y0 xk_yk];
%scatter(convergence(1,:),convergence(2,:));
plot(convergence(1,:),convergence(2,:),-ro);
A.6
Narayanan Krishnamurthy
26
q_fit_f1 = t;
end
Narayanan Krishnamurthy
disp(----------------------------------------------------------------------);
break;
end
end
n = 30; %# of contours
format long g
[X,Y] = meshgrid(-1:.02:2,-1.4:.02:4);
Z = (X.^2 - Y).^2 + (X - ones(size(X))).^2;
%Then, generate a contour plot of Z.
[C,h] = contour(X,Y,Z,n);
clabel(C,h),xlabel(x_1,Fontsize,18),ylabel(x_2,Fontsize,18),
title(sprintf(f_1(x_1,x_2) quadratic fit, initial pt:%1.1f,%1.1f,
x0_y0(1),x0_y0(2)),Fontsize,18);
grid on
hold on;
convergence = [x0_y0 xk_yk];
%scatter(convergence(1,:),convergence(2,:));
plot(convergence(1,:),convergence(2,:),-ro);
A.7
%Assume that the relative minima lies in the uncertanity region [0 0] and
%[2 2], we will use golden search method to reduce the uncertanity region
%and thus determine the minima of f1 and f2
clear;clc; echo off;
format long g;
% initial point
xk_yk(1,:) = [0 0];
xk_yk(2,:) = [2 2];
i =2; e = 0.05;
disp(iteration
x1
x2
f1(x1,x2)
Norm(gradient));
j = 1;
while i < 1000
a1 = 0.382.*(xk_yk(i,:)-xk_yk(i-1,:)) + xk_yk(i-1,:);
b1 = 0.618.*(xk_yk(i,:)-xk_yk(i-1,:)) + xk_yk(i-1,:);
if f1(a1) > f1(b1)
xk_yk(i+1,:) = a1; xk_yk(i+2,:) = xk_yk(i,:);
27
Narayanan Krishnamurthy
else
xk_yk(i+1,:) = xk_yk(i-1,:); xk_yk(i+2,:) = a1;
end
xkp_ykp(j,:) = xk_yk(i,:);
i = i + 2;
j = j + 1;
disp(sprintf(%-4d\t\t%3.4f\t\t\t%3.4f\t\t%3.4f\t\t%3.4f,(i-2)/2,xk_yk(i,1),
xk_yk(i,2),f1(xk_yk(i,:)),norm(gradient_f1(xk_yk(i,:)))));
if norm(gradient_f1(xk_yk(i,:))) < e
disp(----------------------------------------------------------------------);
disp(sprintf(The minima of f1 found after %d iterations,,(i-2)/2));
disp(sprintf(with initial pt %f %f, using Golden Search
method are x1=%1.1f, x2=%1.1f,,xk_yk(2,1),xk_yk(2,2),xk_yk(i,1),xk_yk(i,2)));
disp(sprintf(and the min value of function is =%f, f1(xk_yk(i,:))));
disp(----------------------------------------------------------------------);
break
end
end
n = 30; %# of contours
format long g
[X,Y] = meshgrid(-1:.02:2,-1.4:.02:4);
Z = (X.^2 - Y).^2 + (X - ones(size(X))).^2;
%Then, generate a contour plot of Z.
[C,h] = contour(X,Y,Z,n);
clabel(C,h),xlabel(x_1,Fontsize,18),ylabel(x_2,Fontsize,18),
title(sprintf(f_1(x_1,x_2), Golden-Search Initial pt
%1.1f,%1.1f,xk_yk(2,1),xk_yk(2,2)),Fontsize,18);
grid on
hold on;
convergence = [xkp_ykp];
%scatter(convergence(1,:),convergence(2,:));
plot(convergence(1,:),convergence(2,:),-ro);
28