Unconstrained Optimization KN PDF

Optimization Methods the unconstrained
extremization problem
Narayanan Krishnamurthy1
for ECE-2671 taught by Dr. Simaan A. Marwan2
Abstract. The subject of this technical report is the evaluation of different unconstrained optimization methods; that determine the extremum points of the given objective
function(s). The analysis is done using Matlab, and assumes the same starting point and
termination criteria for comparing the different iterative algorithms.
Key words. Gradient Methods, Method of Steepest Descent, Conjugate Direction
Method, Newton-Raphson Method, Region Elimination Methods, Bisection Method, Equal
Spacing Method, Fibonacci Search, Golden Search, Polynomial Approximation,
1 Department of Electrical and Computer Engg. 2*Professor ECE Department University of Pittsburgh,
Pittsburgh, PA 15261
The unconstrained extremization problem
Narayanan Krishnamurthy
Contents
1 Introduction
2 Necessary and Sufficiency Conditions for Extremum points and Taylors

Approximation
3 Iso-line plots of given objective functions
4 Gradient Methods
4.1
4.2
Constant and Variable Step-size Steepest Descent Methods . . . . . . . . . .

Newton-Raphson method . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
8
4.3
Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
5 Direct Search Methods

5.1 Golden Search Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Polynomial Approximation -Quadratic fit method . . . . . . . . . . . . . . .
12
12
14
6 Observations and interpretation of results of different methods
16
7 Conclusion
17
A Matlab Code for the Project
17
A.1 Code for Contour Plots of functions . . . . . . . . . . . . . . . . . . . . . . .

A.2 Code for Constant step size Steepest Descent method . . . . . . . . . . . . .
A.3 Code for Variable step size Steepest Descent Method . . . . . . . . . . . . .
17
18
19
A.4 Code for Newton-Raphson method . . . . . . . . . . . . . . . . . . . . . . .

A.5 Code for Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . . .
21
23
A.6 Code for Direct Search - Quadratic Approximation method . . . . . . . . . .

A.7 Code for Direct Search - Golden Search Method . . . . . . . . . . . . . . . .
25
27
Introduction
The study of methods for extremizing a function f (x) over the entire Rn is an unconstrained
optimization problem.An important assumption for using most of the algorithms discussed
here to minimize the function f (x) is that the requirement that the function be differentiable
over the chosen domain, and that partial derivatives of at-least first order exist, namely the
Gradient of the function. Some algorithms require the existence of second order derivatives
and positive definiteness of the Hessian - the second order derivative matrix. These assumptions on the behavior of the function f (x) and its Taylor approximation is taken up next
[3].
Necessary and Sufficiency Conditions for Extremum

points and Taylors Approximation
The First-Order Necessary condition for a relative minimum point x of f (x), over Rn , for
any feasible direction d(the feasible directions are the directions used in the neighborhood
of a point, to evaluate the functions behavior in that neighborhood); satisfies the following
condition:
f (x )T d 0
(2.1)
where f (x ) is the Gradient or first variant of f and is given by:

f f
f
f (x ) =
,
,...,
x1 x2
xn
T
(2.2)
The Second-Order Necessary condition for a relative minimum point x of a twice differentiable function f (x) is:
f (x )T d 0
if f (x )T d = 0,
(2.3)
then dT 2 f (x )d 0
(2.4)
where f (x )T d = 0 condition occurs if the extremum point x is an internal point with all
possible feasible directions d in the functions domain Rn . 2 f (x ) is the Hessian or second
variant of the function f and is given by the second-order derivative matrix:

2
f (x ) =

2 f
,
xi xj x i,j
where 1 i, j n
(2.5)
The Second-Order Sufficient conditions for a relative minimum point x of a twice differentiable function f (x) is:
f (x ) = 0 and 2 f (x ) is positive definite
(2.6)
The Taylors approximation of the function f based on the first and second variants of
f is given by:
1
f (x) = f (x0 )+ < f (x0 ), (x x0 ) > + < (x x0 ), 2 f (x0 )(x x0 ) >
2
+ Higher Order Terms
(2.7)
where < ., . > is the notation for inner product between vectors in Rn , for more details
on vector spaces and FONC/SONC and sufficiency conditions refer [2, 4].
Iso-line plots of given objective functions
Two objective functions were provided to evaluate the performance of the various algorithms,
the initial iteration guess x0y 0 is chosen as [2 2]T and termination condition of the iterative
search for the relative minimum is ||f (xky k)|| < where = 0.05.
f1 (x1 , x2 ) = (x21 x2 )2 + (x1 1)2
(3.1)
f2 (x1 , x2 ) = 100(x21 x2 )2 + (x1 1)2
(3.2)
The objective functions f1 and f2 are common test functions used to test convergence
properties of any new algorithm. f1 was introduced by Witte and Holst (1964), while f2 was
proposed by Rosenbrock (1960).
Iso-lines or contours are good to visualize the gradient or descent/ascent direction of the
functions, and to locate the minima or maximum, especially if the function is dependent on
just two variables.

2
1
f (x ,x ) = (x x ) + (x 1)
90
72
0.9
1
1
458
871
.91
1.9
87
0.5
1.5
71
6.38
37
282.2
9032
188
.193
55
94.0
37
6.
38
71
1
94 88.
.09 19
67 355
74
2
74
645 4
5.0 77
67
8. 65
10340.96
5
0
6
9 7
58
9
4. 87
.870
846
56 .483
9
0
2.7741
47
32
90
2.2
28
5
5
93
8.1
18
74
67
9
.0
94
9677
0.5
0
0.5
1
77
94
96
.0
.0
96
94
77
18
8.
19
35
94.096774
9355
188.1
0.5
94
28
2.2 1 .096
77
90 88.
4
6556470.4376 32 1935 4
8.6.58 83 .38
5
77 06 87 71
42 5
6.3
1.5
1
2 87 7
03 .3 38
29 376 .48
0
47
58
87
729
0.9
94
1.
91
1411.4516
548
1317.3
75
658.67742
564.58065
470.48387
37
2.
03
2.
2
45 2
86 74
4. .83 32
5 10
8
6.
0.5
10 112 122
3
3
2. 846 9 5.069.161 .2581
77 .8 40 45 3
41 70 .967
9 97 74
75
28
1
916
3.8
0.5
1 2
28
2.
29 18
47
03 8.1
5
93
65 64 0.4 376 2
55
8. .5 83 .3
67 80 8 87
75 74 65 7 1
2.7 2
8 7
10 46.419
1135.0 870
9
2
14 9.1645 974
61 0 .
11
3 9
.
1176931541511631 1 6774
87 .79950 7.22
.8341.654.5 353.2
879 5248 48 58
4
1
3
2. .89
91 1
8 7 61
1
458
1.9
2.5
x2
871
2.91
4.
8
5. 64
83 52
3.
7
89
7 6 42
16 2
8. .78.810
75 3 3
1 .91 1.94 0.97
61 23 2
87 58
29
1
1
3
03
2
1.5
3.5
2
74 5 7
67 0 6 8
8. 58 83
2
74
65 64. 0.4
03 5
67
5 47
29 35
2. 19 4.09
28 88.
9
1
2.5
2 42
206
3
03 7
13.67
19
47
81 83
.70 7561
12.6
10
6. 5.
8.
48
3
2
0
11.67
9
5
2
4
10
9.7
86
9.729 .7019
4.
03
3
832
7.7
13
8.756
7.7832
6.81
3
032
032 42
6.81
7
5.83
5.8374
1
61
2
91
58
52
4
6
3.8 871
94
4.8
4.86452
1.
1
2.9
3
0
29
3.89161
97
0.
2
1
f (x ,x ) = 100*(x x ) + (x 1)
16.5394
15.56
14.5935 65
1 13.6
11. 2.647 206
674 7
8
3.5
x2
1.
94
29
58
03
1
14
4
1
.
5
.
16 9 12 0.7
5.88645
.53 135 .64 101
37 2 3.
19181. 79.45.56 771.69
89
0.9
2210..45485512 65 13 748 9.7 7.76.8412
16 2.9 1.9
729
29 83 03
.4403812 3
1
.62
18 458
03
03 23 2
319
71
06
1
1 2
0.
97
0.5
0.5
1.5
(a)
(b)
Figure 1: (a)Contour Plot of f1 (x1 , x2 ) (b)Contour Plot of f2 (x1 , x2 )
Gradient Methods
The objective of the gradient methods is to generate a sequence of vectors, x0 , x1 , . . . , xn ,in

Rn ,such that: f (x0 ) > f (x1 ) > f (xn ). Thus the iterative method of generating such
sequences can be mathematically expressed as:
xk+1 = xk tk Ak f (xk )
(4.1)
where tk is the step-size for the kth iteration. Note the process of determining the
minimum involves traversing along the negative direction of the gradient vector given by
f (xk ). Ak is a linear transformation of the gradient vector, which provides more flexibility
in exploring for the minima that just the line search provided by the gradient. Ak in case of
the Newton-Raphson is 2 f (xk )1 .
4.1
Constant and Variable Step-size Steepest Descent Methods
Constant and variable step size steepest descent methods involves choosing a constant tk
or multiple tk s in the equation 4.1. In the variable step size method, at every step of
generation of next exploration point in the function, that step from the chosen array that
gives a minimum value of f evaluated at that point is chosen. Optimal Step-size method
may be thought of as an extension of the variable step-size method taking the step-size to

f1(x1,x2), Const Stepsize=0.150
71
0.5
0.5
81
45
1.9
61
91
1
87 3.8
1
9
2.
0.5
0
0.5
1
0.5
x1
1.
90
72
0.9
94
58
71
8
.91
2
0
0.5
(a)
81
45
1.9
3
290
0.97
81
45
1.9
0.9
72
90
3
1.9
2.
91
1. 871
94
58
1
0.
97
29
03
03
29
97
0.
58
1.5
0.9
581
2903
58
18
0.972903
94
2.9
90
72
71
0.97
1.
45
86
4.
2
74
83
5.
61
91
3.8
0.5
3.8916
2.918
1
16
89
71
3.
18
2.9
42
37
52 5.8 32
64
0
4.8
81
6.
4.86452
1.5
1
16 52
89 64
3. 4.8
1
3.
13
4.88916
1
.
64 1
16 6120 0.70 9.7
1 2
1 .5 46
6.8 52
11978..53194 .593 191 903
7.7103 5.8
2120.4458253 155.
1 .6
832 374
.4.043812
56 2.67448
23
2
65
391
77
90
72
0.9
2
03
81 2
6. 74
3 52
8
5. 64
8
4.
2. 1.9
91 4
87 58
1 1
187
1.9
2.9
0.5
94
03
29
9.72903
8.756
13
7.78
323
6.81
032
5.837
42
2.5
77
.64
03
206 12
29
9.7
748
11.6
19
10.70
3
561
8.7
323
7.78
61
032
91
6.81
3.8
742
5.83 2
71
1
5
8
4
1
58
4.86
2.9 .94
1
13.6
6.
8
7. 103 5.8
7 2
8.832 374
75 3
2
61
3
1
458
16.5394
13.6 14.5
15.5665
206 935
12
10.7 11.6
.6
019 748 477
3.5
3.89161
1.5
1 2
97
0.
x2
2.5
f (x ,x ) Variable Stepsizes chosen from =0.090,0.150,0.200
0.9
72
90
3
2 2
03 74
81 3
6. 5.8 2
5
4
86
4. 61
1
89
3. 71
8
91
2.
1.
3.5
16.5394
20677 748
15.5665
13.6
13.6
3
64 .6 9
20 14.5935
12. 11 .7013 561
12.646
10290 8.7
77
9.7
11.6748
10.7019
9.72903
323
8.75613
7.78
7.783
23
6.81
032
032
6.81 3742
5.8
5.83742
452
81
4.86
45
4.86452
161
1.9
3.89
1
87
61
2.91
3.891
0.
97
6. 5.8
29
81 37 4.
03
8. 7.7032 42 864
75 8
2.
52
91 1.9
61 32
87 45
3 3
81
1
10
3.8
.
91
13 7019
61
.62
06 11. 9.
1615 1 674 729
6.85.83 4
20 1 1.75.3.95646.59 81 03
1 7 .
.493.418514 5 35 2.6
8.77.703242 8645
1 58.4283
47
56 832
2
1 52
7
13 3
1.5
(b)
Figure 2: (a)Convergence of f1 using constant step-size steepest descent method

(b)Convergence of f1 using variable step-size steepest descent method
the limit of a continuum.
Convergence of f1 using constant step size of 0.15, and the steepest descent algorithm:
iteration
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
x1
-1.0762
-0.9756
-0.8351
-0.6830
-0.5009
-0.2811
-0.0163
0.2830
0.5529
0.6871
0.7124
0.7338
0.7532
0.7708
0.7867
0.8013
0.8145
0.8267
0.8379
x2
1.9670
1.7244
1.4926
1.2541
1.0178
0.7877
0.5751
0.4027
0.3059
0.3058
0.3557
0.4012
0.4424
0.4799
0.5142
0.5456
0.5745
0.6012
0.6259
f1(x1,x2)
4.9648
4.4998
4.0000
3.4528
2.8408
2.1434
1.3633
0.6182
0.1999
0.1256
0.1058
0.0897
0.0765
0.0656
0.0565
0.0488
0.0423
0.0368
0.0321
Norm(gradient)
1.7511
1.8067
1.8861
1.9889
2.1212
2.2640
2.3027
1.9113
0.8948
0.3730
0.3354
0.3035
0.2760
0.2519
0.2308
0.2122
0.1955
0.1806
0.1672
21
0.8482
0.6487
0.0280
0.1551
22
0.8578
0.6700
0.0246
0.1441
23
0.8666
0.6897
0.0216
0.1341
24
0.8748
0.7081
0.0190
0.1250
25
0.8823
0.7252
0.0167
0.1166
26
0.8894
0.7412
0.0147
0.1089
27
0.8960
0.7562
0.0130
0.1018
28
0.9021
0.7701
0.0115
0.0953
29
0.9078
0.7832
0.0102
0.0893
30
0.9132
0.7955
0.0090
0.0837
31
0.9182
0.8070
0.0080
0.0786
32
0.9229
0.8179
0.0071
0.0738
33
0.9273
0.8280
0.0063
0.0693
34
0.9314
0.8376
0.0056
0.0651
35
0.9352
0.8465
0.0050
0.0613
36
0.9389
0.8550
0.0044
0.0577
37
0.9423
0.8629
0.0040
0.0543
38
0.9455
0.8704
0.0035
0.0511
39
0.9485
0.8775
0.0031
0.0482
---------------------------------------------------------------------The minima of f1 found after 39 iterations,
using constant step of 0.150000, gradient methods are x1=0.948498 x2=0.877480,
and the min value of function is =0.003144
---------------------------------------------------------------------Convergence of f1 using variable steps chosen from 0.05,0.15 and 0.25,
and negative direction of gradient is used as the line of exploration.
iteration
2
3
4
5
6
7
8
9
10
11
12
13
14
x1
1.3015
1.1441
1.2024
1.1793
1.1833
1.1558
1.1658
1.1478
1.1555
1.1383
1.1383
1.1161
1.1241
x2
1.4738
1.5134
1.4766
1.4710
1.4389
1.4273
1.4109
1.3901
1.3610
1.3564
1.3321
1.3175
1.3046
f1(x1,x2)
0.1394
0.0626
0.0419
0.0386
0.0351
0.0326
0.0302
0.0271
0.0248
0.0228
0.0204
0.0186
0.0171
Norm(gradient)
1.8044
0.7662
0.2635
0.1617
0.1989
0.2138
0.1373
0.1506
0.1983
0.1215
0.1324
0.1682
0.1038
15
1.1114
1.2882
0.0152
16
1.1139
1.2670
0.0137
17
1.0972
1.2591
0.0125
18
1.1044
1.2426
0.0114
19
1.0882
1.2357
0.0104
20
1.0954
1.2203
0.0095
21
1.0802
1.2141
0.0087
22
1.0868
1.1999
0.0079
23
1.0730
1.1943
0.0072
24
1.0788
1.1814
0.0065
25
1.0666
1.1761
0.0059
26
1.0713
1.1646
0.0054
27
1.0608
1.1595
0.0049
28
1.0655
1.1458
0.0044
29
1.0577
1.1439
0.0040
30
1.0559
1.1339
0.0035
using Variable Stepsizes chosen from =0.090,0.150,0.200, gradient
method are x1=1.055874 x2=1.133855,
----------------------------------------------------------------------
0.1067
0.1232
0.1206
0.1175
0.1137
0.1094
0.1047
0.0996
0.0942
0.0887
0.0832
0.0777
0.0724
0.0889
0.0510
0.0494
Convergence of f2:
The function f2 did not converge at all for constant and variable
stepsize methods, various intial points ranging from [-1 1], [-1.4
0.5] [2 2] and various step sizes ranging from 0.05,0.09,0.15 were
tried in vain the steepest descent algorithm couldnt locate the
descent direction to iterate on and hence a failed output:
iteration
x1
x2
f2(x1,x2)
Norm(gradient)
not a descent direction ... exiting
The extremum could not be located after 2 iterations
4.2
Newton-Raphson method
The Newton-Raphson method, uses not only the first variant, but also the second variant
information in its search for the minima. The Taylors approximation of a function given by
the expression 2.7, can be re-written (ignoring the higher order terms) as:

f1(x1,x2), NewtonRaphson Initial pt 2.0,2.0
4.86452
3.89
1
458
91
1.9
2.
0.972903
0.5
0
1.
94
2.
58
87
1.9
0.5
81
45
91
90
2
.97
1
2.9
1
87
0.5
1 2
16 5
89 64
3. 4.8
1
3
28 7 6 .
2. 3 8
2 7
18 903 1
94 8.1 2
9
.0
96 355
77
4
1
58
94
1.
1.5
1.5
94.096774
1
0.5
0
35
19
.09
18
8.1
8.
94
0.5
5.
83
6 7
7 .8 42
8. .78 103
75 32 2
61 3
3
61
91
3.8
52 42 2
64 37 03
4.8 5.8 6.81
0.5
03
03
9
72
0.
1
87
1.9
161
1
187
2.9
3.89
161
2
1.5
2.5
1
58
42
28
94
2.
.0
56
29
96
65 4.5 4
03
77
8 . 80 7 0
2
4
6 7 6 .4
74 5 83
87
2
84
6
10 .870
37
35 97
6.3
.06
18
87
1154113 45
1 28 8.1
75
94
051.417
2
5
2 9
.
.
1
1659.5451.35 11 9 774 6564.5 47 .290 355 0967
17939. .6846 48 12 29 40 19 8.6 80 0.
32
74
4
23 .16 .96
877445
6
7
74 5 838
.25 13 77
.83192
2
7
4
81
87
32
6.810 2
2
74
645
5.83
4.8
5.837
2.5
1505.5484
122
1411.451
581
6
3.2
35 11293.258
1317.3548 1229.1613
.06 .16 1
2 5
7
45 13
74 06 7
11245 67748709
75
67 8 8
940
.06 .9 6.
2.7
8. 4.5 83
.967
1035 940 84
74
65 56 0.4
56
74
1
4.5
9
846.87
47
097
47 8065
7419
0.4
752.7
83
87
658.67742
564.58065
376
1
7
.38
470.4838 76.387 2
71
3
03
55
282
.29
93
.290
282
8.1
32
74
18
67
.09
188.19
94
355
10
3.5
0.97
1.9
290
458
3
1
8.756
29
9.72903
13
7.78323
2
2
03 2 45
81 4 6
6. .837 4.8
5
1
16 71
89 8
3. 2.91
97
206
12.64
77
10.7
019
11.674
8
0.
13.6
3.5
f2(x1,x2), NewtonRaphson, Initial Pt: 2.0,2.0

4
16.5394
15.5665 .5935 6
7
14 3.620 2.647 748
1
1 11.6
03
29
9.7
19
3
10.70
613.7832
5
8.7
7
2.
0.
91 1.94
87 58 972
90
1
1
3
11
3
.8
13 .67
4.8916
15 .6122.468 9.7
1
6
0
.
16 56 6 47
29
5.8 452
.5 6 1 7
03
37
20 1839154 4.59
76. .8142
.4139.4 7.
10
3
.70 7830232
1.48555212 5
81 3
3
19
18
67
74
9677
93
94.0
55
0.5
x1
0.5
71
38
6.
37
35
188.19
1.5
x1
(a)
(b)
Figure 3: (a)Convergence of f1 and (b)Convergence of f2 using Newton-Raphson
f (xk+1 ) f (xk ) + 2 f (xk )(xk+1 xk )
(4.2)
which is also clear from fundamental definition of derivatives in calculus. Now finding
the root of f (x) we get:
xk+1 = xk 2 f (xk )1 f (xk )
(4.3)
Convergence of f1:
iteration
x1
x2
f1(x1,x2)
Norm(gradient)
2
1.0593
0.5733
0.3046
2.6785
3
1.0310
1.0622
0.0010
0.0653
4
1.0000
0.9991
0.0000
0.0044
with initial pt 2.000000 2.000000, using Newton-Raphson(NR) method are x1=1.0, x2=1.0,
----------------------------------------------------------------------
Convergence of f2:
iteration
x1
2
1.0012
3
1.0012
x2
0.0099
1.0025
f2(x1,x2)
98.5152
0.0000
Norm(gradient)
444.3233
0.0025
10

with initial pt 2.0,2.0,using Newton-Raphson(NR) method are x1=1.001233 x2=1.002467,
----------------------------------------------------------------------
4.3
Conjugate Gradient Method
The conjugate gradient method uses a method similar to Gram-Schmidts orthogonalization procedure to generate Q-Conjugate vectors [1]. That is dk is constructed as a linear
combination of dk1 vectors, with the coefficient k1 chosen such that < dk , Qdk1 >= 0.
The conjugate gradient algorithm can be expressed as the following sequence of steps:
epsilon
r0
=
H
=
d0
=
t0
=
x1_y1 =
r1
=
b0
=
d1
=
= 0.05
%termination condition
gradient_f1(x0_y0)
hessian_f1(x0_y0)
-r0
- <d0,r0> / <d0,(H d0)>
x0_y0 + t0*d0
gradient_f1(x1_y1)
<r1,r1>/<r0,r0>
-r1 + b0*d0
while (norm(gradient_f1(xk_yk)) < epsilon) {

H
xk_yk
rk
bk-1
dk
tk
}
=
=
=
=
=
=
hessian_f1(x0_y0)
xk-1_yk-1 + tk-1 * dk-1
gradient_f1(xk_yk)
<rk,rk> / <rk-1,rk-1>
-rk + bk-1 * dk-1
- <dk,rk> / <dk,(H dk)>
Results of Convergence of f1 using the conjugate gradient algorithm

whose pseudo-code is given above is:
iteration
2
3
4
5
x1
1.4975
1.4275
1.4032
1.3242
x2
2.1115
2.1236
2.1165
1.9980
f1(x1,x2)
0.2646
0.1901
0.1843
0.1649
Norm(gradient)
1.7980
0.4035
0.2955
0.8101
45
2
0.5
3.
2.
29
03
2
94
.0
96
77
4
28
19
8.
94
.09
67
74
6774
188
94.09
0.5
55
355
.19
188
.193
94
.09
67
18 74
8.1
93
5
18
77
96
.0
94
2
03 71
29 8
2. .3
28 376
0.5
x1
(a)
1
1.5
94.096774
0.5
6.
86
4.
16
0.5
71
18
2.9
89
0.5
1
16
581
4
1.9
4
81 5. .86
0 83 45
7. 32 742 2
8. 78
75 32
61 3
11 9.73
.67 29
48 03
15
.5614 112.
3
1187.565.59.36624077
210.94.84152136.55 6
.43528 39
11
4
2903
0.97
58
89
3.
71
0.9
94
90
72
1.
1.5
4
5 6 70.
18
4 4
8.
65 .5 8 838
19
8. 0 6 7
35
67 5
5
75742
2.
77
11 1 8419
28
12 29 0 46
2
3
37 .290
1323..165.0 .87
6
12 1 6 0
4 .3 32
16 15 7.355813 45 997
5670.4871
40
1973.991510448
4.5 83
.96
8774.64 51. 1
80 87
77
.8139 5254.45
65
4
87 8416
916
3.8
18
0.
1
2.9
2
2
74 03
83 81
5. 6.
0.5
2.5
58
.94
90
2
97
71
0.5
58
1.
3.891
2.918
03
81
5
94
61
87
1
1.
94
29
3
1
7
18
2.9
4.86452
1.5
91
2.
1
.89
1505.5484
1411.4516
1121223.21317.354
97
613
8
9.1 45 4 70
42 5
1039.161 581
11235.069677 46.8 677 806
5.06 3
8
10 940.
8. 4.5 87
940.945
5
19 6 56 483 1
6774
56
4
7
.7
0. 87
47 4.58
752.7
846
752
47 6.3
0.4 06
7419 .87097
8 5
37
37 387
6.3
658.67742
87
5
1
564.5806
470.48387
32
90
4
2.2
376.3871
282.2
28
77
188
9
96
0
3
2
.19
.0
5
355
94
935
.1
8
18
032
5.8374
2
61
3
7.7832 1032
6.8
52
742
5.83 4.864
3.5
0.9
2.9 1.94
729
18
03
71 581
2.5
97
8.75613
6.81
0.
2
2
3 3
32 10 2 45
78 6.8374 .86
8 4
.
5
7.
2. 1.9
91
0.9
87 458
72
1
1
90
3
3.8
91
61
6.8 5 4.8
10 .83 645
10
7.372 742 2
.70
83
19
23
3.5
f2(x1,x2), Conjugate Gradient Initial pt 2.0,2.0

4
16.5394
15.5665
12. 13.62
6
647 06
14.5935 13.620
7
48
77 11.67
12.64
10.7 11.6
748
019
9
1
3
0
9.72
10.7 2903 7561
903
9.7
8.
35
f1(x1,x2), Conjugate Gradient Initial pt 2.0,2.0

4
11
1.5
x1
(b)
Figure 4: (a)Convergence of f1 and (b)Convergence of f2 using conjugate gradient method

6
0.8612
0.7543
0.0194
7
0.8656
0.7353
0.0183
8
0.8788
0.7243
0.0170
9
0.8918
0.7265
0.0164
10
0.9207
0.7539
0.0151
11
0.9814
0.8592
0.0112
12
1.0314
0.9974
0.0054
13
1.0428
1.0756
0.0020
14
1.0388
1.0870
0.0016
with initial pt 2.000000 2.000000, using Conjugate Gradient method are
x1=1.0, x2=1.1, and the min value of function is =0.001568
----------------------------------------------------------------------
0.3224
0.2226
0.1211
0.1405
0.2646
0.4254
0.3617
0.1371
0.0475
Convergence of f2:
The function f2 does not converge using current implementation of
conjugate gradient method. The Gradient vector/Hessian becomes abnormally
large after a few iterations, due to precision errors while
calculating the gradient when the itinerant approaches the
minima. Note the direction of approach of itinerant before the
abnormal gradient value is headed in the right direction i.e. towards
the minima. A better approach would be to use direct methods to
calculate next estimate like the polynomial fit thereby circumventing
precision error in calculating first and second variants. The failed output:
iteration
x1
2
1.4813
3
1.4337
4
1.4257
5
1.4249
6
1.4203
7
0.8654
8
4.4958
The extremum could not be located after
x2
f2(x1,x2)
2.0134
3.4993
2.0290
0.2583
2.0316
0.1814
2.0317
0.1807
2.0223
0.1792
0.4497
8.9710
10.6666
9124.72
8 iterations
12
Norm(gradient)
113.955
16.914
1.5335
0.2822
2.3043
119.390
17279
Direct Search Methods
The direct search methods precludes the calculation of the computation intensive gradient
vectors or Hessian matrices. But like most methods they work best for some functions while
their performance and convergence is dismal for others.
5.1
Golden Search Method
The Golden Search method uses the limiting properties of the Fibonacci sequence Fk when
k ,the sequence and the limiting properties are:
F0 = F1 = 1
Fk = Fk1 + Fk2 ,
(5.1)
when k
(5.2)
Fk1
0.382
Fk+1
Fk
0.618
Fk+1
(5.3)
(5.4)
Thus at every iteration in golden search the uncertainty interval (region of locating a
minima) is reduced by 38.2 percent till you converge to a sufficiently small region where the
minima is found.
Results of convergence of f1 using Golden Search method:
iteration
x1
x2
f1(x1,x2)
1
2.0000
2.0000
5.0000
2
1.0968
1.0968
0.0207
3
1.0968
1.0968
0.0207
4
1.0968
1.0968
0.0207
Norm(gradient)
18.4391
0.6930
0.6930
0.6930

f1(x1,x2), GoldenSearch Initial pt 2.0,2.0
87
91
0.
97
29
03
2.
0.5
3.8
91
61
0.5
94
2
45
86
4. 42
7
83
0.9
58
5.
03
729
1.
2.
91
1
458
1.9
87
71
8
.91
188
94
1.5
.09
.19
67
74
355
282.29032
94.09677
1
0.5
0.5
0.5
x1
(a)
7
38
48
0.
35
47
19
77
96
.0
94
77
96
.0
94
0
0.5
1
94
.09
67
18
74
8.1
5
93
1
8. 2
18 03
9
.2
2
28
6774
94.09
93
55
9355
188.1
2
1
355
.19
188
8.
0.9
729
1
03
18 .9458
71
1
18
1
58
94
1.
03
2.5
2 5
74 06
67 8
8. .5
65 564
71
38
6. 32
37 .290
2
28
3
56 4 76.
4. 7 0 38
5 . 7
65806 483 1
8. 5 8 7
67
74
2
94
0.9
67
9
1312 174
75
1
37 282 18 4.0
14 1 2 0
2
16 510151.74.33.25 135. 846.77 564 47 6.3 .290 8.19 967
7
8
9359.545154 811206 .8 41
3
.
.749.6846 8 9.145 70 9 5800.48 71 32 55 4
6
97
61
1945
5 387
2
3
1.9
29
458
0
29
97
0.
0.9
1.5
1.
81
5
94
1505.5484
122
81
11 3.25 1317.31411.4516
3.25 13
548
67 10 29.1681
12229.1645
3
7
1
84 4 5.06 3
11 5.06
6.8
3
45
10
70
56
75297
4
4.5
940.9677.87097 19
.77
419
80
846 2.774
65
5
7
37
6.3 47
658.67742
0
28 871 .483
87
2.2
564.58065 .48387
90
470
32
376.3871
94
3.5
2.9
2.5
97
2
03 42
81 37
6. 5.8
2
45
86 61
4. 91
8 1
.
3 87
91
2.
0.
3.5
f2(x1,x2), GoldenSearch Initial pt 2.0,2.0

4
16.5394 5
13.6
15.566
206
14.5935
206 477
13.6 12.6 748
12.6
4
03
11.6
10.7 11.6777
29
48
019
9
9.7
1
0
10.7
3
3
561 .7832
9.7
.7
29
03
8
7
8.756
13
7.78323
6.81
032
032
6.81 3742
5.8
5.8374
2
452
4.86
4.86452
161
3.89
871
.91
1
2
16
3.89
4. 3
86 .8
4 91
5 52 61
7
0.
2.
8. .78 6.8 .837
91 1.94 972
75 32 10 4
90
87 58
9.76133 32 2
3
1
1
29
03
13
1
.62 0.70
06 1 19
2
4.8 3.8
1 .64
64 91
220191178.516 4.159 77 1
7.7 6 5.852 61
1.6
1..443.4.54182.3535.355
83.81 374
74
013 8152 94 66
23 03 2
8
9
5
2
13
1.5
0.5
0.5
x1
(b)
Figure 5: (a)Convergence of f1 and (b)Convergence of f2 using golden search

5
1.0968
1.0968
0.0207
0.6930
6
1.0069
1.0069
0.0001
0.0443
with initial pt 2.000000 2.000000, using Golden Search method are x1=1.0, x2=1.0,
----------------------------------------------------------------------
Results of convergence of f2 using Golden Search method:

iteration
x1
x2
f2(x1,x2)
Norm(gradient)
1
1.3280
1.3280
19.0809
247.8529
2
1.1076
1.1076
1.4314
58.1173
3
1.0353
1.0353
0.1347
16.8643
4
1.0116
1.0116
0.0138
5.3054
5
1.0038
1.0038
0.0015
1.7162
6
1.0012
1.0012
0.0002
0.5604
7
1.0004
1.0004
0.0000
0.1835
8
1.0001
1.0001
0.0000
0.0602
9
1.0000
1.0000
0.0000
0.0197
with initial pt 2.000000 2.000000, using Golden Search method are x1=1.0, x2=1.0,
----------------------------------------------------------------------
1.5
5.2
14
Polynomial Approximation -Quadratic fit method
The algorithm for fitting a quadratic polynomial to known three points a,b,c and function values fa,fb,fc at
these points is as follows[1]:
d0 is any direction vector that is normalized such that ||d0|| = 1
d0
= [1/sqrt(2) 1/sqrt(2)]^T %(say)
x0_y0
= [2 2]^T
%initial point
t0
= 0.15
%(say) assumed initial step size
fa
= f(x0_y0)
fb
= f(x0_y0 + t0.* d0)
if
fa < fb
fc
= f(x0_y0 - t0.* d0)
else
fc
= f(x0_y0 + 2*t0.*d0)
while norm(gradient_fl(xk_yk)) < e {

1 f a a2

1 f b b2

2
1 (b2 a2 )f a + (c2 a2 )f b + (a2 b2 )f c
1
1 1 fc c
=
tk =
=
2 1 a fa
2 (b c)f a + (c a)f b + (a b) f c
22

1 b fb

1 c fc
Note: For the algorithm to converge:
2 =
(c b)f a + (a c)f b + (b c)f c

>0
(a b)(b c)(c a)
xk+1 = xk + tk d0
}
The essence of the above algorithm is in approximating the
neighbourhood of f(x) with a quadratic polynomial:
f (x + tk d0 ) 0 + 1 tk + 2 t2k
Results of convergence of f1 using quadratic approximation:
iteration
x1
x2
f1(x1,x2)
Norm(gradient)
2
1.1923
1.1923
0.0896
1.5476
3
1.0413
1.0413
0.0036
0.2754
4
1.0024
1.0024
0.0000
0.0151
----------------------------------------------------------------------

f1(x1,x2) quadratic fit, initial pt:2.0,2.0
1
458
1
87
1
9
2.
2.9
16
0.5
1.9
18
71
0
0.5
x1
(a)
8.
94
.0
96
77
4
35
19
1.5
47
290
0.97
18
1.9
45
81
91
61
58
89
58
94
1.
03
29
94
45
86
4.
2
74
83
3.
1.5
94.096774
1
0.5
1312 10.7
14.62.64 01
.5 0 7 9 9.
72
1716. 9365 7
. 5
11 903
195123394 1
.
21 2.4518 5.5 6748
.400.481.4
6
3931 852 65
1.
0.9
0
3 7 .4 8
6 . 38
38 7
71
2
45
86
97
1
2. .94
91 58
87 1
1
0.972903
03
9
72
1505.5484
12
9
1411.4516
13
84 40.9
613
1 23.2 17.3548
5
6.8 67 10129.1 581
9.1 5
42 06
1125.064
70 74 35.0 613
19 77 58
97
645
74 58.6 64.
103
7
.
2 6 5
4
75
2
56
940.9677
4.5
03
6.87097
80
29
752.7741 84
65
2.
9
8
2
470
658.67742
.48
5
387
564.5806 .48387 1
28
87
470
2.2 376.38
.3
90
71
376
32
9032
188
74
282.2
.19
55
67
355
.193
.09
188
94
77
94
0.5
67
18
74
93
55
71 87
38 83
6. .4
37 470
94.096774
5
188.1935
0.5
5
93
8.
18
8.1
96
.0
94
.09
28
65564
2.
8. .5
29
67 80
03
74 65
2
2
8
9
11103 46.8
37
18 4.0
29 5. 70
8.1 96
47 6.38
13122.1606 97
93 774
0.4 71
11 3 1 4
55
75
8
16 4171.3.5253 5
3
94 2.
1793 1150.4 4881
28
0.9 77
5 87
87.745955. 1
2.2
67 419 64.5
.83199.546
90
74
80
87 64854
32
65
2
1
187
5.
1
4.
72
0.9
3
90
58
.94
0.5
2.5
3.8
87
2.91
2.9
0.5
58
4.8
5 64
6 .83 52
7..7810742
83 32
23
3.89161
3.89161
4.86452
2
1.5
3.5
4
.9
0.9
72
90
3
2.5
71
18
9
2.
0.
2
03 2
81 4
6. 837
5.
3.
89
16
1
3.5
f2(x1,x2) quadratic fit, initial pt:2.0,2.0

4
16.5394
1
12. 3.620
15.5665 .5935 6
48
647 6
14 3.620
.67
03
7
1 77
11
29
9.7
12.64
19
0
0.7
10.7 11.6748
1
019
8.75
613
9.72903
13
7.78
8.756 323
323
7.78
6.81
032
61
032
6.81
2
91
742 .8645 3.8
5.837
5.83
4
42
4.
8
5 . 645
6. 8 3 7 2
7. 810 42
8.783 32
75 23
61
3
15
0.5
1.5
x1
(b)
Figure 6: (a)Convergence of f1 and (b)Convergence of f2 using quadratic fit method

The minima of f1 found after 4 iterations,
using quadratic polynomial approximation to determine successive stepsize
with initial pt:2.0,2.0 and minima are x1=1.002384 x2=1.002384,
----------------------------------------------------------------------
Results of convergence of f2 using quadratic approximation:

iteration
x1
x2
f2(x1,x2)
Norm(gradient)
2
1.2496
1.2496
9.7917
168.3929
3
1.0865
1.0865
0.8897
45.0909
4
1.0159
1.0159
0.0263
7.3363
5
1.0007
1.0007
0.0000
0.3087
6
1.0000
1.0000
0.0000
0.0049
using quadratic polynomial approximation to determine successive stepsize
with initial pt:2.0,2.0 and minima are x1=0.999989 x2=0.999989,
----------------------------------------------------------------------
16
Observations and interpretation of results of different methods
The programming experience and visually seeing the convergence of different algorithms
from program output or through plots helped in understanding the different methods viz.;
the gradient and direct search methods. Here are the observations of the exercise:
The contour plots(figure 1) of f1 and f2 indicate they have a relative minima at [1 1]T .
It is also apparent from the expressions for the functions given by equations 3.1.
The constant step-size steepest descent method on f1 converged to the minima in 39
iterations,while the variable step-size method though more complex than the first was a
clear improvement with convergence in 30 iterations. Note: If the sequence of steps for
the variable step-size was taken to the limit, it would be the optimal step-size method
(ideally the best method but computationally expensive)
The Rosenbrock function f2 did not converge for steepest descent and conjugate gradient methods. The method was unable to locate a descent direction to iterate upon
for the former, see results of section 4.1. While in the later, though the algorithm
heads in the right direction i.e. towards the minima, precision errors in calculating the
gradient vector close to zero makes the algorithm go awry, resulting in abnormally high
gradients in the opposite direction - see figure 4 for the partial success of the current
implementation.
The Newton-Raphson method like conjugate gradient method, though computationally
expensive involving the calculation of the first and second variants of the functions at
every iteration, converged to the minima with least number of iterations of all the
methods. f1 converged in 4 iterations, while f2 converged in 3 iterations.
The direct search methods were the computationally least expensive methods, as they
did not involve computations of Gradients/Hessians. The complexity of the methods increase from Golden-search to Quadratic fit, While the Golden search relies on
the limiting behavior of the Fibonacci sequence in determining the next itinerant,the
Quadratic fit locates the next itinerant by minimizing a fitted quadratic polynomial
through the known values of the function evaluated at the three known points .
The Quadratic fit method converged to the minima in 4 iterations on f1 test function,
while it converged to the minima in 6 iterations on the f2 test function
17
Conclusion
The unconstrained optimization problem of finding the relative minima x , of f(x) in Rn

was studied in this report. Various methods including gradient methods and direct search
methods were evaluated for the given test functions f1 and f2 , see previous section 6 for a
summary of the interpretation on the results.
References
[1] M. Aoki, Introduction to Optimization Techniques, The Macmillan Company, 1971.
[2] E. K. P. Chong and S. H. Zak, An Introduction to Optimization Second Edition,
Wiley-Intersciences Series, 2001.
[3] D. G. Luenberger, Introduction to Linear and Nonlinear Programming, AddisonWesley, 1972.
[4] S. A. Marwan, Lecture notes. ECE-2671 fall 2005.
Matlab Code for the Project
The code for testing the various algorithms on f1 and f2 functions are identical, the only
changes are in the call to appropriate functions, for example Newton-Raphson algorithm
tested on f2 , would call gradient f2 instead of gradient f1 , and similarly call appropriate
functions depending on whether the test is on f1 or f2 . Since code is identical, the code for
various algorithms tested on f1 is provided here in the annex:
A.1
Code for Contour Plots of functions
%contour plots By Narayanan Krishnamurthy

%For Dr.Simaan ece 2671 class
clear,clf,clc
n = 30; %# of contours
format long g
[X,Y] = meshgrid(-1:.02:2,-1.4:.02:4);
Z = (X.^2 - Y).^2 + (X - ones(size(X))).^2;
%Then, generate a contour plot of Z.
[C,h] = contour(X,Y,Z,n);
clabel(C,h),xlabel(x_1,Fontsize,18),ylabel(x_2,Fontsize,18),title(f(x_1,x_2)
= (x_1^2 - x_2)^2 + (x_1 - 1)^2 ,Fontsize,18);
grid on
A.2
Code for Constant step size Steepest Descent method
%Functions used by the Algorithm:

function r = f1(xk_yk)
format long g
r = (xk_yk(1)^2 - xk_yk(2))^2 + (xk_yk(1) -1)^2;
function p = gradient_f1(xk_yk)
format long g;
p(1) = 4*(xk_yk(1)^2 - xk_yk(2))*xk_yk(1) + 2*(xk_yk(1) -1) ;
p(2) = -2*(xk_yk(1)^2 - xk_yk(2));
function idx = check_min_t_f1(t1,t2,t3,xk_yk)
[m,idx] = min([f1(xk_yk -t1.*gradient_f1(xk_yk))
f1(xk_yk -t2.*gradient_f1(xk_yk)) f1(xk_yk -t3.*gradient_f1(xk_yk))]);
%Constant step size -Steepest Descent Algorithm By Narayanan Krishnamurthy
%for ee2671 fall 2005
clear,clf,clc
%Gradient methods f1
clear; echo off;
format long g;
% initial point
x0_y0 = [2 2];
% termination point
e = 0.05; N =1000;
% constant step size
t = 0.15;
xk_yk(1,:) = x0_y0 - t.*gradient_f1(x0_y0);
disp(iteration
x1
x2
f1(x1,x2)
Norm(gradient));
for i = 2:N
xk_yk(i,:) = xk_yk(i-1,:) - t.*gradient_f1(xk_yk(i-1,:));
if f1(xk_yk(i,:)) > f1(xk_yk(i-1,:))
disp(not a descent direction ... exiting);
disp(sprintf(The extremum could not be located after %d iterations,i));
18
break;
end
disp(sprintf(%-4d\t\t%3.4f\t\t\t%3.4f\t\t%3.4f\t\t%3.4f,i,xk_yk(i,1),xk_yk(i,2),
f1(xk_yk(i,:)),norm(gradient_f1(xk_yk(i,:)))));
if norm(gradient_f1(xk_yk(i,:))) < e
disp(----------------------------------------------------------------------);
disp(sprintf(The minima of f1 found after %d iterations,,i));
disp(sprintf(using constant step of %f, gradient methods are
x1=%f x2=%f,,t, xk_yk(i,1),xk_yk(i,2)));
disp(sprintf(and the min value of function is =%f, f1(xk_yk(i,:))));
disp(----------------------------------------------------------------------);
break;
end
end
format long g
[X,Y] = meshgrid(-1:.02:2,-1.4:.02:4);
Z = (X.^2 - Y).^2 + (X - ones(size(X))).^2;
clabel(C,h),xlabel(x_1,Fontsize,18),ylabel(x_2,Fontsize,18),
title(sprintf(f_1(x_1,x_2),Const Stepsize=%1.3f,t),Fontsize,18);
grid on
hold on;
convergence = [x0_y0 xk_yk];
%scatter(convergence(1,:),convergence(2,:));
plot(convergence(1,:),convergence(2,:),-ro);
A.3
Code for Variable step size Steepest Descent Method
%Variable Step Size Steepest Descent Method, Narayanan Krishnamurthy

clear,clf,clc
clear; echo off;
format long g;
% initial point
19
x0_y0 = [2 2];
% termination point
e = 0.05; N =1000;
tf1(1) = 0.09; tf1(2) =0.15; tf1(3) =0.2;
t = check_min_t_f1(tf1(1),tf1(2),tf1(3),x0_y0);
xk_yk(1,:) = x0_y0 - tf1(t).*gradient_f1(x0_y0);
disp(iteration
x1
x2
f1(x1,x2)
Norm(gradient));
for i = 2:N
t = check_min_t_f1(tf1(1),tf1(2),tf1(3),xk_yk(i-1,:));
xk_yk(i,:) = xk_yk(i-1,:) - tf1(t).*gradient_f1(xk_yk(i-1,:));
if f1(xk_yk(i,:)) > f1(xk_yk(i-1,:))
break;
end
disp(sprintf(%-4d\t\t%3.4f\t\t\t%3.4f\t\t%3.4f\t\t%3.4f,i,xk_yk(i,1),
xk_yk(i,2),f1(xk_yk(i,:)),norm(gradient_f1(xk_yk(i,:)))));
disp(----------------------------------------------------------------------);
disp(sprintf(using Variable Stepsizes chosen from
=%1.3f,%1.3f,%1.3f, gradient method are x1=%f x2=%f,,
tf1(1),tf1(2),tf1(3),xk_yk(i,1),xk_yk(i,2)));
disp(----------------------------------------------------------------------);
break;
end
end
format long g
[X,Y] = meshgrid(-1:.02:2,-1.4:.02:4);
Z = (X.^2 - Y).^2 + (X - ones(size(X))).^2;
title(sprintf(f_1(x_1,x_2) Variable Stepsizes chosen from =
20
%1.3f,%1.3f,%1.3f,tf1(1),tf1(2),tf1(3)),Fontsize,18);
grid on
hold on;
A.4
Code for Newton-Raphson method
%New Functions used in the algorithm:

function h = hessian_f1(xk_yk)
format long g
h(1,1) =
4*(xk_yk(1)^2 - xk_yk(2)) + 8*xk_yk(1)^2 + 2;
h(1,2) = -4*xk_yk(1);
h(2,1) = -4*xk_yk(1);
h(2,2) =
2;
function [ispos,posh] = is_positive_h(h)
ispos = 1; posh = h; % assume given hessian in pd
N = 5000; %try making hessian positive
if min(eig(h)) < 0
% make hessian positive
for i = 2 : N
if min(eig(h)) > 0
posh = h;
ispos = 1;
break;
else
if i == N
ispos = -1;
disp(couldnt find positive hessian);
end
end
h = h + i.* eye(size(h));
end
end
%Newton-Raphson method by Narayanan Krishnamurthy
21
clear;clc; echo off;
format long g;
% initial point
x0_y0 = [2 2];
% termination point
e = 0.05; N =1000;
t = 1;
[T, H ] = is_positive_h(hessian_f1(x0_y0));
if T < 0
disp(Unable to proceed with non positive hessian)
break;
end
xk_yk(1,:) = x0_y0 - t.*gradient_f1(x0_y0)*inv(H);
disp(iteration
x1
x2
f1(x1,x2)
Norm(gradient));
for i = 2:N
[T, H ] = is_positive_h(hessian_f1(xk_yk(i-1,:)));
if T < 0
break;
end
xk_yk(i,:) = xk_yk(i-1,:) - t.*gradient_f1(xk_yk(i-1,:))*inv(H);
%
%
%
%
%
if f1(xk_yk(i,:)) > f1(xk_yk(i-1,:))

break;
end
disp(----------------------------------------------------------------------);
disp(sprintf(with initial pt %f %f, using Newton-Raphson(NR)
method are x1=%1.1f, x2=%1.1f,,x0_y0(1),x0_y0(2),xk_yk(i,1),xk_yk(i,2)));
disp(----------------------------------------------------------------------);
break;
22
end
end
format long g
[X,Y] = meshgrid(-1:.02:2,-1.4:.02:4);
Z = (X.^2 - Y).^2 + (X - ones(size(X))).^2;
title(sprintf(f_1(x_1,x_2), Newton-Raphson Initial pt %1.1f,%1.1f,
x0_y0(1),x0_y0(2)),Fontsize,18);
grid on
hold on;
A.5
Code for Conjugate Gradient Method
%Conjugate Gradient method, Narayanan Krishnamurthy

format long g;
% initial point
x0_y0 = [2 2];
% termination point
e = 0.05; N =1000;
%t = 1;
if T < 0
break;
end
r0 = gradient_f1(x0_y0);
d0 = -1.*r0;
if T < 0
23
24

break;
end
t0 = - d0*r0/(d0*(H*d0));
xk_yk(1,:) = x0_y0 + t0.*d0;
r(1,:) = gradient_f1(xk_yk(1,:));
b0 = r(1,:)*r(1,:)/(r0*r0);
%b0 = r(1,:)*(H*d0)/(d0*(H*d0));
d(1,:) = -r(1,:) + b0.*d0;
t(1) = - d(1,:)*r(1,:)/(d(1,:)*(H*d(1,:)));
disp(iteration
x1
x2
f1(x1,x2)
for i = 2:N
Norm(gradient));
[T, H ] = is_positive_h(hessian_f1(xk_yk(i-1,:)));
if T < 0
break;
end
xk_yk(i,:) = xk_yk(i-1,:) + t(i-1).*d(i-1,:);
r(i,:) = gradient_f1(xk_yk(i,:));
%b(i-1) = r(i,:)*(H*d(i-1,:))/(d(i-1,:)*(H*d(i-1,:)));
b(i-1) = r(i,:)*r(i,:)/(r(i-1,:)*r(i-1,:));
d(i,:) = -r(i,:) + b(i-1).*d(i-1,:);
t(i) = - d(i,:)*r(i,:)/(d(i,:)*(H*d(i,:)));
%
%
%
%
%
if f1(xk_yk(i,:)) > f1(xk_yk(i-1,:))

break;
end
disp(----------------------------------------------------------------------);
disp(sprintf(with initial pt %f %f, using Conjugate Gradient
method are x1=%1.1f, x2=%1.1f,,x0_y0(1),x0_y0(2),xk_yk(i,1),xk_yk(i,2)));
disp(----------------------------------------------------------------------);
25
break;
end
end
format long g
[X,Y] = meshgrid(-1:.02:2,-1.4:.02:4);
Z = (X.^2 - Y).^2 + (X - ones(size(X))).^2;
title(sprintf(f_1(x_1,x_2), Conjugate Gradient Initial pt
%1.1f,%1.1f,x0_y0(1),x0_y0(2)),Fontsize,18);
grid on
hold on;
A.6
Code for Direct Search - Quadratic Approximation method
%Functions used by algorithm

function q_fit_f1 = quad_fit_f1(xk_yk)
m =5; d0 =[(1/sqrt(2)) (1/sqrt(2))]; t0 = 0.005;
fa = f1(xk_yk); a = 0;
fb = f1(xk_yk + t0.*d0); b = t0;
if fa > fb
fc = f1(xk_yk +2*t0.*d0); c = 2*t0;
else
fc = f1(xk_yk -t0.*d0); c= -t0;
end
%a1 = -1*((b^2 - c^2)*fa + (c^2 - a^2)*fb + (a^2 - b^2)*fc)/ ((a-b)*(b-c)*(c-a));
a2 = ((c-b)*fa + (a-c)*fb + (b-a)*fc)/((a-b)*(b-c)*(c-a));
t = ((b^2 - c^2)*fa + (c^2 - a^2)*fb + (a^2 - b^2)*fc)/(2*((b-c)*fa + (c-a)*fb +(a-b)*fc));
if (a2 <= 0) %| (abs(t) > m*tk) %| (a1 > 0)
q_fit_f1 = -t0;
disp(quad fit failed);
else
26
q_fit_f1 = t;
end
%Direct Search-Quadratic Approximation, Narayanan Krishnamurthy

clear,clf,clc
clear; echo off;
format long g;
% initial point
x0_y0 = [2 2];
% termination point
e = 0.05; N =1000;
% initial step size
t0 = 0.005;
%initial gradient vector is used as the direction for successive quadratic fit
d0 = [1/sqrt(2) 1/sqrt(2)];
t = quad_fit_f1(x0_y0);
xk_yk(1,:) = x0_y0 + t.*d0;
disp(iteration
x1
x2
f1(x1,x2)
Norm(gradient));
for i = 2:N
t = quad_fit_f1(xk_yk(i-1,:));
xk_yk(i,:) = xk_yk(i-1,:) + t.*d0;
%
%
%
%
%
%
if f1(xk_yk(i,:)) > f1(xk_yk(i-1,:))

break;
end
disp(----------------------------------------------------------------------);
disp(using quadratic polynomial approximation to determine successive stepsize);
disp(sprintf(with initial pt:%1.1f,%1.1f and minima are x1=%f
%x2=%f,,x0_y0(1),x0_y0(2),xk_yk(i,1),xk_yk(i,2)));
disp(----------------------------------------------------------------------);
break;
end
end
format long g
[X,Y] = meshgrid(-1:.02:2,-1.4:.02:4);
Z = (X.^2 - Y).^2 + (X - ones(size(X))).^2;
title(sprintf(f_1(x_1,x_2) quadratic fit, initial pt:%1.1f,%1.1f,
x0_y0(1),x0_y0(2)),Fontsize,18);
grid on
hold on;
A.7
Code for Direct Search - Golden Search Method
%Assume that the relative minima lies in the uncertanity region [0 0] and
%[2 2], we will use golden search method to reduce the uncertanity region
%and thus determine the minima of f1 and f2
format long g;
% initial point
xk_yk(1,:) = [0 0];
xk_yk(2,:) = [2 2];
i =2; e = 0.05;
disp(iteration
x1
x2
f1(x1,x2)
Norm(gradient));
j = 1;
while i < 1000
a1 = 0.382.*(xk_yk(i,:)-xk_yk(i-1,:)) + xk_yk(i-1,:);
b1 = 0.618.*(xk_yk(i,:)-xk_yk(i-1,:)) + xk_yk(i-1,:);
if f1(a1) > f1(b1)
xk_yk(i+1,:) = a1; xk_yk(i+2,:) = xk_yk(i,:);
27
else
xk_yk(i+1,:) = xk_yk(i-1,:); xk_yk(i+2,:) = a1;
end
xkp_ykp(j,:) = xk_yk(i,:);
i = i + 2;
j = j + 1;
disp(sprintf(%-4d\t\t%3.4f\t\t\t%3.4f\t\t%3.4f\t\t%3.4f,(i-2)/2,xk_yk(i,1),
xk_yk(i,2),f1(xk_yk(i,:)),norm(gradient_f1(xk_yk(i,:)))));
disp(----------------------------------------------------------------------);
disp(sprintf(The minima of f1 found after %d iterations,,(i-2)/2));
disp(sprintf(with initial pt %f %f, using Golden Search
method are x1=%1.1f, x2=%1.1f,,xk_yk(2,1),xk_yk(2,2),xk_yk(i,1),xk_yk(i,2)));
disp(----------------------------------------------------------------------);
break
end
end
format long g
[X,Y] = meshgrid(-1:.02:2,-1.4:.02:4);
Z = (X.^2 - Y).^2 + (X - ones(size(X))).^2;
title(sprintf(f_1(x_1,x_2), Golden-Search Initial pt
%1.1f,%1.1f,xk_yk(2,1),xk_yk(2,2)),Fontsize,18);
grid on
hold on;
convergence = [xkp_ykp];
28

Unconstrained Optimization KN PDF

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Unconstrained Optimization KN PDF

Caricato da

Copyright:

Formati disponibili

Optimization Methods the unconstrained

The unconstrained extremization problem

2 Necessary and Sufficiency Conditions for Extremum points and Taylors

3 Iso-line plots of given objective functions

Constant and Variable Step-size Steepest Descent Methods . . . . . . . . . .

Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 Direct Search Methods

6 Observations and interpretation of results of different methods

A Matlab Code for the Project

A.1 Code for Contour Plots of functions . . . . . . . . . . . . . . . . . . . . . . .

A.4 Code for Newton-Raphson method . . . . . . . . . . . . . . . . . . . . . . .

A.6 Code for Direct Search - Quadratic Approximation method . . . . . . . . . .

The unconstrained extremization problem

Necessary and Sufficiency Conditions for Extremum

where f (x ) is the Gradient or first variant of f and is given by:

The unconstrained extremization problem

variant of the function f and is given by the second-order derivative matrix:

Iso-line plots of given objective functions

f1 (x1 , x2 ) = (x21 x2 )2 + (x1 1)2

f2 (x1 , x2 ) = 100(x21 x2 )2 + (x1 1)2

The unconstrained extremization problem

Figure 1: (a)Contour Plot of f1 (x1 , x2 ) (b)Contour Plot of f2 (x1 , x2 )

The objective of the gradient methods is to generate a sequence of vectors, x0 , x1 , . . . , xn ,in

Constant and Variable Step-size Steepest Descent Methods

The unconstrained extremization problem

f (x ,x ) Variable Stepsizes chosen from =0.090,0.150,0.200

Figure 2: (a)Convergence of f1 using constant step-size steepest descent method

The unconstrained extremization problem

The unconstrained extremization problem

The unconstrained extremization problem

f2(x1,x2), NewtonRaphson, Initial Pt: 2.0,2.0

Figure 3: (a)Convergence of f1 and (b)Convergence of f2 using Newton-Raphson

f (xk+1 ) f (xk ) + 2 f (xk )(xk+1 xk )

The unconstrained extremization problem

---------------------------------------------------------------------The minima of f2 found after 3 iterations,

Conjugate Gradient Method

while (norm(gradient_f1(xk_yk)) < epsilon) {

Results of Convergence of f1 using the conjugate gradient algorithm

The unconstrained extremization problem

f2(x1,x2), Conjugate Gradient Initial pt 2.0,2.0

f1(x1,x2), Conjugate Gradient Initial pt 2.0,2.0

Figure 4: (a)Convergence of f1 and (b)Convergence of f2 using conjugate gradient method

The unconstrained extremization problem

Direct Search Methods

Golden Search Method

The unconstrained extremization problem

f2(x1,x2), GoldenSearch Initial pt 2.0,2.0

Figure 5: (a)Convergence of f1 and (b)Convergence of f2 using golden search

Results of convergence of f2 using Golden Search method:

The unconstrained extremization problem

Polynomial Approximation -Quadratic fit method

Note: For the algorithm to converge:

(c b)f a + (a c)f b + (b c)f c

The unconstrained extremization problem

f2(x1,x2) quadratic fit, initial pt:2.0,2.0

Figure 6: (a)Convergence of f1 and (b)Convergence of f2 using quadratic fit method

Results of convergence of f2 using quadratic approximation:

The unconstrained extremization problem

Observations and interpretation of results of different methods

The unconstrained extremization problem

The unconstrained optimization problem of finding the relative minima x , of f(x) in Rn

Matlab Code for the Project