13 NLP III Handout

Solving Multivariable, Constrained NLPs
Methods for Multivariable

Constrained Optimization
Non-Linear Programming (NLP): Multivariable,
Methods for Single-Var.
Optimization – Part III

Optimization – Part II
Unconstrained Optim.
Methods for Multivar.

Unconstrained Optim.
Optimization – Part I
Constrained
Basic Concepts for
Basic Concepts for
Basic Concpets for

Benoı̂t Chachuat <benoit@mcmaster.ca>
McMaster University
Department of Chemical Engineering
1111111111111111111111111
0000000000000000000000000
ChE 4G03: Optimization in Chemical Engineering
opportunity, 1-d linesearch constrained
objective, constraints Newton’s method optimization theory
unconstrained search direction &
optimization theory linesearch combination
Benoı̂t Chachuat (McMaster University) NLP: Multivariable, Constrained 4G03 1 / 29 Benoı̂t Chachuat (McMaster University) NLP: Multivariable, Constrained 4G03 2 / 29
Outline Penalty Methods
Constrained Idea: Transform a constrained NLP into an unconstrained NLP

Optimization
Consider the NLP problem
minimize:
n
f (x)
This lesson Numerical Analytical x∈IR
Solution Methods Solution Methods
subject to: gj (x) ≤ 0, j = 1, . . . , mi
hj (x) = 0, j = 1, . . . , me
Seq. Unconstrained Seq. Linear Seq. Quadratic Generalized Reduced

Penalty methods drop constraints and substitute new terms in the
Prog. Prog. (SLP) Prog. (SQP) Gradient (GRG) objective function penalizing infeasibility:
 
me mi
∆
X X
minimize:
n
F (x) = f (x) + µ  pje (x) + pji (x)
For additional details, see Rardin (1998), Chapter 14.5-14.7 x∈IR
j=1 j=1
(also check: http://www.mpri.lsu.edu/textbook/Chapter6.htm)
with µ > 0 the penalty multiplier; F , the auxiliary function
Penalty Functions for Constrained NLPs Penalty Functions for Constrained NLPs
Inequality Constraints: gj (x) ≤ 0 Equality Constraints: hj (x) = 0 Inequality Constraints: gj (x) ≤ 0 Equality Constraints: hj (x) = 0
i e i e
pj (x) = 0, if gj (x) ≤ 0 pj (x) = 0, if hj (x) = 0 pj (x) = 0, if gj (x) ≤ 0 pj (x) = 0, if hj (x) = 0
pji (x) > 0, otherwise pje (x) > 0, otherwise pji (x) > 0, otherwise pje (x) > 0, otherwise
Common Choices: Common Choices: Common Choices: Common Choices:

∆ ∆ ∆ ∆
pji (x) = max{0, gj (x)}γ , γ ≥ 1 pje (x) = |hj (x)|γ , γ ≥ 1 pji (x) = max{0, gj (x)}γ , γ ≥ 1 pje (x) = |hj (x)|γ , γ ≥ 1
Exact vs. Inexact Penalty Functions Nonsquared penalty functions, γ = 1:

◮ Such penalty functions are exact, under mild assumptions, for µ
If the unconstrained optimum of a penalty model F is feasible in the sufficiently large
original NLP, it is also optimal in that NLP ◮ But, the auxiliary function F is nonsmooth even though gj and hj may
If the unconstrained optimum of a penalty model F is optimal in the be differentiable
original NLP for some finite value of µ, the corresponding penalty Squared penalty functions, γ = 2:
function is said to be exact ◮ The auxiliary function F is differentiable provided that gj and hj are
If no such finite value of µ exists, it is said to be inexact (yields an differentiable
optimum as µ → ∞ only) ◮ But, such penalty functions are typically inexact
Constructing and Solving Penalty Models Pros and Cons of Penalty Models
Class Exercise: Consider the optimization problem Pros:
∆ Straightforward approach
min f (x) = x
x
∆
Possible use of fast and robust algorithms for unconstrained NLP
s.t. g (x) = 2 − x ≤ 0 (e.g., BFGS quasi-Newton search)
1 Solve this problem by inspection Cons:
2 Construct a penalty model using a square penalty function, then solve Large penalty multipliers lead to ill-conditioned penalty models
the unconstrained NLP as a function of the penalty multiplier µ ◮ Subject to slow convergence (small steps)
◮ Possible early termination (numerical errors)
5 5
4
In practice: Sequential Unconstrained Penalty Algorithm
µ
µ=
4
µ=5
f (x) + µp(x)
µ=
µ=5
=
µ
0.5
3
1.5
1.5
3
p(x)
Considers a sequence of increasing penalty parameters, µ0 < µ1 < . . .

0.5
Solves each new optimization problem (µk+1 ) from the optimal

2
1
0
1 solution obtained for the previous problem (xk )
-1 0 Produces a sequence of infeasible points, whose limit is an optimal
-1 -0.5 0 0.5 1 1.5 2 2.5 3 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x x solution to the original NLP (exterior penalty function approach)
Sequential Unconstrained Penalty Algorithm Barrier Methods
Step 0: Initialization
◮ Form penalty model; choose initial guess x0 , penalty multiplier µ0 > 0, Idea: Transform a constrained NLP into an unconstrained NLP
escalation factor β > 1, and stopping tolerance ǫ > 0; set k ← 0
Step 1: Unconstrained Optimization Consider the NLP problem with inequality constraints only
k
◮ Direction: Starting from x , solve penalty optimization problem
  minimize:
n
f (x)
me mi x∈IR
∆
X X
min F (x) = f (x) + µ  pje (x) + pji (x) , subject to: gj (x) ≤ 0, j = 1, . . . , mi
x
j=1 j=1
with µ = µk , to produce xk+1

Barrier methods drop constraints and substitute new terms in the
objective function discouraging approach to the boundary of the
Step 2: Stopping
hP feasible region:
me Pmi i k+1 i
◮ If µk e k+1
j=1 pj (x ) + j=1 pj (x ) < ǫ, stop — report xk+1
mi
(approximate KKT point) ∆
X gj (x)ր0
minimize:
n
F (x) = f (x) + µ bj (x), bj (x) −→ +∞,
x∈IR
Step 3: Update j=1
k+1 k
◮ Enlarge the penalty parameter as µ ← βµ
◮ Increment k ← k + 1 and return to step 1
with µ > 0 the barrier multiplier; F , the auxiliary function
Barrier Functions for Inequality Constrained NLPs Constructing and Solving Barrier Models
Ideal Barrier Function: gj (x) ≤ 0 Class Exercise: Consider the same optimization problem as previously
∆

bj (x) = 0, if gj (x) < 0 min f (x) = x
x
bj (x) = +∞, otherwise ∆
s.t. g (x) = 2 − x ≤ 0
Common Barrier Functions: 1 Construct a barrier model using the inverse barrier function, then
∆ 1 ∆ solve the unconstrained NLP as a function of the barrier multiplier µ
bj (x) = − , bj (x) = − ln (−gj (x))
gj (x) 5 6
5.5
4
f (x) + µb(x)
5
Properties of Barrier Functions µ = 1.5
3 4.5
b(x)
The optimum of a barrier model can never equal the optimum of the µ = 1.5 4
µ = 0.5
original NLP model if µ > 0 and that optimum lies on the boundary 2 3.5
µ = 0.5 µ = 0.1
of the feasible domain 1
3
µ = 0.1 2.5
However, as µ ց 0, the unconstrained optimum comes closer and 0 2
2 2.5 3 3.5 4 2 2.5 3 3.5 4
closer to the constrained solution (as with penalty methods) x x
Pros and Cons of Barrier Models Sequential Unconstrained Barrier Algorithm
Pros: Step 0: Initialization
Straightforward approach ◮ Form barrier model; choose initial guess x0 , barrier multiplier µ0 > 0,
reduction factor 0 < β < 1, and stopping tolerance ǫ > 0; set k ← 0
Possible use of fast and robust algorithms for unconstrained NLP
(e.g., BFGS quasi-Newton search) Step 1: Unconstrained Optimization
Cons: ◮ Direction: Starting from xk , solve barrier optimization problem
Small barrier multipliers lead to ill-conditioned barrier models me
Subject to slow convergence (small steps) ∆
◮
X
min F (x) = f (x) + µ bj (x),
◮ Possible early termination (numerical errors) x
j=1
In practice: Sequential Unconstrained Barrier Algorithm with µ = µk , to produce xk+1

Considers a sequence of decreasing, positive barrier parameters, Step 2: Stopping
µ0 > µ1 > . . . > 0 ◮ If µk
Pme
bj (xk+1 ) < ǫ, stop — report xk+1 (approximate KKT point)
j=1
Solves each new optimization problem (µk+1 ) from the optimal
solution obtained for the previous problem (xk ) Step 3: Update
Produces a sequence of feasible points, whose limit is an optimal ◮ Decrease the penalty parameter as µk+1 ← βµk
solution to the original NLP (interior point approach) ◮ Increment k ← k + 1 and return to step 1
Sequential Linear Programming Methods LP-based Search Direction: Principles

Consider the NLP problem
Idea: Develop a method for constrained NLP based on a
sequence of LP approximations minimize:
n
f (x)
x∈IR
subject to: gj (x) ≤ 0, j = 1, . . . , mi
Follows the improving-search paradigm: hj (x) = 0, j = 1, . . . , me
◮ Generate a search direction by formulating, then solving, an LP
problem at each iteration An LP-based search direction ∆x at a given point x̄ is determined
◮ LP problems can be solved both reliably and efficiently by linearizing the original NLP problem at x̄:
minimize: f (x̄) + ∇f (x̄)T ∆x

An LP solution is always obtained at a corner/extreme point of the ∆x
feasible region: subject to: gj (x̄) + ∇gj (x̄)T ∆x ≤ 0, j = 1, . . . , mi
◮ A successful approach must consider extra bounds on the direction hj (x̄) + ∇hj (x̄) ∆x = 0,T
j = 1, . . . , me
components: a “trust region” ±δ
− δi ≤ ∆xi ≤ δi , i = 1, . . . , n
◮ The common approach is to bound the direction components with a
“box” (or hypercube)
Problem: The LP-based search direction could be infeasible!
Constructing and Solving Direction-Finding LP LP-based Search Direction: Penalty Approach
Class Exercise: Consider the optimization problem
∆
min f (x) = 2x12 + 2x22 − 2x1 x2 − 4x1 − 6x2 Feasibility of the LP-based search direction problem can be enforced
x
via softening the constraints by penalization in the LP objective:
∆
s.t. g1 (x) = 3x12 − 2x2 ≤ 0
∆
 
g2 (x) = x1 + 2x2 − 7 ≤ 0 mi me
minimize: f (x̄) + ∇f (x̄)T ∆x + µ 
X X
T yj + (zj+ + zj− )
Formulate, then solve, the direction-finding LP at x0 = ( 21 , 1) , for δ = 1
2
∆x,y,z±
j=1 j=1
3.5 3.5 T
subject to: gj (x̄) + ∇gj (x̄) ∆x ≤ yj , yj ≥ 0, j = 1, . . . , mi
3 3
hj (x̄) + ∇hj (x̄)T ∆x = zj+ − zj− , zj+ , zj− ≥ 0, j = 1, . . . , me
2.5 2.5
− δi ≤ ∆xi ≤ δi , i = 1, . . . , n
2 2
x2
x2
1.5 1.5
∆x with µ > 0 a suitable (large enough) penalty multiplier
1 1
x0 x0
0.5 0.5
0 0
0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 3.5
x1 x1
SLP Algorithm — Minimize Problem SLP Algorithm — Minimize Problem (cont’d)

◮ Choose initial guess x0 , initial step bound δ 0 , penalty multiplier µ > 0, Step 2: Stopping
scalars 0 < ρ1 < ρ2 < 1 (e.g., ρ1 = 0.25, ρ2 = 0.75), step-bound ◮ If ∆xk+1 < ǫ, stop — report xk (approximate KKT point)
adjustment parameter 0 < β < 1 (e.g., β = 0.5), and stopping
tolerance ǫ > 0; set k ← 0 Step 3: Step Sizes
◮ Compute ∆F k+1 = F (xk + ∆xk+1 ) − F (xk ), with
Step 1: LP-based Search Direction
Compute gradients ∇f (xk ), ∇gi (xk ) and ∇hi (xk )
 
◮ mi me
∆
X X
◮ Solve direction-finding LP, Merit function: F (x) = f (x) + µ  max{0, gi (x)} + |hi (x)|
  j=1 j=1
mi me
∆ T X X
min ± L = ∇f (xk ) ∆x + µ  yj + (zj+ + zj− ) ◮ If ∆F k+1 > 0 (no improvement), shrink: δ k ← βδ k ; return to step 1
∆x,y,z
j=1 j=1
T
◮ If ∆F k+1 > ρ1 Lk+1 (small improvement), shrink: δ k ← βδ k
s.t. gj (xk ) + ∇gj (xk ) ∆x ≤ yj , j = 1, . . . , mi ◮ If ∆F k+1 < ρ2 Lk+1 (good improvement), expand: δ k ← β1 δ k
k T
k
hj (x ) + ∇hj (x ) ∆x = zj+ − zj− , j = 1, . . . , me
Step 4: Update
− δ k ≤ ∆x ≤ δ k , y, z± ≥ 0 ◮ Update xk+1 = xk + ∆xk+1 ; increment k ← k + 1; return to step 1
to produce ∆xk+1 and Lk+1
Pros and Cons of SLP Second-Order Methods
Pros: Goal: Incorporate second-order information to achieve
faster convergence
Functions well for mostly linear programs
Converges quickly if the solution lies on the constraints First, consider NLPs with equality constraints only:
Can rely on robust and efficient LP codes minimize: f (x)
n
x∈IR
No need for computing/estimating second-order derivatives
subject to: hj (x) = 0, j = 1, . . . , me
Cons:
Poor convergence for highly nonlinear programs At a regular optimal point x∗ , there exist Lagrange multipliers λ∗
such that
Slow convergence to optimal points not at constraints (interior)  me 
X
Not available in general purpose modeling systems (GAMS, AMPL) ∗
 ∇f (x ) −
∗ ∗
λj ∇hj (x ) 
0 = ∇L(x∗ , λ∗ ) =  j=1 
But, ∗
h(x )
Used often in some industries (petrochemical)
me
Available in commercial products tailored for specific applications in where L(x, λ) = f (x) −
∆
X
λj hj (x)
specific industries j=1
Second-Order Methods (cont’d) Quadratic Programming
Idea: Solve the nonlinear system of (n+m) equations using Quadratic Programs
a Newton-like iterative method A constrained nonlinear program is a quadratic program, or QP, if its
objective function is quadratic and all its constraints are linear:
Newton’s method to find y ∈ IRn such that F(y) = 0: 1
minimize: cT x + xT Qx
x∈IR n
2
yk+1 = yk − ∇F(yk )−1 F(yk ); y0 given
subject to: Ai x ≤ bi
∆ ∆ Ae x = be
With F = ∇L and y = (x, λ),
with Q ∈ IRn×n , c ∈ IRn , Ai ∈ IRmi ×n , bi ∈ IRmi , Ae ∈ IRme ×n , be ∈ IRme
!
T
∆xk+1 ∇f (xk )

∇2xx L(xk , λk ) −∇h(xk )
=−
∇h(xk ) 0 λk+1 h(xk )
QPs are [strictly] convex programs provided that the matrix Q in the
∆ objective function is positive semi-definite [positive definite]
where ∆xk+1 = xk+1 − xk
Like LPs, powerful and reliable techniques/codes are available to solve
But, no distinction between local minima and local maxima! convex QPs, including very large-scale QPs
Search Direction: QP-based Approach Search Direction: Problems with Inequality Constraints
Consider the general NLP:
Solutions (∆xk+1 , λk+1 ) to the direction-finding system
! minimize:
n
f (x)
T x∈IR
∆xk+1 ∇f (xk )

2 k k k
∇xx L(x , λ ) −∇h(x )
=− subject to: gj (x) ≤ 0, j = 1, . . . , mi
∇h(xk ) 0 λk+1 h(xk )
hj (x) = 0, j = 1, . . . , me
exactly match stationary points to the Lagrangian of QP
The search direction ∆xk+1 at xk can be obtained from:
T 1
minimize: ∇f (xk ) ∆x + ∆xT ∇2xx L(xk , λk )∆x T 1
∆x 2 minimize: ∇f (xk ) ∆x + ∆xT ∇2xx L(xk , ν k , λk )∆x
T ∆x 2
subject to: hj (xk ) + ∇hj (xk ) ∆x = 0, j = 1, . . . , me T
subject to: gj (xk ) + ∇gj (xk ) ∆x ≤ 0, j = 1, . . . , mi
with λk+1 corresponding to the QP Lagrange multipliers k T
hj (xk ) + ∇hj (x ) ∆x = 0, j = 1, . . . , me
Solution of this linear system provides: (i) the search direction ∆

with L(x, ν, λ) = f (x) − ν T g(x) − λT h(x)
∆xk+1 at xk ; (ii) estimates λk+1 of the Lagrange multipliers Estimates λk+1 , ν k+1 of the Lagrange/KKT multipliers correspond to
the QP Lagrange/KKT multipliers
Constructing and Solving Direction-Finding Problem Constructing and Solving Direction-Finding Problem
Class Exercise: Consider the optimization problem Class Exercise: Consider the optimization problem
∆ ∆
min f (x) = 2x12 + 2x22 − 2x1 x2 − 4x1 − 6x2 min f (x) = 2x12 + 2x22 − 2x1 x2 − 4x1 − 6x2
x x
∆ ∆
s.t. g1 (x) = 3x12 − 2x2 ≤ 0 s.t. g1 (x) = 3x12 − 2x2 ≤ 0
∆ ∆
g2 (x) = x1 + 2x2 − 7 ≤ 0 g2 (x) = x1 + 2x2 − 7 ≤ 0
T T
Formulate, then solve, the direction-finding QP problem at x0 = ( 21 , 1) Formulate, then solve, the direction-finding QP problem at x0 = ( 21 , 1)
3.5 3.5
T
4x10 − 2x20 − 4 4 − 6ν10 −2

1 T 3 3
min ∆x + ∆x ∆x
∆x −2x10 + 4x20 − 6 2 −2 4 2.5 2.5
T
6x10

2 2
0 2 0
s.t. 3(x1 ) − 2x2 + ∆x ≤ 0
x2
x2
−2 1.5 1.5 ∆x
T ∆x
0 0 1 1 1
x1 + 2x2 − 7 + ∆x ≤ 0 x0 x0
2 0.5 ν10 = 0, ν20 = 0 0.5 ν10 = −1, ν20 = 0
0 0
The QP depends on the KKT multiplier ν1 associated to g1 0 0.5 1 1.5
x1
2 2.5 3 3.5 0 0.5 1 1.5
x1
2 2.5 3 3.5
Sequential Quadratic Programming Method SQP Algorithm — Minimize Problem
Follows the improving-search paradigm
◮ Choose initial guess x0 , initial multipliers λ0 and ν 0 ≥ 0, positive
Update search direction ∆xk+1 repeatedly via the solution of a QP definite matrix D0 , penalty multiplier µ > 0, and stopping tolerance
subproblem ǫ > 0; set k ← 0
Linesearch can be performed along a given direction by using a
Step 1: QP-based Search Direction
suitable merit function that measures progress — Typical choice:
◮ Compute gradients ∇f (xk ), ∇gi (xk ) and ∇hi (xk )

mi me
 ◮ Solve direction-finding QP,
∆
X X
F (x, µ) = f (x) + µ  max{0, gi (x)} + |hi (x)| T 1
min ∇f (xk ) ∆x + ∆xT Dk ∆x
j=1 j=1 ∆x 2
T
s.t. gj (xk ) + ∇gj (xk ) ∆x ≤ 0, j = 1, . . . , mi
with a suitable penalty multiplier µ > 0
k T
Possibility to construct an approximation Dk of the second-order hj (xk ) + ∇hj (x ) ∆x = 0, j = 1, . . . , me
derivatives ∇2xx L(xk , ν k , λk ) — E.g., based on a BFGS recursive
scheme to produce ∆xk+1 , λk+1 and ν k+1
◮ Positive definiteness of Dk provides robustness Step 2: Stopping
◮ Reduces computational effort ◮ If ∆xk+1 < ǫ, stop — report xk (approximate KKT point)
SQP Algorithm — Minimize Problem (cont’d)

Step 3: Linesearch
◮ Solve 1-d linesearch problem (at least approximately),
∆
minα≥0 ℓ(α) = F (xk + α∆xk+1 , µ), to compute the step αk+1
Step 4: Update
◮ Iterate: xk+1 ← xk + αk+1 ∆xk+1
ggT k T k
◮ BFGS: D k+1
← D + gT d − DdTddDk Dd ,
k
with d = xk+1 − xk , g = ∇L(xk+1 , ν k+1 , λk+1 ) − ∇L(xk , ν k+1 , λk+1 )

◮ Increment k ← k + 1 and return to step 1
SQP usually much faster and more reliable than first-order methods
◮ Analytical derivatives highly recommended for reliability
◮ Method of choice for optimization of complex, first-principle models
Available in general purpose modeling systems (GAMS, AMPL)
◮ Use within a modeling manager recommended
◮ Often need to adjust parameters for good performance (more tuning!)
Used routinely in engineering optimization products
Benoı̂t Chachuat (McMaster University) NLP: Multivariable, Constrained 4G03 29 / 29

13 NLP III Handout

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

13 NLP III Handout

Caricato da

Copyright:

Formati disponibili

Solving Multivariable, Constrained NLPs

Methods for Multivariable

Methods for Single-Var.

Optimization – Part III

Methods for Multivar.

Basic Concepts for

Basic Concepts for

Basic Concpets for

Outline Penalty Methods

Constrained Idea: Transform a constrained NLP into an unconstrained NLP

Seq. Unconstrained Seq. Linear Seq. Quadratic Generalized Reduced

Common Choices: Common Choices: Common Choices: Common Choices:

Exact vs. Inexact Penalty Functions Nonsquared penalty functions, γ = 1:

Considers a sequence of increasing penalty parameters, µ0 < µ1 < . . .

Solves each new optimization problem (µk+1 ) from the optimal

with µ = µk , to produce xk+1

In practice: Sequential Unconstrained Barrier Algorithm with µ = µk , to produce xk+1

Sequential Linear Programming Methods LP-based Search Direction: Principles

minimize: f (x̄) + ∇f (x̄)T ∆x

SLP Algorithm — Minimize Problem SLP Algorithm — Minimize Problem (cont’d)

Second-Order Methods (cont’d) Quadratic Programming

Solution of this linear system provides: (i) the search direction ∆

SQP Algorithm — Minimize Problem (cont’d)

with d = xk+1 − xk , g = ∇L(xk+1 , ν k+1 , λk+1 ) − ∇L(xk , ν k+1 , λk+1 )

Potrebbero piacerti anche