Sei sulla pagina 1di 10

B553 Lecture 7: Constrained Optimization, Lagrange Multipliers, and KKT Conditions

Kris Hauser February 2, 2012


Constraints on parameter values are an essential part of many optimization problems, and arise due to a variety of mathematical, physical, and resource limitations. In optimization, they can require signicant work to handle depending on their complexity. In general, constrained optimization algorithms are much more complex than their unconstrained counterparts. A constrained optimization is specied in a problem of the form
xRn

min f (x) (1)

such that xS

where S Rn denotes the subset of valid parameters, known as the feasible set (Figure 1). S must be a closed set to guarantee the existence of a minimum. Recall how in the univariate case of optimizing a function within some interval [a, b], we had to test the endpoints of the interval as well as the critical points in the interior (a, b) for optimality. In the multivariate constrained setting, the optimizer must not only consider the possibility that the optimum is a local minimum, but also that the optimum lies on the boundary of the feasible set (Figure 2). The challenge is now that there are an innite number of points on S . This lecture will introduce analytical techniques Lagrange multipliers for equality constraints and the Karush-Kuhn-Tucker (KKT) conditions for inequalities for identifying those critical points. Besides being analytically useful, these conditions are the starting point for most constrained 1

optimization algorithms. Note that like other critical point tests these are only rst-order conditions for optimality, and are therefore necessary but not sucient for nding minima.

Common types of constraints

Several forms of constraints arise in practice. Here are some of the most common ones (Figure 3). Bound constraints. Axis-aligned bound constraints take the form li xi ui for some lower and upper values li and ui , i = 1, . . . , n. These are some of the easiest constraints to incorporate. Linear inequalities. Linear inequality constraints take the form Ax b for some m n matrix A and a length m vector b. Note that bound constraints are a special case of a linear inequality with A= and b= u l (3) I I (2)

where u and l are the vectors fo upper and lower bounds, respectively. Linear equalities. Linear equality constraints take the form Ax = b, where A and b have m rows. Note that this is usually an underdetermined system (otherwise S would consist of either a single point or the empty set). In theory these constraints can easily be removed by nding a representation that incorporates the nullspace of A, say x = x0 + N y , and converting the optimization over x into a smaller optimization over y . However, note that most optimization routines do not operate this way because of numerical errors in computing N . Nonlinear constraints: general form. In general, constraints may be nonlinear. In this setting we can (usually) write the constraints in the following form: gi (x) = 0 for i = 1, . . . , m hj (x) 0 for j = 1, . . . , p 2 (4)

where the gi and hj are continuous, dierentiable scalar eld functions. This is the form that we will assume for the rest of this class, because all prior constraint types are special cases. Convex constraints. A convex set S satises the following property. For any two points x and y in S , the point (1 u)x + uy (5)

for u [0, 1] lies in S as well. In other words, the line segment between any two points in S must also lie in S . Later we will discuss ecient algorithms for solving problems in which the constraints gi and hj produce a convex feasible set, and also the objective function f is a convex function. In particular, we will show that descent methods converge to a global minimum. (Note that to achieve convexity, any equality constraints must be linear) Black-box constraints. Another type of constraint that is a black box that can be queried to test whether a point x lies inside it. No other mathematical property, like the magnitude of feasibility violation, derivatives, or even smoothness, is necessarily provided. These constraints typically arise as a result of rather complex procedures (e.g., simulations, geometric algorithms, etc) that do not have a convenient mathematical representation. These constraints are rarely considered in the numerical optimization literature, but often come up in large practical systems.

First-Order Conditions of Local Optimality

We say that a feasible point x is a local minimum of the optimization problem (1) if f (x) is lower than any other feasible point in some neighborhood of S . That is, x is a local minimum if x S and there exists a neighborhood of radius so that f (x) < f (y ) for any y in {y S |0 < d(x, y ) < }. Unfortunately not all local minima are critical points of f , because we have to take into account how the constraints aect the neighborhood! We will show that there are alternative criteria that we can use to generate candidates for local minima.

2.1

Lagrange Multipliers

Let us suppose for the moment that there are no inequality constraints, and instead that we are addressing the general equality-constrained problem
xRn

min f (x) (6)

such that gi (x) = 0 for i = 1, . . . , m . We will assume that both f and g are dierentiable.

With One Constraint. First let us consider the m = 1 case. The principle of Lagrange multipliers states that any local minima or maxima x of (6) must simultaneously satisfy the following equations: f (x)+g1 (x) = 0 g1 (x) = 0 (7)

for some value of . The variable is known as the Lagrange multiplier. These equations are saying that at x, the gradient direction of f is a multiple of the gradient direction of g , which is to say that they are parallel (Figure 4). You might visualize this as follows. Imagine yourself standing at x, which satises g1 (x) = 0. Any direction v that you can move in to instantaneously change the value of f will have a non-zero dot-product with f due to the properties of the directional derivative. The constraint g1 , however, will stop you from moving in any direction unless it maintains g1 (x) = 0. This is equivalent to saying that the dot-product of v with g1 (x) must be zero. If g1 (x) is not a multiple of f (x), then you can slide along level set g1 (x) = 0 along a direction v that has a nonzero dot-product with f (x) (Figure 5). In other words, x is not a minimum. On the other hand, if g1 (x) is a multiple of f (x), then there is no such direction to move in, because any valid sliding direction will not change the value of f . In other words, the constraint g1 cancels out any kind of change that you could make in the value of f . It is important to note that there may be multiple points x that satisfy (7), each of which has dierent Lagrange multipliers . With Many Constraints. The following condition generalizes Lagrange 4

multipliers to multiple constraints: f (x) + 1 g1 (x) + + m gm (x) = 0 g1 (x) = 0 . . . g m ( x) = 0

(8)

where 1 , . . . , m are the Lagrange multipliers. This equation is saying that at x, f (x) Span({g1 (x), . . . , gm (x)}). The reason why this makes sense is that each of the constraints resists motion in the direction of its gradient. If f lies in this span, then a motion in any direction that locally changes f will be completely nullied by the constraints. All local minima must satisfy (8). Conversely, if the two equations of (8) are satised then x must be a local minimum, maximum, or a sort of saddle point restricted to S . So this is a necessary, but not sucient, condition for optimality. Example. Suppose we wanted to nd the closest points (x1 , y1 ) and (x2 , y2 ) on two unit circles, one centered at the origin and the other centered at (cx , cy ). The optimization variable is x = (x1 , y1 , x2 , y2 ) and the constrained minimization problem is min f (x) = (x1 x2 )2 + (y1 y2 )2 such that 2 2 g1 (x) = x1 + y1 1=0 2 2 g2 (x) = (x2 cx ) + (y2 cy ) 1 = 0

(9)

The method of Lagrange multipliers states that we need to nd a variable x that satises the constraints, and multipliers 1 and 2 to satisfy: f (x) + 1 g1 (x) + 2 g2 (x) = 0. We can compute the following gradients 2(x1 x2 ) 2(y1 y2 ) f (x) = 2(x1 x2 ) 2(y1 y2 ) 5 (10)

, (11)

2x1 2y1 g1 (x) = 0 , 0 0 0 g2 (x) = 2(x2 xc ) 2(y2 yc )

(12)

. (13)

Putting these together, we have the two simultaneous sets of equations x1 x2 + 1 x1 = 0 y 1 y 2 + 1 y 1 = 0 and x1 x2 2 (x2 xc ) = 0 y1 y2 2 (y2 yc ) = 0. (15) (14)

In other words, the vectors (x1 x2 , y1 y2 ), (x1 , y1 ), and (x2 xc , y2 yc ) must all be parallel. With some rearrangement, it also means that (x1 , y1 ) and (x2 , y2 ) must be parallel to (xc , yc ). Verify geometrically that all points on the circles that intersect the line through (xc , yc ) are either local minima, local maxima, or saddle points of the squared distance function. Interpreting Lagrange Multipliers. In some applications like physics and economics, Lagrange multipliers have a meaningful interpretation. Consider the m = 1 case. Lets interpret the constraint as stating g1 (x) = c with c = 0. The Lagrange multiplier at a (global) minimum x states how fast the minimum value f would change if I were to relax the constraint by raising c at a constant rate. This amount would be (Figure 6). For example, in constrained physical simulation the Lagrange multipliers produce the forces required to maintain each constraint. Using Lagrange Multipliers in numerical optimization. If we dene the following Lagrangian function on n + m variables
m

L(x, 1 , . . . , m ) = f (x) +
i=1

i gi (x).

(16)

then the constraint optimization problem can be cast as one of nding the critical points of L in Rn+m . More compactly, if we let = (1 , . . . , m ), note that we would like to nd a point (x, ) such that f ( x) + m g ( x ) i i i=1 g1 (x) x L(x, ) L(x, ) = = (17) . . L(x, ) . gm (x) equals zero. The importance of this is that we have converted a constrained optimization into an unconstrained root-nding problem! There do exist Newton-like techniques for solving multivariate root-nding problems. If f and the gi s are twice dierentiable, we can use the iterative method: (xt+1 , t+1 ) = (xt , t ) 2 L(xt , t )1 L(x, ). (18) The Hessian of the Lagrangian is given by the following matrix 2 gm (x) 2 f (x) + m i=1 i gi (x) g1 (x) T g ( x ) 1 2 L(x, ) = . . 0 . T gm (x)

. (19)

2.2

Karush-Kuhn-Tucker Conditions

The KKT conditions extend the ideas of Lagrange multipliers to handle inequality constraints in addition to equality constraints. These conditions provide a rst-order optimality condition for the problem:
xRn

min f (x) (20)

such that gi (x) = 0 for i = 1, . . . , m hj (x) 0 for j = 1, . . . , p where f and all the gi s and hj s are dierentiable.

With one inequality. Let us start by assuming m = 0 and p = 1. The peculiar thing about inequalities is that they operate in essentially two regimes depending on whether they aect a critical point or not (Figure 7). If x is a 7

local minimum of f (x) such that h1 (x) < 0, then the constraint is satised for a neighborhood around x, and x is a local minimum of the constrained problem. On the other hand, there could be local minima at the boundary of the feasible set S , which consists of those points that satisfy h1 (x) = 0. To nd these critical points, we can treat h1 like an equality constraint and use the method of Lagrange multipliers. So, we must be aware of the following two cases: 1. f (x) = 0 and h1 (x) < 0. 2. h1 (x) = 0 and there exists a Lagrange multiplier such that f (x) + h1 (x) = 0. A compact way of writing these two conditions, which will be very useful in a moment, is through the following set of equalities and inequalities: f (x) + h1 (x) = 0 h1 (x) 0 h1 (x) = 0

(21)

in which the term h1 (x) = 0 is known as the complementarity condition that enforces either to be zero or h1 (x) to be zero, If we are only interested in nding local minima, we can also include the constraint 0. With many inequalities. To generalize this argument to p > 1, consider that each of the two cases outlined above can hold for each of the inequalities. So, we may potentially need to enumerate all partitions of inequalities into those that are strictly satised and those that are met with equality, and nd critical points for each subset. But there are 2n possible subsets (Figure 8)! To express this condition, we can use the compact form as follows. f (x) + 1 h1 (x) + + p hp (x) = 0 hj (x) 0 for j = 1, . . . , p j hj (x) = 0 for j = 1, . . . , p

(22)

where 1 , . . . , p are the KKT multipliers. For those critical points with hj (x) = 0, we say the inequality is active at x. If hj (x) < 0, then we say the inequality is inactive. Some of the rst 8

numerical methods that we present in this class perform combinatorial search through the possible subsets of active constraints. General form. Equalities can be incorporated in a straightforward manner into the above equation, giving us the full set of KKT conditions.
m p

f ( x) +
i=1

i gi (x) +
j =1

j hj (x) = 0 gi (x) = 0 for i = 1, . . . , m hj (x) 0 for j = 1, . . . , p j hj (x) = 0 for j = 1, . . . , p (23)

where 1 , . . . , m are the Lagrange multipliers and 1 , . . . , p are the KKT multipliers. Note that the complementarity condition only needs to be satised on the inequalities. Use of KKT conditions in analytical optimization. The KKT conditions can be used to analytically prove that a point is an optimum of a constrained problem. One drawback is that there are a combinatorial number of subsets of active inequalities, and in the absence of further information all of these subsets must be considered as candidates for generating the optimal critical point! Use of KKT conditions in numerical optimization. Unfortunately, we are not able to use the KKT conditions to formulate an unconstrained root-nding problem like we did in the case of Lagrange multipliers. The reason is because the inequality constraints hj (x) 0 must be preserved, and there is no natural way to handle them in the root nding methods we have observed so far. Instead, in most optimization software the KKT conditions are usually used as a rst stage of vering that a candidate point found by some algorithm is truly a critical point.

Exercises
1. The entropy of a discrete probability distribution (p1 , . . . , pn ) over n values is given by E (p1 , . . . , pn ) = n i=1 pi ln pi . Of course, probabilities must sum to 1. Find the probability distribution that maximizes entropy using Lagrange multipliers. 9

2. Find a simple way to compute the solution to the n-dimensional constrained optimization min ||x c||2 such that l x u, where l and u are bound constraints. 3. Write the KKT conditions for nding the closest point in a 2D triangle with vertices a, b, c (boundary inclusive) to the origin. Assume a, b, c are given in counterclockwise order. What is the signicance of the KKT multipliers? What does it mean if none of them are nonzero? One? Two? More?

10

Potrebbero piacerti anche