Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
A. Banerji
July 26, 2015
Chapter 1
Introduction
1.1
Some Examples
We briefly introduce our framework for optimization, and then discuss some
preliminary concepts and results that well need to analyze specific problems.
Our optimization examples can all be couched in the following general
framework:
Suppose V is a vector space and S V . Suppose F : V <. We wish to
find x S s.t. F (x ) F (x), x S, or x S s.t. F (x ) F (x), x S.
x , x are respectively called a maximum and a minimum of F on S.
In different applications, V can be finite- or infinite-dimensional. The
latter need more sophisticated optimization tools such as optimal control; we
will keep that sort of stuff in abeyance for now. In our applications, F will
be continuous, and pretty much also differentiable; often twice continuously
differentiable. S will be specified most often using constraints.
Example 1 Let U : <k+ < be a utility function, p1 , ..., pk , I be positive
P
prices and wealth. Maximize U s.t. xi 0, i = 1, ..., k, and ki=1 pi xi
p.x I.
Here, the objective function is U , and
1
00
RT
0
et u(c(t))dt s.t.
1.1.1
F () = {(x1 , x2 )|x1 , x2 0, x1 + x2 }
the feasible set or set of feasible allocations.
An allocation (y1 , y2 ) Pareto dominates (x1 , x2 ) if
ui (yi ) ui (xi )i = 1, 2, with >0 for some i
An allocation (x1 , x2 ) is Pareto optimal if there is no feasible allocation
that Pareto dominates it.
Let a (0, 1) and consider the social welfare function U (x1 , x2 , a)
au1 (x1 ) + (1 a)u2 (x2 ). Then if (z1 , z2 ) is any allocation that solves
it is a Pareto optimal allocation. For, if (z1 , z2 ) is in this set of solutions but is not Pareto optimal, then there is a feasible allocation (y1 , y2 ) s.t.
u1 (y1 ) u1 (z1 ), u2 (y2 ) u2 (z2 ), with >0 for at least one of these. Multiplying the first inequality by a, the second by (1 a) and adding, we get
U (y1 , y2 , a) > U (z1 , z2, a), contradicting that (z1 , z2 ) is a maximizer.
If we assume that the utility functions ui (xi ) are concave,then the converse
holds: every Pareto optimal allocation is a solution to
1.2
We will now discuss some concepts that we will need, such as the compactness
of the set S above, and the continuity and differentiability of the objective
function F . We will work in normed linear spaces. In the absence of any
other specification, the space we will be in is <n with the Euclidean norm
P
1/2
||x|| = ( ni=1 x2i ) . (Theres a bunch of other norms that would work
equally well. Recall that a norm in <n is defined to be a function assigning
to each vector x a non-negative real number ||x||, s.t. (i) for all x, ||x|| 0
with =0 iff x = 0 (00 being the zero vector); (ii) If c <, ||cx|| = |c|||x||.
(iii) ||x + y|| ||x|| + ||y||. The last requirement, the triangle inequality,
follows for the Euclidean norm from the Cauchy-Schwarz inequality).
One example in the previous section used another normed linear space,
namely the space of bounded continuous functions defined on an interval
of real numbers, with the sup norm. But in further work in this part of
6
the course, we will stick to using finite dimensional spaces. Some of the
concepts below apply to both finite and infinite dimensional spaces, so we
will sometimes call the underlying space V . But mostly, it will help to think
of V as simply <n , and to visualize stuff in <2 .
Pn
2 1/2
.
We will measure distance between vectors using ||xy|| =
i=1 (xi yi )
This is our intuitive notion of distance using Pythagoras theorem. Furthermore, it satisfies the three properties of a metric, viz., (i) ||x y|| 0, with
= iff x = y; (ii) ||x y|| = ||y x||; (iii) ||x z|| ||x y|| + ||y z||.
Note that property (iii) for the metric follows from that for the triangle
inequality for the norm, since ||xz|| = ||(xy)+(yz)|| ||xy||+||yz||.
Open and Closed Sets
Let > 0 and x V . The open ball centered at x with radius is defined
as
B(x, ) = {y : ||x y|| < }
We see that if V = <, B(x, ) is the open interval (x , x + ). If
V = <2 , it is an open disk centered at x. The boundary of the disk is traced
by Pythagoras theorem.
Exercise 1 Show that ||x y|| defined by max{|x1 y1 |, . . . , |xn yn |}, for
all x, y <n is a metric (i.e. satisfies the three requirements of a metric). In
the space <2 , sketch B(0, 1), the open ball centered at 0, the origin, of radius
1, in this metric.
Let S V . x is an interior point of S if B(x, ) S, for some > 0. S
is an open set if all points of S are interior points. On the other hand, S is
a closed set iff S c is an open set.
Example. Open in < vs. open in <2 .
There is an alternative, equivalent, convenient way to define closed sets.
x is an adherent point of S, or adheres to S, if every B(x, ) contains a point
Theorem 2 (xk ) x in <n iff for every i {1, . . . , n}, the coordinate
sequence (xki ) xi .
Proof. Since
(xki xi )2
n
X
(xkj xj )2 ,
j=1
taking square roots implies |xki xi | ||xk x||, so for every k N s.t.
||xk x|| < , |xki xi | < .
Conversely, if all the coordinate sequences converge to the coordinates
of the point x, then there exists a positive integer N s.t. k N implies
|xki xi | < / n, for every coordinate i. Squaring, adding across all i and
taking square roots, we have ||xk x|| < .
Several convergence results that appear to be true are in fact so. For
instance, (xk ) x, (y k ) y implies (xk + y k ) (x + y). Indeed, there
exists N s.t. k N implies ||xk x|| < /2 and ||y k y|| < /2. So
||(xk + y k ) (x + y)|| = ||(xk x) + (y k y)|| ||xk x|| + ||y k y|| (by the
triangle inequality), and this is less than /2 + /2 = .
Exercise 3 Let (ak ) and (bk ) be sequences of real numbers that converge to a
and b respectively. Then the product sequence (ak bk ) converges to the product
ab.
Closed sets can be characterized in terms of convergent sequences as follows.
Lemma 2 A set S is closed if and only if for every sequence (xk ) lying in
S, xk x implies x S.
Proof. Suppose S is closed. Take any sequence (xk ) that converges to a
point x. Then for every B(x, ), we can find a member xk of the sequence
lying in this open ball. So, x adheres to S. Since S is closed, it must contain
this adherent point x.
9
Conversely, suppose the set S has the property that whenever (xk ) S
converges to x, x S. Take a point y that adheres to S. Take the successively
smaller open balls B(y, 1/k), k = 1, 2, 3, .... We can find, in each such open
ball, a point y k from the set S (since y adheres to S). These points need not
be all distinct, but since the open balls have radii converging to 0, y k y.
Thus by the convergence property of S, y S. So, any adherent point y of
.
S actually belongs to S.
Related Results
1. If (ak ) is a sequence of real numbers all greater than or equal to 0,
and ak a, then a 0. The reason is that for all k, ak [0, ) which is a
closed set and hence must contain the limit a.
2. Sup and Inf.
Let S <. u is an upper bound of S if u a, for every a S. s is the
supremum or least upper bound of S (called sup S), if s is an upper bound
of S, and s u, for every upper bound u of S.
We say that a set S of real numbers is bounded above if there exists an
upper bound, i.e. a real number M s.t. a M, a S. The most important
property of a supremum, which well by and large take here as given, is the
following:
Completeness Property of Real Numbers: Every set S of real numbers that is bounded above has a supremum.
For a short discussion of this property, see the Appendix.
Note that sup S may or may not belong to S.
Examples. S = (0, 1), D = [0, 1], K = set of all numbers in the sequence
1
, n = 1, 2, 3, .... The supremum of all these sets is 1, and this does not
1 2n
belong to S or to K.
When sup S belongs to S, it is called the maximum of S, for obvious
reasons. Another important property of suprema is the following.
Lemma 3 For every > 0, there exists a number a S s.t. a > sup S .
Note that this means that sup S is an adherent point of S.
10
Proof. Suppose that for some > 0, there is no number a S s.t. a >
sup S . So, every a S must then satisfy a sup S . But then,
sup S is an upper bound of S that is less than sup S. This implies that
sup S is not in fact the supremum of S. Contradiction.
.
Digression.
The next few results, upto and including Cantors intersection theorem,
explore another bunch of ideas that begins with the idea of a supremum.
This digression is meant for those interested. Otherwise, please skip to the
discussion about Compact Sets.
Theorem 5 Every bounded and increasing sequence of real numbers converges (to its supremum).
Proof. Let a sup(xn ). Take any > 0. By the above discussion, there
exists some xN (a , a]. And since (xn ) is an increasing sequence, we
have that for all k N , xk (a , a]. So (xn ) a.
A similar conclusion holds for decreasing bounded sequences. And:
Theorem 6 Every sequence of real numbers has a monotone subsequence.
Proof. For the bounded sequence (xk ), let An = {xk |k n}, n = 1, 2, ....
If any one of these sets An does not have a maximum, we can pull out an
increasing sequence. For instance, suppose A1 does not have a max. Then
let xk1 = x1 . Let xk2 be the first member of the sequence (xk ) that is greater
than x1 , and so on.
On the other hand, if all An have maxes, then we can pull out a decreasing
subsequence. Let xk1 = max{A1 }, xk2 = max{Ak1 + 1}, xk3 = max{Ak2 + 1}
and so on.
It follows from the above two theorems, that we have
Theorem 7 Bolzano-Weierstrass Theorem.
Every bounded sequence of real numbers has a convergent subsequence.
Finally, as an application to the ideas of monotone sequences, we have
Theorem 8 Cantors Nested Intervals theorem.
If [a1 , b1 ] [a2 , b2 ] . . . is a nested sequence of closed intervals, then
Compact Sets.
Suppose (xn ) is a sequence in V . (Note the change in notation, from
superscript to subscript. This is just by the way; most places have this
subscript notation, but Rangarajan Sundaram at times has the superscript
notation in order to leave subscripts to denote co-ordinates of a vector).
Let m(k) be an increasing function from the natural numbers to the
natural numbers. So, l > n implies m(l) > m(n). A subsequence (xm(k) ) of
(xn ) is an infinite sequence whose k th member is the m(k)th member of the
original sequence.
Give an example. The idea is that to get a subsequence from (xn ), you
strike out some members, keeping the remaining members positions the
same.
Fact. If a sequence (xn ) converges to x, then all its subsequences converge
to x.
Proof. Take an arbitrary > 0. So, there exists N s.t. n N implies
||xn x|| < . This implies, for any subsequence (xm(k) ), that k N implies
.
||xm(k) x|| < .
However, if a sequence does not converge anywhere, it can still have (lots
of) subsequences that converge. For example, let (xn ) ((1)n ), n = 1, 2, ....
Then, (xn ) does not converge; but the subsequences (ym ) = 1, 1, 1, ....
and (zm ) = 1, 1, 1, ... both converge, to different limits. (Such points are
called limit points of the mother sequence (xn )).
Compact sets have a property related to this fact.
Definition 2 A set S V is compact if every sequence (xn ) in S has a
subsequence that converges to a point in S.
13
16
17
Chapter 2
Existence of Optima
2.1
Weierstrass Theorem
19
Chapter 3
Unconstrained Optima
3.1
Preliminaries
yx
f (y) f (x)
a
yx
=0
(1)
By limit equal to 0 as y x, we require that the limit be 0 w.r.t. all
sequences (yn ) s.t. yn x. a turns out to be the unique number equal to
the slope of the tangent to the graph of f at the point x. We denote a by
0
the notation f (x). We can rewrite Equation (1) as follows:
lim
yx
=0
(2)
Note that this means the numerator tends to zero faster than does the
denominator.
We can use this way of defining differentiability for more general functions.
20
yx
=0
In the one variable case, the existence of a gives the existence of a tangent;
in the more general case, the existence of the matrix A gives the existence of
tangents to the graphs of the m component functions f = (f1 , ..., fm ), each
of those functions being from <n <. In other words this definition has
to do with the best linear affine approximation to f at the point x. To see
this in a way equivalent to the above definition, put h = y x in the above
definition, so y = x + h. Then in the 1-variable case, from the numerator,
0
f (x + h) is approximated by the affine function f (x) + ah = f (x) + f (x)h. In
the general case, f (x + h) is approximated by the affine function f (x) + Ah.
It can be shown that (w.r.t. the standard bases in <n and <m ), the matrix
A equals Df (x), the m n matrix of partial derivatives of f evaluated at the
point x. To see this, take the slightly less general case of a function f : <n
<. If f is differentiable at x, there exists a 1 n matrix A = (a11 , . . . , a1n )
satisfying the definition above: i.e.
||f (x + h) f (x) Ah||
=0
h
||h||
lim
In particular, the above must hold if we choose h = (0, .., 0, t, 0, .., 0) with
hj = t 0. That is,
||f (x1 , .., xj + t, .., xn ) f (x1 , .., xj , .., xn ) a1j t||
=0
t0
t
lim
But from the limit on the LHS, we know that a1j must equal the partial
derivative f (x)/x1 .
Df (x) as the derivative of f at x; Df itself is a function from <n to <m .
21
f1 (x)/x1 . . . f1 (x)/xn
Df (x) =
...
...
...
fm (x)/x1 . . . fm (x)/xn
Here,
fi (x)
fi (x1 , .., xj + t, ..., xn ) fi (x1 , .., xj , ..., xn )
= lim
t0
xj
t
We want to also represent the partial derivative in different notation: Let
ej = (0, .., 0, 1, 0, ..., 0) be the unit vector in <n on the j th axis. Then,
fi (x + tej ) fi (x)
fi (x)
= lim
t0
xj
t
That is, the partial of fi w.r.t. xj , evaluated at the point x, is looking at
essentially a function of 1-variable: we take the (n 1) dimensional surface
of the function fi , and slice it parallel to the j th axis, s.t. point x is contained
on this slice/plane; well get a function pasted on this plane; its derivative
is the relevant partial derivative.
To be more precise about this one-variable function pasted on the slice/plane,
note that the single variable t < is first mapped to a vector x+tej <n , and
then that vector is mapped to a real number fi (x + tej ). So, let : < <n
be defined by (t) = x + tej , for all t <. Then the one-variable function
were looking for is g : < < defined by g(t) = f ((t)), for all t <; its
the composition of f and .
In addition to slicing the surface of functions that map from <n to <
in the directions of the axes, we can slice them in any direction and get
a function pasted on the slicing plane. This is the notion of a directional
derivative.
Recall that if x <n , and h <n , then the set of all points that can
be written as x + th, for some t <, comprises the line through x in the
22
direction of h.
See figure (drawn in class).
Definition 6 The directional derivative of a function f : <n < at x <n ,
in the direction h <n , denoted Df (x; h), is
f (x + th) f (x)
t0+
t
lim
3.2
Interior Optima
zk x
y k x
Taking limits preserves these inequalities since (, 0] and [0, ) are
closed sets and the ratio sequences lie in these closed sets. So,
0
f (x ) 0 f (x )
0
so f (x ) = 0.
Step 2. Suppose n > 1. Take any j t h axis direction, and let g : < <
be defined by g(t) = f (x + tej ). Note that g(0) = f (x ). Now, since x is a
local max of f , f (x ) f (x + tej ), for t smaller than some cutoff value: i.e.,
g(0) g(t) for t smaller than this cutoff value, i.e., g(0) is a local interior
maximum. (Since t < 0 and t > 0 are both allowed). g is differentiable
at 0 since g(0) = f ((0)) = f (x ), and f is differentiable at x and is
differentiable at t = 0. (Here, (t) = x + tej , so D(t) = ej , t). So, g is
0
differentiable at 0, g (0) = 0, and by the Chain Rule,
0
f (x )
xj
.
Note that this is necessary but not sufficient for a local max or min, e.g.
f (x) = x3 has a vanishing first derivative at x = 0, which is not a local
optimum.
Second Order Conditions
24
25
Chapter 4
Optimization with Equality
Constraints
4.1
Introduction
The following example illustrates the principle of no arbitrage underlying a maximum. A more general illustration, with more than 1 constraint,
requires a little bit of the machinery of linear inequalities, which well not
cover. The idea here is that the Lagrange multiplier captures how the constraint is distributed across the variables.
Example 1. Suppose x solves M ax U (x1 , x2 ) s.t. I p1 x1 p2 x2 = 0
and suppose x >> .
Then reallocating a small amount of income from one good to the other
does not increase utility. Say income dI > 0 is shifted from good 1 to good 2.
So dx1 = (dI/p1 ) > 0 and dx2 = (dI/p1 ) < 0. Note that this reallocation
satisfies the budget constraint, since
p1 (x1 + dx1 ) + p2 (x2 + dx2 ) = I
The change in utility is dU = U1 dx1 + U2 dx2
= [(U1 /p1 )(U2 /p2 )]dI 0, since the change in utility cannot be positive
at a maximum. Therefore,
(U1 /p1 ) (U2 /p2 ) 0
(1)
Similarly, dI > 0 shifted from good 1 to good 2 does not increase utility,
so that
[(U1 /p1 ) + (U2 /p2 )]dI 0, or
(U1 /p1 ) + (U2 /p2 ) 0
(2)
4.2
..
.
Dg(x) =
.
.
.
=
Dgk (x)
So Dg(x) is a k n matrix.
The theorem below provides a necessary condition for a local max or
local min. Note that x is a local max (resp. min) of f on the constraint set
{x Rn |gi (x) = 0, i = 1, . . . , k} if f (x ) f (x) (resp. f (x)) for all x U
for some open set U containing x , s.t. gi (x) = 0, i = 1, . . . , k}. Thus x is
a Max on the set S = U {x Rn |gi (x) = 0, i = 1, . . . , k}.
condition; so there could be points x0 that meet the condition and yet are
not even local max or min.
Example. Max f (x, y) = x3 + y 3 , s.t. g(x, y) = x y = 0. Here the
contour set Cg (0) is the 45-degree line on the x y plane. By taking larger
and larger positive values of x and y on this contour set, we get higher and
higher f (x, y). So f does not have a global max on the constraint set. But
if we mechanically crank out the Lagrangean FONCs as follows
Max x3 + y 3 + (x y)
FONC: 3x2 + = 0
3y 2 + = 0
x y = 0. So x = y = = 0 is a solution. But (x , y ) = (0, 0)
is neither a local max nor a local min. Indeed, f (0, 0) = 0, whereas for
(x, y) = (, ), > 0, f (, ) = 23 > 0, and for (x, y) = (, ), < 0,
f (, ) = 23 < 0.
Pathology 2. The CQ is violated at the optimum.
In this case, the FONCs need not be satisfied at the global optimum.
Example. Max f (x, y) = y s.t. g(x, y) = y 3 x2 = 0.
Let us first find the solution using native intelligence. Then well show
that the CQ fails at the optimum, and that the usual Lagrangean method
is a disaster. Finally, well show that the general form of the equation the
Theorem of Lagrange, that does NOT assume that the CQ holds at the
optimum, works.
The constraint is y 3 = x2 , and since x2 is nonnegative, so must y 3 be.
Therefore, y 0. The maximum of y s.t. y 0 implies y = 0 at the max.
So y 3 = x2 = 0, so x = 0. So f attains global max at (x, y) = (0, 0).
Dg(x, y) = (2x, 3y 2 ) = (0, 0) at (x, y) = (0, 0). So rank(Dg(x, y)) =
0 < k = 1 at the optimum; the CQ fails at this point. Using the Lagrangean
method, we get the following FONC:
(f /x) + (g/x) = 0, that is 2x = 0
(1)
2
(f /y) + (g/y) = 0, that is 1 + 3y = 0
(2)
(L/) = 0, that is x2 + y 3 = 0
(3)
Eq.(1) implies either = 0 or x = 0. x = 0 implies, from Eq.(3), that
31
.
Dx(t) is in the direction of the tangent to the curve x(t), so the equation
above implies that Dgi (x(t)) is orthogonal to it. (Seen as a vector rather
than a matrix, we write this as the gradient gi (x(t))). (As an application,
notice how this geometry implies the first order condition MRSxy = px /py in
a two-good utility maximization in which both goods are consumed at the U
max).
In the second-order conditions, we check the definiteness or semi-definiteness
of the second-derivative or Hessian D2 L(x , ) w.r.t. all vectors x that are
orthogonal to the gradient of each constraint. This approximates vectors
close to x that satisfy each gi (x) = 0.
P
Since L(x, ) = f (x) + ki=1 i gi (x),
P
D2 L(x, )nn = D2 f (x)nn + ki=1 i D2 gi (x)nn ,
f11 (x) + ki=1 i gi11 (x) . . . f1n (x) + ki=1 i gi1n (x)
..
..
...
So D2 L(x, )nn =
.
.
Pk
Pk
fn1 (x) + i=1 i gin1 (x) . . . fnn (x) + i=1 i ginn (x)
is the second derivative of L w.r.t. the x variables. Note that D2 L(x, )
is symmetric, so we may work with its quadratic form.
Dg1 (x )
..
Dgk (x )
So the set of all vectors x that are orthogonal to all the gradient vectors
of the constraint functions at x is the Null Space of Dg(x ), N (Dg(x )) =
{x Rn |Dg(x )x = k1 }.
Theorem 17 Suppose there exists (xn1 , k1 ) such that Rank(Dg(x )) = k
P
and Df (x ) + ki=1 Dgi (x ) = .
33
BH(L ) =
[Dg(x )]Tnk D2 L(x , )nn (n+k)(n+k)
BH (L ; k + n r) is the matrix obtained by deleting the last r rows and
columns of BH (L ).
BH (L ) will denote a variant in which the permutation has been
applied to (i) both rows and columns of D2 L(x , ) and (ii) only the columns
of Dg(x ) and only the rows of [Dg(x )]T , which is the transpose of Dg(x ).
Theorem 18 (1a) xT D2 L(x , )x 0,for all x N (Dg(x )), iff for all
permutations of {1, . . . , n}, we have:
(1)nr det(BH (L ; n + k r)) 0, r = 0, 1, . . . , k 1.
(1b) xT D2 L(x , )x 0,for all x N (Dg(x )), iff for all permutations
of {1, . . . , n}, we have:
(1)k det(BH (L ; k + n r)) 0, r = 0, 1, . . . , k 1.
(2a). xT D2 L(x , )x < 0,for all nonzero x N (Dg(x )), iff (1)nr det(BH(L ; n+
k r)) > 0, r = 0, 1, . . . , k 1.
(2b)xT D2 L(x , )x > 0,for all nonzero x N (Dg(x )), iff (1)k det(BH(L ; n+
k r)) > 0, r = 0, 1, . . . , k 1.
34
Note. (1) For the negative definite or semidefiniteness subject to constraints cases, the determinant of bordered Hessian with last r rows and
columns deleted must be of the same sign as (1)nr . The sign of (1)nr
switches with each successive increase in r from r = 0 to r = k 1. So the
corresponding bordered Hessians switch signs. In the usual textbook case of
2 variables and one constraint, k = 1, k 1 = 0, so we just need to check
the sign for r = 0, that is, the sign of the determinant of the big bordered
Hessian. You should be clear about what this sign should be if it is to be a
sufficient condition for a strict local max or min. For the necessary condition,
we need to check signs or 0, for one permuted matrix as well, in this
case. What is this permuted matrix?
(2) As in the unconstrained case, the sufficiency conditions do not require
checking weak inequalities for permuted matrices.
(3) In the p.s.d. and p.d. cases, the signs of the principal minors must
be all positive, if the number k of constraints is even, and all negative, if k
is odd.
(4) If we know that a global max or min exists, where the CQ is satisfied,
and we get a unique solution x Rn that solves the FONC, then we may
use a second order condition to check whether it is a max or a min. However,
weak inequalities demonstrating n.s.d. or p.s.d. (subject to constraints) of
D2 (L ) do not imply a max or min; these are necessary conditions. Strict
inequalities are useful; they imply (strict) max or min. If however, a global
max or min exists, the CQ is satisfied everywhere, and there is more than
one solution of the FONC, then the one giving the highest value of f (x) is
the max. In this case, we dont need second order conditions to conclude
that it is the global max.
VII.4. Two Examples
Example 1.
A consumer with income I > 0 faces prices p1 > 0, p2 > 0, and wishes
to maximize U (x1 , x2 ) = x1 x2 . So the problem is: Max x1 x2 s.t. x1 0,
x2 0, and p1 x1 + p2 x2 I.
To be able to use the Theorem of Lagrange, we need equality constraints.
35
Now, it is easy to see that if (x1 , x2 ) solves the above problem, then (i)
(x1 , x2 ) > (0, 0). If xi = 0 for some i, then utility equals zero; clearly, we can
do better by allocating some income to the purchase of each good; and (ii)
the budget constraint binds at (x1 , x2 ). For if p1 x1 + p2 x2 < I, then we can
allocate some of the remaining income to both goods, and increase utility
further.
We conclude from this that a solution (x1 , x2 ) will also be a solution to
the problem
Max x1 x2 s.t. x1 > 0, x2 > 0, and p1 x1 + p2 x2 = I.
2
{(x1 , x2 )|I
That is, Maximize U (x1 , x2 ) = x1 x2 over the set S = R++
p1 x1 p2 x2 = 0}. Since the budget set in this problem is compact and the
utility function is continuous, U attains a maximum on the budget set (by
Weierstrass Theorem). Moreover, we argued above that at such a maximum
x , xi > 0, i = 1, 2 and the budget constraint binds. So, x S.
Furthermore, Dg(x) = (p1 , p2 ), so Rank(Dg(x)) = 1, at all points in
the budget set. So the CQ is met. Therefore, the global max will be among
the critical points of L(x1 , x2 , ) = x1 x2 + (I p1 x1 p2 x2 ).
FONC:
(L/x1 ) = x2 p1 = 0
(L/x2 ) = x1 p2 = 0
(L/) = I p1 x1 p2 x2 = 0
(1)
(2)
(3)
D2
L(x , ) = D2 U (x) + D
g(x )
U11 (x ) U12 (x )
g11 (x ) g12 (x )
=
+
U
(x
)
U
(x
)
21
22
g21 (x) g22 (x )
0 0
0 1
0 1
=
=
+
1 0
1 0
0 0
Now evaluate the quadratic form z T D2 L(x , )z = 2z1 z2 at any (z1 , z2 )
that is orthogonal to Dg(x ) = (p1 , p2 ). So, p1 z1 p2 z2 = 0 or
z1 = (p2 /p1 )z2 . For such (z1 , z2 ), z T D2 L(x , )z = (2p2 /p1 )z22 < 0, so
D2 L(x , ) is negative definite relative to vectors orthogonal to the gradient
of the constraint, and x is therefore a strict local max.
Youve probably seen the computation below. I provide it here anyway,
even though it is unnecessary, and weve done the second-order exercise above
using the quadratic form.
37
0
p
p
1
2
0
Dg(x )
BH(L ) =
= p1
0
1
T
2
[Dg(x )]
D L(x , )
p2
1
0
n
det(BH(L )) = 2p1 p2 > 0. This is the sign of (1) = (1)2 . Therefore,
there is a strict local max at x .
Example 2.
Find global maxima and minima of f (x, y) = x2 y 2 on the unit circle
in <2 , i.e., on the set {(x, y) <2 |g(x, y) 1 x2 y 2 = 0}.
Constrained maxima and minima exist, by Weirerstrass theorem, as f
is continuous and the unit circle is closed and bounded. Bounded, as it is
entirely contained in, say, B(0, 2). Closed as well, visually, we can see that
the constraint set contains all its adherent points. More formally, suppose
(xk , yk )
k=1 is a sequence of points on the unit circle converging to the limit
(x, y). Since g is continuous, and (xk , yk ) (x, y), we have g(xk , yk )
g(x, y). Since g(xk , yk ) = 0, k, their limit is 0, i.e. g(x, y) = 0 or (x, y) is on
the unit circle, and so the unit circle is closed.
Constraint Qualification: Dg(x, y) = (2x 2y). The rank of this row
matrix is zero only at (x, y) = (0, 0). But the origin does not satisfy the
constraint. Everywhere on the constraint, at least one of x or y is not zero,
and the rank of Dg(x, y) is 1.
So, the max and min will be solutions to the FOCs of the usual Lagrangean.
L(x, y, ) = x2 y 2 + (1 x2 y 2 )
FOC.
2x 2x = 0
(1)
2y 2y = 0
(2)
2
2
x +y =1
(3)
(1) and (2) imply 2x(1 ) = 0 and 2y(1 + ) = 0 respectively. Suppose
6= 1 or 1. Then (x, y) = (0, 0), violating (3). If = 1, y = 0, so x2 = 1,
and so on. So the four solutions (x, y, ) to the FOCs form the solution set
{(1, 0, 1), (1, 0, 1), (0, 1, 1), (0, 1, 1)}. Evaluating the function values at
38
these points, we have that f has a constrained max at (1, 0) and (1, 0) and
constrained min at (0, 1) and (0, 1).
Although unnecessary, lets practice second-order conditions for this example. Df (x, y)= (2x 2y), Dg(x, y) = (2x 2y).
2 0
D2 f (x, y) =
0 2
2 0
D2 g(x, y) =
0 2
0 0
0 4
0
y
=4y 2 < 0 for all (x, y) 6= (0, 0). So negative definiteness holds, and we
are at a strict local max.
Some Derivatives
(1). Let I : <n <n be defined by I(x) = x, x <n . In component
function notation, we have I(x) = (I1 (x), . . . , In (x)) = (x1 , . . . , xn ). So,
DIi (x) = ei , i.e. the vector with 1 in the ith place and zeros elsewhere. So,
DI(x) = Inn , the identity matrix.
By similar work, we can show that if f (x) = Ax, where A is an m n matrix, then Df (x) = A. Indeed, the jth component function fj (x) = aj1 x1 +
. . . + ajn xn , so its matrix of partial derivatives with respect to x1 , . . . , xn is
Dfj (x) = (aj1 . . . ajn ).
(2). Let f : <n <m and g : <n <m . By way of convention, consider
f (x) and g(x) to be column vectors, and consider the function h : <n <
39
Df1 (x)
Df (x)
P
2
g
(x)Df
(x)
=
(g
(x),
.
.
.
,
g
(x))
i
1
n
i i
Dfn (x)
T
= g(x) Df (x), and so on.
We take a step back and derive this in a more expanded fashion. Since
P
h(x) = m
i=1 fi (x)gi (x), its partial derivative with respect to xj is:
h(x)/xj =
m
X
i=1
40
.
As an application, let h(x) = xT x. Then Dh(x) equals xT D(x)+xT Dx =
xT I + xT I = 2xT .
On the Chain Rule
We saw an example (in the proof of the 1st order condition in unconstrained optimization) of the Chain Rule at work; youve seen this before.
Namely, if h : < <n and f : <n < are differentiable at the relevant
points, then the composition g(t) = f (h(t)) is differentiable at t and
0
g (t) = Df (h(t))Dh(t) =
n
X
f (h(t))
j=1
xj
hj (t)
You may have encountered this before in notation f (h1 (t), . . . , hn (t)),
with some use of total differentiation or something. Similarly, suppose h :
<p <n and f : <n <m are differentiable at the relevant points, then the
composition g(x) = f (h(x)), g : <p <m is differentiable at x, and
Dg(x) = Df (h(x))Dh(x)
.
Here, on the RHS an m n matrix multiplies an n p matrix, to result
in the m p matrix on the LHS.
The intuition for the Chain Rule is perhaps this. Let z = h(x). If x
changes by dx, the first-order change in z is dz = Dh(x)dx. The first-order
change in f (z) is then Df (z)dz. Substituting for dz, the first-order change
in f (h(x)) equals [Df (h(x))Dh(x)] dx.
In the formula, things are actually quite similar to the familiar case.
The (i, j)th element of the matrix Dg(x) is gi (x)/xj , where gi is the ith
component function of g and xj is the j th variable. Since this is equal to the
dot product of the ith row of Df (h(x)) and the j th column of Dh(x), we have
41
gi (x)/xj =
n
X
fi (h(x)) hk (x)
hk
k=1
xj
f (x ) = Fx /Fy
where Fx , Fy are the partial derivatives of F , evaluated at (x , y ). The
marginal rate of substitution between the two goods (LHS) equals the ratio of
the marginal utilities (RHS). In fact, when we say under some assumptions
on F , one of the assumptions is that Fy evaluated at (x , y ) is not zero.
The mnemonic for getting the derivative: From F (x, y) = c, we totally
differentiate to get Fx dx+Fy dy = 0, and rearrange to get dy/dx = Fx /Fy .
(2). Comparative Statics.
We then move to the vector case by analogy. Suppose
F (x, y) = c
where x is an n-vector, y an m-vector, c a given m-vector. Let (x , y )
solve F (x , y ) = c. Think of x being exogenous, so this is a set of m
43
1 = P (q1 + q2 )q1 c1 q1
and
2 = P (q1 + q2 )q2 c2 q2
If profits are concave in own output, then the first-order conditions below
characterize Cournot-Nash equilibrium (q1 , q2 ).
1 /q1 = P 0 (q1 + q2 )q1 + P (q1 + q2 ) c1 = 0
45
F1 /q1 F1 /q2
DFq (.) =
F2 /q1 F2 /q2
For brevity, let P 0 and P 00 be the derivative and second derivative of P (.)
evaluated at the
Then
equilibrium.
P 00 q1 + 2P 0 P 00 q1 + P 0
DFq (.) =
P 00 q2 + P 0 P 00 q2 + 2P 0
The determinant of this matrix works out to be
(P 0 )2 + P 0 (P 00 (q1 + q2 ) + 2P 0 ) > 0 since P 0 < 0 and the concavity in own
output condition is assumed to be met. So the implicit function theorem can
be applied
Notice also
that
1 0
.
DFc (.) =
0 1
Thus we can work out Df (c), the changes in equilibrium outputs as a
result of changes in unit costs. It would be a good exercise for you to work
these out, and sign these.
Proof of the Theorem of Lagrange
Before the formal proof, note that well use the tangency of the contour
sets of the objective and the constraint approach, which in other words uses
the implicit function theorem. For example, consider maximizing F (x1 , x2 )
s.t. G(x1 , x2 ) = 0. If G1 6= 0 (this is the constraint qualification in this
0
case), we have at a tangency point of contour sets, G1 f (x2 ) + G2 = 0 (where
x1 = f (x2 ) is the implicit function that keeps the points (x1 , x2 ) on the
0
constraint); so f (x2 ) = G2 /G1 .
On the other hand, if we vary x2 and adjust x1 to stay on the constraint,
the function value F (x1 , x2 ) = F (f (x2 ), x2 ) does not increase; therefore lo0
cally around the optimum, F1 f (x2 ) + F2 = 0. Substituting, F1 (G2 /G1 ) +
F2 = 0. If we now put
46
F1 /G1 =
,
we have both F1 + G1 = 0 by definition, and G2 + F2 = 0, the two
FONC.
The Proof:
Without loss of generality, let the leading principal k k minor matrix of
Dg(x ) be linearly independent. We write x = (w, z) with w being the first
k coordinates of x and z being the last (n k) coordinates. So showing the
existence of (a 1 n vector) that solves
Df (x ) + Dg(x ) =
is the same as showing that the 2 equations below hold for this ; the
equations are of dimension 1 k and 1 (n k) respectively:
Dfw (w , z ) + Dgw (w , z ) =
(*)
Dfz (w , z ) + Dgz (w , z ) =
(*)
Since Dgw (w , z ) is square and of full rank, Eq.(*) yields
= Dfw (w , z ) [Dgw (w , z )]1
(**)
Indeed, using the Chain Rule on V (a) f (x (a), a), we have V 0 (a) =
Dfx (x , a)Dx (a)+f (x , a)/a. But because x is an interior Max, Dfx (x , a) =
1n . So, V 0 (a) = f /a.
Now suppose we want to maximize an objective function f (x), which does
not depend on a, but subject to a constraint g(x, a) = a G(x) = 0 that
does depend on a. Under nice conditions, at the Max,
Df (x ) + Dg(x , a) =
(i)
Also note that if a changes, the value of g(x (a), a) must continue to be
zero, so
49
(ii)
Now, V (a) f (x (a)), so V 0 (a) = Df (x )Dx (a). Using (i) to substitute for Df (x ), this equals Dg(x , a)Dx (a), which equals, using (ii),
g/a = . So here, V 0 (a) = , the value of the multiplier at the optimum
is the rate of change of the objective with respect to the parameter a being
relaxed.
Suppose now that we have an objective function f (x, a) to maximize
subject to g(x, a) = 0. Along similar lines, we can show that V 0 (a) = f /a+
g/a, i.e. the direct effect of a on the Lagrangian function. As an exercise,
please derive Roys Identity using the indirect utility function V (p, I).
50
Chapter 5
Optimization with Inequality
Constraints
5.1
Introduction
We use Kuhn-Tucker Theory to address optimization problems with inequality constraints. The main result is a first order necessary condition
that is somewhat different for that of the Theorem of Lagrange; one main
difference is that the first order conditions gi (x) = 0, i = 1, . . . , k in the Theorem of Lagrange are replaced by the conditions i gi (x) = 0, i = 1, . . . , k in
Kuhn-Tucker theory.
In order to motivate this difference, let us discuss a simple setting. Consider an objective function f : <2 <. We want to maximize f (x) or
f (x1 , x2 ) over all x <2 that satisfy G(x) a, where G : <2 <. We will
alternatively write g(x) = a G(x) 0. For this example, let us assume
that G(x) is strictly increasing. We can view a as the total resource available;
such as the total income available for spending on goods. Draw a picture.
A maximum x can occur either in the interior (i.e. G(x ) < a or g(x ) >
0), or at the boundary ( G(x ) = a or g(x ) = 0). If it happens in the
interior, it implies Df (x ) = . If it happens on the boundary, it must
be that reducing the parameter value a does not increase f (x); for whatever
vector x you choose as maximizer after the reduction of a was available before,
at the higher value of a, and was not chosen as the maximizer. Consider then
setting up the Lagrangian
5.2
Kuhn-Tucker Theory
Dgi (x)
..
Dg (x) =
.
, where i, . . . , m are the indexes of the binding
Dg m (x)
constraints. So Dg (x) is an l n matrix.
53
We now state FONC for the problem. The Theorem below is a consolidation of the Fritz-John and the Kuhn-Tucker Theorems.
Theorem 21 (The Kuhn-Tucker (KT) Theorem). Let f : Rn R, and
gi : Rn R, i = 1, . . . , k be C 1 functions. Suppose x is a Maximum of f
on the set S = U
U Rn . Then there exist real numbers , 1 , . . . , k , not all zero such that
P
Df (x ) + ki=1 i Dgi (x ) = 1n .
Moreover, if gi (x ) > 0 for some i, then i = 0.
If, in addition, RankDg (x ) = l, then we may take to be equal to 1.
Furthermore, i 0, i = 1, . . . , k, and i > 0 for some i implies gi (x ) = 0.
Suppose the constraint qualification, RankDg (x ) = l, is met at the
optimum. Then the KT equations are the following (n + k) equations in the
n + k variables x1 , . . . , xn , 1 , . . . , k :
i gi (x ) = 0, i = 1, . . . , k, i 0, gi (x ) 0 with complementary
slackness.
(1)
P
k
(2)
Df (x ) + i=1 i Dgi (x ) =
If x is a local minimum of f on S, then f (x ) attains a local maximum
value. Thus for minimization, while Eq.(1) stays the same, Eq.(2) changes
to
P
Df (x ) + ki=1 i Dgi (x ) =
(2)
Equation (1) and (2) are known as the Kuhn-Tucker conditions.
Note finally that since the conditions of the Kuhn-Tucker Theorem are
not sufficient conditions for local optima; there may be points that satisfy
Equations (1) and (2) or (2) without being local optima. For example, you
may check that for the problem
Max f (x) = x3 s.t. g(x) = x 0, the values x = = 0 satisfy the KT
FONC (1) and (2) for a local maximum but do not yield a maximum.
54
5.3
55
Suppose the constraint does not bind at the maximum; then we dont have
to check a CQ. But suppose it does. That is, suppose the optimum occurs
at x = 3. Dg(x) = 3(3 x)2 = 0 at x = 3. The CQ fails here. You could
check that the KT FONC will not isolate the maximum.In fact, in this baby
example, it is easy to see that x = 3 is the max, as (3x)3 0 if f (3x) 0,
so we may work with the latter constraint function, with which CQ does not
fail. It is a good exercise to visualize f (x) and see that x = 3 is the maximum,
rather than merely cranking out the algebra now.
Alternatively, we may use the more general FONCs stated in the theorem.
Df (x) + Dg(x) = 0, with , not both zero.
(6x2 6x) + (3(3 x)2 ) = 0, and
(1)
3
(3 x) 0, with strict inequality implying = 0.
(2)
3
If (3 x) > 0, then = 0, which from Eq.(1) implies either = 0, which
violates the FONC, or x = 1. At x = 1, f (x) = 1.
On the other hand, if (3 x)3 = 0, that is x = 3, then Eq (1) implies
= 0, so it must be that > 0. At x = 3, f (x) = 27. so x = 3 is the
maximum.
Two Simple Utility Maximization Problems
Example 1. This is a real baby example meant purely for illustration.
No one expects you to use the heavy Kuhn-Tucker machinery for such simple
problems. In this example, one expects instead that you would use reasoning about the marginal utility per rupee ratios (U1 /p1 ), (U2 /p2 ) to solve the
problem.
Max U (x1 , x2 ) = x1 + x2 , over the set {x = (x1 , x2 ) R2 |x1 0, x2
0, I p1 x1 p2 x2 0}, where I > 0, p1 > 0 and p2 > 0 are given.
So there are 3 inequality constraints:
g1 (x1 , x2 ) = x1 0, g2 (x1 , x2 ) = x2 0, and
g3 (x1 , x2 ) = I p1 x1 p2 x2 0
At the maximum x , any combination of these three could bind; so there
are 8 possibilities. However, since U is strictly increasing, the budget constraint binds at the maximum (g3 (x ) = 0). Moreover, g1 (x ) = g2 (x ) = 0 is
56
not possible since consuming 0 of both goods gives utility equal to 0, which
is clearly not a maximum.
So we have to check just three possibilities out of the eight.
Case(1) g1 (x ) > 0, g2 (x ) > 0, g3 (x ) = 0
Case(2) g1 (x ) = 0, g2 (x ) > 0, g3 (x ) = 0
Case(3) g1 (x ) > 0, g2 (x ) = 0, g3 (x ) = 0
Before using the KT conditions, we verify that (i) a global max exists
(here, because the utility function is continuous and the budget set is compact), and that (ii) the CQ holds at all 3 relevant cominations of binding
constraints described above.
Indeed, for Case(1), Dg (x) = Dg 3 (x) = (p1 , p2 ), so Rank[Dg 3 (x)] =
1, so CQ holds.
Dg1 (x)
1
0
, so Rank[Dg (x)]
For Case(2), Dg (x) =
=
Dg 3 (x)
p1 p2
= 2.
0
1
Dg2 (x)
=
, so Rank[Dg (x)]
For Case(3), Dg (x) =
p1 p2
Dg3 (x)
= 2.
Thus for the maximum, x , there exists a such that (x , ) will be a
solution to the KT FONCs. Of course, there could be other (x, )0 s that are
solutions as well, but a simple comparison of U (x) for all candidate solutions
will isolate for us the Maximum.
L(x, ) = x1 + x2 + 1 x1 + 2 x2 + 3 (I p1 x1 p2 x2 )
The KT conditions are
1 (L/1 ) = 1 x1 = 0, 1 0, x1 0, with CS
(1)
2 (L/2 ) = 2 x2 = 0, 2 0, x2 0, with CS
(2)
3 (L/3 ) = 3 (I p1 x1 p2 x2 ) = 0, 3 0, I p1 x1 p2 x2 0, with
CS
(3)
(L/x1 ) = 1 + 1 3 p1 = 0
(4)
(L/x2 ) = 1 + 2 3 p2 = 0
(5)
Since we dont know which of the three cases select the constraints that
bind at the maximum, we must try all three.
Case(1). Since x1 > 0, x2 > 0, (1) and (2) imply 1 = 2 = 0.Plugging
these in Eq(4) and (5), we have 1 = 3 p1 = 3 p2 . This implies 3 > 0. (Also
57
note that this is consistent with the fact that since utility is strictly increasing,
relaxing the budget constraint will increase utility. So the marginal utility
of income, 3 > 0. Thus 3 p1 = 3 p2 implies p1 = p2 .
So if at a local max both x1 and x2 are strictly positive, then it must be
that their prices are equal. All (x1 , x2 ) that solve Eq(3) are solutions. The
utility in any such case equals
x1 + (I p1 x1 /p2 ) = I/p, where p = p1 = p2 . Note that in this case,
(U1 /p1 ) = (U2 /p2 ) = 1/p.
Case 2. x1 = 0 implies, from Eq(3), that x2 = (I/p2 ). Since this is
greater than 0, Eq(2) implies 2 = 0. Hence from Eq(5), 3 p2 = 1.
Since 1 0, Eq (4) and (5) imply 3 p1 = 1 + 1 1 = 3 p2 . Moreover,
since 3 > 0, this implies p1 p2 .
That is, if it is the case that at the maximum, x1 = 0, x2 > 0, then it must
be that p1 p2 . Note that in this case, (U2 /p2 ) = (1/p2 ) (U1 /p1 ) = (1/p1 ).
For completeness sake, Eq(5) implies 3 = (1/p2 ). So from Eq (4),
1 = (p1 /p2 ) 1. So the unique critical point of L(x, ) is
(x , ) = (x1 , x2 , 1 , 2 , 3 ) = (0, (I/p2 ), (p1 /p2 ) 1, 0, (1/p2 )).
Case(3). This case is similar, and we get that x2 = 0, x1 > 0 occurs only
if p1 p2 . We have
(x , ) = ((I/p1 ), 0, 0, (p2 /p1 ) 1, 1/p1 ).
We see that which of the cases applies depends upon the price ratio p1 /p2 .
2
such that the
If p1 = p2 , then all three cases are relevant, and all (x1 , x2 ) R+
budget constraint binds are utility maxima. But if p1 > p2 , then only Case(2)
is applies, because if Case (1) had applied, we would have had p1 = p2 , and
if Case (3) had applied, that would have implied p1 p2 . The solution to
the KT conditions in that case is the utility maximum. Similarly, if p1 < p2 ,
only Case (3) applies.
Example 2. Max U (x1 , x2 ) = (x1 /1 + x1 ) + x2 /1 + x2 ), s.t. x1 0,
x2 0, p1 x1 + p2 x2 I.
Check that the indifference curves are downward sloping, convex and that
they cut the axes (show all this). This last is due to the additive form of the
58
utility function, and may result in 0 consumption of one of the goods at the
utility maximum.
Exactly as in Example 1, we are assured that a global max exists, that
the CQ is met at the optimum, and that there are only 3 relevant cases of
binding constraints to check.
the Kuhn-Tucker conditions are:
1 (L/1 ) = 1 x1 = 0, 1 0, x1 0, with CS
(1)
2 (L/2 ) = 2 x2 = 0, 2 0, x2 0, with CS
(2)
3 (L/3 ) = 3 (I p1 x1 p2 x2 ) = 0, 3 0, I p1 x1 p2 x2 0, with
CS
(3)
2
(L/x1 ) = (1/(1 + x1 ) ) + 1 3 p1 = 0
(4)
(L/x2 ) = (1/(1 + x2 )2 ) + 2 3 p2 = 0
(5)
Case(1). x1 > 0, x2 > 0 implies 1 = 2 = 0. Eq (4) implies 3 > 0, so
that Eq(4) and (5) give ((1 + x2 )/(1 + x1 )) = (p1 /p2 )1/2 .
Using Eq(3), which gives x2 = ((I p1 x1 )/p2 ), above, we get
((p2 + I p1 x1 )/(p2 (1 + x1 )) = (p1 /p2 )1/2 , so simple computations yield
x1 = ((I + p2 (p1 p2 )1/2 )/(p1 + (p1 p2 )1/2 )),
x2 = ((I + p1 (p1 p2 )1/2 )/(p2 + (p1 p2 )1/2 )),
3 = (1/p1 (1 + x1 )2 ).
x1 > 0, x2 > 0 implies I > (p1 p2 )1/2 p1 , I > (p1 p2 )1/2 p2 . If either of
these fails, then we are not in the regime of Case 1.
Case(2) x1 = 0 with Eq(3) implies x2 = I/p2 . Since this is positive,
2 = 0, so Eq(5) implies 3 = 1/(1 + (I/p2 ))2 p2 .
1 = 3 p1 1 (from x1 = 0 and Eq(4)).
1 = p1 p2 /(p2 + I)2 1. For this to be 0, it is required that
p1 p2 /(p2 + I)2 1,that is I (p1 p2 )1/2 p2 .
Utility equals x2 /(1 + x2 ) = I/(p2 + I).
(x1 , x2 , 1 , 2 , 3 ) = (0, I/p2 , 1 + ((p1 p2 )/(p2 + I)2 ), 0, p2 /(p2 + I)2 ).
Case(3) By symmetry, the solution is
(x1 , x2 , 1 , 2 , 3 ) = (I/p1 , 0, 0, 1 + ((p1 p2 )/(p1 + I)2 ), p1 /(p1 + I)2 )
And for this Case to hold it is necessary that p1 p2 /(p1 + I)2 1, or
I (p1 p2 )1/2 p1 .
59
5.4
Miscellaneous
(1) For problems where some constraints are of the form gi (x) = 0, and
others of the form gj (x) 0, only the latter give rise to Kuhn-Tucker like
complementary slackness conditions (j 0, gj (x) 0, j gj (x) = 0).
(2) If the objective to be maximized, f , and the constraints gi , i = 1, . . . , k
(where constraints are of the form gi (x) 0) are all concave functions, and
if Slaters constraint qualification holds (i.e., there exists some x <n s.t.
gi (x) > 0, i = 1, . . . , k), then the Kuhn-Tucker conditions become both
necessary and sufficient for a global max.
(3) Suppose f and all the gi s are quasiconcave. Then the Kuhn-Tucker
conditions are almost sufficient for a global max: An x and that satisfy
the Kuhn-Tucker conditions indicate that x is a global max provided that
in addition to the above, either Df (x ) 6= , or f is concave.
60
Appendix
Completeness Property of Real Numbers
61