Sei sulla pagina 1di 17

The story becomes more complicated when we take higher order derivatives of

multivariate functions. The interpretation of the first derivative remains the same, but
there are now two second order derivatives to consider.

First, there is the direct second-order derivative. In this case, the multivariate function
is differentiated once, with respect to an independent variable, holding all other
variables constant. Then the result is differentiated a second time, again with respect
to the same independent variable. In a function such as the following:

There are 2 direct second-order partial derivatives, as indicated by the following


examples of notation:

These second derivatives can be interpreted as the rates of change of the two slopes of
the function z.

Now the story gets a little more complicated. The cross-partials, fxy and fyx are
defined in the following way. First, take the partial derivative of z with respect to x.
Then take the derivative again, but this time, take it with respect to y, and hold the x
constant. Spatially, think of the cross partial as a measure of how the slope (change in
z with respect to x) changes, when the y variable changes. The following are
examples of notation for cross-partials:

We'll discuss economic meaning further in the next section, but for now, we'll just
show an example, and note that in a function where the cross-partials are continuous,
they will be identical. For the following function:
Take the first and second partial derivatives.

Now, starting with the first partials, find the cross partial derivatives:

Note that the cross partials are indeed identical, a fact that will be very useful to us in
future optimization sections.

Now that we have the brief discussion on limits out of the way we can proceed into taking
derivatives of functions of more than one variable. Before we actually start taking derivatives of
functions of more than one variable lets recall an important interpretation of derivatives of
functions of one variable.

Recall that given a function of one variable, , the derivative,


, represents the rate of change of the function as x changes. This is an important
interpretation of derivatives and we are not going to want to lose it with functions of more than
one variable. The problem with functions of more than one variable is that there is more than
one variable. In other words, what do we do if we only want one of the variables to change, or if
we want more than one of them to change? In fact, if were going to allow more than one of the
variables to change there are then going to be an infinite amount of ways for them to change.
For instance, one variable could be changing faster than the other variable(s) in the function.
Notice as well that it will be completely possible for the function to be changing differently
depending on how we allow one or more of the variables to change.

We will need to develop ways, and notations, for dealing with all of these cases. In this section
we are going to concentrate exclusively on only changing one of the variables at a time, while
the remaining variable(s) are held fixed. We will deal with allowing multiple variables to change
in a later section.
Because we are going to only allow one of the variables to change taking the derivative will now
become a fairly simple process. Lets start off this discussion with a fairly simple function.

Lets start with the function and lets

determine the rate at which the function is changing at a point, , if we


hold y fixed and allow x to vary and if we hold x fixed and allow y to vary.

Well start by looking at the case of holding y fixed and allowing x to vary. Since we are
interested in the rate of change of the function at and are holding y fixed this
means that we are going to always have (if we didnt have this then
eventually y would have to change in order to get to the point). Doing this will give us a
function involving only xs and we can define a new function as follows,

Now, this is a function of a single variable and at this point all that we are asking is to determine

the rate of change of at . In other words, we want to


compute and since this is a function of a single variable we already know how

to do that. Here is the rate of change of the function at if we hold y fixed and
allow x to vary.

We will call the partial derivative of with respect


to x at and we will denote it in the following way,

Now, lets do it the other way. We will now hold x fixed and allow y to vary. We can do this in a
similar way. Since we are holding x fixed it must be fixed at and so we can
define a new function of y and then differentiate this as weve always done with functions of one
variable.

Here is the work for this,


In this case we call the partial derivative of with
respect to y at and we denote it as follows,

Note that these two partial derivatives are sometimes called the first order partial derivatives.
Just as with functions of one variable we can have derivatives of all orders. We will be looking
at higher order derivatives in a later section.

Note that the notation for partial derivatives is different than that for derivatives of functions of a
single variable. With functions of a single variable we could denote the derivative with a single
prime. However, with partial derivatives we will always need to remember the variable that we
are differentiating with respect to and so we will subscript the variable that we differentiated with
respect to. We will shortly be seeing some alternate notation for partial derivatives as well.

Note as well that we usually dont use the notation for partial derivatives. The
more standard notation is to just continue to use . So, the partial derivatives
from above will more commonly be written as,

Now, as this quick example has shown taking derivatives of functions of more than one variable
is done in pretty much the same manner as taking derivatives of a single variable. To
compute all we need to do is treat all the ys as constants (or

numbers) and then differentiate the xs as weve always done. Likewise, to compute

we will treat all the xs as constants and then differentiate the ys as we are used
to doing.

Before we work any examples lets get the formal definition of the partial derivative out of the
way as well as some alternate notation.
Since we can think of the two partial derivatives above as derivatives of single variable functions
it shouldnt be too surprising that the definition of each is very similar to the definition of the
derivative for single variable functions. Here are the formal definitions of the two partial
derivatives we looked at above.

Now lets take a quick look at some of the possible alternate notations for partial derivatives.
Given the function the following are all equivalent
notations,

For the fractional notation for the partial derivative notice the difference between the partial
derivative and the ordinary derivative from single variable calculus.
Okay, now lets work some examples. When working these examples always keep in mind that
we need to pay very close attention to which variable we are differentiating with respect to. This
is important because we are going to treat all other variables as constants and then proceed with
the derivative as if it was a function of a single variable. If you can remember this youll find
that doing partial derivatives are not much more difficult that doing derivatives of functions of a
single variable as we did in Calculus I.

Partial Derivatives: a Body Weight Analogy


Calories consumed and calories burned have an impact on our weight. Let's say that our weight, u,
depended on the calories from food eaten, x, and the amount of physical exertion we do, y. If we
only regulated our eating while doing the same exercise every day, we could ask how does u change
when we vary only x. Likewise, we could keep x constant and take note of how u varies when we
change y. This would be like keeping a constant daily diet while changing how much we exercise.
This idea of change with respect to one variable while keeping other variables constant is at the
heart of the partial derivative.

Defining the Partial Derivative


When we write u = u(x,y), we are saying that we have a function, u, which depends on two
independent variables: x and y. We can consider the change in u with respect to either of these two
independent variables by using the partial derivative. The partial derivative of u with respect to x is
written as
What this means is to take the usual derivative, but only x will be the variable. All other variables will
be treated as constants. We can also determine how uchanges with y when x is held constant. This
is the partial of u with respect to y. It is written as

For example, if u = y * x^2, then

Likewise, we can differentiate with respect to y and treat x as a constant:

The rule for partial derivatives is that we differentiate with respect to one variable while keeping all
the other variables constant. As another example, find the partial derivatives of u with respect
to x and with respect to y for

To do this example, we will need the derivative of an exponential,

and the derivative of a cosine,

Thus,

and
Taking the Partial Derivative of a Partial
Derivative
So far we have defined and given examples for first-order partial derivatives. Second-order partial
derivatives are simply the partial derivative of a first-order partial derivative. We can have four
second-order partial derivatives:

Continuing with our first example of u = y * x^2,

and

Likewise,
and

And, in our second example,

and

The mixed partial derivatives become


and

Note that we will always get

This can be used to check our work.

Extending the Idea of Partial Derivatives


What if the variables x and y also depend on other variables? For example, we could have x = x(s,t)
and y= y(s,t).

Then,

and

To unlock this lesson you must be a Study.com Member. Create your account
Unconstrained optimization problems consider the problem of minimizing an objective function that depends on
real variables with no restrictions on their values. Mathematically, let xRnxRn be a real vector

with n1n1 components and let f:RnRf:RnR be a smooth function. Then, the unconstrained optimization
problem is

minxf(x).minxf(x).

Unconstrained optimization problems arise directly in some applications but they also arise indirectly from
reformulations of constrained optimization problems. Often it is practical to replace the constraints of an optimization
problem with penalized terms in the objective function and to solve the problem as an unconstrained problem.

Algorithms
An important aspect of continuous optimization (constrained and unconstrained) is whether the functions are smooth,
by which we mean that the second derivatives exist and are continuous. There has been extensive study and
development of algorithms for the unconstrained optimization of smooth functions. At a high level, algorithms for
unconstrained minimization follow this general structure:

Choose a starting point x0x0.

Beginning at x0x0, generate a sequence of iterates {xk}k=0{xk}k=0 with non-increasing function (ff)
value until a solution point with sufficient accuracy is found or until no further progress can be made.

To generate the next iterate xk+1xk+1, the algorithm uses information about the function at xkxk and possibly earlier
iterates.

Newton's Method

Newton's Method gives rise to a wide and important class of algorithms that require computation of the gradient
vector

f(x)=(1f(x),,nf(x))Tf(x)=(1f(x),,nf(x))T

and the Hessian matrix

2f(x)=[ijf(x)].2f(x)=[ijf(x)].

Although the computation or approximation of the Hessian can be a time-consuming operation, there are many
problems for which this computation is justified.

Wikipedia Link to Newton's Method in Optimization

Open Link in New Tab


Newton's method forms a quadratic model of the objective function around the current iterate. The model function is
defined by

qk(s)=f(xk)+f(xk)Ts+12sT2f(xk)s.qk(s)=f(xk)+f(xk)Ts+12sT2f(xk)s.

In the basic Newton method, the next iterate is obtained from the minimizer of qk(s)qk(s). When the Hessian

matrix, 2f(xk)2f(xk), is positive definite, the quadratic model has a unique minimizer that can be obtained by

solving the symmetric nnnn linear system:

2f(xk)s=f(xk).2f(xk)s=f(xk).

The next iterate is then

xk+1=xk+sk.xk+1=xk+sk.

Convergence is guaranteed if the starting point is sufficiently close to a local


minimizer xx at which the Hessian is positive definite. Moreover, the rate of
convergence is quadratic, that is,

xk+1xxkx2xk+1xxkx2

for some positive constant . In most circumstances, however, the basic Newton
method has to be modified to achieve convergence. There are two fundamental
strategies for moving from xkxk to xk+1xk+1: line search and trust region. Most
algorithms follow one of these two strategies.

The line-search method modifies the search direction to obtain another downhill, or descent, direction for ff.
It then tries different step lengths along this direction until it finds a step that not only decreases ff but also
achieves at least a small fraction of this direction's potential.

Wikipedia Link to Line Search

Open Link in New Tab

The trust-region methods use the original quadratic model function, but they constrain the new iterate to stay
in a local neighborhood of the current iterate. To find the step, it is necessary to minimize the quadratic function
subject to staying in this neighborhood, which is generally ellipsoidal in shape.
Wikipedia Link to Trust Region

Open Link in New Tab

Line-search and trust-region techniques are suitable if the number of variables nn is not too large, because the cost
per iteration is of order n3n3. Codes for problems with a large number of variables tend to use truncated Newton
methods, which usually settle for an approximate minimizer of the quadratic model.

Wikipedia Link to Truncated Newton Method

Open Link in New Tab

Methods with Hessian Approximations

If computing the exact Hessian matrix is not practical, the same algorithms can be used with a reasonable
approximation of the Hessian matrix. Two types of methods use approximations to the Hessian in place of the exact
Hessian.

One approach is to use difference approximations to the exact Hessian. Difference approximations exploit
the fact that each column of the Hessian can be approximated by taking the difference between two instances
of the gradient vector evaluated at two nearby points. For sparse Hessians, it is often possible to approximate
many columns of the Hessian with a single gradient evaluation by choosing the evaluation points judiciously.

Quasi-Newton Methods build up an approximation to the Hessian by keeping track of the gradient
differences along each step taken by the algorithm. Various conditions are imposed on the approximate
Hessian. For example, its behavior along the step just taken is forced to mimic the behavior of the exact
Hessian, and it is usually kept positive definite.

Wikipedia Link to Quasi-Newton Method

Open Link in New Tab

Other Methods for Unconstrained Optimization

There are two other approaches for unconstrained problems that are not so closely related to Newton's method.

Nonlinear conjugate gradient methods are motivated by the success of the linear conjugate gradient method
in minimizing quadratic functions with positive definite Hessians. They use search directions that combine the
negative gradient direction with another direction, chosen so that the search will take place along a direction not
previously explored by the algorithm. At least, this property holds for the quadratic case, for which the minimizer
is found exactly within just n iterations. For nonlinear problems, performance is problematic, but these methods
do have the advantage that they require only gradient evaluations and do not use much storage.

Wikipedia Link to Nonlinear Conjugate Gradient Method

Open Link in New Tab

The nonlinear Simplex method (not to be confused with the simplex method for linear programming) requires
neither gradient nor Hessian evaluations. Instead, it performs a pattern search based only on function values.
Because it makes little use of information about f, it typically requires a great many iterations to find a solution
that is even in the ballpark. It can be useful when f is nonsmooth or when derivatives are impossible to find, but
it is unfortunately often used when one of the algorithms above would be more appropriate.

Related Problems
The Nonlinear Least-Squares Problem is a special case of unconstrained optimization. It arises in many
practical problems, especially in data-fitting applications. The objective function ff has the form

f(x)=0.5j=1mr2j(x),f(x)=0.5j=1mrj2(x),

where each rjrj is a smooth function from RnRn to RR. The special form of ff and its derivatives has been
exploited to develop efficient algorithms for minimizing ff.

The problem of solving a system of Nonlinear Equations is related to unconstrained optimization in that a
number of algorithms for nonlinear equations proceed by minimizing a sum of squares. It often arises in
problems involving physical systems. In nonlinear equations, there is no objective function to optimize but
instead the goal is to find values of the variables that satisfy a set of nn equality constraints.

In mathematical optimization, constrained optimization (in some contexts called constraint


optimization) is the process of optimizing an objective function with respect to some variables in the
presence of constraints on those variables. The objective function is either a cost function or energy
function which is to be minimized, or a reward function or utility function, which is to be maximized.
Constraints can be either hard constraints which set conditions for the variables that are required
to be satisfied, or soft constraints which have some variable values that are penalized in the
objective function if, and based on the extent that, the conditions on the variables are not satisfied.

General form[edit]

A general constrained minimization problem may be written as follows:

where and are constraints that are required to be satisfied; these are called hard constraints.

In some problems, often called constraint optimization problems, the objective function is
actually the sum of cost functions, each of which penalizes the extent (if any) to which a soft
constraint (a constraint which is preferred but not required to be satisfied) is violated.
Solution methods[edit]

Many unconstrained optimization algorithms can be adapted to the constrained case, often via
the use of a penalty method. However, search steps taken by the unconstrained method may be
unacceptable for the constrained problem, leading to a lack of convergence. This is referred to
as the Maratos effect.[1]

Equality constraints[edit]

If the constrained problem has only equality constraints, the method of Lagrange multipliers can
be used to convert it into an unconstrained problem whose number of variables is the original
number of variables plus the original number of equality constraints. Alternatively, if the
constraints are all equality constraints and are all linear, they can be solved for some of the
variables in terms of the others, and the former can be substituted out of the objective function,
leaving an unconstrained problem in a smaller number of variables.

Inequality constraints[edit]

With inequality constraints, the problem can be characterized in terms of the Geometric
Optimality conditions, Fritz John conditions and KarushKuhnTucker conditions, in which
simple problems may be solvable.

Linear programming[edit]

If the objective function and all of the hard constraints are linear, then the problem is a linear
programming problem. This can be solved by the simplex method, which usually works
in polynomial time in the problem size but is not guaranteed to, or by interior point
methods which are guaranteed to work in polynomial time.

Quadratic programming[edit]

If all the hard constraints are linear but the objective function is quadratic, the problem is
a quadratic programming problem. It can still be solved in polynomial time by the ellipsoid
method if the objective function is convex; otherwise the problem is NP hard.

Constraint optimization problems[edit]


Branch and bound[edit]

Constraint optimization can be solved by branch and bound algorithms. These are backtracking
algorithms storing the cost of the best solution found during execution and using it to avoid part
of the search. More precisely, whenever the algorithm encounters a partial solution that cannot
be extended to form a solution of better cost than the stored best cost, the algorithm backtracks,
instead of trying to extend this solution.
Assuming that cost is to be minimized, the efficiency of these algorithms depends on how the
cost that can be obtained from extending a partial solution is evaluated. Indeed, if the algorithm
can backtrack from a partial solution, part of the search is skipped. The lower the estimated cost,
the better the algorithm, as a lower estimated cost is more likely to be lower than the best cost of
solution found so far.

On the other hand, this estimated cost cannot be lower than the effective cost that can be
obtained by extending the solution, as otherwise the algorithm could backtrack while a solution
better than the best found so far exists. As a result, the algorithm requires an upper bound on
the cost that can be obtained from extending a partial solution, and this upper bound should be
as small as possible.

A variation of this approach called Hansen's method uses interval methods.[2] It inherently
implements rectangular constraints.

First-choice bounding functions[edit]

One way for evaluating this upper bound for a partial solution is to consider each soft constraint
separately. For each soft constraint, the maximal possible value for any assignment to the
unassigned variables is assumed. The sum of these values is an upper bound because the soft
constraints cannot assume a higher value. It is exact because the maximal values of soft
constraints may derive from different evaluations: a soft constraint may be maximal for while
another constraint is maximal for .

Russian doll search[edit]

This method[3] runs a branch-and-bound algorithm on problems, where is the number of


variables. Each such problem is the subproblem obtained by dropping a sequence of
variables from the original problem, along with the constraints containing them. After the
problem on variables is solved, its optimal cost can be used as an upper bound while solving
the other problems,

In particular, the cost estimate of a solution having as unassigned variables is added to the cost
that derives from the evaluated variables. Virtually, this corresponds on ignoring the evaluated
variables and solving the problem on the unassigned ones, except that the latter problem has
already been solved. More precisely, the cost of soft constraints containing both assigned and
unassigned variables is estimated as above (or using an arbitrary other method); the cost of soft
constraints containing only unassigned variables is instead estimated using the optimal solution
of the corresponding problem, which is already known at this point.

There is similarity between the Russian Doll Search method and Dynamic Programming. Like
Dynamic Programming,Russian Doll Search solves sub-problems in order to solve the whole
problem. But, whereas Dynamic Programming directly combines the results obtained on sub-
problems to get the result of the whole problem, Russian Doll Search only uses them as bounds
during its search.

Bucket elimination[edit]

The bucket elimination algorithm can be adapted for constraint optimization. A given variable can
be indeed removed from the problem by replacing all soft constraints containing it with a new
soft constraint. The cost of this new constraint is computed assuming a maximal value for every
value of the removed variable. Formally, if is the variable to be removed, are the soft
constraints containing it, and are their variables except , the new soft constraint is defined by:

Bucket elimination works with an (arbitrary) ordering of the variables. Every variable is
associated a bucket of constraints; the bucket of a variable contains all constraints having
the variable has the highest in the order. Bucket elimination proceed from the last variable to
the first. For each variable, all constraints of the bucket are replaced as above to remove the
variable. The resulting constraint is then placed in the appropriate bucket.

Potrebbero piacerti anche