CO250 Web

Introduction to Optimization
Course Notes for CO 250/CM 340

Winter 2012
c _Department of Combinatorics and Optimization
University of Waterloo
December 19, 2011
2
c Department of Combinatorics and Optimization, University of Waterloo Winter 2012
Contents
1 Introduction 7
1.1 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.1 A production planning example . . . . . . . . . . . . . . . . . . . . 8
1.1.2 Multiperiod models . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Integer Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.1 Maximum Weight Matching . . . . . . . . . . . . . . . . . . . . . . 14
1.2.2 Shortest Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.3 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 Nonlinear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.1 Pricing a DVD Player . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 Portfolio Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3.3 Finding a closest point feasible in an LP . . . . . . . . . . . . . . . . 25
1.3.4 Finding a central feasible solution of an LP . . . . . . . . . . . . . 26
1.4 Overview of the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5 Further reading and notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 Solving linear programs 29
2.1 Possible outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.1 Infeasible linear programs . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.2 Linear programs with optimal solutions . . . . . . . . . . . . . . . . 33
2.1.3 Unbounded linear programs . . . . . . . . . . . . . . . . . . . . . . 36
2.2 Standard equality form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3
CONTENTS 4
2.3 A Simplex iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4 Bases and canonical forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.1 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.2 Canonical forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.5 The Simplex algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.5.1 An example with an optimal solution . . . . . . . . . . . . . . . . . 51
2.5.2 An unbounded example . . . . . . . . . . . . . . . . . . . . . . . . 54
2.5.3 Formalizing the procedure . . . . . . . . . . . . . . . . . . . . . . . 55
2.6 Finding feasible solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.6.1 Reducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.6.2 The Two Phase method - an example . . . . . . . . . . . . . . . . . 63
2.6.3 Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.7 Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3 Computational complexity 71
3.1 A motivating example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.2 Fast and slow algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.2.1 The big O notation . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.2.2 Input size and running time . . . . . . . . . . . . . . . . . . . . . . . 73
3.2.3 The running time function . . . . . . . . . . . . . . . . . . . . . . . 75
3.2.4 Polynomial versus exponential algorithms . . . . . . . . . . . . . . . 75
3.2.5 The case of the Simplex . . . . . . . . . . . . . . . . . . . . . . . . 77
3.3 Hard problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.3.1 Decision problems - some examples . . . . . . . . . . . . . . . . . . 78
3.3.2 Polynomial reducibility . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.3.3 The P and NP-complete classes . . . . . . . . . . . . . . . . . . . 79
4 Introduction to duality 83
4.1 A rst example: Shortest paths . . . . . . . . . . . . . . . . . . . . . . . . . 84
CONTENTS 5
4.2 Bounding the optimal value of a linear program . . . . . . . . . . . . . . . . 85
4.3 The shortest-path example revisited . . . . . . . . . . . . . . . . . . . . . . 90
4.4 A second example: Minimum vertex cover . . . . . . . . . . . . . . . . . . . 95
4.5 Duals of general linear programs . . . . . . . . . . . . . . . . . . . . . . . . 98
4.5.1 Finding duals of general LPs . . . . . . . . . . . . . . . . . . . . . . 99
4.6 A third example: Maximum weight matching . . . . . . . . . . . . . . . . . 104
4.7 A fourth example: Network ow . . . . . . . . . . . . . . . . . . . . . . . . 107
4.8 A fth example: Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.9 The duality theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.9.1 Integer programs and the duality gap . . . . . . . . . . . . . . . . . . 115
4.9.2 Linear programs and the duality theorem . . . . . . . . . . . . . . . 116
4.10 Complementary Slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.11 Complementary Slackness and combinatorial examples . . . . . . . . . . . . 124
4.11.1 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.11.2 Shortest paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.11.3 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5 Solving Integer Programs 133
5.1 Shortest paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.1.1 An algorithm for shortest paths . . . . . . . . . . . . . . . . . . . . . 136
5.2 Maximum weight matching algorithm . . . . . . . . . . . . . . . . . . . . . 141
5.2.1 Halls condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.2.2 An optimality condition . . . . . . . . . . . . . . . . . . . . . . . . 143
5.2.3 The Hungarian matching algorithm . . . . . . . . . . . . . . . . . . 145
5.3 Cutting planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.3.1 General scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.3.2 Valid inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.3.3 Cutting plane and simplex . . . . . . . . . . . . . . . . . . . . . . . 153
5.4 Branch & Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
CONTENTS 6
5.4.1 A discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6 Geometry of optimization 163
6.1 Feasible solutions to linear programs and polyhedra . . . . . . . . . . . . . . 163
6.2 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.3 Extreme points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.4 Geometric interpretation of Simplex algorithm . . . . . . . . . . . . . . . . . 171
6.5 Cutting planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.6 A geometric interpretation of optimality . . . . . . . . . . . . . . . . . . . . 175
7 Nonlinear optimization 179
7.1 What is a nonlinear program? . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.2 Nonlinear programs are hard . . . . . . . . . . . . . . . . . . . . . . . . . . 182
7.2.1 NP-hardness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
7.2.2 Hard small dimensional instances . . . . . . . . . . . . . . . . . . . 182
7.3 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.3.1 Convex functions and epigraphs . . . . . . . . . . . . . . . . . . . . 184
7.3.2 Level sets and feasible region . . . . . . . . . . . . . . . . . . . . . 187
7.4 Relaxing convex NLPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.4.1 Subgradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.4.2 Supporting halfspaces . . . . . . . . . . . . . . . . . . . . . . . . . 189
7.5 Optimality conditions for the differentiable case . . . . . . . . . . . . . . . . 191
7.5.1 Sufcient conditions for optimality . . . . . . . . . . . . . . . . . . 192
7.5.2 Differentiability and gradients . . . . . . . . . . . . . . . . . . . . . 194
7.5.3 A Karush-Kuhn-Tucker Theorem . . . . . . . . . . . . . . . . . . . 195
7.6 Optimality conditions for Lagrangians . . . . . . . . . . . . . . . . . . . . . 196
Chapter 1
Introduction
Optimization problems are abundant in every day life, and all of us face such problems fre-
quently, although we may not always be aware of the fact! Obvious examples are, for instance,
the use of your GPS to nd a shortest route from your home to your work place in the morn-
ing, or the scheduling of trains on the rail connections between Waterloo and Toronto. There
are however many more, less obvious examples. How, for example, does the region of Wa-
terloo determine the structure of its electricity network? How are the schedules for buses
determined? And how does a company set the price for a newly developed product? All of
these questions are optimization problems, and in this chapter we show that many of these
questions admit a mathematical formulation.
1.1 Linear Programming
In this section we introduce linear programming, an efcient and versatile method to optimize
(minimize or maximize) a linear function subject to a system of linear inequality and/or equal-
ity constraints. As unimpressive as this may sound, linear programming is a very powerful tool
that is used in practice to solve instances of optimization problems arising in applications in
most branches of industry. In fact, a recent survey (see also [21]) of Fortune 500 rms shows
that 85% of all respondents use linear programming in their operations. The roots of linear
programming can be traced back at least a couple of hundred years to the work of Fourier
7
1.1. LINEAR PROGRAMMING 8
on solutions of systems of linear inequalities. The name linear programming, however, was
coined only recently, in the late 1930s and early 1940s when the Russian mathematician
Leonid Kantorovich and the American mathematician George Dantzig formally dened the
underlying techniques. George Dantzig also developed the Simplex algorithm which to this
date remains one of the most popular methods to solve linear programs. Dantzig who worked
as a mathematical advisor for the U.S. Air Force initially applied linear programming to solve
logistical problems arising in the military. It did, however, not take long for industry to realize
the techniques potential, and its use is wide-spread today. In this section we will see two
typical examples of optimization problems that can be solved via linear programming.
1.1.1 A production planning example
Production planning problems are probably among the most frequent applications of linear
programming. In a typical such application, we are given a company that produces a number
of different products from a set of resources. Producing a unit of a product requires the
use of a certain amount of each resource, and can be sold at a certain price on the market.
The company has limited amounts of each of the resources available, and is interested in
maximizing its revenue from sales. How much of each product should be produced?
Heres a more concrete example. WaterTech is a company that produces four products,
requiring time on two machines and two types (skilled and unskilled) of labour. The amount
of machine time and labor (in hours) needed to produce a unit of each product and the sales
prices in dollars per unit of each product are given in the following table:
Product Machine 1 Machine 2 Skilled Labor Unskilled Labor Sales Price ($)
1 11 4 8 7 300
2 7 6 5 8 260
3 6 5 5 7 220
4 5 4 6 4 180
Each month, 700 hours are available on machine 1 and 500 hours on machine 2. Each
month, WaterTech can purchase up to 600 hours of skilled labour at $8 per hour and up to 650
CHAPTER 1. INTRODUCTION 9
hours of unskilled labour at $6 per hour. The company wants to determine how much of each
product it should produce each month in order to maximize its prot (i.e., revenue from sales
minus labour costs).
We can model this problem as a linear program. WaterTech must determine how much of
each product to produce; it is therefore natural to dene a variable x
i
for each i 1, 2, 3, 4 for
the number of units of product i to produce. As part of the planning process the company must
also decide on the number of hours of skilled and unskilled labour that it wants to purchase.
We therefore introduce decision variables y
s
and y
u
for the number of purchased hours of
skilled and unskilled labour, respectively.
Objective function. Deciding on a production plan now amounts to nding values for
variables x
1
, . . . , x
4
, y
s
and y
u
. Once the values for these variables have been found, Wa-
terTechs prot is easily expressed by the following linear function
300x
1
+260x
2
+220x
3
+180x
4
. .
Prot from Sales
8 y
s
6 y
u
,
. .
Labour Costs
and the company wants to maximize this quantity.
Constraints. Not every production plan is feasible, of course. For example, from the
problem description, we easily see that we cannot produce more than 700/11 64 units
of product 1. More generally, the total amount of time needed on machine 1 for a given
production plan is given by
11x
1
+7x
2
+6x
3
+5x
4
,
and this must not exceed the available 700 hours of time on that machine. Thus, we obtain the
following constraint
11x
1
+7x
2
+6x
3
+5x
4
700. (1.1)
In a similar way, we derive a constraint for machine 2
4x
1
+6x
2
+5x
3
+4x
4
500. (1.2)
Analogously, once we decide on how much of each product should be produced, we know
how much skilled and unskilled labour is needed. Naturally, we need to make sure that enough
hours of each type of labour are purchased. The following two constraints enforce this.
8x
1
+5x
2
+5x
3
+6x
4
y
s
(1.3)
7x
1
+8x
2
+7x
3
+4x
4
y
u
. (1.4)
Finally, we need to add constraints that force each of the variables to take on only non-negative
values as well as constraints that limit the number of hours of skilled and unskilled labour pur-
chased. Combining the objective function with (1.1)-(1.4) gives the following linear program:
max 300x
1
+260x
2
+220x
3
+180x
4
8 y
s
6 y
u
s.t. 11x
1
+7x
2
+6x
3
+5x
4
700
4x
1
+6x
2
+5x
3
+4x
4
500
8x
1
+5x
2
+5x
3
+6x
4
y
s
7x
1
+8x
2
+7x
3
+4x
4
y
u
y
s
600
y
u
650
x
1
, x
2
, x
3
, x
4
, y
u
, y
s
0.
In Chapter 2, we will see how to solve the above linear program. At this point we just
state its optimal solution; i.e., the feasible production plan with maximum prot; it turns out
to be
x
1
= 16
2
3
, x
2
= 50, x
4
= 33
1
3
, y
s
= 583
1
3
, y
u
= 650,
achieving a total prot of 15433
1
3
. As the above example shows, solutions to a linear program
need not be integer-valued. Depending on the application, fractional solution values may
or may not make sense. For example, in the above production example, it may or may not
be possible to produce a fractional number of units of a product. Unfortunately, integrality
can generally not be enforced using linear programming. We will return to this question in
Section 1.2.
1.1.2 Multiperiod models
KWOil is a local supplier of heating oil. The company has been around for many years, and
knows its home turf. In particular, KWOil has developed a dependable model to forecast fu-
ture demand for oil. For each of the following four months, the company expects the following
amounts of demand for heating oil.
Month 1 2 3 4
Demand () 5000 8000 9000 6000
At the beginning of each of the 4 months, KWOil may purchase heating oil from a regional
supplier at the current market rate. The following table shows the projected price per litre at
the beginning of each of these months.
Month 1 2 3 4
Price ($/) .75 .72 .92 .90
KWOil has a small storage tank on its facility. The tank can hold up to 4000 litres of oil,
and currently (at the beginning of month 1) contains 2000 litres. The company wants to know
how much oil it should purchase at the beginning of each of the 4 months such that it satises
the projected customer demand at the minimum possible total cost.
Once again, we will attempt to model the problem as a linear program. KWOil needs to
decide how much oil to purchase at the beginning of each of the four months. We therefore
introduce decision variables p
i
for i 1, 2, 3, 4 denoting the number of litres of oil purchased
at the beginning of month i for i 1, 2, 3, 4. We also introduce variables t
i
for each i
2, 3, 4 to denote the number of litres of heating oil in the companys tank at the beginning
of month i.
Objective function. Given the variables dened above it is straightforward to write down
the cost incurred by KWOil. The objective function of KWOils problem is
min .75p
1
+.72p
2
+.92p
3
+.90p
4
. (1.5)
Constraints. What constraints does KWOil face, or put differently, when is p
i
i
and
t
i
i
a feasible oil purchasing pattern? Well, in each month i, the company needs to have
enough heating oil available to satisfy customer demand. The amount of available oil at the
beginning of month 1, for example, is comprised of two parts: the p
1
litres of oil purchased
in month 1, and the t
1
litres contained in the tank. The sum of these two quantities needs to
cover the demand in month 1, and the excess is stored in the local tank. Hence, we obtain the
following constraint:
p
1
+t
1
= 5000+t
2
. (1.6)
We obtain similar constraints for months 2 and 3:
p
2
+t
2
= 8000+t
3
, (1.7)
p
3
+t
3
= 9000+t
4
. (1.8)
Finally, in order to satisfy the demand in month 4 we need to satisfy the constraint
p
4
+t
4
6000. (1.9)
Notice that each of the variables t
i
for i 2, 3, 4 appears in two of the constraints (1.6)-
(1.9). The constraints are therefore linked by the t-variables. Such linkage is a typical feature
in multiperiod models.
We now obtain the entire LP for the KWOil problem by combining (1.5)-(1.9), and by
adding upper bounds and initialization constraints for the tank contents, as well as non-
negativity constraints:
min .75p
1
+.72p
2
+.92p
3
+.90p
4
s.t. p
1
+t
1
= 5000+t
2
p
2
+t
2
= 8000+t
3
p
3
+t
3
= 9000+t
4
p
4
+t
4
6000
t
1
= 2000
0 t
i
4000 i 2, 3, 4
p
i
0 i 1, 2, 3, 4.
Solving this LP yields
p
1
= 3000, p
2
= 12000, p
3
= 5000, p
4
= 6000,
t
1
= 2000, t
2
= 0, t
3
= 4000, t
4
= 0,
corresponding to a total purchasing cost of $20890. Not surprisingly, this solution suggests
to take advantage of the low oil prices in month 2 while no oil should be stored in month 3
where the prices are higher.
1.2 Integer Programming
In the production planning example in Section 1.1.1 we saw that solutions to linear programs
are in general not guaranteed to be integer-valued. It is not hard to imagine applications for
which fractional variable values are not desirable. It may, for example, not be possible to
produce a fraction of a product. In this case, we would like to nd an optimal solution that is
integer-valued. In general, we obtain an integer program from a linear program by requiring
any subset of its variables to be integer-valued. For example, requiring all variables in the
production example to be integer-valued yields the following integer program:
max 300x
1
+260x
2
+220x
3
+180x
4
8 y
s
6 y
u
s.t. 11x
1
+7x
2
+6x
3
+5x
4
700
4x
1
+6x
2
+5x
3
+4x
4
500
8x
1
+5x
2
+5x
3
+6x
4
y
s
7x
1
+8x
2
+7x
3
+4x
4
y
u
y
s
600
y
u
650
x
1
, x
2
, x
3
, x
4
, y
u
, y
s
0
x
1
, x
2
, x
3
, x
4
, y
u
, y
s
integer.
1.2. INTEGER PROGRAMMING 14
In Chapter 5 we will discuss algorithms for solving integer programs. Here, we discuss
two examples for integer programs of particular importance.
1.2.1 Maximum Weight Matching
One frequently occurring type of problem in practice is the so called assignment or matching
problem. Consider the following example arising in the subdivision of the wireless frequency
spectrum. Imagine a government, say, that of Canada that owns 4 licenses, L
1
, L
2
, L
3
and L
4
for disjoint portions of the wireless frequency spectrum, and suppose that there are four large
companies, C
1
, C
2
, C
3
and C
4
that are interested in purchasing a license. Assume, slightly
unrealistically, that the government knows the maximumamount v
i j
that companyC
i
is willing
to pay for license L
j
for each pair i, j 1, 2, 3, 4; v
i j
is sometimes called the valuation of
company C
i
for license L
j
. Valuations (in millions of dollars) for the four companies and
licenses are given in the following table.
Companies
Licenses
L
1
L
2
L
3
L
4
C
1
10 20 10 10
C
2
5 30 10 5
C
3
20 10 10 10
C
4
10 10 20 10
Table 1.1: Valuations (in millions of dollars) for frequency licenses.
The government needs to assign licenses to each of the companies such that
[a] each company obtains at most one license, and
[b] each license is given to at most one company.
Afterwards, each company is charged a license fee proportional to its valuation for the as-
signed license. Having the welfare of its taxpayers in mind, the governments goal is to
maximize the total revenue achieved in the sale of these licenses. Canada therefore wants to
nd an assignment of licenses to companies that maximizes the total valuation.
We can model this problem as an integer program with variables x
i j
for each company-
license pair C
i
and L
j
. We intend variable x
i j
to take on value 1 if license L
j
is assigned to
company C
i
and x
i j
= 0 otherwise. Such variables are called indicator variables.
Objective function. Using these variables, the objective of maximizing the total valuation
is noweasily expressed. Given an assignment x
i j
i, j
, the pair C
i
, L
j
contributes v
i j
x
i j
the total
valuation. We therefore obtain the following objective function.
max

i, j1,2,3,4
v
i j
x
i j
. (1.10)
Constraints. It remains to specify constraints that enforce the above conditions [a] and
[b]. Let us consider condition [a] rst. Given an assignment x
i j
i j
, we easily see that
j1,2,3,4
x
i j
is the number of licenses assigned to company i. We therefore enforce condition [a] by writing
j1,2,3,4
x
i j
1 i 1, 2, 3, 4. (1.11)
Very similarly, we can enforce condition [b] by writing
i1,2,3,4
x
i j
1 j 1, 2, 3, 4. (1.12)
Adding bounds and integrality constraints to (1.10)-(1.12), yields the following integer pro-
gram,
max

i, j1,2,3,4
v
i j
x
i j
s.t.

j1,2,3,4
x
i j
1 i 1, 2, 3, 4
i1,2,3,4
x
i j
1 j 1, 2, 3, 4
0 x
i j
1 i, j 1, 2, 3, 4
x
i j
integer i, j 1, 2, 3, 4. (1.13)
Solving the integer program for the valuations given in Table 1.1 yields the following
solution of value $80 million: x
14
= x
22
= x
31
= x
43
= 1, and x
i j
= 0 otherwise.
Graphs
Assignment problems and their solution can be nicely visualized by graphs. A graph G =
(V, E) is a combinatorial object dened by a set V of vertices, and a set E of pairs of vertices,
the so called edges of G. An edge between vertices u, v V is denoted by uv, and we say that
uv is incident to u and v.
c
1
c
4
c
3
c
2
l
1
l
4
l
3
l
2
10
20
10
10
5
30
10
5
20
10
10
10
10
10
10
20
The graph for the frequency assignment example above
has one vertex for each company and license. Let c
i
be the
vertex for company C
i
, and let l
j
be the vertex for license
L
j
. With this, we have
V =c
1
, c
2
, c
3
, c
4
, l
1
, l
2
, l
3
, l
4
.
We add an edge for each possible assignment: the edge c
i
l
j
corresponds to the assignment of license L
j
to company C
i
.
E =c
i
l
j
: i, j 1, 2, 3, 4
Pictorially, we use lled circles for vertices, and lines for edges. In the picture on the right,
each vertex is labeled with its identier, and each edge is labeled with the valuation for the as-
signment. Thick edges indicate the assignment computed by our integer programming solver.
The graph on the right has another useful property: its vertex set V can be partitioned into
disjoint sets
C =c
1
, c
2
, c
3
, c
4
and L =l
1
, l
2
, l
3
, l
4
,
and each edge e E has one endpoint in C and one in L. Such graphs are called bipartite.
Graphs provide a useful tool for visualizing instances of certain problems (as we have
seen for the assignment problem). More than that, sometimes the fact that a problem can be
modeled as a graph allows us to design particularly efcient algorithms. It turns out that this
is the case for the assignment problem.
A subset of edges M E is called a matching if no two edges in M share an endpoint;
i.e., for each vertex v V, M contains at most one edge that is incident to v. Hence, in our
example, a feasible assignment of licenses to companies (one that satises conditions [a] and
[b]) corresponds to a matching in the graph representing the instances. E.g., the thick edges
in the gure above form a matching corresponding to the assignment found in our solution.
In the maximum-weight matching problem, we are given a graph G = (V, E) and a non-
negative weight w
e
R for each edge e E, and the goal is to nd a matching M in G of
maximum total weight; i.e., we want to nd a matching M that maximizes
w(M) =

eM
w
e
.
In the frequency assignment problem we could dene the weight of edge c
i
l
j
to be the val-
uation v
i j
of company C
i
for license L
j
. The problem of nding a maximum-valuation as-
signment that satises [a] and [b] then reduces to nding a maximum-weight matching in the
corresponding assignment graph for these weights.
We will return to the matching problem later in these notes. In particular, in Chapter 5
we will discuss an efcient algorithm for the maximum-weight matching problem in bipartite
graphs.
1.2.2 Shortest Paths
The shortest path problem is another fundamental optimization problem that most of us en-
counter (and solve) frequently: starting in geographic location A, we wish to travel to location
B. Since we are frugal travellers, we wish to choose a route of minimum total length. It turns
out that we can once again use an integer program to solve this problem.
Columbia Ave
F
i
s
c
h
e
r

H
a
l
l
m
a
n

R
d
W
e
s
t
m
o
u
n
t

R
d
K
i
n
g

S
t A
l
b
e
r
t

S
t
Erb St
Victoria St
U
n
iv
e
rs
ity
A
v
e
s
t j i h
g
f
e
d
c
b
a
k
l
Let us illustrate this approach
using a more concrete example.
Suppose you are visiting the city
of Watertown. You are staying
with a friend who lives close to
the intersection of Fischer Hall-
man Road and Columbia Avenue,
and you are planning on visit-
ing the famous Cannery District which is located close to the intersection of King and Victoria
Streets. The gure on the right shows a graph for relevant parts of the city map. Vertices of
this graph correspond to street intersections, and edges connecting two vertices represent the
street segments connecting the corresponding intersections. The gure shows the two inter-
sections of interest labeled as s and t. We are looking for a route from s to t that (since we are
frugal) visits no vertex more than once. The thick edges in the gure show such a route.
Let us rephrase the problem in the language of graph theory. We are given a graph G =
(V, E) representing the street network. For each edge e E, we also have a non-negative
cost c
e
that equals the length of the street segment corresponding to e. We are looking for a
sequence of edges
P = v
1
v
2
, v
2
v
3
, v
3
v
4
, . . . , v
k2
v
k1
, v
k1
v
k
such that v
i
V and v
i
v
i+1
E for all i. Furthermore, we want v
i
,= v
j
for all i ,= j. Such a
sequence P is called an st-path. Among all such paths we want one with the smallest total
length; i.e., we want to nd an st-path P that minimizes
(c
e
: e is an edge of P.
Our integer program for this problem will have a variable x
e
for each edge e E. We want
x
e
to have value 1 if edge e is part of the output st-path, and x
e
= 0 otherwise. It remains to
characterize the set of 0, 1-vectors x that correspond to st-paths.
Columbia Ave
F
i
s
c
h
e
r

H
a
l
l
m
a
n

R
d
W
e
s
t
m
o
u
n
t

R
d
K
i
n
g

S
t A
l
b
e
r
t

S
t
Erb St
Victoria St
U
n
iv
e
rs
ity
A
v
e
s
t j i h
g
f
e
d
c
b
a
k
l
S
m
We make the following in-
tuitive observation: consider the
set S = s, a, c, f in the gure
on the right. Any st-path must
leave the set S at some point;
i.e., any such path must contain
at least one of the edges in the
set
al, bm, cd, ci, kg, f g (1.14)
(depicted as dashed, thick lines). In fact, this is true more generally. Consider any set S V
with s S and t , s. Then any st-path P must contain an edge e from the set
(S) =uv E : u S, v , S.
Such a set (S) is called an st-cut. For example, (1.14) shows the st-cut (S) for S =
s, a, b, c, f . Indeed, we have the following useful fact whose proof we omit in this course.
Theorem 1. Let G = (V, E) be a graph, and s, t V two vertices. There is an st-path in G if
and only if (S) ,= / 0 for all st-cuts (S).
Thus, if P is an st-path, then P has at least one edge from any st-cut (S). Conversely, if
P E is a set of edges that intersects every st-cut, then P contains an st-path. We therefore
obtain the following integer programming formulation.
min

eE
c
e
x
e
s.t.

(x
e
: e (S)) 1 st-cuts (S)
0 x
e
1 e E
x
e
integer e E. (1.15)
Notice that the number of st-cuts in a graph can be very large; it can be exponential in
[V[! The integer program above therefore possibly has many constraints. Nevertheless we
will revisit this problem in Chapter 5 and present an efcient algorithm to solve it.
1.2.3 Scheduling
Imagine that your to-do list for today has n tasks T = 1, . . . , n, and that each task i T
has a certain start time s
i
, and a time t
i
by which it needs to be completed. Note that task
i has duration t
i
s
i
. For simplicity, let us assume that all start and nish times are integer.
In addition, each task i comes with a non-negative prot p
i
that you obtain if the task is
completed. At each time, you can only work on one task, and once you start a task, you need
to complete it without interrupting it.
The gure below shows an instance of this problem with 7 tasks. The x-axis is annotated
by the hours of your work day, and each of the tasks in your list is represented by a bar. Each
bar is annotated by two values: the number of the corresponding task at the front, and its
reward in the middle of the bar. The problem is now to choose a subset of the tasks of largest
total prot that can be feasibly completed in a day.
Which tasks should we choose? We could, for example choose 1, 3, 5 and 7 for a total prot
of 11, or we could choose tasks 2, 4 and 7, for a prot of 12. Is this the optimal solution?
8am 9am 10am 11am noon 1pm 2pm 3pm 4pm 5pm
2
3
3
4
3
5
6
6pm
1
2
3
4
5
6
7
Figure 1.1: A scheduling example. Each task is labeled by its number and prot.
In order to solve this problem, we model it as an integer linear program. For each task
i 1, . . . , 7 we introduce an indicator variable x
i
that takes on value 1 if we decide to work
on task i, and x
i
= 0 otherwise.
Objective function. For each task i, we obtain prot p
i
if task i is executed; i.e., the prot
we obtain for task i is given by p
i
x
i
, and the objective function is therefore
max 2x
1
+4x
2
+3x
3
+5x
4
+3x
5
+6x
6
+3x
7
. (1.16)
Constraints. A schedule is feasible if and only if at any point in time, at most one job is
executed. For example, at 8am, at most one of the jobs 1 and 2 can be executed. This can be
expressed by the constraint
x
1
+x
2
1. (8am)
In a similar fashion, we obtain constraints for each of the remaining times of the day.
x
1
+x
2
1 (9am)
x
2
+x
3
1 (10am)
x
3
1 (11am)
x
3
+x
4
1 (noon)
x
4
+x
5
+x
6
1 (1pm)
x
5
+x
6
1 (2pm)
x
5
+x
6
1 (3pm)
x
6
+x
7
1 (4pm)
x
7
1. (5pm)
Clearly, some of the constraints in the above list are redundant. E.g., constraints (8am) and
(9am) are identical, and so are (2pm) and (3pm). One constraint in each of these two pairs
can clearly be omitted in a formulation. Similarly, constraint (11am) is implied by constraint
(10am) (why?), and can also be dropped. The same reasoning shows that (5pm) is implied
by (4pm). Finally, we add bound and integrality constraints and obtain the following integer
program:
max 2x
1
+4x
2
+3x
3
+5x
4
+3x
5
+6x
6
+3x
7
s.t. x
1
+x
2
1
x
2
+x
3
1
x
3
+x
4
1
x
4
+x
5
+x
6
1
x
5
+x
6
1
x
6
+x
7
1
0 x
i
1 i 1, . . . , 7
x
i
integer i 1, . . . , 7. (1.17)
1.3. NONLINEAR PROGRAMMING 22
Solving this integer program conrms that tasks 2, 4 and 7 form an optimal schedule.
1.3 Nonlinear Programming
So far, we have studied problems in which the objective function as well as the left-hand
sides of all constraints were linear functions of the decision variables. While many practical
problems have a linear formulation, many others do not. If we allow the objective function or
the left-hand sides of constraints to be non-linear functions, we obtain a non-linear program
(NLP). In this section, we will see two examples for NLPs.
1.3.1 Pricing a DVD Player
Company Dibson is a local producer of DVD players. Its newest model, the Dibson BR-1 will
be sold in three regions: 1, 2, and 3. Dibson is considering a price between $50 and $70 for
the device, but is uncertain as to which exact price it should choose. Naturally, the company
wants to maximize its revenue, and that depends on the demand. The demand on the other
hand is a function of the price. Dibson has done a considerable amount of market research,
and has reliable estimates for the demand for the new product for three different prices in each
of the three regions. The following table summarizes this.
Price ($)
Demand
Region 1 Region 2 Region 3
50 130 90 210
60 125 80 190
70 80 20 140
The company wants to model the demand in region i as a function d
i
(p) of the units price.
Dibson believes that the demand within a region can be modelled by a quadratic function, and
it uses the data from the above table to obtain the following quadratic demand functions:
d
1
(p) = 0.2p
2
+21.5p445
d
2
(p) = 0.25p
2
+26.5p610
d
3
(p) = 0.15p
2
+14.5p140.
The gure on the right shows
the demand functions for the three
regions. Which price p [50, 70]
should Dibson choose in order to
maximize its revenue?
We can nd an answer to this
question by solving a non-linear pro-
gram. Our NLP has only one variable,
p, the price of the unit.
Objective function. Given p, we can use the given demand functions to determine the
demand in each of the three regions. As Dibson wants to maximize its revenue, we have the
following objective function:
max p
3
i=1
d
i
(p) = max p [(0.2p
2
+21.5p445)+ (1.18)
(0.25p
2
+26.5p610)+
(0.15p
2
+14.5p140)].
Adding the simple bound constraints for p, we obtain the following NLP:
max p
3
i=1
d
i
(p)
s.t. 50 p 70.
Using a non-linear programming solver, we nd that a price of p = 57.9976 maximizes
Dibsons revenue. The revenue at this price is $23872.80024.
1.3.2 Portfolio Optimization
Another important application of non-linear programming is portfolio optimization. In such
a problem, one usually has a xed amount of capital that one wishes to invest into several
investment options. The usual goal is to maximize the return from this investment while
ensuring that the risk of the constructed portfolio is small. Usually, stocks that have the
highest potential returns are the most volatile, and hence the most risky stocks to invest in at
the same time. The risk-averse investor will therefore have to strike a balance between return
and risk. One way to achieve this balance is to attempt to minimize the overall volatility while
guaranteeing a minimum expected return.
More concretely, imagine you had $500 available for investment into three different stocks
1, 2, and 3. In the following, we let S
i
be the random variable for the annual return on $1
invested into stock i. Assume that we know the expected annual return and its variance for
each of the given stocks. The following table shows expected values and variances for the
given variables.
i E[S
i
] var[S
i
]
1 .1 .2
2 .13 .1
3 .08 .15
In addition to the information above, we are also given the following covariances:
cov(S
1
, S
2
) = .03, cov(S
1
, S
3
) = .04, and cov(S
2
, S
3
) = .01.
The goal is now to invest the given $500 into the three stocks such that the expected return
is at least $50, and such that the total variance of our investment is minimized. The problem
can be cast as a quadratic optimization problem. For this, we introduce a variable x
i
denoting
the amount of money invested into stock i for all i 1, 2, 3.
Objective function. The goal is to minimize the total variance
var
_
3
i=1
S
i
x
i
_
(1.19)
of the return of the investment. Using standard formul from statistics, we see that (1.19) can
be rewritten as
_
3
i=1
var(S
i
)x
2
i
_
+2cov(S
1
, S
2
)x
1
x
2
+2cov(S
1
, S
3
)x
1
x
3
+2cov(S
2
, S
3
)x
2
x
3
, (1.20)
and minimizing this expression is the objective function of our non-linear program.
Constraints. The constraints are once again straightforward. An investment strategy
given by x
1
, x
2
, and x
3
is feasible, if a) exactly $500 are invested, and b) the expected return
of investment is at least $50. These two constraints can be written as follows
3
i=1
x
i
= 500, (1.21)
3
i=1
E[S
i
]x
i
50. (1.22)
We can now combine (1.20)-(1.22) into the following non-linear program:
min .2x
2
1
+.1x
2
2
+.15x
2
3
+.06x
1
x
2
+.08x
1
x
3
+.02x
2
x
3
s.t. x
1
+x
2
+x
3
500
.1x
1
+.13x
2
+.08x
3
50
x
1
, x
2
, x
3
0.
Solving this non-linear programs yields the following investment plan
x
1
= 78.54, x
2
= 259.96, and x
3
= 161.5,
with total variance 14983.41.
1.3.3 Finding a closest point feasible in an LP
Suppose we are given x R
n
as well as an m-by-n matrix A and b R
m
. We are asked to nd
a point in the set
x R
n
: Ax b
that is closest to x. This problem can be formulated as
min |x x|
subject to
Ax b
Recall that |x| :=
_
x
2
1
+x
2
2
+ +x
2
n
and even though the above problem is equivalent to
min |x x|
2
subject to
Ax b
the objective function is still nonlinear.
1.3.4 Finding a central feasible solution of an LP
Suppose we are given an m-by-n matrix A and b R
m
. We are interested in the solutions x
satisfying
Ax b.
However, we are worried about violating constraints and as a result, we want a solution that is
as far away as possible from getting close to violating any of the constraints. Then one option
is to solve the optimization problem
max
m
i=1
ln
_
b
i
j=1
A
i j
x
j
_
subject to
Ax b.
Another option is to solve the optimization problem
min
m
i=1
1
b
i
n
j=1
A
i j
x
j
subject to
Ax b
Indeed, we can pick any, suitable function which is nite valued and monotone decreasing
for positive arguments, and tends to innity as its argument goes to zero with positive values.
Note that the functions used above as building blocks f
1
: R R, f
1
(x) :=
1
x
and f
2
: R R,
f
2
(x) :=ln(x) both have these desired properties.
Further note that the objective functions of both of these problems are nonlinear. There-
fore, they are both nonlinear optimization problems.
1.4 Overview of the course
In this chapter we have seen a number of examples of optimization problems arising in prac-
tice. In each of these cases, we have developed a concise mathematical model that captures
the respective problem. Clearly, modeling a problem in mathematical terms is nice and the-
oretically satisfying but it will not by itself satisfy a practitioner who is after all interested in
solutions! For each of the given examples in this chapter, we also provided a solution. But
how do we nd these solutions? The techniques to solve these problems depend on the type
of model considered. The simplest of these models is linear programming and an algorithm
to solve linear programs is the Simplex algorithm that we describe in Chapter 2. In Chapter 3
we try to address the the following question: What is a good/efcient algorithm? In particular
we will see that, unlike the case for linear programs, there is little hope to nd and algorithm
that is guaranteed to be efcient for general integer programs. A central theme in optimiza-
tion is the notion of duality. We explore this topic in the context of linear programming in
Chapter 4. Heuristics for solving general integer programs are presented in Chapter 5. We
will also develop an efcient algorithm for solving the maximum weight matching problem
that is based on the theory of duality. Chapter 6 interprets much of the material dened in the
previous chapters through the lens of geometry. Finally, in Chapter 7, we explain how some
of the concepts introduced for linear programs extend to non-linear programs.
1.5. FURTHER READING AND NOTES 28
1.5 Further reading and notes
We just saw some examples of modeling a given problem as an optimization problem. An
important part of the mathematical modeling process is to prove the correctness of the mathe-
matical model constructed. In this course, typically, our starting point will be a well-described
statement of the problem with clearly stated data. Then, we take this description of the prob-
lem, dene variables, the objective function and the constraints. Once the mathematical model
is constructed, we prove that the mathematical model we constructed is correct, i.e., it exactly
represents the given problem statement. For many examples of such mathematical models, see
the book [21].
In real applications, we have to go through a lot of preparation to arrive at the clear
description of the problem and the data. In some applications (actually in almost all applica-
tions), some of the data will be uncertain. There are more advanced tools to deal with such
situations (see, for instance, the literature on Robust Optimization starting with the book [1]).
Many of the subjects that we introduced in this chapter have a whole course dedicated
to them. For example, the portfolio optimization example is an instance of Markowitz model.
It was proposed by Harry Markowitz in the 1950s. For his work in the area, Markowitz
received the Nobel Prize in Economics in 1990. For applications of optimization in nancial
mathematics, see the books [2, 6]. For further information on scheduling, see [15].
Chapter 2
Solving linear programs
2.1 Possible outcomes
Consider a linear program (P) with variables x
1
, . . . , x
n
. An assignment of values to each of
x
1
, . . . , x
n
is a feasible solution if the constraints of (P) are satised. We can view a feasible
solution to (P) as a vector x = (x
1
, . . . , x
n
)
T
. Given a vector x, by the value of x we mean the
value of the objective function of (P) for x. Suppose (P) is a maximization problem. Then a
vector x is an optimal solution if it is a feasible solution and no feasible solution has larger
value. The value of the optimal solution is the optimal value. By denition, a linear program
has only one optimal value, however, it may have many optimal solutions. When solving
a linear program, we will be satised with nding any optimal solution. Suppose (P) is a
minimization problem. Then a vector x is an optimal solution if it is a feasible solution and
no feasible solution has smaller value.
If a linear program (P) has a feasible solution then it is said to be feasible, otherwise it
is infeasible. Suppose (P) is a maximization problem and for every real number , there is
a feasible solution to (P) which has value greater than , then we say that (P) is unbounded.
In other words, (P) is unbounded if we can nd feasible solutions of arbitrarily high value.
Suppose (P) is a minimization problem and for every real number there is a feasible solution
to (P) which has value smaller than , then we say that (P) is unbounded. Unbounded linear
29
2.1. POSSIBLE OUTCOMES 30
programs are easy to construct. Try to construct one yourself (hint: you can do this for a linear
program with a single variable).
We have identied three possible outcomes for a linear program (P) namely,
1. it is infeasible,
2. it has an optimal solution,
3. it is unbounded.
Clearly, each of these outcomes are mutually exclusive (i.e. no two can occur at the same
time). We will show that in fact exactly one of these outcomes must occur (this is a form of
the Fundamental Theorem of Linear Programming which will be established at the end of this
chapter). We will now illustrate each of these outcomes with a different example.
2.1.1 Infeasible linear programs
If we are interested in knowing whether a linear program (P) is infeasible, it sufces to con-
sider the constraints, as it does not depend on the objective function. Suppose that the con-
straints of (P) are as follows
4x
1
+10x
2
6x
3
2x
4
= 6 (2.1)
2x
1
+ 2x
2
4x
3
+ x
4
= 5 (2.2)
7x
1
2x
2
+4x
4
= 3 (2.3)
x
1
, x
2
, x
3
, x
4
0. (2.4)
Your boss, asked you to solve the linear program(P). After some effort you convinced yourself
that (P) is in fact infeasible. In an ideal world your boss might just take your word for it, but
she expects you to back up your claim by providing a proof. You certainly could not claim to
have tried all possible sets values for x
1
, x
2
, x
3
, x
4
and checked that none satisfy all of (2.1)-
(2.4) as there are an innite number of possibilities.
CHAPTER 2. SOLVING LINEAR PROGRAMS 31
Here is a way of convincing your boss that there is no solution to (2.1)-(2.4). The rst step
is to create a new equation by combining equations (2.1),(2.2) and (2.3). We pick some values
y
1
, y
2
, y
3
. Then we multiply equation (2.1) by y
1
, equation (2.2) by y
2
and equation (2.3) by
y
3
and add each of the resulting equations together. If we choose the values y
1
= 1, y
2
= 2
and y
3
= 1 we obtain the equation,
x
1
+4x
2
+2x
3
=1. (2.5)
Let us proceed by contradiction and suppose that there is in fact a solution x
1
, x
2
, x
3
, x
4
to (2.1)-
(2.4). Then clearly x
1
, x
2
, x
3
, x
4
must satisfy (2.5) as it satises each of (2.1), (2.2) and (2.3).
As x
1
, x
2
, x
3
, x
4
are all non-negative, it follows that x
1
+4 x
2
+2 x
3
0. But this contradicts
constraint (2.5). Hence, our hypothesis that there was in fact a solution to (2.1)-(2.4) must be
false.
The vector y = (1, 2, 1)
T
is the kind of proof that should satisfy your boss as she can
easily check there is no solution with the help of this vector. Of course, this proof will only
work for an appropriate choice of y and we have not told you how to nd such a vector at this
juncture. We will derive an algorithm that either nds a feasible solution, or nds a vector y
that proves that no feasible solution exists.
We wish to generalize this argument, but before this can be achieved, we need to become
comfortable with matrix notation. Equations (2.1)-(2.3) can be expressed as:
Ax = b
where
A =
_
_
_
4 10 6 2
2 2 4 1
7 2 0 4
_
_
_
b =
_
_
_
6
5
3
_
_
_
x =
_
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
_
. (2.6)
Then Ax is a vector with three components. Components 1,2 and 3 of the vector Ax cor-
respond to respectively the left hand side of equations (2.1),(2.2) and (2.3). For a vector
y = (y
1
, y
2
, y
3
)
T
, y
T
(Ax) is the scalar product of y and Ax and it consists of multiplying the
left hand side of equation (2.1) by y
1
, the left hand side of equation (2.2) by y
2
, the left hand
side of equation (2.3) by y
3
and adding each of resulting expressions. Also, y
T
b is the scalar
product of y and b and it consists of multiplying the right hand side of equation (2.1) by y
1
the right hand side of equation (2.2) by y
2
and the right hand side of equation (2.3) by y
3
and
adding each of the resulting values. Thus
y
T
(Ax) = y
T
b
is the equation obtained by multiplying equation (2.1) by y
1
, (2.2) by y
2
, (2.3) by y
3
, and
by adding each of the resulting equations together. Note that in the previous relation we may
omit the parenthesis because of associativity, i.e. y
T
(Ax) = (y
T
A)x. For instance, if we choose
y
1
= 1, y
2
=2 and y
3
= 1 then we obtain,
(1, 2, 1)
_
_
_
4 10 6 2
2 2 4 1
7 2 0 4
_
_
_
x = (1, 2, 1)
_
_
_
6
5
3
_
_
_
,
and after simplifying,
(1, 4, 2, 0)(x
1
, x
2
, x
3
, x
4
)
T
=1
which is equation (2.5). We then observed that all the coefcients in the left hand side of (2.5)
are non-negative, i.e. that (1, 4, 2, 0) 0
T
or equivalently that y
T
A 0
T
. We also observed
that the right hand side of (2.5) is negative or equivalently that y
T
b < 0. (Note, 0 denotes the
number zero and 0 the column vector whose entries are all zero.) These two facts implied that
there is no solution to (2.1)-(2.4), i.e. that Ax = b, x 0 has no solution.
Let us generalize this argument to an arbitrary matrix A and vector b. We will assume that
the matrices and vectors have appropriate dimensions so that the matrix relations make sense.
This remark holds for all subsequent statements.
Proposition 2. Let A be a matrix and b be a vector. Then the system
Ax = b x 0
has no solution if there exists a vector y such that,
1. y
T
A 0
T
and
2. y
T
b < 0.
Note if A has m rows then the vector y must have m components. Then the matrix equation
y
T
Ax = y
T
b is obtained by multiplying for every i 1, . . . , m row i of A by y
i
and adding all
the resulting equations together.
Proof of Proposition 2. Let us proceed by contradiction and suppose that there exists a solu-
tion x to Ax = b, x 0 and that we can nd y such that y
T
A 0
T
and y
T
b < 0. Since A x = b
is satised, we must also satisfy y
T
A x = y
T
b. Since y
T
A 0
T
and x 0, it follows that
y
T
A x 0. Then 0 y
T
A x = y
T
b < 0, a contradiction.
We call a vector y which satises conditions (1) and (2) of Proposition 2 a certicate of
infeasibility. To convince your boss that a particular system Ax = b, x 0 has no solution,
it sufces to exhibit a certicate of infeasibility. Note, while we have argued that such a
certicate will be sufcient to prove infeasibility, it is not clear at all that for every infeasible
system there exists a certicate of infeasibility. The fact that this is indeed so, is a deep result
which is known as Farkas Lemma.
2.1.2 Linear programs with optimal solutions
Consider the linear program,
maxz(x) = c
T
x : Ax = b, x 0
where
A =
_
_
_
4 2 1 1 0
3 9 2 1 1
5 5 1 0 3
_
_
_
b =
_
_
_
2
7
7
_
_
_
c =
_
_
_
_
_
_
_
_
8
14
1
4
1
_
_
_
_
_
_
_
_
x =
_
_
_
_
_
_
_
_
x
1
x
2
x
3
x
4
x
5
_
_
_
_
_
_
_
_
. (2.7)
Suppose you are being told that (0, 0, 1, 3, 2)
T
is an optimal solution to (2.7). Again you have
to report to your boss and convince her that this solution is indeed optimal. It is easy to verify
that (0, 0, 1, 3, 2)
T
is feasible and that the corresponding value is 11. However, you cannot
claim to have tried every possible feasible solution and veried that in every case the value is
less than or equal to 11 as there are an innite number of feasible solutions.
To convince your boss, we will prove that z( x) 11 for every feasible solution x. In other
words, no feasible solution has value that exceeds 11. Since (0, 0, 1, 3, 2)
T
has value equal to
11 it must be optimal. We seemingly traded one problem for another however. How can we
show that, z( x) 11 for every feasible solution x, or more generally show for some number
that z( x) for every feasible solution x ?
To do this, we pick values y
1
, y
2
and y
3
and create a new equation which is obtained by
multiplying the rst constraint of Ax = b by y
1
, the second constraint by y
2
, and the third by
y
3
and by adding each of the resulting equations. This can be expressed as y
T
Ax = y
T
b or
equivalently as,
(y
1
, y
2
, y
3
)
_
_
_
4 2 1 1 0
3 9 2 1 1
5 5 1 0 3
_
_
_
x = (y
1
, y
2
, y
3
)
_
_
_
2
7
7
_
_
_
. (2.8)
For instance if we choose values, y
1
=1, y
2
= 3 and y
3
= 0 we obtain the equation,
(13, 29, 5, 4, 3)x = 23
or equivalently,
0 = 23(13, 29, 5, 4, 3)x.
This equation holds for every feasible solution of (2.7). Thus adding this equation to the
objective function z(x) = (8, 14, 1, 4, 1)x will not change the value of the objective function
for any of the feasible solutions. The resulting objective function is
z(x) = (8, 14, 1, 4, 1)x +23(13, 29, 5, 4, 3)x = 23+(5, 15, 4, 0, 4)x.
Let x be any feasible solution. As x 0 and (5, 15, 4, 0, 4) 0
T
we have (5, 15,
4, 0, 4) x 0. Hence, z( x) 23. Thus, we have proved that no feasible solution has value
larger than 23. Note, this is not quite sufcient to prove that (0, 0, 1, 3, 2)
T
is optimal. It shows
however, that the optimum value for (2.7) is between 11 and 23. It is at least 11 as we have a
feasible solution with that value and it cannot exceed 23 by the previous argument. We let the
students verify that by choosing the vector y = (2, 2, 1)
T
in (2.8) and proceeding as above
we get an upper bound of 11, therefore proving that(0, 0, 1, 3, 2)
T
is indeed optimal.
We will derive an algorithm that, given a linear program for which an optimal solution
exists, will nd a feasible solution x with value and a vector y that proves that no feasible
solution has value greater than . Such a pair x and y would satisfy your boss.
Let us generalize the previous argument. Let A be a matrix with m rows and consider the
following linear program,
maxz(x) = c
T
x : Ax = b, x 0. (P)
We want to show that for some value , z( x) for every feasible solution x. We proceed
as in the numerical example. We rst choose a vector y = (y
1
, y
2
, . . . , y
m
)
T
and create a new
equation,
y
T
Ax = y
T
b.
This last equation is obtained from Ax = b by multiplying the 1st equation by y
1
the 2nd by
y
2
the 3rd by y
3
etc, and by adding all of the resulting equations together. This equation can
be rewritten as
0 = y
T
by
T
Ax
which holds for every feasible solution x of (P). Thus, adding this equation to the objective
function z(x) = c
T
x will not change the value of the objective function for any of the feasible
solutions. The resulting objective function is
z(x) = y
T
b+c
T
x y
T
Ax = y
T
b+(c
T
y
T
A)x. (2.9)
Suppose that because of the choice of y, c
T
y
T
A 0
T
. Let x be any feasible solution. As
x 0 we have that (c
T
y
T
A) x 0. It then follows by (2.9) that z( x) y
T
b. Thus, we
have shown that for all y such that c
T
y
T
A 0
T
the value y
T
b is an upper bound on the
value of the objective function. Finally, note that the condition c
T
y
T
A 0
T
is equivalent to
y
T
A c
T
.
This previous argument proves the following result.
Proposition 3. For any vector y such that y
T
A c
T
, the value y
T
b is an upper bound on the
value of the objective function for any feasible solution of (P). In particular, if x is a feasible
solution of (2.8) and c
T
x = y
T
b then x is an optimal solution of (P).
Given an optimal solution x of (P), we call a vector y which satises y
T
A c
T
and c
T
x =
y
T
b a certicate of optimality. To convince your boss that x is an optimal solution, it sufces
to exhibit a certicate of optimality. While we have argued that such a certicate will be
sufcient to prove optimality, it is not clear at all that for every linear program with an optimal
solution there exists such a certicate. The fact that this is so, is a deep result which is known
as the duality theorem.
2.1.3 Unbounded linear programs
Consider the linear program
maxz(x) = c
T
x : Ax = b, x 0
where
A =
_
_
_
1 1 3 1 2
0 1 2 2 2
2 1 4 1 0
_
_
_
b =
_
_
_
7
2
3
_
_
_
c =
_
_
_
_
_
_
_
_
1
0
3
7
1
_
_
_
_
_
_
_
_
x =
_
_
_
_
_
_
_
_
x
1
x
2
x
3
x
4
x
5
_
_
_
_
_
_
_
_
. (2.10)
This linear program is unbounded. How can you convince your boss of this fact? We will
dene a family of feasible solutions x(t) for all real numbers t 0 and show that as t tends to
innity, so does the value of x(t). This will show that (2.10) is indeed unbounded.
We dene for every t 0,
x(t) = x +td
where
x = (2, 0, 0, 1, 2)
T
and d = (1, 2, 1, 0, 0)
T
.
For instance when t = 0 then x(t) = (2, 0, 0, 1, 1)
T
and when t = 2, x(t) = (4, 4, 2, 1, 2)
T
.
We claim that for every t 0, x(t) is feasible for (2.10). Let us rst check that the equations
Ax = b hold for every x(t). You can verify that A x = b and that Ad = 0. Then we have,
Ax(t) = A( x +td) = A x +A(td) = A x
..
=b
+t Ad
..
=0
= b
as required. We also need to verify that x(t) 0 for every t 0. Note that x, d 0 hence,
x(t) = x +td td 0, as required. Let us investigate what happens to the objective function
for x(t) as t increases. Observe that c
T
d = 2 > 0 then
c
T
x(t) = c
T
( x +td) = c
T
x +c
T
(td) = c
T
x +tc
T
d = c
T
x +2t,
and as t tends to innity so does c
T
x(t). Hence, we have proved that the linear program is in
fact unbounded.
Given x and d, your boss can easily verify that the linear program is unbounded. We have
not told you how to nd such a pair of vectors x and d. We will want an algorithm that detects
if a linear program is unbounded and when it is, provides us with the vectors x and d.
Let us generalize the previous argument. Let A be an mn matrix, b a vector with m
components and c a vector with n components. The vector of variables x has n components.
Consider the linear program
maxc
T
x : Ax = b, x 0. (P)
We leave the proof of the following proposition to the reader, as the argument is essentially
the one which we outlined in the above example.
Proposition 4. Suppose there exists a feasible solution x and a vector d such that,
1. Ad = 0,
2. d 0,
3. c
T
d > 0.
Then (P) is unbounded.
We call a pair of vectors x, d as in the previous proposition a certicate of unboundedness. We
will show that there exists a certicate of unboundedness for every unbounded linear program.
2.2. STANDARD EQUALITY FORM 38
2.2 Standard equality form
A linear program is said to be in Standard Equality Form (SEF) if it is of the form:
maxc
T
x + z : Ax = b, x 0.
where z denotes some constant. In other words, a linear program is in SEF, if it satises the
following conditions:
1. it is a maximization problem,
2. other than the non-negativity constraints, all constraints are equations,
3. every variable has a non-negativity constraint.
Here is an example,
max (1, 2, 4, 4, 0, 0)x +3
subject to
_
_
_
1 5 3 3 0 1
2 1 2 2 1 0
1 2 1 1 0 0
_
_
_
x =
_
_
_
5
4
2
_
_
_
x 0.
(2.11)
We will develop an algorithm that given a linear program (P) in SEF will either: prove that
(P) is infeasible by exhibiting a certicate of infeasibility, or prove that (P) is unbounded by
exhibiting a certicate of unboundedness, or nd an optimal solution and showthat it is indeed
optimal by exhibiting a certicate of optimality.
However, not every linear program is in SEF. Given a linear program (P) which is not in
SEF, we wish to convert (P) into a linear program (P) in SEF and apply the algorithm to
(P) instead. Of course we want the answer for (P) to give us some meaningful answer for
(P). More precisely what we wish is for (P) and (P) to satisfy the following relationships:
1. (P) is infeasible if and only if (P) is infeasible,
2. (P) is unbounded if and only if (P) is unbounded,
3. given any optimal solution of (P) we can construct an optimal solution of (P).
Linear programs that satisfy the relationships (1),(2) and (3) are said to be equivalent.
We now illustrate on an example how to convert an arbitrary linear program into an equiv-
alent linear program in SEF. Note, we will leave it to the reader to verify that at each step we
do indeed get an equivalent linear program:
min (1, 2, 4)(x
1
, x
2
, x
3
)
T
subject to
_
_
_
1 5 3
2 1 2
1 2 1
_
_
_
_
_
_
x
1
x
2
x
3
_
_
_
=
_
_
_
5
4
2
_
_
_
x
1
, x
2
0.
(2.12)
The notation used here means that , and = refer to the 1st, 2nd, and 3rd constraints respec-
tively. We want to convert (2.12) into a linear program in SEF. We will proceed step by step.
First note that (2.12) is a minimization problem. We can replace min(1, 2, 4)(x
1
, x
2
, x
3
)
T
by max(1, 2, 4)(x
1
, x
2
, x
3
)
T
, or more generally minc
T
x by maxc
T
x. The resulting linear
program is as follows,
max (1, 2, 4)(x
1
, x
2
, x
3
)
T
subject to
_
_
_
1 5 3
2 1 2
1 2 1
_
_
_
_
_
_
x
1
x
2
x
3
_
_
_
=
_
_
_
5
4
2
_
_
_
x
1
, x
2
0.
(2.13)
We do not have the condition that x
3
0 as part of the formulation in (2.13). We call such
a variable free. We might be tempted to simply add the constraint x
3
0 to the formulation.
However, by doing so we may change the optimal solution as it is possible for instance that all
optimal solutions to (2.13) satisfy x
3
< 0. The idea here is to express x
3
as the difference of
2.2. STANDARD EQUALITY FORM 40
two non-negative variables, say x
3
= x
+
3
x
3
where x
+
3
, x
3
0. Let us rewrite the objective
function with these new variables,
(1, 2, 4)(x
1
, x
2
, x
3
)
T
= x
1
2x
2
+4x
3
= x
1
2x
2
+4(x
+
3
x
3
)
= x
1
2x
2
+4x
+
3
4x
3
= (1, 2, 4, 4)(x
1
, x
2
, x
+
3
, x
3
)
T
.
Let us rewrite the left hand side of the constraints with these new variables,
_
_
_
1 5 3
2 1 2
1 2 1
_
_
_
_
_
_
x
1
x
2
x
3
_
_
_
= x
1
_
_
_
1
2
1
_
_
_
+x
2
_
_
_
5
1
2
_
_
_
+x
3
_
_
_
3
2
1
_
_
_
= x
1
_
_
_
1
2
1
_
_
_
+x
2
_
_
_
5
1
2
_
_
_
+(x
+
3
x
3
)
_
_
_
3
2
1
_
_
_
= x
1
_
_
_
1
2
1
_
_
_
+x
2
_
_
_
5
1
2
_
_
_
+x
+
3
_
_
_
3
2
1
_
_
_
+x
3
_
_
_
3
2
1
_
_
_
=
_
_
_
1 5 3 3
2 1 2 2
1 2 1 1
_
_
_
_
_
_
_
_
x
1
x
2
x
+
3
x
3
_
_
_
_
_
.
In general, for a linear program with variables x = (x
1
, . . . , x
n
)
T
if we have a variable x
i
where
x
i
0 is not part of the formulation (i.e. a free variable), we introduce variables x
+
i
, x
i
0
and dene x
/
= (x
1
, . . . , x
i1
, x
+
i
, x
i
, x
i+1
, . . . , x
n
)
T
. Replace the objective function c
T
x by c
/T
x
/
where c
/
= (c
1
, . . . , c
i1
, c
i
, c
i
, c
i+1
. . . , c
n
)
T
. If the left hand side of the constraints is of the
form Ax where A is a matrix with columns A
1
, . . . , A
n
replace Ax by A
/
x
/
where A
/
is the matrix
which consists of columns A
1
, . . . , A
i1
, A
i
, A
i
, A
i+1
, . . . , A
n
.
The new linear program is as follows,
max (1, 2, 4, 4)(x
1
, x
2
, x
+
3
, x
3
)
T
subject to
_
_
_
1 5 3 3
2 1 2 2
1 2 1 1
_
_
_
_
_
_
_
_
x
1
x
2
x
+
3
x
3
_
_
_
_
_
=
_
_
_
5
4
2
_
_
_
x
1
, x
2
, x
+
3
, x
3
0.
(2.14)
Let us replace the constraint 2x
1
x
2
+2x
+
3
2x
3
4 in (2.14) by an equality constraint. We
introduce a new variable x
4
where x
4
0 and we rewrite the constraint as 2x
1
x
2
+2x
+
3

2x
3
+x
4
= 4. The variable x
4
is called a slack variable. More generally, given a constraint of
the form
n
i=1
a
i
x
i
we can replace it by
n
i=1
a
i
x
i
+x
n+1
= , where x
n+1
0.
The resulting linear program is as follows,
max (1, 2, 4, 4, 0)(x
1
, x
2
, x
+
3
, x
3
, x
4
)
T
subject to
_
_
_
1 5 3 3 0
2 1 2 2 1
1 2 1 1 0
_
_
_
_
_
_
_
_
_
_
_
x
1
x
2
x
+
3
x
3
x
4
_
_
_
_
_
_
_
_
=
=
_
_
_
5
4
2
_
_
_
x
1
, x
2
, x
+
3
, x
3
, x
4
0.
(2.15)
Let us replace the constraint x
1
+5x
2
+3x
+
3
3x
3
5 in (2.15) by an equality constraint.
We introduce a new variable x
5
, where x
5
0 and we rewrite the constraint as x
1
+5x
2
+
3x
+
3
3x
3
x
5
= 5. The variable x
5
is also called a slack variable. More generally, given a
constraint of the form
n
i=1
a
i
x
i
we can replace it by
n
i=1
a
i
x
i
x
n+1
= where x
n+1
0.
2.3. A SIMPLEX ITERATION 42
The resulting linear program is as follows,
max (1, 2, 4, 4, 0, 0)(x
1
, x
2
, x
+
3
, x
3
, x
4
, x
5
)
T
subject to
_
_
_
1 5 3 3 0 1
2 1 2 2 1 0
1 2 1 1 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
x
1
x
2
x
+
3
x
3
x
4
x
5
_
_
_
_
_
_
_
_
_
_
=
_
_
_
5
4
2
_
_
_
x
1
, x
2
, x
+
3
, x
3
x
4
, x
5
0.
Note that after relabeling the variables x
1
, x
2
, x
+
3
, x
3
, x
4
, x
5
by x
1
, x
2
, x
3
, x
4
, x
5
, x
6
, we obtain the
linear program(2.11). We leave to the reader to verify that the aforementioned transformations
are sufcient to convert any linear program into a linear program in SEF.
2.3 A Simplex iteration
Consider the following linear program in SEF,
max z(x) = (2, 3, 0, 0, 0)x
subject to
_
_
_
1 1 1 0 0
2 1 0 1 0
1 1 0 0 1
_
_
_
x =
_
_
_
6
10
4
_
_
_
x
1
, x
2
, x
3
, x
4
, x
5
0,
(2.16)
where x = (x
1
, x
2
, x
3
, x
4
, x
5
)
T
. Because (2.16) has a special form it is easy to verify that x =
(0, 0, 6, 10, 4)
T
is a feasible solution with value z( x) = 0. Let us try to nd a feasible solution
x with larger value. Since the objective function is z(x) = 2x
1
+3x
2
by increasing the value of
x
1
or x
2
we will increase the value of the objective function. Let us try to increase the value of
x
1
while keeping x
2
equal to zero. In other words, we look for a new feasible solution x where
x
1
= t for some t 0 and x
2
= 0. The matrix equation will tell us which values we need for
x
3
, x
4
, x
5
. We have,
_
_
_
6
10
4
_
_
_
=
_
_
_
1 1 1 0 0
2 1 0 1 0
1 1 0 0 1
_
_
_
x
= x
1
_
_
_
1
2
1
_
_
_
+x
2
_
_
_
1
1
1
_
_
_
+x
3
_
_
_
1
0
0
_
_
_
+x
4
_
_
_
0
1
0
_
_
_
+x
5
_
_
_
0
0
1
_
_
_
=t
_
_
_
1
2
1
_
_
_
+0
_
_
_
1
1
1
_
_
_
+
_
_
_
x
3
x
4
x
5
_
_
_
.
Thus,
_
_
_
x
3
x
4
x
5
_
_
_
=
_
_
_
6
10
4
_
_
_
t
_
_
_
1
2
1
_
_
_
. (2.17)
The larger we pick t 0, the more we will increase the objective function. How large can we
choose t to be? We simply need to make sure that x
3
, x
4
, x
5
0, i.e. that,
t
_
_
_
1
2
1
_
_
_
_
_
_
6
10
4
_
_
_
.
Thus, t 6 and 2t 10, i.e. t
10
2
. Note that t 4 does not impose any bound on how
large t can be. Picking the largest possible t yields 5. We can summarize this computation as,
t = min
_
6
1
,
10
2
,
_
.
Replacing t = 5 in (2.17) yields x
/
:= (5, 0, 1, 0, 9)
T
with z(x
/
) = 10.
What happens if we try to apply the same approach again? We could try to increase the
value of x
/
2
, but in order to do this we would have to decrease the value of x
/
1
which might
decrease the objective function. The same strategy no longer seems to work! We were able
to carry out the computations the rst time around because the linear program (2.16) was in a
suitable form for the original solution x. However (2.16) is not in a suitable form for the new
2.4. BASES AND CANONICAL FORMS 44
solution x
/
. In the next section, we will show that we can rewrite (2.16) so that the resulting
linear program is in a form that allows us to carry out the kind of computations outlined for
x. By repeating this type of computations and rewriting the linear program at every step, this
will lead to an algorithm for solving linear programs which is know as the Simplex algorithm.
2.4 Bases and canonical forms
2.4.1 Bases
Consider an mn matrix A where the rows of A are linearly independent. We will denote
column j of A by A
j
. Let J be a subset of the column indices (i.e. J 1, . . . , n), we dene
A
J
to be the matrix formed by columns A
j
for all j J (where the columns appear in the order
given by their corresponding indices). We say that a set of column indices B forms a basis if
the matrix A
B
is a square non-singular matrix. Equivalently, a basis corresponds to a maximal
subset of linearly independent columns. Consider for instance,
A =
_
_
_
2 1 2 1 0 0
1 0 1 2 1 0
3 0 3 1 0 1
_
_
_
.
Then B = 2, 5, 6 is a basis as the matrix A
B
is the identity matrix. Note that B = 1, 2, 3
and 1, 5, 6 are also bases, while B = 1, 3 is not a basis (as A
B
is not square in this case)
and neither is B = 1, 3, 5 (as A
B
is singular in this case). We will denote by N the set of
column indices not in B. Thus, B and N will always denote a partition of the column indices
of A.
Suppose that in addition to the matrix A we have a vector b with m components, and
consider the system of equations Ax = b. For instance say,
_
_
_
2 1 2 1 0 0
1 0 1 2 1 0
3 0 3 1 0 1
_
_
_
x =
_
_
_
2
1
1
_
_
_
. (2.18)
Variables x
j
are said to be basic when j B and non-basic otherwise. The vector which is
formed by the basic variables is denoted by x
B
and the vector which is formed by the non-basic
variables is x
N
. We assume that the components in x
B
appear in the same order as A
B
and that
the components in x
N
appear in the same order as A
N
. For instance, for (2.18) B =1, 5, 6 is
a basis, then N =2, 3, 4 and x
B
= (x
1
, x
5
, x
6
)
T
, x
N
= (x
2
, x
3
, x
4
)
T
.
The following easy observation will be used repeatedly,
Ax =
n
j=1
x
j
A
j
=
jB
x
j
A
j
+
jN
x
j
A
J
= A
B
x
B
+A
N
x
N
.
A vector x is a basic solution of Ax = b for a basis B if the following conditions hold:
1. A x = b and
2. x
N
= 0.
Suppose x is such a basic solution, then
b = A x = A
B
x
B
+A
N
x
N
..
=0
= A
B
x
B
.
Since, A
B
is non-singular it has an inverse and we have x
B
= A
1
B
b. In particular it shows,
Remark 5. Every basis is associated with a unique basic solution.
For (2.18) and basis B =1, 5, 6, the unique basic solution x is x
2
= x
3
= x
4
= 0 and,
_
_
_
x
1
x
5
x
6
_
_
_
=
_
_
_
2 0 0
1 1 0
3 0 1
_
_
_
1
_
_
_
2
1
1
_
_
_
=
_
_
_
1
0
2
_
_
_
.
Thus, x = (1, 0, 0, 0, 0, 2)
T
. A basic solution x is feasible if x 0. A basis B is feasible if the
corresponding basic solution is feasible. If a basic solution (resp. a basis) is not feasible then
it is infeasible. For instance, the basis B = 1, 5, 6 is infeasible as the corresponding basic
solution x has negative entries such as x
6
=2. When B =2, 5, 6, then as A
B
is the identity
matrix, the corresponding basic solution is x = (0, 2, 0, 0, 1, 1)
T
which is feasible as all entries
are non-negative.
The Simplex algorithm will solve linear programs in SEF by considering the bases of the
matrix A where Ax = b are the equations that dene all the equality constraints of the linear
program. Of course if the rows of A are not linearly independent then A will not have any
basis. We claim however that we may assume without loss of generality that the rows of A are
indeed linearly independent. For otherwise, you can prove using elementary linear algebra
that one of the following two possibilities must occur:
1. The system Ax = b has no solution,
2. The system Ax = b has a redundant constraint.
If (1) occurs, then linear program is infeasible and we can stop. If (2) occurs, then we can
eliminate a redundant constraint. We repeat the procedure until all rows of A are linearly inde-
pendent. Hence, throughout this chapter we will always assume (without stating it explicitly)
that the rows of the matrix dening the left hand side of the equality constraints in a linear
program in SEF are linearly independent.
2.4.2 Canonical forms
Let us restate the linear program (2.16) we were trying to solve in section 2.3,
max z(x) = (2, 3, 0, 0, 0)x
subject to
_
_
_
1 1 1 0 0
2 1 0 1 0
1 1 0 0 1
_
_
_
x =
_
_
_
6
10
4
_
_
_
x
1
, x
2
, x
3
, x
4
, x
5
0.
(2.16)
Observe that B =3, 4, 5 is a basis. The corresponding basic solution is given by x = (0, 0, 6,
10, 4)
T
. Note that x is the feasible solution with which we started the iteration. The basis
B has the property that A
B
is an identity matrix. In addition, the objective function has the
property that c
B
= (c
3
, c
4
, c
5
)
T
= (0, 0, 0)
T
. These are the two properties that allowed us to
nd a new feasible solution x
/
= (5, 0, 1, 0, 9)
T
with larger value. We let the students verify
that x
/
is a basic solution for the basis B =1, 3, 5. Clearly, for that new basis B, A
B
is not an
identity matrix, and c
B
is not the 0 vector. Since these properties are no longer satised for x
/
,
we could not carry the computation further.
This motivates the following denition: consider the following linear program in SEF,
maxc
T
x + z : Ax = b, x 0, (P)
where z is a constant and let B be a basis of A. We say that (P) is in canonical form for B if
the following conditions are satised:
(C1) A
B
is an identity matrix,
(C2) c
B
= 0.
We will show that given any basis B, we can rewrite the linear program (P) so that it is in
canonical form. For instance, B =1, 2, 4, is a basis of the linear program (2.16) that can be
rewritten as the following equivalent linear program,
max z(x) = 17+(0, 0,
5
2
, 0,
1
2
)x
subject to
_
_
_
1 0 1/2 0 1/2
0 1 1/2 0 1/2
0 0 3/2 1 1/2
_
_
_
x =
_
_
_
1
5
3
_
_
_
x 0.
(2.19)
How did we get this linear program?
Let us rst rewrite the equations of (2.16) so that condition (C1) is satised for B =
1, 2, 4 We left multiply the equations by the inverse of A
B
(where Ax =b denote the equality
constraints of (2.16) and where B =1, 2, 4),
_
_
_
1 1 0
2 1 1
1 1 0
_
_
_
1
_
_
_
1 1 1 0 0
2 1 0 1 0
1 1 0 0 1
_
_
_
x =
_
_
_
1 1 0
2 1 1
1 1 0
_
_
_
1
_
_
_
6
10
4
_
_
_
.
The resulting equation is exactly the set of equations in the linear program (2.19).
Let us rewrite the objective function of (2.16) so that condition (C2) is satised (still for
basis B =1, 2, 4). First, we generate an equation obtained by multiplying the rst equation
of (2.16) by y
1
, multiplying the second equation of (2.16) by y
2
, the third equation by y
3
, and
adding each of the corresponding constraints together (where the values of y
1
, y
2
, y
3
are yet to
be decided). The resulting equation can be written as,
(y
1
, y
2
, y
3
)
_
_
_
1 1 1 0 0
2 1 0 1 0
1 1 0 0 1
_
_
_
x = (y
1
, y
2
, y
3
)
_
_
_
6
10
4
_
_
_
,
or equivalently as,
0 = (y
1
, y
2
, y
3
)
_
_
_
6
10
4
_
_
_
(y
1
, y
2
, y
3
)
_
_
_
1 1 1 0 0
2 1 0 1 0
1 1 0 0 1
_
_
_
x.
Since this equation holds for every feasible solution x, we can add this previous constraint to
the objective function of (2.16), namely z(x) = (2, 3, 0, 0, 0)x. The resulting objective function
is,
z(x) = (y
1
, y
2
, y
3
)
_
_
_
6
10
4
_
_
_
+
_
_
(2, 3, 0, 0, 0) (y
1
, y
2
, y
3
)
_
_
_
1 1 1 0 0
2 1 0 1 0
1 1 0 0 1
_
_
_
_
_
x ()
which is of the form z(x) = z + c
T
x. For (C2) to be satised we need c
1
= c
2
= c
4
= 0. We
need to choose y
1
, y
2
, y
3
accordingly. Namely, we need,
(2, 3, 0) (y
1
, y
2
, y
3
)
_
_
_
1 1 0
2 1 1
1 1 0
_
_
_
= 0
T
,
or equivalently,
(y
1
, y
2
, y
3
)
_
_
_
1 1 0
2 1 1
1 1 0
_
_
_
= (2, 3, 0)
by taking the transpose on both sides of the equation we get
_
_
_
1 1 0
2 1 1
1 1 0
_
_
_
T
_
_
_
y
1
y
2
y
3
_
_
_
=
_
_
_
2
3
0
_
_
_
.
By solving the system, we get the unique solution y = (
5
2
, 0,
1
2
)
T
. By substituting y in () we
obtain,
z(x) = 17+(0, 0,
5
2
, 0,
1
2
)x
which is the objective function of (2.19).
Let us formalize these observations. Consider a linear program in SEF,
maxc
T
x + z : Ax = b, x 0,
where z is a constant. By denition, A
B
is non-singular, hence A
1
B
exists.
We claim that to achieve condition (C1) it sufces to replace Ax = b by,
A
1
B
Ax = A
1
B
b. (2.20)
First observe that Ax = A
B
x
B
+A
N
x
N
, thus
A
1
B
Ax = A
1
B
(A
B
x
B
+A
N
x
N
)
= A
1
B
A
B
x
B
+A
1
B
A
N
x
N
= x
B
+A
1
B
A
N
x
N
.
In particular, the columns corresponding to B in the left hand side of (2.20) form an identity
matrix as required. Moreover, we claim that the set of solutions to Ax = b is equal to the set
of solutions to (2.20). Clearly, every solution to Ax = b is also a solution to A
1
B
Ax = A
1
B
b as
these equations are linear combinations of the equations Ax = b. Moreover, every solution to
A
1
B
Ax = A
1
B
b is also a solution to A
B
A
1
B
Ax = A
B
A
1
B
b, but this equation is simply Ax = b,
proving the claim.
Let us consider condition (C2). Let B be a basis of A. Suppose A has m rows, then for any
vector y = (y
1
, . . . , y
m
)
T
the equation,
y
T
Ax = y
T
b
can be rewritten as,
0 = y
T
by
T
Ax.
Since this equation holds for every feasible solution, we can add this constraint to the objective
function z(x) = c
T
x + z. The resulting objective function is,
z(x) = y
T
b+ z +(c
T
y
T
A)x. (2.21)
Let c
T
:=c
T
y
T
A. For (C2) to be satised we need c
B
=0 and need to choose y accordingly.
Namely we want that
c
T
B
= c
T
B
y
T
A
B
= 0
T
or equivalently that,
y
T
A
B
= c
T
B
.
By taking the transpose on both sides we get,
A
T
B
y = c
B
.
Note that inverse operation and the transpose operations commute, hence (A
1
B
)
T
= (A
T
B
)
1
.
Therefore, we will write A
T
B
for (A
1
B
)
T
. Hence, the previous relation can be rewritten as,
y = A
T
B
c
B
. (2.22)
We have shown the following, see (2.20), (2.21), (2.22),
Proposition 6. Suppose a linear program,
maxz(x) = c
T
x + z : Ax = b, x 0
and a basis B of A are given. Then, the following linear program is an equivalent linear
program in canonical form for the basis B,
max z(x) = y
T
b+ z +(c
T
y
T
A)x
subject to
A
1
B
Ax = A
1
B
b
x 0
where y = A
T
B
c
B
.
2.5 The Simplex algorithm
2.5.1 An example with an optimal solution
Let us continue the example (2.16) which we rst started in Section 2.3. At the end of the rst
iteration, we obtained the feasible solution x = (5, 0, 1, 0, 9)
T
which is a basic solution for the
basis B =1, 3, 5. Using the formulae in Proposition 6, we can rewrite the linear program so
that it is in canonical form for that basis. We obtain,
maxz(x) = 10+c
T
x : Ax = b, x 0,
where
A =
_
_
_
1 1/2 0 1/2 0
0 1/2 1 1/2 0
0 3/2 0 1/2 1
_
_
_
b =
_
_
_
5
1
9
_
_
_
c =
_
_
_
_
_
_
_
_
0
2
0
1
0
_
_
_
_
_
_
_
_
. (2.23)
Let us try to nd a feasible solution x with value larger than x. Recall, B and N partition the
column indices of A, i.e. N = 2, 4. Since the linear program is in canonical form, c
B
= 0.
Therefore, to increase the objective function value, we must select k N such that c
k
> 0 and
increase the component x
k
of x. In this case, our choice for k is k =2. Therefore, we set x
2
=t
for some t 0. For all j N where j ,= k we keep component x
j
of x equal to zero. It means
in this case that x
4
= 0. The matrix equation will tell us what values we need to choose for
x
B
= (x
1
, x
3
, x
5
)
T
. Following the same argument as in Section 2.3 we obtain that,
x
B
=
_
_
_
x
1
x
3
x
5
_
_
_
=
_
_
_
5
1
9
_
_
_
t
_
_
_
1/2
1/2
3/2
_
_
_
(2.24)
which implies as x
B
0 that,
t
_
_
_
1/2
1/2
3/2
_
_
_
_
_
_
5
1
9
_
_
_
. (2.25)
The largest possible value for t is given by,
t = min
_
5
1/2
,
1
1/2
,
9
3/2
_
= 2. (2.26)
2.5. THE SIMPLEX ALGORITHM 52
Note that (2.25) can be written as tA
k
b (where k = 2). Thus t is obtained by taking the
smallest ratio between the entry b
i
and entry A
ik
for all A
ik
> 0.
Replacing t = 2 in (2.24) yields (4, 2, 0, 0, 6)
T
. We redene x to be (4, 2, 0, 0, 6)
T
and we
now have z( x) = 14. This vector x is a basic solution (see Exercise 1.23). Since x
1
, x
2
, x
5
> 0
the basis corresponding to x must contain each of 1, 2, 5. As each basis of A contains
exactly 3 basic elements, it follows that the new basis must be 1, 2, 5. We can rewrite the
linear program so that it is in canonical form for that basis and repeat the same process.
To start the new iteration it sufces to know the new basis 1, 2, 5. We obtained 1, 2, 5
from the old basis 1, 3, 5 by adding element 2 and removing element 3. We will say that
2 entered the basis and 3 left the basis. Thus it sufces at each iteration to establish which
element enters and which element leaves the basis. If we set x
k
= t where k N, element k
will enter the basis. If some basic variable x
is decreased to 0 then we can select to leave

the basis. In (2.26) the minimum was attained for the second term. Thus in (2.24) the second
component of x
B
will be set to zero. i.e. x
3
will be set to zero and 3 will leave the basis.
Let us proceed with the next iteration. Using the formulae in Proposition 6 we can rewrite
the linear program so that it is in canonical form for the basis B =1, 2, 5. We get,
maxz(x) = 14+c
T
x : Ax = b, x 0,
where
A =
_
_
_
1 0 1 1 0
0 1 2 1 0
0 0 3 2 1
_
_
_
b =
_
_
_
4
2
6
_
_
_
c =
_
_
_
_
_
_
_
_
0
0
4
1
0
_
_
_
_
_
_
_
_
(2.27)
Here we have N =3, 4. Let us rst choose which element k enters the basis. We want k N
and c
k
>0. The only choice is k =4. We compute t by taking the smallest ratio between entry
b
i
and entry A
ik
(where k = 4) for all i where A
ik
> 0, namely,
t = min
_
4
1
, ,
6
2
_
= 3,
where - indicates that the corresponding entry of A
ik
is not positive. The minimum was
attained for the ratio
6
2
, i.e. the 3
rd
row. It corresponds to the third component of x
B
. As the
3
rd
element of B is 5 we will have x
5
= 0 for the new solution. Hence, 5 will be leaving the
basis and the new basis will be 1, 2, 4.
Let us proceed with the next iteration. Using the formulae in Proposition 6 we can rewrite
the linear program so that it is in canonical form for the basis B =1, 2, 4. We get,
max z(x) = 17+
_
0, 0,
5
2
, 0,
1
2
_
x
subject to
_
_
_
1 0 1/2 0 1/2
0 1 1/2 0 1/2
0 0 3/2 1 1/2
_
_
_
x =
_
_
_
1
5
3
_
_
_
x 0.
(2.28)
The basic solution is x := (1, 5, 0, 3, 0)
T
and z( x) = 17. We have N = 3, 5. We want k N
and c
k
> 0. However, c
3
=
5
2
and c
5
=
1
2
so there is no such choice. We claim that this
occurs because the current basic solution is optimal.
Let x
/
be any feasible solution then,
z(x
/
) = 17+
_
0, 0,
5
2
, 0,
1
2
_
. .
0
x
/
..
0
17.
Thus 17 is an upper bound for the value of any feasible solution. It follows that x is an optimal
solution.
Recall that the original formulation (2.16) of the linear program was given as,
maxc
T
x : Ax = b, x 0,
where
A =
_
_
_
1 1 1 0 0
2 1 0 1 0
1 1 0 0 1
_
_
_
b =
_
_
_
6
10
4
_
_
_
c =
_
_
_
_
_
_
_
_
2
3
0
0
0
_
_
_
_
_
_
_
_
.
To compute the formulation (2.28) we used the vector,
y = A
T
B
c
B
=
_
_
_
1 1 0
2 1 1
1 1 0
_
_
_
T
_
_
_
2
3
0
_
_
_
=
_
_
_
5/2
0
1/2
_
_
_
as dened in Proposition 6. Observe that y is the certicate of optimality dened in Proposi-
tion 3 as one can readily check that A
T
y c and y
T
b = c
T
x. Hence the algorithm found an
optimal solution x and a certicate of optimality y.
2.5.2 An unbounded example
Consider the following linear program,
maxz(x) = c
T
x : Ax = b, x 0
where
A =
_
2 4 1 0 1
3 7 0 1 1
_
b =
_
1
3
_
c =
_
_
_
_
_
_
_
_
1
3
0
0
1
_
_
_
_
_
_
_
_
.
It is in canonical form for the basis B = 3, 4. Then N = 1, 2, 5. Let us choose which
element k enters the basis. We want k N and c
k
> 0. We have choices k = 2 and k = 5. Let
us select 5. We compute t by taking the smallest ratio between entry b
i
and entry A
ik
for all i
where A
ik
> 0. Namely,
min
_
1
1
,
3
1
_
.
The minimum is attained for the ratio
1
1
which corresponds to the rst row. The rst basic
variable is 3. Thus 3 is leaving the basis. Hence, the new basis is B =4, 5.
Using the formulae in Proposition 6 we can rewrite the linear program so that it is in
canonical form for the basis B =4, 5. We get,
maxz(x) = 1+c
T
x : Ax = b, x 0,
where
A =
_
1 3 1 1 0
2 4 1 0 1
_
b =
_
2
1
_
c =
_
_
_
_
_
_
_
_
1
1
1
0
0
_
_
_
_
_
_
_
_
.
Here N =1, 2, 3. Let us choose which element k enters the basis. We want k N and c
k
>0.
The only possible choice is k = 1. We compute t by taking the smallest ratio between entry b
i
and entry A
ik
for all i where A
ik
> 0. However, as A
k
0, this is not well dened. We claim
that this occurs because the linear program is unbounded.
The newfeasible solution x(t) is dened by setting x
1
(t) =t for some t 0 and x
2
=x
3
=0.
The matrix equation Ax = b tells us which values to choose for x
B
(t) namely (see argument in
Section 2.3),
x
B
(t) =
_
x
4
(t)
x
5
(t)
_
=
_
2
1
_
t
_
1
2
_
.
Thus, we have
x(t) =
_
_
_
_
_
_
_
_
t
0
0
2+t
1+2t
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
0
0
0
2
1
_
_
_
_
_
_
_
_
. .
:= x
+t
_
_
_
_
_
_
_
_
1
0
0
1
2
_
_
_
_
_
_
_
_
. .
:=d
.
Then x is feasible, Ad =0, d 0, and c
T
d =1 >0. Hence, x, d form a certicate of unbound-
edness.
2.5.3 Formalizing the procedure
Let us formalize the Simplex procedure described in the previous sections. At each step we
have a feasible basis and we either detect unboundedness, or attempt to nd a new feasible
basis where the associated basic solution has larger value than the basic solution for the current
basis.
In light of Proposition 6 we can assume that the linear program is in canonical form for a
feasible basis B, i.e. that it is of the form,
max z(x) = z +c
T
N
x
N
subject to
x
B
+A
N
x
N
= b
x 0,
(P)
where z is some real value and b 0.
Let x be the basic solution for basis B, i.e. x
N
= 0 and x
B
= b.
Remark 7. If c
N
0 then x is an optimal solution to (P).
Proof. Suppose c
N
0. Note that z( x) = z +c
T
N
x
N
= z +c
T
N
0 = z. For any feasible solution x
/
we have x
/
0. As c
N
0, it implies that c
T
x
/
= c
T
N
x
/
N
0. It follows that z(x
/
) z, i.e. z is
an upper bound for (P). As z( x) = z, the result follows.
Now suppose that for some k N we have c
k
> 0. We dene x
/
N
which depends on some
parameter t 0 as follows
x
/
j
=
_
_
t if j = k
0 if j Nk.
Now we need to satisfy, x
/
B
+A
N
x
/
N
= b. Thus,
x
/
B
= bA
N
x
/
N
= b
jN
x
/
j
A
j
= bx
/
k
A
k
= btA
k
. (2.29)
Remark 8. If A
k
0 then the linear program is unbounded.
Proof. Suppose A
k
0. Then for all t 0 we have x
/
B
= b tA
k
0. Hence, x
/
is feasible.
Moreover,
z(x
/
) = z +c
T
N
x
/
N
= z +
jN
c
j
x
/
j
= z +c
k
x
/
k
= z +c
k
t.
Since c
k
> 0, z(x
/
) goes to innity as t goes to innity.
Thus, we may assume that A
ik
> 0 for some row index i. We need to choose t so that
x
/
B
0. It follows from (2.29) that,
tA
k
b
or equivalently, for every row index i for which A
ik
> 0 we must have,
t
b
i
A
ik
.
Hence, the largest value t for which x
/
remains non-negative is given by
t = min
_
b
i
A
ik
: A
ik
> 0
_
.
Note that since A
k
0 does not hold, this is well dened. Let r denote the index i where the
minimum is attained in the previous equation. Then (2.29) implies that the r
th
entry of x
/
B
will
be zero. Let denote the r
th
basic variable of B. Note that since we order the components
of x
B
in the same order as B, the r
th
component of x
/
B
is the basic variable x
/
. It follows that
x
/
= 0. Choose
B
/
= Bk, N
/
=1, . . . , nk.
We let the reader verify that x
/
N
/
=0. It can be readily checked that B
/
is a basis. Then, x
/
must
be a basic solution for the basis B
/
.
Let us summarize the simplex procedure for
maxc
T
x : Ax = b, x 0. (P)
Simplex Algorithm.
input: Linear program (P) and a feasible basis B.
output: An optimal solution of (P) or a certicate proving that (P) is unbounded
Step 1: Rewrite (P) so that it is in canonical form for the basis B.
Let x be the basic feasible solution for B.
Step 2: If c
N
0 stop, x is optimal.
Select k N such that c
k
> 0.
Step 3: If A
k
0 stop, (P) is unbounded.
Let r be any index i where the following minimum is attained,
2.6. FINDING FEASIBLE SOLUTIONS 58
t = min
_
b
i
A
ik
: A
ik
> 0
_
.
Let be the r
th
basis element.
Set B := Bk.
Step 4: Go to step 1.
Note, we have argued that if the algorithm terminates, then it provides us with a correct
solution. However, the algorithm as it is described need not stop. Suppose that at every step,
the quantity t > 0. Then at every step, the objective function will increase. Moreover, it
is clear that at no iteration will the objective function decrease or stay the same. Hence, in
that case we never visit the same basis twice. As there are clearly only a nite number of
bases, this would guarantee that the algorithm terminates. However, it is possible that at every
iteration the quantity t = 0. Then at the start of the next iteration we get a new basis, but the
same basic solution. After a number of iterations, it is possible that we revisit the same basis.
If we repeat this forever, the algorithm will not terminate.
There are a number of easy renements to the version of the Simplex algorithm we de-
scribed that will guarantee termination. The easiest to state is as follows: Throughout the
Simplex iterations with t = 0, in Step 2, among all j N with c
j
> 0, choose k := min j
N : c
j
> 0; also, in Step 3, dene t as before and choose the smallest r B with A
rk
> 0 and
b
r
A
rk
=t. This rule is known as the smallest subscript rule or Blands rule.
Theorem 9. The Simplex procedure with the smallest subscript rule is guaranteed to termi-
nate.
We will omit the proof of this result in these notes.
2.6 Finding feasible solutions
The Simplex algorithm requires a feasible basis as part of its input. We will describe in this
section on how to proceed when we are not given such feasible basis. We will rst take a step
back and look at the problem in the more general context of reducible problems.
2.6.1 Reducibility
In Section 1.1.2 we saw an inventory problem (the KWoil problem) with three time periods.
We showed that this problem can be formulated as a linear program. Moreover, it is easy to see
that if we generalize this inventory problem to one with an arbitrary number of time periods, it
can still be formulated as a linear program. Thus, assuming that we have an efcient algorithm
to solve linear programs, we can nd an efcient algorithm to solve any of these inventory
problems. We say that this class of inventory problem is reducible to linear programming. The
meaning of efcient will be formalized in Chapter 3. More generally, consider two classes of
optimization problems, say A and B. If given an efcient algorithm to solve all instances of
problem B, we can solve every instance of problem A efciently, then we say A is reducible
to B.
Consider the following linear program in SEF,
maxc
T
x : Ax = b, x 0 (P)
and let us dene the following three optimization problems,
Problem A.
Either,
1. prove that (P) has no feasible solution, or
2. prove that (P) is unbounded, or
3. find an optimal solution to (P).
Problem B.
Either,
1. prove that (P) has no feasible solution, or
2. find a feasible solution to (P).
Problem C.
Given a feasible solution x, either,
1. prove that (P) is unbounded, or
2. find an optimal solution to (P).
The problem we wish to solve is problem A. Note, that the simplex procedure essentially
solves problem C (with the difference that we need a feasible basic solution rather than an
arbitrary feasible solution, this will be addressed in Section 2.6.2). We will show the following
result,
Proposition 10. Problem B is reducible to problem C.
Suppose you have an algorithm to solve C and wish to solve A. You proceed as follows:
Proposition 10 implies that you can solve problem B. If we get that (P) has no feasible solution
we can stop as we solved A. Otherwise, we obtain a feasible solution x of (P). Using this
solution x we can use our algorithm to solve C to either deduce that (P) is unbounded or
to nd a feasible solution to (P). Hence, we have solved A as well. Hence, Proposition 10
implies that
Proposition 11. Problem A is reducible to problem C.
It remains to prove Proposition 10.
Let us rst proceed on an example and suppose that (P) is the following linear program
with variables x
1
, x
2
, x
3
, x
4
,
max (1, 2, 1, 3)x
subject to
_
1 5 2 1
2 9 0 3
_
x =
_
7
13
_
x 0
Using an algorithm for problem C we wish to nd a feasible solution for (P) if it exists or
show that (P) has no feasible solution. Let us rst rewrite the constraints of (P) by multiplying
every equation by 1, where the right hand side is negative, we obtain,
_
1 5 2 1
2 9 0 3
_
x =
_
7
13
_
x = (x
1
, x
2
, x
3
, x
4
)
T
0.
We now dene the following auxiliary linear program,
min x
5
+x
6
subject to
_
1 5 2 1 1 0
2 9 0 3 0 1
_
x =
_
7
13
_
x = (x
1
, x
2
, x
3
, x
4
, x
5
, x
6
)
T
0.
(Q)
Variables x
5
, x
6
are the auxiliary variables. Observe that B =5, 6 is a basis, and that the cor-
responding basic solution x = (0, 0, 0, 0, 7, 13)
T
is feasible, since all entries are non-negative.
Note, that this is the case since we made sure that the right hand side of the constraints of
(Q) are all non-negative. We can use the algorithm for problem C to solve (Q). Note that 0
is a lower bound for (Q), hence (Q) is not unbounded. It follows that the algorithm for C
will nd an optimal solution. In this case the optimal solution is x
/
= (2, 1, 0, 0, 0, 0)
T
. Since
x
/
5
= x
/
6
= 0 it follows that (2, 1, 0, 0)
T
is a feasible solution to (P). Hence, we have solved
problem B.
Consider a second example and suppose that (P) is the following linear program,
max (6, 1, 1)x
subject to
_
5 1 1
1 1 2
_
x =
_
1
5
_
x = (x
1
, x
2
, x
3
)
T
0.
The corresponding auxiliary problem is
min x
4
+x
5
subject to
_
5 1 1 1 0
1 1 2 0 1
_
x =
_
1
5
_
x = (x
1
, x
2
, x
3
, x
4
, x
5
)
T
0.
(Q)
An optimal solution to (Q) is x
/
= (0, 0, 1, 0, 3)
T
which has value 3. We claim that (P) has no
feasible solution in this case. Suppose for a contradiction that there was a feasible solution
x
1
, x
2
, x
3
to (P). Then x = ( x
1
, x
2
, x
3
, 0, 0)
T
would be a feasible solution to (Q) of value 0,
contradicting the fact that x
/
is optimal.
Let us summarize these observations. We consider
maxc
T
x : Ax = b, x 0 (P)
where A has m rows and n columns. We may assume that b 0 as we can multiply any
equation by 1 without changing the problem. We construct the auxiliary problem,
min w = x
n+1
+
.
.
. +x
n+m
subject to
_
A I
_
_
_
_
x
1
.
.
.
x
n+m
_
_
_
= b
(x
1
, . . . , x
n+m
)
T
0
T
.
(Q)
We leave the proof of the next remark as an easy exercise (follow the argument outlined in the
aforementioned examples),
Remark 12. Let x
/
= (x
/
1
, . . . , x
/
n+m
)
T
be an optimal solution to (Q),
1. If w = 0 then (x
/
1
, . . . , x
/
n
)
T
is a solution to (P).
2. If w > 0 then (P) is infeasible.
We can now prove Proposition 10. Construct from (P) the auxiliary problem (Q). Find an
optimal solution x
/
for (Q) using the algorithm for Problem C. Then (P) has a feasible solution
if and only if w = 0 as indicated in the previous Remark.
Note, problem A is more general than problem B. However, it is an easy consequence
from Theorem 25 that in fact Problem A can be reduced to problem B. In other words for
linear programs, if we have an algorithm that can nd feasible solutions (when they exists),
we can derive an algorithm that can nd optimal solutions (when they exist).
2.6.2 The Two Phase method - an example
We illustrate the method presented in the previous section on an example. During Phase I,
we look for a basic feasible solution (if one exists), and during Phase II, we nd an optimal
solution (if one exists), starting from the feasible basic solution obtained during Phase I.
Consider,
max
_
2 1 2
_
x
subject to
_
1 2 1
1 1 1
_
x =
_
1
3
_
.
x = (x
1
, x
2
, x
3
)
T
0
(P)
PHASE I.
We construct the auxiliary problem,
max
_
0 0 0 1 1
_
x
subject to
_
1 2 1 1 0
1 1 1 0 1
_
x =
_
1
3
_
.
x = (x
1
, x
2
, x
3
, x
4
, x
5
)
T
0
(Q)
Note that the objective function is equivalent to min x
4
+x
5
. B = 4, 5 is a feasible basis,
however (Q) is not in canonical form for B. We could use the formulae in Proposition 6 to
rewrite (Q) in canonical form, but a simpler approach is to add each of the two equations of
(Q) to the objective function. The resulting linear program is,
max
_
2 1 0 0 0
_
x 4
subject to
_
1 2 1 1 0
1 1 1 0 1
_
x =
_
1
3
_
.
x = (x
1
, x
2
, x
3
, x
4
, x
5
)
T
0
Solving this linear program using the Simplex algorithm starting from B = 4, 5 we obtain
the optimal basis B =1, 3. The canonical form for B is,
max
_
0 0 0 1 1
_
x
subject to
_
1
1
2
0
1
2
1
2
0
3
2
1
1
2
1
2
_
x =
_
2
1
_
.
x = (x
1
, x
2
, x
3
, x
4
, x
5
)
T
0
(Q)
The basic solution corresponding to B is x = (2, 0, 1, 0, 0)
T
which has value 0. It follows
from Remark 12 that (2, 0, 1)
T
is a feasible solution for (P). Moreover, (2, 0, 1)
T
is the basic
solution for basis B of (P), hence B is a feasible basis of (P). Note, it is always true that the
feasible solution we construct after solving (Q) using the Simplex procedure will be a basic
solution of (P). It need not be the case that B be a basis of (P), however.
PHASE II.
We can use the formulae in Proposition 6 to rewrite (P) in canonical form for the basis B =
1, 3. Note that to obtain the constraints we can use the constraints of (Q) (omitting the
auxiliary variables). We obtain,
max
_
0 1 0
_
x +6
subject to
_
1
1
2
0
0
3
2
1
_
x =
_
2
1
_
.
x = (x
1
, x
2
, x
3
)
T
0
Solving this linear program using the Simplex algorithm starting from B = 1, 3 we obtain
the optimal basis B =2, 3. The canonical form for B is,
max
_
2 0 0
_
x +10
subject to
_
2 1 0
3 0 1
_
x =
_
4
7
_
x 0
Then the basic solution (0, 4, 7)
T
is an optimal solution for (P).
2.6.3 Consequences
Suppose that we are given an arbitrary linear program (P) in SEF. Let us run the Two Phase
method for (P) using the smallest subscript rule. Theorem 9 implies that the Two Phase
method will terminate. The following now is a consequence of our previous discussion,
Theorem 13 (Fundamental Theorem of LP (SEF)).
Let (P) be an LP problem in SEF. If (P) does not have an optimal solution, then (P) is either
infeasible or unbounded. Moreover,
1. if (P) is feasible then (P) has a basic feasible solution;
2. if (P) has an optimal solution, then (P) has a basic feasible solution that is optimal.
Since we can convert any LP problem into SEF while preserving the main property of the LP,
the above theorem yields the following result for LP problems in any form.
Theorem 14 (Fundamental Theorem of LP).
Let (P) be an LP problem. Then exactly one of the following holds:
1. (P) is infeasible.
2. (P) is unbounded.
3. (P) has an optimal solution.
2.7. PIVOTING 66
2.7 Pivoting
The Simplex algorithm requires us to reformulate the problem in canonical form for every ba-
sis. We can describe the computation required between two consecutive iteration in a compact
way. We describe this next. Let T be an mn matrix and consider (i, j), where i 1, . . . , m
and j 1, . . . , n such that T
i, j
,= 0. We say that matrix T
/
is obtained from T by pivoting on
element (i, j) if T
/
is dened as follows: for every row index k,
row
k
(T
/
) =
_
_
1
T
i, j
row
k
(T) if k = i
row
k
(T)
T
k, j
T
i, j
row
i
(T) if k ,= i.
We illustrate this on an example. Consider the matrix,
T =
_
_
_
2 2 1 0 3
0 3 2 3 5
3 1 2 1 5
_
_
_
.
Let us compute the matrix T
/
obtained from T by pivoting on element (2, 3). (We will use the
convention that elements on which we pivot are surrounded by a square.) We get
T
/
=
_
_
_
2 7/2 0 3/2 1/2
0 3/2 1 3/2 5/2
3 2 0 4 0
_
_
_
.
Observe, that the effect of pivoting on element (i, j) is to transform column j of the matrix T
into the vector e
i
(the vector where all the entries are zero except for entry i which is equal to
1).
The students should verify that T
/
=YT where,
Y =
_
_
_
1 1/2 0
0 1/2 0
0 1 1
_
_
_
.
Consider the system of equations,
_
_
_
2 2 1 0
0 3 2 3
3 1 2 1
_
_
_
x =
_
_
_
3
5
5
_
_
_
. (2.30)
We can represent this system by the matrix T. Namely, T is obtained from the coefcients
of the left hand side by adding an extra column corresponding to the right hand side. Given
T
/
we may construct a system of equations where we do the aforementioned operations in
reverse, namely, the set of all but the last columns of T
/
corresponds to the left hand side and
the last column corresponds to the right hand side. Then we get,
_
_
_
2 7/2 0 3/2
0 3/2 1 3/2
3 2 0 4
_
_
_
x =
_
_
_
1/2
5/2
0
_
_
_
. (2.31)
Since T
/
=YT it follows that equation (2.31) is obtained from equation (2.30) by left multi-
plying by the matrix Y. Observe that the matrix Y is non-singular, hence the set of solutions
for (2.31) is the same as for (2.30). Hence, we used pivoting to derive an equivalent system of
equations.
We can proceed in a similar way in general. Given a system Ax = b we construct a matrix
T by adding to A an extra column corresponding to b, i.e. T = (A[b). Let T
/
= (A
/
[b
/
) be
obtained from T by pivoting. Then T
/
=YT for some non-singular matrix Y. It follows that
A
/
=YA and b
/
=Yb. So, in particular, Ax = b and A
/
x = b
/
have the same set of solutions.
We say that T is the tableau representing the system Ax = b.
To show how this discussion relates to the Simplex algorithm, let us revisit example (2.16),
max z = 0+(2, 3, 0, 0, 0)x
subject to
_
_
_
1 1 1 0 0
2 1 0 1 0
1 1 0 0 1
_
_
_
x =
_
_
_
6
10
4
_
_
_
x
1
, x
2
, x
3
, x
4
, x
5
0.
2.7. PIVOTING 68
Let us express both the equations, and the objective function as a system of equations,
_
_
_
_
_
1 2 3 0 0 0
0 1 1 1 0 0
0 2 1 0 1 0
0 1 1 0 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
z
x
1
x
2
x
3
x
4
x
5
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
0
6
10
4
_
_
_
_
_
. (2.32)
Note, the rst constraint of (2.32) states that z 2x
1
3x
2
= 0, i.e. that z = 2x
1
+3x
3
, which
is the objective function. Let T
1
be the tableau representing the system (2.32), namely,
T
1
=
_
_
_
_
_
1 2 3 0 0 0 0
0 1 1 1 0 0 6
0 2 1 0 1 0 10
0 1 1 0 0 1 4
_
_
_
_
_
.
Note, for readability the vertical bars separate the z columns and the right hand side. The
horizontal separates the objective function from the equality constraints. Let us index the
column z by element 0 and the constraint corresponding to the objective function by 0 as well.
Thus the rst row and rst column of T
1
are row and column zero. Observe, that the linear
program is in canonical form for B = 3, 4, 5 as the columns of T
1
indexed by 0, 3, 4, 5,
form an identity matrix.
In general given the linear program,
maxz = z +c
T
x : Ax = b, x 0 (P)
we construct the tableau,
T =
_
1 c
T
z
0 A b
_
and the linear program (P) is in canonical form for basis B exactly when the columns of T
formed by columns B0 form an identity matrix.
Let us try to solve the linear program working with the tableau T
1
only. We select as
an entering variable k N such that c
k
> 0. It means in T
1
that we are selecting a column
k 1, . . . , 5 where T
1
0,k
is smaller than 0. We can select column 1 or 2, say select column
k = 1. Note that b corresponds to column 6 (rows 1 to 3) of T
1
. We select the row index
i 1, 2, 3, minimizing the ratio T
1
i,6
/T
1
i,k
where T
1
i,k
> 0, i.e. we consider
min
_
6
1
,
10
2
,
_
where the minimum is attained for row i = 2. Let us now pivot on the element (k, i) = (1, 2).
We obtain the following tableau,
T
2
=
_
_
_
_
_
1 0 2 0 1 0 10
0 0 1/2 1 1/2 0 1
0 1 1/2 0 1/2 0 5
0 0 3/2 0 1/2 1 9
_
_
_
_
_
.
Since we pivoted on (2, 1), column 1 of T
2
will have a 1 in row 2 and all other elements
will be zero. Since row 2 has zeros in columns 0, 3, 5 these columns will be unchanged in
T
2
. It follows that columns 0, 1, 3, 5 will form a permutation matrix in T
2
. We could permute
rows 1,2 of T
2
so that columns indexed by 0, 1, 3, 5 form an identity matrix, and therefore that
T
2
represents an LP in canonical form for the basis B = 1, 3, 5. However, reordering the
rows will prove unnecessary in this procedure. The Simplex procedure consists of selecting a
column j, selecting a row i and pivoting on (i, j).
We state the remaining sequence of tableaus obtained to solve (2.16).
T
3
=
_
_
_
_
_
1 0 0 4 1 0 14
0 0 1 2 1 0 2
0 1 0 1 1 0 4
0 0 0 3 2 1 6
_
_
_
_
_
and nally,
T
4
=
_
_
_
_
_
1 0 0 5/2 0 1/2 17
0 0 1 1/2 0 1/2 5
0 1 0 1/2 0 1/2 1
0 0 0 3/2 1 1/2 3
_
_
_
_
_
.
The corresponding objective function is now, z =175/2x
4
1/2x
5
. Hence the basic solution
x = (1, 5, 0, 3, 0)
T
is optimal.
The classical reference for the Simplex algorithm is the book by Dantzig [7]. The word sim-
plex is the name of a simple geometric object which generalizes a triangle (in R
2
) and a
tetrahedron (in R
3
) to arbitrary dimensions. Dantzig presents in his book an attractive ge-
ometric interpretation of the algorithm (which uses simplices) to suggest that the algorithm
would be efcient in practice. This is certainly worthwhile reading after this introductory
course is completed.
We saw that we can replace a free variable by the difference of two new non-negative
variables. Another way of handling free variables is to nd an equation which contains the
free variable, isolate the free variable in the equation and use this identity elsewhere in the
LP, record the variable and the equation on the side and eliminate the free variable and the
equation from the original LP. This latter approach reduces the number of variables and
constraints in the nal LP problem and is more suitable in many situations.
As we already hinted when discussing the nite termination and the smallest subscript
rule, there are many ways of choosing entering and leaving variables (these are called pivot
rules). In fact, there is a rule, based on perturbation ideas, called the lexicographic rule
which only restricts the choice of the leaving variable and also ensures nite convergence of
the Simplex algorithm. There are some theoretical and some computational issues which can
be addressed through proper choices of entering and leaving variables. See the books [4, 18].
In implementing the Simplex algorithm, it is important that the linear algebra is done in
an efcient and numerically stable way. Even though we derived various formulae involving
inverses of the matrices, in practice, A
1
B
is almost never formed explicitly. For various tech-
niques that exploit sparsity in the data and solve linear systems of equations in a numerically
stable way, see [10].
Chapter 3
Computational complexity
An algorithm is a formal procedure that describes how to solve a problem. In the previous
chapter we saw an example of an algorithm, namely the (two-phases) Simplex algorithm.
It takes as input a linear program in standard equality form and either returns an optimal
solution, or detects that the linear program is infeasible or unbounded. Two basic properties
we require for an algorithm are: correctness and termination. By correctness, we mean that
the algorithm is alway accurate when it claims that we have a particular outcome. One way
to ensure this is to require that the algorithm provide a certicate, i.e. a proof, to justify its
answers. By termination, we mean that the algorithm will stop after a nite number of steps.
In this section we will dene the running time of an algorithm. We will distinguish be-
tween fast and slow algorithms and will explain why this distinction is of critical importance.
Finally, we will discuss the possible existence of inherently hard classes of problems for which
it is unlikely that any fast algorithm exists. (For much more information on the topic, have a
look at the recent textbooks by Kleinberg and Tardos [13], and Sipser [17].)
3.1 A motivating example
Suppose we are given an mn matrix A and a vector b with m entries. The LP feasibility
problem, asks to either nd a solution to Ax b, or to show that there is none. (We showed in
Proposition 10 that this can be solved by the Simplex algorithm.) The 0, 1 feasibility problem,
71
3.2. FAST AND SLOW ALGORITHMS 72
asks to either nd a solution to Ax b where x 0, 1
n
, i.e. all variables take value 0 or 1,
or to show that no such solution exists. Unlike the LP-feasibility problem, the 0, 1 feasibility
problem is a nite problem. Indeed, as every variable can take 2 possible values, and since we
have n variables, there are 2
n
possible assignments of values to the variables. Hence, to solve
the 0, 1 feasibility problem we could try all possible 2
n
assignments of 0, 1 values x and check
for each case whether x satises the inequalities Ax b. This procedure is an algorithm for
the 0, 1 feasibility problem, as correctness and termination are trivially satised in this case.
This is however, not a satisfactory algorithm. One drawback is that it is very slow. Suppose
for instance that we have n = 100 variables. We need to enumerate 2
100
possible assignments
of values to the variables. Assuming that we have implemented our algorithm on a computer
that can enumerate a million such assignments every second, it would take over 410
16
years.
According to current estimates the universe is 13.75 billion years old. Then the running time
would be nearly three million times the age of the universe ! Clearly, this is not a practical
procedure by any reasonable standard. This illustrates the fact that brute force enumeration
is not going to be a sensible strategy in general. There is a further shortcoming with this
algorithm. Suppose that the algorithm states that there is no 0, 1 solution to the system Ax b.
How could we convince anyone that this is indeed the case. The algorithm provides no help
and anyone wanting to verify this fact would have to solve the problem from scratch.
3.2 Fast and slow algorithms
Our brute force algorithm to solve the 0,1 feasibility problem is an example of a slow algo-
rithm. In this section we will dene formally what we mean by fast and slow algorithms.
3.2.1 The big O notation
Before we can proceed further, we require a fewdenitions. Consider functions f , g : NN.
1
We write,
1
N is the set of all positive integers, i.e. 1, 2, 3, 4, . . ..
CHAPTER 3. COMPUTATIONAL COMPLEXITY 73
a) f = O(g) if there exist constants c
1
, c
2
such that for all n c
1
, f (n) c
2
g(n),
b) f =(g) if there exist constants c
1
, c
2
such that for all n c
1
, f (n) c
2
g(n),
c) f =(g) if f = O(g) and f =(g).
Thus f = O(g) means that for large n, some xed multiple of g(n) is an upper bound of f (n).
Similarly, f = (g) means that for large n, some xed multiple of g(n) is lower bound of
f (n). Hence, if f =(g) then f behaves like g asymptotically.
Example 1. Suppose for instance that f (n) = 2n
3
+3n
2
+2. Then f = O(n
3
) as for all n 4,
we have 3n
2
+2 n
3
, hence in particular
2n
3
+(3n
2
+2) 3n
3
.
(We apply the denition with c
1
= 4 and c
2
= 3.) Similarly, we can show that f = (n
3
).
Hence, f =(n
3
). In general, we can show that if f (n) is a polynomial of degree k, i.e.
f (n) =
k
i=1
a
i
n
i
,
for constants a
1
, . . . , a
k
, then f =(n
k
).
3.2.2 Input size and running time
It will be important to distinguish between a problem and an instance of that problem. For
instance, if our problem is 0, 1 feasibility, an instance of that problem would be described by a
matrix A, and a vector b and solving that instance means checking if there exists a 0, 1 solution
to Ax b for that specic matrix A and vector b. When we say that an algorithm solves the
0, 1 feasibility problem, we mean that we have an algorithm that works for every instance.
The input size of a given instance is the number of bits we need to store the data in a
computer. The way the data is encoded is not important as long as we avoid storing numbers
in base 1.
2
We dene the running time of an algorithm as the number of primitive operations
2
To store numbers that can take values between 1 and in base 1 we require bits, however, by using base
p, we only require, log
p
() bits, an exponential difference in storage requirements.
that are needed to complete the computation where a primitive operation primitive operation is
one that can be completed in a short xed amount time on a computer (i.e., a constant number
cycles on the processor). This may include for instance:
arithmetic operations like +, , , / involving numbers of a bounded size,
assignments,
loops, or
conditional statements.
Example 2. Consider a simple algorithmthat multiplies two square integer matrices.
1: Input: Matrices A, B N
nn
.
2: Output: Matrix C = AB.
3: Initialize C := 0
4: for 1 i n do
5: for 1 j n do
6: C
i j
:= 0
7: for 1 l n do
8: C
i j
:=C
i j
+A
il
B
l j
9: end for
10: end for
11: end for
12: return C
Let us assume that we are in the case where all numbers in the matrices A and B are within
a xed range. Then every one of these numbers can be stored using a xed number of bits,
say k. Thus the total number of bits required to store the data A, B (i.e. the input size) is
2n
2
k =(n
2
).
The main work of the algorithm is done in two main loops in steps 4 and 5. In particular,
steps 69 are executed n
2
times, once for each pair 1 i, j n. Each such execution needs
n+1 assignments, n additions and n multiplications. Thus, the entire algorithm (modulo the
initialization) needs
n
2
(3n+1) = 3n
3
+n
2
=(n
3
)
primitive operations in total.
3.2.3 The running time function
Of course we expect algorithms to have longer running time as the size of the instance in-
creases. Hence, we will express the running time as a function f of the input size s. Consider
Example 1 again. For an instance A, B of the problem (where A, B are nn matrices) the size
s of the problem is (n
2
) and the running time is (n
3
). Thus, we have the following running
time
f (s) =(s
3
2
).
In the case of Example 1 the running time is always going to be the same for every instance
of the same size s. Thus there is no ambiguity when we talk about running time. This need
not be the case in general however. Consider for instance the Simplex algorithm. We may
be given as input a problem that is already in canonical form for an optimal basis, or we may
be given a problem of the same input size that will require many steps of the Simplex. How
should we dene the running time in this context? We take a worst case view and dene the
running time function f as follows:
for any input size s, f (s) is the longest running time over all instances of size s.
3.2.4 Polynomial versus exponential algorithms
We say that an algorithm is polynomial-time if for some constant k we have f (s) = O(s
k
)
where f is the running time function, i.e., for any instance the running time of the algorithm is
bounded by some polynomial of the size of the instance. An algorithm is an exponential-time
if for for some constant k > 1 we have f (s) = (k
s
), i.e. for the worst instance the running
time of the algorithm is at least some exponential function of the size of that instance.
We call a polynomial-time algorithm a fast algorithm and an exponential-time algorithm
a slow algorithm. According to our denitions the simple minded algorithm described in
Section 3.1 for the 0, 1 feasibility problem is slow and the algorithm for multiplying two
matrices is fast.
To motivate the notion of fast and slow, suppose that your computer is capable of executing
1 million primitive instructions per second. Assume that you have an algorithm that has
running time f (s) for an input size s. The following table shows the actual running time of
this algorithm on your computer for an input of size s = 100, depending on f (s). Clearly the
slow, i.e. exponential-time algorithms, (the rightmost two) will be of little use in this case.
f (s) s slog
2
(s) s
2
s
3
1.5
s
2
s
Time < 1 sec < 1 sec < 1 sec 1 sec 12, 892 years 410
16
years
Let us consider the impact of improved hardware on the ability of fast and slow algorithms
to solve large instances. Suppose that in 1970, Prof. Brown was running a (fast) algorithm
with a running time f (s) =s
3
and that he was able to solve a problem of size 50 in one minute
on a computer. Forty years later, in 2010, Prof. Brown has a new computer that is a million
times faster than the one he had in 1970. What is, then the largest instance he can solve in
one minute running the same algorithm? If m denotes the number of primitive operations that
the computer could run in one minute in 1970, we must have 50
3
m. We claim that Prof.
Brown can solve problems of size 10050 = 5000 with his 2010 computer. This is because
for such a problem the running time is (10050)
3
= 100
3
50
3
10
6
m. Hence, the size
of the largest instance that can be solved has been multiplied by a factor 100.
On the other hand suppose that in 1970, Prof. Brown was running a (slow) algorithm
with a running time f (s) = 2
s
and that he was able to solve a problem of size at most 50
in one minute. As the largest instance he could solve in 1970 had size 50, we must have
had 2
51
> m. We claim that Prof. Brown can only solve problems of size at most 50 +20
with this new computer. This is because for a problem of size 50 +21 the running time is
2
50+21
= 2
51
2
20
> m2
20
> m10
6
as 2
20
> 10
6
. Hence, the size of the largest instance
that can be solved has only been increased by a constant amount !
3.2.5 The case of the Simplex
What about the Simplex algorithm? Dantzig introduced this algorithm in the late 1940s, and
while it performed well empirically, no proof or disproof of its efciency existed until the
early 1970s. In 1972, Klee and Minty showed that there are instances for which the Simplex
algorithm takes an exponential (in the input size) number of iterations to terminate. Therefore,
the Simplex algorithm is not a fast algorithm, according to our denition. Recall however, that
our analysis is a worst case analysis, and while it is true that some instances will require an
exponential number of iterations, it can be shown that the Simplex is in fact very fast for a
typical instance.
In 1979, the Soviet mathematician L.G. Khachiyan proved that the Ellipsoid Method
(originally proposed by Shor as well as Nemirovski and Yudin for a class of well-behaved
non-linear optimization problems) can be implemented so that it becomes a polynomial-time
algorithm to solve linear programs, thereby showing that there is a fast algorithm for linear
programming. While this discovery received a great deal of attention (among others, an article
in the New York Times), the excitement about its practical impact quickly dissipated. While
the Ellipsoid Method outperforms the Simplex method in a worst case scenario (and the Ellip-
soid Method is very useful in the development of optimization theory), the Simplex algorithm
is much faster for typical instances. Since Khachiyans discovery, many variants and other
polynomial algorithms for linear programming have been developed, and some of them are
competitive with the Simplex method on typical instances. There are also many variants of
Simplex Method. For many of these variants, the negative results of Klee and Minty can
be extended (proving that such variants are exponential-time algorithms, in the worst-case).
However, it is still an open problem whether there exists an efcient way of choosing enter-
ing/leaving variables in a Simplex algorithm so that the number of iterations of the Simplex
algorithm can be bounded by a polynomial function of the input size.
3.3. HARD PROBLEMS 78
3.3 Hard problems
In the previous section we have distinguished between fast and slowalgorithms for a particular
problem. In this section we will turn our attention to the following question: given a particular
problem, can we hope to have a fast algorithm to solve that problem?
3.3.1 Decision problems - some examples
A problem where the answer is either YES or NO is known as a decision problem. We have
already seen some examples of decision problems,
LP feasibility.
Given: Matrix A and vector b.
Question: Is there a solution x to Ax b?
0,1 feasibility.
Given: Matrix A and vector b.
Question: Is there a solution x 0, 1
n
to Ax b?
Let x
1
, x
2
, . . . , x
n
denote Boolean variables. So, x
j
is either TRUE or FALSE. A literal
is either a variable x
j
or its complement x
j
. A clause is a disjunction of a nite collection
of literals (e.g., clause C
j
can be (x
5
x
3
x
2
x
10
)). A formula is a conjunction of a nite
collection of clauses. For example,
(x
5
x
3
x
2
x
10
) ( x
1
) (x
2
x
4
x
8
x
9
x
7
).
A formula is satised for an assignment of TRUE/FALSE values to its variables if every clause
is satised. For instance, in the above example, we can satisfy the formula by assigning x
1
FALSE, x
2
TRUE, x
3
FALSE and all other variables to any TRUE/FALSE.
Satisability.
Given: A boolean formula.
Question: Is there an assignment of values to the variables that satises the formula?
We can reformulate optimization problems as decision problems. For instance given a
graph with non-negative weights and vertices s, t, rather than asking what is length of the
shortest st-path, we can consider the following decision problem.
Shortest st-path.
Given: Graph G, vertices s, t, non-negative edge weights w and real number k.
Question: Is there an st-path of G of length k?
3.3.2 Polynomial reducibility
We revisit the ideal of reducibility introduced in Section 2.6.1. Given two classes of deci-
sion problems, say A and B. If given an polynomial-time algorithm to solve all instances of
problem B, we can solve every instance of problem A in polynomial time, then we say A is
polynomially-reducible to B.
Example 3. For instance, suppose that you had a polynomial time algorithm to solve the
0, 1 feasibility problem. You could then proceed as follows to solve Satisability. Given
a formula with clauses C
1
,C
2
, . . . ,C
m
and variables x
1
, x
2
, . . . , x
n
, dene the following 0, 1
feasiblity problem (we can multiply all constraints by 1 to obtain constraints),
j:x
j
C
i
x
j
+

j: x
j
C
i
(1x
j
) 1 (i 1, . . . , m) (3.1)
x
j
0, 1 ( j 1, . . . , n) (3.2)
Then, it can be readily checked that there is a solution to the formula if and only if there is a
solution to (3.1), (3.2). Hence, Satisability is polynomially-reducible to 0, 1 feasibility.
3.3.3 The P and NP-complete classes
The class P is the class of all decision problem for which there exists a polynomial-time
algorithm that solves this problem. Another important class of decision problem is the NP-
complete class. To avoid technical difculties, we shall not give a formal denition, we point
out however, a key property:
for any two NP-complete decision problems Q
1
and Q
2
3
we have that Q
2
is
polynomially-reducible to Q
1
and that Q
1
is polynomially reducible to Q
2
.
In particular this implies that if there exists a polynomial-time algorithm to solve any problem
that is NP-complete, then there exists a polynomial-time algorithm to solve every problem
that is NP-complete. Since there are hundreds of NP-complete problems and nobody
knows how to solve any of these problems in polynomial-time, the prevailing view is that no
polynomial-time algorithm exists for such problems, though no one has been able to give a
proof of that conjecture. Indeed, it is one of the seven Millennium Prize Problems selected by
the Clay Mathematics Institute to carry a US$ 1,000,000 prize for the rst correct solution.
We say that a decision problem Q is NP-hard, if any NP-complete problem is polyno
mially-reducible to Q. Satisability is known to be NP-complete. Since we proved in
Example 3 that Satisability is polynomially-reducible to 0, 1 feasibility, it follows that 0, 1-
feasibility is NP-hard. Suppose you are asked by your boss to nd a polynomial-time al-
gorithm to solve a decision problem. In spite of your hard work, you are unable to nd such
an algorithm, you could of course admit failure to your boss. A much better option would
be for you to prove that the problem is NP-hard (by nding a polynomial reduction to an
NP-complete problem). This should at least convince your boss, that ring you is unlikely
to help her, as no one knows a polynomial-time algorithm for this problem (and quite possibly
none exists).
While it is conjectured the class P and NP-complete are disjoint. Problems with very
similar formulations appear in both classes as the following table indicates,
3
We say that a problem is NP-complete to mean that it is in class of NP-complete problems.
Polynomial NP-Complete
(1) LP feasibility 0,1 feasibility
Given graph G = (V, E), vertices s, t Given graph G = (V, E), vertices s, t, u.
(2) c N
E
and k N. c N
E
and k N.
Is there an st-path of length <= k? Is there an st-path of length <= k using u?
Given graph G = (V, E) and k N. Given a hypergraph (V, E ) and k N.
(3) Is there a matching of cardinality k? Is there a matching of cardinality k?
(1) in the table shows that restricting the variables to 0, 1 values makes the problem difcult.
With (2) we see that adding the seemingly innocuous condition that the st-path visits a pre-
scribed vertex u is sufcient to make the problem difcult. A hypergraph is a pair (V, E )
where V are the vertices, and E are the hyperedges, which are subsets of the vertices. Thus
a graph is a hypergraph, where every hyperedge has cardinality 2. A matching is a set of
pairwise disjoint hyperedges. (3) indicates that nding a maximum cardinality matching in a
hypergraph is likely to be hard, while the maximum cardinality matching in graphs can solved
by a fast algorithm.
Chapter 4
Introduction to duality
Suppose your boss asked you to solve an optimization problem and that you were able to nd
a feasible solution x
for it. Your boss now naturally asks you Is x
optimal? Suppose that

you know the answer to the question, maybe because youre a genius, or maybe because you
worked for days on this problem. But in the end, you will have to convince your boss that
your answer is correct, and he is not a genius and too busy to work for days on the problem.
If the answer is no, there is an easy way to do this just show your boss a better solution.
If the answer is yes, we want a similarly easy way to convince your boss that every feasible
solution has objective value at most that of x
.
In the previous chapter, we have shown that when the simplex algorithm nds an optimal
basic solution x
it also provides a proof that no solution has value larger than that of x
. In this
chapter, we introduce the concept of duality which allows us to generate succinct certicates
of optimality for optimal solutions to linear programs. We can also obtain such certicates
of optimality for special types of integer programs. Duality, as we will see, is however even
more powerful than that, and can be used in the design of algorithms to solve optimization
problems.
83
4.1. A FIRST EXAMPLE: SHORTEST PATHS 84
4.1 A rst example: Shortest paths
Let us start with an example. Recall the shortest path problem from Chapter 1, where we are
given a graph G= (V, E), non-negative lengths c
e
for all edges e E, and two specic vertices
s, t V. An st-path P is a sequence of edges
v
1
v
2
, v
2
v
3
, . . . , v
k2
v
k1
, v
k1
v
k
in E such that v
1
= s, v
k
= t, and v
i
,= v
j
for all i ,= j. We are looking for an st-path of
minimum total length (c
e
: e P).
3
2
1
2
2
4
1
4
s
t
a
d
c
b
The gure on the right shows an
instance of this problem. Each of the
edges in the graph is labeled by its
length. The thick black edges in the
graph form an st-path P of total length 7. Is this a shortest path? The answer is yes, but how
could you convince your boss?
3
2
1
2
2
4
1
4
s
t
a
d
c
b
({s},3)
({s,a,b,c,d},1)
({s,a,c},2)
({s,a},1)
Here is a nice way of accomplish-
ing this. The gure on the left shows
the same graph as before together with
a set of four moats each of which sep-
arates s from t. Each moat is labeled
by a pair (S
i
, y
i
), where
1. S
i
is the set of vertices of V that are on the s-side of the moat, and
2. y
i
is the width of the moat.
We say that an edge uv E crosses a moat S
i
if u is in S
i
, and v is outside, in V S
i
. For
example, edge ab in the gure on the left crosses moats s, a and s, a, c. We say that a
collection S =(S
1
, y
1
), . . . , (S
q
, y
q
) of moats and their widths is feasible if
(a) each moat separates s from t; i.e., s S
i
and t V S
i
for all i, and
CHAPTER 4. INTRODUCTION TO DUALITY 85
(b) each edge crosses moats of total width no larger than its length; i.e.,
(y
i
: u S
i
, v V S
i
) c
uv
,
for all uv E.
One easily veries that the moat system in the above example is feasible.
Proposition 15. Let (S
i
, y
i
)
q
i=1
be a feasible collection of moats together with their widths.
The length of any st-path P in G is then at least y
1
+. . . +y
q
.
Proof. Let P = v
1
v
2
, v
2
v
3
, . . . , v
k1
v
k
be a st-path in G. For each 1 j k 1, let I
j
be
the set of indices of moats that are crossed by edge v
j
v
j+1
. Feasibility then implies that
iI
j
y
i
c
v
j
v
j+1
for all j, and therefore
k1
j=1
iI
j
y
i
k1
j=1
c
v
j
v
j+1
= c(P).
Again by feasibility, each moat separates s from t, and Theorem 1 implies that P must contain
at least one edge that crosses S
i
for all i. Thus, the left-hand side of the above inequality is at
least
q
i=1
y
i
. This nishes the proof of the proposition.
The collection of moats given in the example has total width 7, and by Proposition 15
this is therefore a lower-bound on the length of a shortest st-path in G. On the other hand,
the path P depicted in the gure has total length 7, and this must therefore be optimal. The
family of moats and widths in this example is a certicate of optimality. Such a certicate
is obviously nice to have; for one thing, it will convince your boss that your path is indeed a
shortest st-path.
4.2 Bounding the optimal value of a linear program
Let us now consider the linear program
minz(x) = c
T
x : Ax b, x 0, (4.1)
4.2. BOUNDING THE OPTIMAL VALUE OF A LINEAR PROGRAM 86
where
A =
_
_
_
2 1
1 1
1 1
_
_
_
b =
_
_
_
20
18
8
_
_
_
c =
_
2
3
_
.
Since the objective function is min, and the constraints are Ax b, this LP is not in SEF. It
is easy to nd feasible solutions; e.g., the vectors (8, 16)
T
, (10, 10)
T
and (5, 13)
T
all satisfy
the constraints of (4.1). Their objective values are 64, 50, and 49, respectively, and hence the
rst two candidate solutions are clearly not optimal. We claim that (5, 13) is optimal, and will
now develop a succinct argument to show this.
The argument is similar to the one in Section 2.1.2. We will prove that z( x) 49 for every
feasible solution x. Since (5, 13)
T
has value 49, it must then be optimal. How can we nd
(and prove) such a lower bound?
Let us pick non-negative values y
1
, y
2
and y
3
and create a new inequality by multiplying
the rst inequality of Ax b by y
1
, the second inequality by y
2
, the third by y
3
and by adding
the resulting inequalities. This can be expressed as y
T
Ax y
T
b or equivalently as,
(y
1
, y
2
, y
3
)
_
_
_
2 1
1 1
1 1
_
_
_
x (y
1
, y
2
, y
3
)
_
_
_
20
18
8
_
_
_
. (4.2)
It is important to observe that we must require that y is non-negative in order to preserve the
direction of the inequalities. If we choose values, y
1
= 0, y
2
= 2 and y
3
= 1 we obtain the
inequality,
(1, 3)x 44
or equivalently,
0 44(1, 3)x.
This inequality holds for every feasible solution of (4.1). Adding this inequality to the objec-
tive function z(x) = (2, 3)x yields,
z(x) (2, 3)x +44(1, 3)x = 44+(1, 0)x.
Let x be any feasible solution. As x 0 and (1, 0) 0
T
we have (1, 0) x 0. Hence, z( x) 44.
Thus, we have proved that no feasible solution has value smaller than 44. Note, this is not
quite sufcient to prove that (5, 13)
T
is optimal. It shows however, that the optimum value for
(4.1) is between 44 and 49. It is at most 49, as we have a feasible solution with that value, and
it cannot be smaller than 44 by the previous argument.
Let us search for y
1
, y
2
, y
3
0 in a systematic way. We rewrite (4.2) as
0 (y
1
, y
2
, y
3
)
_
_
_
20
18
8
_
_
_
(y
1
, y
2
, y
3
)
_
_
_
2 1
1 1
1 1
_
_
_
x
and add it to the objective function z(x) = (2, 3)x, to obtain,
z(x) (y
1
, y
2
, y
3
)
_
_
_
20
18
8
_
_
_
+
_
_
_
(2, 3) (y
1
, y
2
, y
3
)
_
_
_
2 1
1 1
1 1
_
_
_
_
_
_
x. (4.3)
Suppose that we pick, y
1
, y
2
, y
3
0 such that
(2, 3) (y
1
, y
2
, y
3
)
_
_
_
2 1
1 1
1 1
_
_
_
0.
Then for any feasible solution x, inequality (4.3) implies that
z( x) (y
1
, y
2
, y
3
)
_
_
_
20
18
8
_
_
_
.
For a minimization problem the larger the lower bound the better. Thus, the best possible
upper bound for (4.1) we can achieve using the above argument is given by the optimal value
to the following linear program,
max (y
1
, y
2
, y
3
)
_
_
_
20
18
8
_
_
_
subject to
(2, 3) (y
1
, y
2
, y
3
)
_
_
_
2 1
1 1
1 1
_
_
_
0
y
1
, y
2
, y
3
0
4.2. BOUNDING THE OPTIMAL VALUE OF A LINEAR PROGRAM 88
which we can rewrite as,
max (20, 18, 8)y
subject to
_
_
_
2 1
1 1
1 1
_
_
_
T
y
_
2
3
_
y 0.
(4.4)
Solving this linear program gives
y
1
= 0, y
2
=
5
2
, and y
3
=
1
2
,
and this solution has objective value 49. Since solution (5, 13)
T
has value 49, it is optimal.
Linear program (4.4) is called the dual of the original LP (4.1). The original LP (4.1) is
referred to as the primal.
Let us generalize the previous argument and consider the following primal linear program,
minc
T
x : Ax b, x 0 (4.5)
We rst choose a vector y 0 and create a new inequality,
y
T
Ax y
T
b.
This last inequality is obtained from Ax b by multiplying the 1st inequality by y
1
the 2nd
by y
2
the 3rd by y
3
etc, and by adding all of the resulting inequalities together. This inequality
can be rewritten as
0 y
T
by
T
Ax
which holds for every feasible solution x of (4.5). Thus, adding this inequality to the objective
function z(x) = c
T
x yields,
z(x) y
T
b+c
T
x y
T
Ax = y
T
b+(c
T
y
T
A)x. (4.6)
Suppose that because of the choice of y, c
T
y
T
A 0
T
. Let x be any feasible solution. As
x 0 we have that (c
T
y
T
A) x 0. It then follows by (4.6) that z( x) y
T
b. Thus, we have
shown that for all y 0 such that c
T
y
T
A 0
T
the value y
T
b is an lower bound on the
value of the objective function. Finally, note that the condition c
T
y
T
A 0
T
is equivalent to
y
T
A c
T
, i.e. to A
T
y c.
The best lower bound we can get in this way is the optimal value of
maxb
T
y : A
T
y c, y 0. (4.7)
This is the dual of the linear program (4.5).
Example 4. Consider the problem
min (5, 6, 8, 4, 1)x
subject to
_
_
_
2 1 1 1 0
1 3 1 1 1
2 0 3 1 1
_
_
_
x
_
_
_
1
9
6
_
_
_
x 0.
Its dual is given by
max (1, 9, 6)y
subject to
_
_
_
_
_
_
_
_
2 1 2
1 3 0
1 1 3
1 1 1
0 1 1
_
_
_
_
_
_
_
_
y
_
_
_
_
_
_
_
_
5
6
8
4
1
_
_
_
_
_
_
_
_
y 0.
By construction, the objective value of any feasible solution y to (4.7) provides an upper
bound on the objective value of any feasible solution x to (4.5). This is summarized in the
4.3. THE SHORTEST-PATH EXAMPLE REVISITED 90
following theorem. Consider
minc
T
x : Ax b, x 0 (P)
maxb
T
y : A
T
y c, y 0 (D)
Theorem 16. (Weak Duality)
For every feasible solution x of (P) and every feasible solution y of (D) we have b
T
y c
T
x.
Proof. Let x be a feasible solution of (P) and let y be a feasible solution (D). Then,
b
T
y = y
T
b y
T
(Ax) = (A
T
y)
T
x c
T
x.
The rst inequality follows from the fact that y 0 and that Ax b. The second inequality
follows from the fact that A
T
y c and that x 0.
We have already used the following immediate consequence of this theorem.
Corollary 17. If x
is a feasible solution of (P), y
is a feasible solution to (D), and c

T
x
=
b
T
y
, then x
is an optimal solution of the primal, and y
is an optimal solution of the dual.

Proof. Since c
T
x cannot be smaller than b
T
y
for any feasible solution x of (P) and since x
achieves this bound, x
is optimal. The same argument applies to y
.
4.3 The shortest-path example revisited
Let us return to the shortest path problem from Section 4.1 where we were given a graph
G = (V, E), lengths c
e
0 for all edges e E, and two vertices s, t V. The gure below
shows a somewhat simpler instance of this problem. Like before, vertices and edges are
labeled by their name and length, respectively.
3
2
2
1
4
s
t
a
b
We recall the integer programming formulation for this problem from Section 1.2.2: the IP
has a variable x
e
for each edge e E; the variable takes on value 1 if the corresponding edge is
part of the selected st-path, and 0 otherwise. There is a constraint for each st-cut. Recall that,
for a subset of vertices S, (S) denotes the set of all edges that have one endpoint in S, and
one in its complement V S; (S) is an st-cut if S contains s but not t. In the above example,
the list of such sets S is as follows:
S =s, s, a, s, b, s, a, b,
and the following table lists the corresponding st-cuts:
S (S)
s sa, sb
s, a sb, ab, at
s, b sa, ab, bt
s, a, b at, bt
With this we can write down the integer program:
min (3, 4, 1, 2, 2)(x
sa
, x
sb
, x
ab
, x
at
, x
bt
)
T
subject to
_
_
_
_
_
1 1 0 0 0
0 1 1 1 0
1 0 1 0 1
0 0 0 1 1
_
_
_
_
_
_
_
_
_
_
_
_
_
x
sa
x
sb
x
ab
x
at
x
bt
_
_
_
_
_
_
_
_
1
0 x 1, x integer.
(4.8)
Once again, we obtain the linear programming relaxation of (4.8) by ignoring the condition
that x is integer. We may also drop the upper bound constraints x 1 as in an optimal solution,
no variable will ever take on a value greater than 1 (we encourage the reader to convince
herself that this is indeed true). The dual of this LP has a variable y
S
for every st-cut (S),
and a constraint for each edge e E.
min 1
T
(y
s
, y
s,a
, y
s,b
, y
s,a,b
)
T
subject to
_
_
_
_
_
_
_
_
1 0 1 0
1 1 0 0
0 1 1 0
0 1 0 1
0 0 1 1
_
_
_
_
_
_
_
_
_
_
_
_
_
y
s
y
s,a
y
s,b
y
s,a,b
_
_
_
_
_
_
_
_
_
_
_
_
_
3
4
1
2
2
_
_
_
_
_
_
_
_
y 0
(4.9)
Constraint 1 of the dual states that y
s
+y
s,b
3. The variables on the left-hand side
of this inequality correspond to st-cuts (s) and (s, b). Notice that these are precisely
the st-cuts that contain edge ab! The right-hand side is the length of edge ab, and hence this
constraint restricts the total value of variables of st-cuts that contain ab to the length of edge
ab.
3
2
2
1
4
s
t
a
b
({s},3)
({s,a,b},1)
({s,a},1)
Solving (4.9) gives the following optimal solution
of value 5:
y
s
= 3, y
s,a
= 1, and y
s,a,b
= 1.
The gure on the right illustrates this solution. It is
drawn suggestively, and should remind the reader of the
moat certicate from Section 4.1. Each st-cut (S) cor-
responds in a natural way to a moat separating S from V S. For example, the moat corre-
sponding to s, a separates s and a from b and t. Variable y
S
is the width of moat S. The
system of moats, and width corresponding to the above dual solution is
(s, 3), (s, a, 1), (s, a, b, 1),
and is easily checked to be feasible: each edge e crosses moats of total width at most c
e
.
Proposition 15 therefore shows that the length of any st-path must be at least the sum of the
moat widths; i.e. 5. This is equal to the length of the path sa, ab, and we therefore conclude
that it is a shortest st-path.
As we will see, it is no coincidence that the dual solution y corresponds to a feasible set
of moat widths. In fact, the dual constraints enforce this fact! For example, consider edge at;
the dual constraint for this edge states
y
s,a
+y
s,a,b
2.
The left-hand side is the sum of the widths of moats that are crossed by at, and the right-hand
side is the cost of the edge.
In a general instance of the shortest path problem, we are given a graph G= (V, E), vertices
s, t V, and lengths c
e
0 for all e E. The primal linear program is then given by
min

(c
e
x
e
: e E)
subject to
(x
e
: e (S)) 1 ((S) is an st-cut)
x 0
(4.10)
We can rewrite (4.10) in matrix form as
minc
T
x : Ax 1, x 0,
where c is the vector of edge lengths, and the matrix A is dened as follows:
1. the rows of A are indexed by the st-cuts (S) in G,
2. columns are indexed by edges e E, and
3. for every row (S) and column e, the corresponding entry of A is 1 if e is an edge in
(S), and 0 otherwise.
The dual of (4.10) is given by
max1
T
y : A
T
y c, y 0.
Let us try and understand this dual. There is a variable y
S
for every st-cut (S), and a constraint
for every edge e E. Consider the constraint for edge e E. The right-hand side of this
constraint is the cost c
e
of the edge, and the left-hand side corresponds to column e of A. This
column has a 1 for every st-cut (S) that contains e. Thus, the left-hand side of this constraint
is the sum of the dual variables y
S
of st-cuts (S) that contain e. We can therefore rewrite the
dual of (4.10) as follows:
max

(y
S
: (S) is an st-cut)
subject to
(y
S
: e (S)) c
e
(e E)
y 0
(4.11)
In the language of moats, the left-hand side of the constraint for edge e says that the
width of all moats that are crossed by e is at most c
e
. Thus, the set of feasible solutions to
(4.11) is exactly the set of feasible widths of moats. In other words, (S
i
, y
S
i
)
q
i=1
is a feasible
collection of moats and widths if and only if (y
S
1
, . . . , y
S
q
)
T
is feasible for (4.11). On the other
hand, let P be an st-path, and dene x by letting x
e
=1 whenever e is an edge on P, and x
e
=0
otherwise. Then x is feasible for (4.10), and Theorem 16 yields
i
y
S
i

e
c
e
x
e
= c(P).
We have used duality and the weak duality theorem to obtain a second proof of Proposition
15! To summarize, we have just seen a purely mechanical procedure to derive lower bounds
on the length of a shortest st-path. Algorithm 1 summarizes what we have done.
Algorithm 1 A procedure to derive bounds on the length of a shortest st-path
1: Formulate the shortest path problem as an integer program (IP).
2: Obtain the linear programming relaxation (P) of (IP).
3: Compute the dual (D) of (P).
4: The value of any feasible solution of (D) is a lower-bound for the optimal value (P), and
hence also for that of (IP).
We will now apply this procedure to a second minimization problem.
4.4 A second example: Minimum vertex cover
Given a graph G = (V, E), a set C V is a vertex cover of G if for every edge uv E at least
one of u and v is a member of C; i.e., formally C is a vertex cover if Cu, v ,= / 0 for all
uv E. The goal in the minimum-weight vertex cover problem is to nd a vertex cover C
with the smallest number of vertices. In the following we will often say that a vertex v covers
an edge e if v is an endpoint of e.
a
g
f e d
c b
Consider the instance depicted on
the right. It is not too hard to ver-
ify that both, the set of dark vertices,
and the set of light vertices form ver-
tex covers. Since there are four light
vertices, and three dark ones, the set
of light vertices can not possibly be
a minimum vertex cover. But is the
set of dark vertices a minimum vertex
cover? It turns out that the answer is yes. We will now prove this by following the procedure
described in Algorithm 1.
Step 1. Let us rst formulate the vertex cover problem as an integer program. We will
introduce an indicator variable x
v
for every vertex v V. Variable x
v
will take on value 1 if
we include vertex v in our vertex cover, and x
v
=0 otherwise. The vertex cover corresponding
to solution x is therefore given by C
x
=v : x
v
= 1, and it is feasible if
uv E u, vC
x
,= / 0 , or equivalently x
u
+x
v
1,
4.4. A SECOND EXAMPLE: MINIMUM VERTEX COVER 96
for all uv E. This leads to the following integer programming formulation:
min 1
T
x
subject to
x
u
+x
v
1 (for all uv E)
x 0, x integer.
(4.12)
For the instance given above, writing out the above integer program gives
min (1, 1, 1, 1, 1, 1, 1)(x
a
, x
b
, x
c
, x
d
, x
e
, x
f
, x
g
)
T
subject to
_
_
_
_
_
_
_
_
_
_
_
_
1 0 0 1 0 0 0
1 0 0 0 1 0 0
1 0 0 0 0 1 0
0 1 0 0 1 0 0
0 1 0 0 0 0 1
0 0 1 0 1 0 0
0 0 1 0 0 1 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
x
a
x
b
x
c
x
d
x
e
x
f
x
g
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
1
1
1
1
1
_
_
_
_
_
_
_
_
_
_
0 (x
a
, x
b
, x
c
, x
d
, x
e
, x
f
, x
g
)
T
integer.
(4.13)
Step 2. Find the linear programming relaxation of the IP. This step is easy: we just need
to drop the integrality condition from the statement of the integer program. For example, the
LP relaxation of (4.13) is given by
min (1, 1, 1, 1, 1, 1, 1)(x
a
, x
b
, x
c
, x
d
, x
e
, x
f
, x
g
)
T
subject to
_
_
_
_
_
_
_
_
_
_
_
_
1 0 0 1 0 0 0
1 0 0 0 1 0 0
1 0 0 0 0 1 0
0 1 0 0 1 0 0
0 1 0 0 0 0 1
0 0 1 0 1 0 0
0 0 1 0 0 1 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
x
a
x
b
x
c
x
d
x
e
x
f
x
g
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
1
1
1
1
1
_
_
_
_
_
_
_
_
_
_
(x
a
, x
b
, x
c
, x
d
, x
e
, x
f
, x
g
)
T
0
(4.14)
Step 3. Compute the dual of the LP relaxation. We do this by simply applying the formula
in (4.7) to (4.14).
max 1
T
y
subject to
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 0 0 0 0
0 0 0 1 1 0 0
0 0 0 0 0 1 1
1 0 0 0 0 0 0
0 1 0 1 0 1 0
0 0 1 0 0 0 1
0 0 0 0 1 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
y
ad
y
ae
y
a f
y
be
y
bg
y
ce
y
c f
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
1
1
1
1
1
_
_
_
_
_
_
_
_
_
_
(y
ad
, y
ae
, y
a f
, y
be
, y
bg
, y
ce
, y
c f
)
T
0.
(4.15)
Solving (4.15) gives the solution
y
ad
= y
be
= y
c f
= 1,
and y
e
= 0 for all other edges. This solution is indicated in thick black edges in the gure at
the beginning of this section. Notice that the value of y is 3, and hence by Theorem 16, and
the arguments of the previous section, no vertex cover can have fewer than 3 vertices. On the
other hand, a, b, c is a vertex cover of size 3, and hence, by Corollary 17, it is an optimal
vertex cover for the given instance.
Let us repeat step 3 for a general instance, and hence, let us compute the dual of the LP
relaxation of (4.12). In order to do this, we rst rewrite the LP in matrix form as
min1
T
x : Ax 1, x 0. (4.16)
The matrix A is as follows:
1. rows of A are indexed by the edges of G,
2. columns of A are indexed by the vertices of G,
3. A
ev
has value 1 if v is an endpoint of e, and A
ev
= 0 otherwise.
4.5. DUALS OF GENERAL LINEAR PROGRAMS 98
The dual of (4.16) is as follows,
max1
T
y : A
T
y 1, y 0.
Let us try to understand what this formulation says. There is one dual variable y
e
for each
edge e E, and one constraint for each vertex v V. The constraint for vertex v V simply
restricts the sum of the variables y
e
of edges with one end in v to be at most 1. We can
therefore rewrite the dual as follows,
max 1
T
y
subject to
(y
e
e (v)) 1 (for all v V)
y 0
(4.17)
Note that a 0, 1-vector y is feasible for (4.17) if and only if y is the incidence vector of a
matching in G; i.e., the set
e E : y
e
= 1
is a matching in G. The value of such a solution y is then simply the size of the matching. We
therefore obtain the following immediate consequence from Theorem 16.
Corollary 18. Let G = (V, E) be a graph, C V a feasible vertex cover, and M E be a
matching in G. Then the size of C must be at least the size of M.
The preceding corollary gives us a simple lower-bound on the size of any feasible vertex
cover. We obtained this lower-bound by following the procedure given in Algorithm 1.
4.5 Duals of general linear programs
In Section 4.2 we showed how to compute the dual of linear programs of the form,
minc
T
x : Ax b, x 0.
We will now see how to compute duals of linear programs in other forms. Let us consider the
primal linear program in SEF (standard equality form):
maxc
T
x : Ax = b, x 0. (P)
We showed in Proposition 3 that for any y such that y
T
A c
T
, b
T
y is an upper bound for (P).
The best upper bound we can get in this way is the optimal value of,
minb
T
y : A
T
y c. (D)
Note, that y is a free variable in the previous LP. The linear program (D) is called the dual of
(P).
Example 5. The dual of,
max (5, 6, 8, 4, 1)x
subject to
_
_
_
2 1 1 1 0
1 3 1 1 1
2 0 3 1 1
_
_
_
x =
_
_
_
1
9
6
_
_
_
x 0
is given by,
min (1, 9, 6)(y
1
, y
2
, y
3
)
T
subject to
_
_
_
_
_
_
_
_
2 1 2
1 3 0
1 1 3
1 1 1
0 1 1
_
_
_
_
_
_
_
_
_
_
_
y
1
y
2
y
3
_
_
_
_
_
_
_
_
_
_
_
5
6
8
4
1
_
_
_
_
_
_
_
_
.
4.5.1 Finding duals of general LPs
To nd the dual of an arbitrary linear program (P) we can proceed as follows:
Algorithm 2 Finding the dual of a general linear program.
1: Rewrite a given LP (P) into an equivalent LP (P) in SEF.
2: Find the dual (D) of (P).
3: Simplify (D) to obtain the dual (D) of (P).
Let us apply the above scheme to nd the dual of (D). We rst rewrite the LP in SEF.
The objective function is min b
T
y, i.e. max b
T
y. Constraints A
T
y c are equivalent to
A
T
y c. Using slack variables we rewrite this as A
T
y +z = c where z 0. We
replace the free variables y by non-negative variables y
+
, y
. The resulting LP is,

max (b
T
, b
T
, 0)(y
+
, y
, z)
T
subject to
_
A
T
A
T
I
_
_
_
_
y
+
y
z
_
_
_
=c
y
+
, y
, z 0.
We now take the dual to obtain,
min c
T
x
subject to
_
_
_
A
A
I
_
_
_
x
_
_
_
b
b
0
_
_
_
.
We can rewrite minc
T
x as maxc
T
x. The constraints Ax b and Ax b are equivalent
to Ax = b. Thus, this linear program is simply (P). Hence, we proved that the dual of the dual
of (P) is (P) itself. This holds in general:
Remark 19. The dual of the dual of a linear program, is the original linear program.
In particular, linear programs come in pairs and in duality statements we can interchange
the role of the primal and the dual. Let us give a second example for an application of Algo-
rithm 2. Consider the following LP:
max (1, 2, 4)(x
1
, x
2
, x
3
)
T
subject to
_
_
_
1 5 3
2 1 2
1 2 1
_
_
_
_
_
_
x
1
x
2
x
3
_
_
_
=
_
_
_
5
4
2
_
_
_
x
1
, x
2
0 (x
3
is free).
(4.18)
It was shown in Section 2.2 that this linear program is equivalent to the following LP in SEF,
max (1, 2, 4, 4, 0, 0)(x
1
, x
2
, x
+
3
, x
3
, x
4
, x
5
)
T
subject to
_
_
_
1 5 3 3 0 1
2 1 2 2 1 0
1 2 1 1 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
x
1
x
2
x
+
3
x
3
x
4
x
5
_
_
_
_
_
_
_
_
_
_
=
_
_
_
5
4
2
_
_
_
x
1
, x
2
, x
+
3
, x
3
x
4
, x
5
0.
The dual of that linear program is now given by,
min (5, 4, 2)(y
1
, y
2
, y
3
)
T
subject to
_
_
_
_
_
_
_
_
_
_
1 2 1
5 1 2
3 2 1
3 2 1
0 1 0
1 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
y
1
y
2
y
3
_
_
_
_
_
_
_
_
_
_
_
_
_
1
2
4
4
0
0
_
_
_
_
_
_
_
_
_
_
.
Notice rst that the third and fourth constraints together are equivalent to the equality con-
straint
(3, 2, 1)y = 4.
Moreover, the last two constraints say that y
1
must be non-positive, and y
2
must be non-
negative. Hence, we can simplify this linear program and rewrite it as,
min (5, 4, 2)(y
1
, y
2
, y
3
)
T
subject to
_
_
_
1 2 1
5 1 2
3 2 1
_
_
_
_
_
_
y
1
y
2
y
3
_
_
_
=
_
_
_
1
2
4
_
_
_
y
1
0, y
2
0 (y
3
is free).
(4.19)
Now compare primal LP (4.18) to its dual (4.19). The primal LP is a maximization LP, and
the dual is a minimization problem. As before, each of the primal constraints has an associated
dual variable. Specically, a constraint in the primal translates into a non-negative dual
variable (e.g., the second constraint of (4.18) and variable y
2
), a constraint in the primal
translates into a non-positive dual variable (e.g., the rst constraint of (4.18) and y
1
), and an
equality constraint gives rise to a free dual variable (e.g., constraint number 3 in (4.18) and
y
3
). Similarly, non-negative primal variables yield constraints in the dual (e.g., primal
variable x
1
, and the rst constraint of (4.19)), and free primal variables give rise to equality
constraints in the dual (e.g., variable x
3
and the third dual constraint). The following table
summarizes these correspondences and gives us a direct way of nding the dual of any linear
program.
Primal Dual
constraint 0 variable
max c
T
x = constraint free variable min b
T
y
subject to constraint 0 variable subject to
Ax ? b 0 variable constraint A
T
y ? c
x ? 0 free variable = constraint y ? 0
0 variable constraint
In using the above table, if our primal problem is a maximization problem, then we read
the table from left to right, otherwise (our primal problem is a minimization problem) we
read the table from right to left. In the earlier chapters we applied various transformation
to LP problems which lead to equivalent LP problems. However, some of these transforma-
tions change even the number of variables and constraints in the given LP. Therefore, it is
important that the students work through many exercises to get comfortable with how such
transformations affect the dual LP.
Finally, we note that the weak duality Theorem 16 naturally extends also to general linear
programs and their duals. Suppose that (P) is an arbitrary linear program with a maximization
objective, and let (D) be its minimization dual (obtained through Algorithm 2). We obtain the
following general weak duality theorem.
Theorem 20. (Weak Duality)
Let x and y be feasible solutions for (P), and (D), respectively. The objective value of x is at
most that of y.
The proof of this theorem is similar to that of Theorem 16. We leave the details to the
reader. The theorem has several important consequences.
Corollary 21. Consider a general linear program (P) with maximization objective and its
dual (D). Then the following holds:
1. If (P) is unbounded then (D) is infeasible,
2. If (D) is unbounded then (P) is infeasible,
3. If x
is a feasible solution of (P), y
is a feasible solution to (D), and the value of x
equals that of y
, then x
is an optimal solution of the primal, and y
is an optimal
solution of the dual.
Proof. (1) If (D) has a feasible solution y then the objective function value of (P) is bounded
above by the value of y, in particular, (P) is not unbounded. (2) If (P) has a feasible solution
x then the objective function value of (D) is bounded below by the value of x, in particular,
4.6. A THIRD EXAMPLE: MAXIMUM WEIGHT MATCHING 104
(D) is not unbounded. (3) Since the value of x cannot be larger than that of y
for any feasible

solution x of (P) and since x
achieves this bound, x
is optimal. The same argument applies

to y
.
4.6 A third example: Maximum weight matching
Recall the maximum-weight matching problem from Chapter 1: given a graph G = (V, E),
and a non-negative weight w
e
for each edge e E, we call a set M E of edges a matching,
if no two edges in M share a common vertex.
1
2
3
4
5
6
2
7
4
5
5
3
4
Figure 4.1: A matching instance.
The goal is to nd a matching M of maximum total
weight w(M) :=
eM
w
e
. The gure on the right shows
an instance of this problem. The edges are labeled by
their weights (the graph is drawn approximately to scale,
and the Euclidean length of the edges are roughly pro-
portional to their weights), and the vertices are labeled
by their number. The solid black edges clearly form
a matching M
, and its weight is 4 +5 +7 = 16. This

turns out to be a maximum matching. Let us once again
apply the procedure from Algorithm 1 to prove this.
We start by formulating the matching problem as an
integer program. We will have a variable x
uv
for each edge uv E, where x
uv
= 1 if uv is in
the matching and x
uv
= 0 otherwise. For every vertex u we can select at most one edge from
(u). Thus the integer program is as follows,
max w
T
x
subject to
_
x
uv
: uv (u)
_
1 (for all u V)
x 0, x integer.
(4.20)
Let us denote by (P) the linear programming relaxation, i.e. (P) is obtained from the previous
integer program by ignoring the condition that x be integer. We wish to compute the dual of
(P). We can rewrite (P) in matrix form as,
maxw
T
x : Ax 1, x 0, (P)
where 1 denotes the vector of all ones. The matrix A is as follows:
1. rows of A are indexed by the vertices of G,
2. columns of A are indexed by the edges of G,
3. row u of A is the 0, 1 vector with ones in positions corresponding to edges in (u).
4. the column of A corresponding to edge uv has exactly two 1-entries (in rows u and v),
and is 0 otherwise.
The dual of (P) is as follows,
min1
T
y : A
T
y w, y 0. (D)
u
for every
vertex u of the graph and there is one constraint for every edge uv of the graph. Note that 1
T
y
is simply
uV
y
u
. What is constraint uv of A
T
y w? The right-hand side is simply w
uv
. The
coefcients of the left-hand side correspond to row uv of A
T
, i.e. of column uv of A. It follows
from item (4.) of the above discussion of matrix A that the left-hand side of constraint uv is
y
u
+y
v
. Hence, we can rewrite (D) as follows,
min

uV
y
u
subject to
y
u
+y
v
w
uv
(for all uv E)
y
u
0 (for all u V).
(4.21)
4.6. A THIRD EXAMPLE: MAXIMUM WEIGHT MATCHING 106
In the special case of the matching instance of Figure 4.1, the dual becomes
min 1
T
y
subject to
_
_
_
_
_
_
_
_
_
_
_
_
1 1 0 0 0 0
1 0 0 1 0 0
1 0 0 0 0 1
0 1 1 0 0 0
0 0 1 1 0 0
0 0 0 1 1 0
0 0 0 0 1 1
_
_
_
_
_
_
_
_
_
_
_
_
T
_
_
_
_
_
_
_
_
_
_
y
1
y
2
y
3
y
4
y
5
y
6
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
2
4
7
4
5
5
3
_
_
_
_
_
_
_
_
_
_
_
_
(y
1
, y
2
, y
3
, y
4
, y
5
, y
6
) 0.
(4.22)
Solving this LP yields the optimal solution y
= (4, 1, 3, 2, 3, 3)
T
of value 16. Note that this is
exactly equal to the weight of the matching M
depicted in Figure 4.1. Thus, it follows from

Corollary 21 that M
is a maximum-weight matching.
1
3
2
3
3
1
2
6
4
3
5
4
We will now take a second look at the match-
ing dual (4.21), and we will argue that its fea-
sible solutions have a nice geometric interpreta-
tion. Consider placing a disk of radius y
u
0 at
each vertex u V, and call a collection of radii
y
u
uV
feasible if every edge uv E is covered
by the disks at its endpoints; i.e., for each edge
uv E, we want y
u
+y
v
w
uv
. The gure on the
left shows such a collection of disks for our in-
stance. With this denition a set of radii y
u
uV
is feasible if and only if y is feasible for (4.21).
Theorem 20 has now the following intuitive re-
formulation.
Proposition 22. Let y
u
uV
be a feasible set of radii, and let M be an arbitrary matching in
G. Then the weight w(M) of M is at most the sum of the radii
uV
y
u
.
4.7 A fourth example: Network ow
Suppose you and your friend own computers that are connected by some computer network
and you are interested in nding the maximum rate of transmission between the two of you.
In other words, you want to nd out the maximum number of data bits per second that the
network allows you to send between each other. For simplicity we make the (probably slightly
unrealistic) assumption that the network is symmetric; i.e., the transmission rate from you to
your friend equals that from your friend to you.
The above question would be easy to answer if the network consisted of a single link con-
necting the two of you, but this is not the case; the given network connects many computers,
and other devices such as routers and switches etc. In the following, we will show how linear
programming can be used to nd the maximum transmission rate. Furthermore, we will use
duality to derive useful upper bounds on this rate.
We can model the given network as a graph G = (V, E) whose vertices correspond to the
devices connected by our network, and each of whose edges qr E represents a physical link
connecting the entities corresponding to vertices q and r. Furthermore, we assume that we
know the maximum transmission rate c
e
for each link e E. Vertex s corresponds to your
computer and vertex t corresponds to your friends computer.
The following gure shows such an example. Vertices u and v are intermediate switches.
The number next to each edge e is its capacity c
e
which represents in this case the maximum
transmission rate (in megabytes/sec) possible for the link.
s t
u
v
40
20
10
20
40
A bit sent from s to t travels along a sequence of edges, visiting each vertex of G at most
once. For example, in the given graph, a bit could rst traverse edge su, followed by edges uv
and vt. We will call sequences like su, uv, vt st-paths. In the example, we have the following
4.7. A FOURTH EXAMPLE: NETWORK FLOW 108
st-paths,
P
1
=su, uv, vt P
2
= sv, vu, ut
P
3
=su, ut P
4
= sv, vt.
Sending x
j
megabytes along path P
j
consumes x
j
megabytes of the capacity of each of the
edges on P
j
. Suppose now that we are simultaneously sending x
j
megabytes per second along
path P
j
, for all 1 j 4. The total amount transmitted between s and t is x
1
+x
2
+x
3
+x
4
. For
every edge e, no more than c
e
megabytes per second is allowed to travel along edge e E. For
instance consider edge su. Then P
1
and P
3
are the two st-paths using uv. Thus the total amount
transmitted across su is x
1
+x
3
. It follows that the following constraint must be satised,
x
1
+x
3
40.
Similarly we obtain constraints for edges sv, uv, ut, vt, to be respectively,
x
2
+x
4
20 x
1
+x
2
10 x
2
+x
3
20 x
1
+x
4
40.
Hence the following linear program will nd the maximum transmission rate between s and
t.
max 1
T
(x
1
, x
2
, x
3
, x
4
)
T
subject to
_
_
_
_
_
_
_
_
1 0 1 0
0 1 0 1
1 1 0 0
0 1 1 0
1 0 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
_
_
_
_
_
_
_
_
_
40
20
10
20
40
_
_
_
_
_
_
_
_
x 0.
(4.23)
The dual of (4.23) has one variable y
e
for each edge e of G and and one constraint for each
st-path P.
min (40, 20, 10, 20, 40)(y
su
, y
sv
, y
uv
, y
ut
, y
vt
)
T
subject to
_
_
_
_
_
1 0 1 0 1
0 1 1 1 0
1 0 0 1 0
0 1 0 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
y
su
y
sv
y
uv
y
ut
y
vt
_
_
_
_
_
_
_
_
1
y 0.
(4.24)
Constraint 1 of the dual states that y
su
+y
vu
+y
vt
1. In other words, the sum of y
e
for every
edge e corresponding to the st-path P
1
is at least one. Suppose that y
is a feasible solution to
the dual (4.24) where all the entries of y
are either 0 or 1. Let C
be the set of edges e for

which y
e
=1. Then constraint 1 of the dual implies that C
contains at least one edge from P

1
.
Similarly, constraints 2, 3 and 4 of the dual imply respectively that C
contains at least one

edge from respectively P
2
, P
3
and P
4
. In other words, C
is a set which intersects every st-path

of G. We have seen such sets before, in Section 4.3: they are called st-cuts! Removing the
edges of C
from the graph disconnects s from t.

The value of the dual (4.24) for y
is simply
eC
c
e
, i.e. the sum of the capacity of
all the edges in the st-cut C
. By Corollary 21 we obtain immediately, that the maximum

transmission rate between s and t is at most equal to the minimum capacity of any st-cut.
In our example, one such st-cut of capacity 50 is given by C
= sv, uv, ut, and hence

the transmission rate between s and t is no more than 50 megabytes per second. This rate is
realized by the primal solution x
= (10, 0, 20, 20) which can be easily veried to satisfy the

constraints of (4.23). If y
is the incidence vector of C
then x
and y
are optimal solutions

for the primal and the dual respectively. Hence, x
is an optimal solution for the transmission

problem and C
is a minimal st-cut.
In a general instance of the transmission problem, we are given a graph G = (V, E) with
vertices s and t where s ,= t, and every edge e has a capacity c
e
0. For every st-path P we
have a variable x
P
that indicates the amount sent across the network along P. The primal LP
4.7. A FOURTH EXAMPLE: NETWORK FLOW 110
is then given by
max

_
x
P
: P is an st-path
_
subject to
_
x
P
: P is an st-path using e
_
c
e
(e is an edge of G)
x
P
0 (P is an st-path.)
(4.25)
Solutions to linear programs of this form are called st-ows and an optimal solution a maxi-
mum st-ow. We can rewrite (4.25) in matrix form as,
max1
T
x : Ax c, x 0
where c is the vector of capacities and where the matrix A is dened as follows:
1. the rows of A are indexed by the edges of G,
2. columns of A are indexed by the st-paths of G,
3. for every row e and column P, the corresponding entry of A is 1 if e is an edge of P and
is 0 otherwise.
The dual of (4.25) is given by,
minc
T
y : A
T
y 1, y 0.
e
for every
edge e and there is one constraint for every st-path P. Let us try to understand one such
constraint. The right-hand side of that constraint is 1. The coefcients of the left-hand side
correspond to row P of A
T
, i.e. to column P of A. Column P of A has a 1 for each edge e that
is in P. Thus, the left hand side of that constraints is the sum of the y
e
for every edge e that is
in P. Therefore, we can rewrite the dual as follows,
min

_
c
e
y
e
: e is an edge of G
_
subject to
_
y
e
: e is an edge of P) 1 (P is an st-path of G)
y
e
0 (e is an edge of G.)
(4.26)
Let C
be any st-cut of G. Dene y
as follows:
y
e
=
_
_
1 if e C
0 otherwise.
Then the objective function of (4.26) is the sum of the capacities of the edges in C
. We
call this quantity the capacity of the st-cut of C
. Moreover, y
is clearly feasible for (4.26).

Hence, by the corollary of Theorem 16, it follows that c
T
y
is an upper bound for (4.25).

Hence, we have shown the following result.
Remark 23. For any graph G = (V, E) with vertices s, t where s ,=t and capacities c 0, the
value of any st-ow is at most equal to the capacity of any st-cut.
The aforementioned statement is intuitively obvious. In fact a much stronger result holds,
namely,
Theorem 24 (Max Flow- Min-Cut Theorem). For any graph G = (V, E) with vertices s, t
where s ,=t and capacities c 0, the value of the maximum st-ow is equal to the capacity of
the minimum st-cut.
The proof of this result is outside the scope of this course.
4.8 A fth example: Scheduling
Recall the scheduling example from Chapter 1.2.3, where we are given a set of n tasks T =
1, . . . , n. Each task i T has a start time s
i
N, an end-time t
i
N, and a non-negative
4.8. A FIFTH EXAMPLE: SCHEDULING 112
prot p
i
that we incur if we complete the task. A feasible schedule is a subset S T of jobs
such that the intervals [s
i
, t
i
) and [s
j
, t
j
) are disjoint for any pair of distinct tasks i, j S. The
goal is to nd a feasible schedule which maximizes the total prot.
An example is provided by the gure below. This gure shows the hours of your work
day, and represents of the tasks in T as rectangles. Each task is labeled by a unique number
(at its front). Its reward is indicated in the middle of the corresponding rectangle. An example
of a feasible schedule is the set 1, 3, 5, 7 which has a total prot of 12.
2
3
3
4
3
5
6
6pm
2 1 2 3 3 1 0 0 0 0
dual
solution
1
2
3
4
5
6
7
Figure 4.2: A scheduling example. Each task is labeled by its number and prot.
We shall derive an upper bound on the possible prot of feasible schedules following the
by now familiar strategy given in Algorithm 1. Let us start by nding an IP formulation. For
each task j 1, . . . , 7, we create a variable x
i
, where x
i
takes value 1 if task i is one the
tasks that we intend to complete and takes value 0, otherwise. We add a constraint for every
hour 8am h 5pm, and enforce that at most one of the tasks overlapping the time window
(h, h+1) is chosen. For example, for h = 8am, the constraint would be
x
1
+x
3
1,
and for h = 1pm, we obtain
x
4
+x
5
+x
6
1.
The yields the following integer program whose objective is to maximize the total prot.
max (2, 4, 3, 5, 3, 6, 3)(x
1
, x
2
, x
3
, x
4
, x
5
, x
6
, x
7
)
T
subject to
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 0 0 0 0 0
1 1 0 0 0 0 0
0 1 1 0 0 0 0
0 0 1 0 0 0 0
0 0 1 1 0 0 0
0 0 0 1 1 1 0
0 0 0 0 1 1 0
0 0 0 0 1 1 0
0 0 0 0 0 1 1
0 0 0 0 0 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
x
1
x
2
x
3
x
4
x
5
x
6
x
7
_
_
_
_
_
_
_
_
_
_
_
_
1
x 0, x integer.
(4.27)
Let (P) be the linear programming relaxation of (4.27); i.e. (P) is obtained from (4.27) by
ignoring the condition that x be integer. The dual of (P) has a variable for each of the 10 con-
straints of the above LP; i.e. it has variables y
h
for each time h 8, 9, 10, 11, 12, 1, 2, 3, 4, 5
(where 8 is 8am, 9 is 9am, 12 is noon etc).
min 1
T
(y
8
, y
9
, y
10
, y
11
, y
12
, y
1
, y
2
, y
3
, y
4
, y
5
)
T
subject to
_
_
_
_
_
_
_
_
_
_
_
_
1 1 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0
0 0 1 1 1 0 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 0 1 1 1 0 0
0 0 0 0 0 1 1 1 1 0
0 0 0 0 0 0 0 0 1 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
y
8
y
9
y
10
y
11
y
12
y
1
y
2
y
3
y
4
y
5
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
2
4
3
5
3
6
3
_
_
_
_
_
_
_
_
_
_
_
_
y 0.
(4.28)
4.8. A FIFTH EXAMPLE: SCHEDULING 114
We want to rewrite the LP relaxation of (4.27) and its dual (4.28) in a more intuitive way.
To achieve this, we require additional notation. First, let us introduce the placeholder H for
the set of times 8am, . . . , 5pm. For each time h H , we then let T
h
be the set of tasks in
T that overlap the time window (h, h+1). The linear program equivalent to the LP relaxation
of (4.27) is shown below on the left. For task j T , we also let H
j
be the set of hours during
which j needs to be executed. The LP equivalent to the dual (4.28) is shown below on the
right:
max p
T
x
subject to
_
x
j
: j T
h
_
1 h H
x 0,
min 1
T
y
subject to
_
y
h
: h H
j
_
p
j
j T
y 0.
Let us rst develop some intuition for the dual constraints and variables. Suppose that,
in order for task j T to be processed, you need to pay (someone) p
j
dollars, and that this
payment is due right after the task has been processed; i.e., right after time t
j
. In order to
accomplish this, you would want to set aside money in time so that you never incur a decit.
This can be accomplished by the following savings plan. For each time h H , set aside y
h
dollars in a way, such that you set aside at least p
j
dollars during the execution time of task
j T , for all j. In other words, you want your savings y to satisfy
_
y
h
: h H
j
_
p
j
for each task j T . The disjointness of the tasks in any feasible schedule together with the
above condition ensure that you will never run out of money. Moreover, the total amount of
money you set aside is at least the total prot of any feasible schedule.
The dual solution y
given in the bottom row in Figure 4.2 shows such a savings plan; for
any task j T , the savings accrued in the time interval given by s
j
and t
j
are at least the
prot p
j
of the task. E.g., consider job 6 with s
6
= 1pm, and t
6
= 4pm. The savings during
this time are
y
12
+y
1
+y
2
+y
3
= 3+0+0+3,
and this is exactly the prot of task 6. The value of that dual solution is given by 1
T
y
=
2 +1 +1 +0 +2 +3 +0 +0 +3 +0 = 12. Hence, 12 is an upper bound for the integer pro-
gram (4.27), i.e. for the prot obtained from the scheduling problem. Consider the candidate
solution depicted in Figure 4.2, consisting of tasks 2, 4 and 7 of total prot p
= 12 (the cor-
responding feasible solution for (4.27) is given by vector x
= (0, 1, 0, 1, 0, 0, 1)
T
). It follows
that selecting tasks 2, 4 and 7 yields an optimal schedule.
4.9 The duality theorem
4.9.1 Integer programs and the duality gap
We have seen a considerable variety of optimization problems throughout this section: the
shortest path problem in Section 4.1, the vertex cover problem in Section 4.4, the matching
problem in Section 4.6, the st-ow problem in Section 4.7 , and the scheduling problem in
Section 4.8. For each of these problems we outlined a systematic way of constructing bounds
on the value of an optimal solution. We applied this method to specic example instances,
and were always able to nd feasible solutions that had the same value as the bound we
constructed. Thus, proving that the feasible solution was indeed optimal. Will this always
work? The answer is no, as we illustrate with the following instance of the matching problem.
The graph G consists of three edges uv, vw, uw. Each edge has weight 1. A matching consists
v
u
w
of the empty set or a single edge. Thus M =uv is a maximum matching and it has weight 1.
4.9. THE DUALITY THEOREM 116
To nd an upper bound for the possible value of a matching we use the linear program (4.21)
which corresponds to the following in this example,
min y
u
+y
v
+y
w
subject to
_
_
_
0 1 1
1 0 1
1 1 0
_
_
_
_
_
_
y
u
y
v
y
w
_
_
_
_
_
_
1
1
1
_
_
_
y
u
, y
v
, y
w
0.
It can be readily checked that the unique optimal solution to that linear program is y
u
= y
v
=
y
w
=
1
2
which has value 3/2 (we represented the corresponding disks in the gure). However,
the weight of the largest matching is 1, thus there is a gap between the weight of the largest
matching and the value of the smallest upper bound. This gap is known as the duality gap.
We will see in Section 5.2 that this does not occur when the graph is bipartite.
4.9.2 Linear programs and the duality theorem
Throughout this section, (P) will denote an arbitrary linear program which maximizes its
objective function, and let (D) denote its dual. Recall that the weak duality Theorem 20 says
that for any pair x, y of primal, and dual feasible solutions, the value of x is at most that of y.
Suppose x
is an optimal solution to (P), can we nd a solution y
to (D) whose value is

equal to that of x
? In other words, given an optimal solution to a linear program, is there a

way of deriving an upper bound from the dual that proves that the optimal solution is indeed
optimal? This is in fact the case. We now give the precise statement,
Theorem 25. (Strong Duality)
Suppose x
is an optimal solution to (P). Then there exists an optimal solution y
to (D).
Moreover, the value of x
is equal to that of y
.
Another way of stating the same result is that for linear programs, unlike integer programs (as
seen in the previous section), there is no duality gap. The above statement of the duality the-
orem looks deceivingly innocuous. However, this theorem is a central result in optimization
theory, and has many important consequences. It guarantees the existence of concise optimal-
ity certicates. There are many examples of optimization problems for which such certicates
are unlikely to exist.
Let us outline the proof of this theorem for linear programs in SEF; i.e., suppose,
maxc
T
x : Ax = b, x 0 (P)
minb
T
y : A
T
y c. (D)
We may assume (P) has an optimal solution. If we run the Simplex procedure (with the two-
Phase method, if necessary, see Section 2.6.2), at the end we get an optimal basis B. Let us
rewrite (P) in the canonical form for that basis B. We get the following linear program (see
Proposition 6),
max z = y
T
b+ c
T
x
subject to
x
B
+A
1
B
A
N
x
N
= A
1
B
b
x 0,
where
y
= A
T
B
c
B
and c
T
= c
T
y
T
A.
Let x
be the basic solution for B, i.e.

x
B
= A
1
B
b x
N
= 0.
The value of the basic solution x
is
z = c
T
x
= y
T
x
+ c
T
x
= b
T
y
+ c
T
N
x
N
..
=0
+ c
B
..
=0
x
B
= b
T
y
.
Since the simplex procedure stopped we must have c 0 i.e.
c
T
y
T
A 0.
which we can rewrite as,
A
T
y
c.
4.9. THE DUALITY THEOREM 118
It follows that y
is feasible for (D). Since c

T
x
= b
T
y
we have proved Theorem 25. Well,

actually, we did not! We cheated by assuming that we knew that the Simplex algorithm always
terminates. This is indeed the case as long as we choose the entering and leaving variables
carefully (see Theorem 9). However, it is not trivial to show that this is indeed the case. We
omit the proof of that result in these notes.
There is an equivalent way of stating the strong duality theorem which is very helpful in
many applications:
Theorem 26. (Strong Duality, feasibility based statement)
Suppose both (P) and (D) are feasible. Then (P) and (D) both have optimal solutions. More-
over, their optimal objective values are the same.
Proof. Suppose (P) and (D) are both feasible. Then by the Fundamental Theorem of LP
(Theorem 14), (P) is either unbounded or has an optimal solution. Since (D) is feasible, by the
corollary of the Weak Duality Theorem (Theorem 16), (P) cannot be unbounded. Therefore,
(P) has an optimal solution x
. Now, the rest of the claim follows from the previous theorem
(Theorem 25).
Using this form of the duality theorem, Corollary 21, and the Fundamental Theorem of LP
(Theorem 14), we can classify the possibilities (infeasible, optimal, unbounded) for a primal-
dual pair of linear programs. On the face of it, the three possibilities for each LP would lead to
nine possibilities for the pair, but certain of these really cannot occur. We obtain the following
table of possibilities.
H
H
H
H
H
H
H
H
H
Dual
Primal
optimal solution unbounded infeasible
optimal solution can occur (1) impossible (2) impossible (3)
unbounded impossible (4) impossible (5) can occur (6)
infeasible impossible (7) can occur (8) can occur (9)
For example (5) cannot occur since by Corollary 21, unboundedness of the primal implies
infeasibility of the dual. Similarly, (3) cannot occur since by Theorem 25 if the primal has
an optimal solution then so does the dual. One can similarly explain the other impossible
entries of the table. Each can occur entry can be explained by an example. The only one for
which this is not obvious is (9). Here we take advantage that infeasibility depends only on the
constraints, and not on the objective function. So, the problem
max c
1
x
1
+c
2
x
2
subject to
_
1 1
1 1
_
_
x
1
x
2
_
_
2
1
_
x 0,
is infeasible, regardless of the values of c
1
and c
2
.To give an example for (9) one needs only
to choose c
1
and c
2
such that the dual is infeasible. We leave this as an exercise.
4.10 Complementary Slackness
Consider a linear program (P) in standard equality form and its dual (D); i.e,
maxc
T
x : Ax = b, x 0 (P)
minb
T
y : A
T
y c. (D)
Let us rst reprove the Weak Duality relation (Theorem 20) for (P) and (D). Namely, let x be
feasible for (P) and let y be feasible for (D). Then
c
T
x (A
T
y)
T
x = y
T
(Ax) = y
T
b = b
T
y, (4.29)
where the rst equality follows from the fact that x 0 and that A
T
y c. Let us analyze when
the inequality is satised with equality, i.e., when c
T
x = b
T
y. This occurs exactly when,
c
T
x = (A
T
y)
T
x or equivalently (A
T
y c)
T
x = 0.
Note,
(A
T
y c)
T
x =
n
j=1
(A
T
j
y c
j
)x
j
. (4.30)
4.10. COMPLEMENTARY SLACKNESS 120
Since A
T
j
yc
j
0 and since x
j
0 every term in the sum in (4.30) is non-negative. Thus, the
sum will be zero if every term (A
T
j
y c
j
)x
j
= 0. Hence, equation (4.29) holds with equality
if and only if for all j 1, . . . , n,
A
T
j
y c
j
= 0 or x
j
= 0, (4.31)
Note that the or in the above condition is not an either ... or it includes the possibility of
both conditions being true. Condition (4.31) is called a complementary slackness condition.
It states that, at optimality, for each variable x
j
of the primal (P) either x
j
=0 or the associated
dual constraint A
T
j
y c
j
is satised with equality. Combining these observations together
with the Weak and Strong Duality theorems we obtain the following result,
Theorem 27. (Complementary Slackness Theorem) Let x
be a feasible solution to (P) and

let y
be a feasible solution to (D). Then the following statements are equivalent,

1. x
is optimal for (P) and y
is optimal for (D);

2. c
T
x
= b
T
y
;
3. the complementary slackness conditions hold for x
and y
.
Proof. The Weak Duality Theorem (Theorem 20) implies that if (2) holds then (1) holds. The
Strong Duality Theorem (Theorem 25) implies that if (1) holds then (2) holds. Finally the
previous discussion shows that (2) is equivalent to (3).
Thus, complementary slackness conditions provides necessary and sufcient conditions for
a pair of solutions to be optimal for (P) and (D). Note that if the complementary slackness
conditions fail for x
and y
we cannot deduce that x
is not optimal for (P). Nor can we

deduce that y
is not optimal for (D). We can only conclude that x
and y
cannot both be
optimal.
Let us demonstrate the usefulness of complementary slackness by an example.
Example 6. We can sometimes use complementary slackness to check whether a solution for
an LP is optimal. Consider the following linear program.
max (1, 2, 1, 2) (x
1
, x
2
, x
3
, x
4
)
T
subject to
_
1 1 1 2
2 3 1 1
_
_
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
_
=
_
5
13
_
x 0.
(4.32)
Two friends of yours claim to know an optimal solution for this LP; the rst of your
friends claims that x
1
= (6, 0, 1, 0)
T
is an optimal solution, and the other claims the same for
x
2
= (2, 3, 0, 0)
T
. Which one (if any) is right? Both solutions are easily checked to be feasible
for (4.32). Theorem 27 now tells us that, in order to prove that x
i
, for i 1, 2, is an optimal
solution for (4.32), we need to nd a feasible solution y
i
for the LPs dual that, together with
x
i
, satisfy the complementary slackness condition stated in (4.31). The dual of (4.32) is given
by
min (5, 13) (y
1
, y
2
)
T
subject to
_
_
_
_
_
1 2
1 3
1 1
2 1
_
_
_
_
_
_
y
1
y
2
_
_
_
_
_
_
1
2
1
2
_
_
_
_
_
.
(4.33)
Consider the rst candidate solution x
1
. Suppose that x
1
is optimal. Since x
1
1
> 0 and x
1
3
>
0 the complementary slackness conditions imply that constraints 1 and 3 of the dual (4.33)
must be satised with equality; i.e.,
_
1 2
1 1
_ _
y
1
y
2
_
=
_
1
1
_
. (4.34)
The unique solution to this system is y
1
1
= 1/3, and y
1
2
= 2/3. However, this solution does
4.10. COMPLEMENTARY SLACKNESS 122
not satisfy the second constraint of (4.33). Hence, the assumption that x
1
was optimal was not
correct.
Let us consider your second friends solution. Suppose that x
2
is optimal. Since x
2
1
> 0,
and x
2
2
> 0, the complementary slackness conditions imply that constraint 1 and 2 of the
dual (4.33) must hold with equality; i.e.,
_
1 2
1 3
_ _
y
1
y
2
_
=
_
1
2
_
.
The unique solution to this system is given by y
2
= (1, 1)
T
. This solution also satises the
remaining two constraints of the dual (4.33). Hence, x
2
and y
2
are feasible and satisfy the
complementary slackness conditions. It follows that x
2
is optimal for (4.32) and that y
2
is
optimal for (4.33).
We proved Theorem 27 for the case when the linear program (P) is in SEF. In fact, it holds
for general pairs of linear programs (P) and (D). Of course we need to explain what we mean
by complementary slackness for a general linear program. Every variable x
j
of the primal (P)
is associated with constraint j of the dual. Every variable y
i
of the dual (D) is associated with
constraint i of the primal. By the complementary slackness conditions we mean that,
1. for every variable x
j
of (P) either:
x
j
= 0 or the associated constraint j of (D) is satised with equality,
2. for every variable y
i
of (D) either:
y
i
= 0 or the associated constraint i of (P) is satised with equality.
Note, that if a variable x
j
is free then the associated constraint j of (D) is an equality constraint,
hence the condition is trivially satised. Thus, we only need to consider non-free variables x
j
.
Similar observation holds for the dual variables y
j
.
Example 7. Consider the following linear program
max (12, 26, 20)x
subject to
_
_
_
1 2 1
4 6 5
2 1 3
_
_
_
x
=
_
_
_
2
2
13
_
_
_
x
1
0, x
3
0.
(4.35)
Its dual is given by,
max (2, 2, 13)y
subject to
_
_
_
1 4 2
2 6 1
1 5 3
_
_
_
_
_
_
12
26
20
_
_
_
y
1
0, y
2
0.
(4.36)
Consider x
= (5, 3, 0)
T
and y
= (0, 4, 2)
T
. It can be readily checked that x
is feasible
for (4.35) and that y
is feasible for (4.36). The complementary slackness conditions are:

1. x
1
= 0 or y
1
+4y
2
+2y
2
= 12,
2. x
3
= 0 or y
1
+5y
2
3y
3
= 20,
3. y
1
= 0 or x
1
+2x
2
+x
3
= 2,
4. y
2
= 0 or 4x
1
+6x
2
+5x
3
= 2.
As x
3
=0 condition (2) holds. As y
1
=0 condition (3) holds. As y
1
+4y
2
+2y
2
=12 condition
(1) holds. As 4x
1
+6x
2
+5x
3
= 2 condition (4) holds. Hence, x
is optimal for (4.35) and y
is optimal for (4.36).

4.11. COMPLEMENTARY SLACKNESS AND COMBINATORIAL EXAMPLES 124
4.11 Complementary Slackness and combinatorial examples
4.11.1 Matching
The gure below shows a matching instance. The number next to each vertex is its label.
The length of edge uv indicate its weight w
uv
. The set of edges M = 13, 24 corresponds
to a matching. The radius of the disk around vertex u is its radius y
u
. The radius of the disk
around vertex 5 is zero, i.e. y
5
= 0.
1
2
3
4
5
Recall that, for a graph G= (V, E) with weights w
e
0 for every E, we say that the set y
u
uV
is feasible if y
u
0 for every vertex u and if for every edge uv we have y
u
+y
v
w
uv
. In the
previous example, it means that for every edge uv the disks at the end of edge uv cover the
entire edge. We proved in Proposition 22 that y
1
+y
2
+y
3
+y
4
+y
5
is an upper bound for the
weight of any matching. Hence, to prove that M is a maximum weight matching it sufces to
show that w(M) = y
1
+y
2
+y
3
+y
4
+y
5
. Note, the disks at the end of edge 13 do not overlap;
i.e, w
13
= y
1
+y
3
. Similarly, the disks at the end of 24 do not overlap; i.e. w
24
= y
2
+y
4
.
Moreover, y
5
= 0 thus we have indeed, w(M) = w
13
+w
24
= y
1
+y
2
+y
3
+y
4
+y
5
and M
is a maximum weight matching. Note, that in the previous argument we did not need to
compute the value of w(M) and of y
1
+y
2
+y
3
+y
4
+y
5
, we simply checked that the following
conditions are satised:
(M1) for every edge uv of M we have y
u
+y
v
= w
uv
;
(M2) for every vertex u which is not the end of some edge of M, we have y
u
= 0.
This works in general as the following proposition indicates.
Proposition 28. (Optimality conditions for matching)
Let G= (V, E) be a graph with weights w
e
0 for every e E. Suppose that M is a matching of
G and suppose that y
u
uV
is feasible. Then M is a maximum weight matching if conditions
(M1) and (M2) are satised.
It is not hard to show this proposition directly. However, we will show that this result can be
obtained mechanically, by writing the complementary slackness conditions for the matching
problem.
In Section 4.6, we gave in (4.20) an integer programming formulation of the maximum
weight matching problem. Its linear programming relaxation is,
max w
T
x
subject to
_
x
uv
: uv (u)
_
1 (for all u V)
x 0.
(4.37)
The dual of that linear program was given in (4.21). We restate it here for convenience,
min

uV
y
u
subject to
y
u
+y
v
w
uv
(for all uv E)
y
u
0 (for all u V).
(4.38)
Variable x
uv
of the primal (4.37) is associated with constraint y
u
+y
v
w
uv
of the dual. Vari-
able y
u
of the dual (4.38) is associated with constraint
_
x
uv
: uv (u)
_
1 of the primal.
Thus the complementary slackness conditions are:
(M1) For every edge uv such that x
uv
> 0 we have y
u
+y
v
= w
uv
.
(M2) For every vertex u such that
_
x
uv
: uv (u)
_
< 1 we have y
u
= 0.
Suppose M is a matching of G and that y
u
uV
is feasible. We will prove Proposition 28.
We assume that conditions (M1) and (M2) are satised and will show that M is a maximum
weight matching. Let us dene x
where for every e E,

x
e
=
_
_
1 if e M
0 otherwise.
Since M is a matching, x
is a feasible solution to the primal (4.37). We dened y

u
uV
feasible to mean that y is feasible for the dual (4.38). Thus x
and y are a pair of feasible

solutions for the the primal and the dual respectively. Suppose x
uv
> 0 for some edge uv.
Then uv M. Since we satisfy condition (M1) we have y
u
+y
v
= w
uv
, hence condition (M1)
holds. Suppose
_
x
uv
: uv (u)
_
< 1. Then u is not the end of any edge of M. Since we
satisfy condition (M2) we have y
u
=0, hence condition (M2) holds. Thus x
and y satisfy the

complementary slackness conditions. It follows from Theorem 27 that
w(M) = w
T
x
uV
y
u
.
Hence, M is a maximum weight matching.
4.11.2 Shortest paths
The gure below shows the shortest path instance from Section 4.1. The set of thick edges in
the gure form a path P connecting s and t, and we argued that is in fact a shortest such path.
3
2
1
2
2
4
1
4
s
t
a
d
c
b
({s},3)
({s,a,b,c,d},1)
({s,a,c},2)
({s,a},1)
The gure on the right also shows
a collection of moats and their widths:
S =(s, 3), (s, a, 1), (s, a, c, 2),
(s, a, b, c, d, 1).
Recall that we called such a collec-
tion feasible if the total width of moats
crossed by any edge is no more than the edges length. We then proved in Proposition 15 that
any st-path in G must have length at least the sum of the widths of such moats. All this shows
that, in order to prove that a given path is shortest among all st-paths, sufces to show that its
length is equal to the total width of the given moat collection.
In the above example, note that edge sa crosses a moat of width 3, and this is exactly the
cost of this edge. Similarly, ac, cb, and bt cross moats of total width equal to their lengths.
Moreover, each of the moats s, s, a, s, a, c, s, a, b, c, d is crossed by exactly one of the
edges of P. These facts together imply that
c(P) =
i
y
i
= 3+1+2+1 = 7,
and hence P is a shortest path. The above argument checked the following two conditions:
given a feasible moat collection (S
i
, y
i
)
q
i=1
, and a candidate st-path P, we have
(S1) (y
i
: e crosses S
i
) = c
e
for all e P, and
(S2) (S
i
) contains exactly one edge of P for each i.
Proposition 29. Let G= (V, E) be a graph with edge lengths c
e
0 for all e E, and s, t V.
Suppose P is an st-path, and (S
1
, y
1
), . . . , (S
q
, y
q
) is a feasible collection of moats. Then P is
a shortest st-path if conditions (S1) and (S2) above are satised.
We will now see that we can obtain this proposition directly, from complementary slack-
ness. In Section 4.3, we saw the following primal LP for the shortest path problem:
min

(c
e
x
e
: e E)
subject to
(x
e
x 0
(4.39)
We state its dual once again as well:
max

(y
S
: (S) is an st-cut)
subject to
(y
S
: e (S)) c
e
(e E)
y 0
(4.40)
Variable x
e
of (4.39) is associated with constraint
(
y
S
: e (S)) c
e
of (4.40). Variable
y
S
of the dual (4.40) is associated with constraint (x
e
: e (S)) 1 of (4.39). Thus, the
complementary slackness conditions are:
(S1) For every edge e such that x
e
> 0 we have
(
y
S
: e (S)) = c
e
.
(S2) For every st-cut (S) with y
S
> 0, we have (x
e
: e (S)) = 1.
Suppose P is an st-path, and (S
i
, y
i
)
q
i=1
is a feasible moat collection. We will prove Propo-
sition 29. We assume that conditions (S1) and (S2) are satised, and will show that P is a
shortest st-path. Let us dene x as follows:
x
e
=
_
1 : e P
0 : otherwise,
for all e E. Since P is an st-path, x is a feasible solution for (4.39). We already argued that
the set of feasible moat widths y is feasible for (4.40). Thus, x and y form a pair of primal and
dual feasible solutions. Suppose x
e
> 0, then we must satisfy condition (S1), and hence
(y
i
: e crosses S
i
) = c
e
;
hence condition (S1) holds. On the other hand, suppose that y
i
> 0 for some i. Then condi-
tion (S2) implies that (S
i
) contains exactly one edge of P, and thus (S2) holds as well. It
therefore follows from Theorem 27 that
c(P) = c
T
x =
S
y
S
.
Therefore P is a shortest st-path.
4.11.3 Scheduling
The gure below shows the scheduling instance discussed earlier in this chapter. Recall that
each task is marked by its number (at its front), and its prot in the center.
2
3
3
4
3
5
6
6pm
2 1 2 3 3 1 0 0 0 0
dual
solution
1
2
3
4
5
6
7
Below are the primal and dual linear programs for the problem that we derived earlier:
max p
T
x
subject to
jT
h
x
j
1 h H
x 0,
min 1
T
y
subject to
hH
j
y
h
p
j
j T
y 0.
Recall that H is the set of times 8am, . . . , 5pm, T
h
is the set of tasks that overlap the time
interval (h, h+1), and H
j
contains the hours during which task j is executed.
The set S
of shaded tasks in the gure corresponds to a feasible solution x
= (0, 1, 0, 1, 0, 0, 1)
T
of the primal. A feasible solution y
= (2, 1, 1, 0, 2, 3, 0, 0, 3, 0)
T
of the dual is given below the
graph. Call a task j an equality task if
hH
j
y
j
= p
j
;
i.e, if the savings throughout task js lifetime equal the tasks prot. The complementary
slackness condition for the primal variables x
j
can then be restated as follows,
j S
implies that j is an equality task,

for all j T . This can be checked to be true in this case, as the duals below each of the
shaded tasks in the gure add up to the tasks prot.
Similarly, the complementary slackness conditions for the dual variables y
i
say that if
y
h
> 0 then S
must have a job that is executed at time h; i.e., we can only save money at time
h if there is a task that is being processed at that time.
Exercise 1. Consider the network ow problem in Section 4.7. Write the Complementary
Slackness conditions for the linear program (4.25) and its dual. Explain what these condi-
tions mean for a pair of solutions x
and y
where x
is an st-ow and y
is a 0, 1 solution
corresponding to an st-cut.
After seeing the beautiful applications of duality theory to shortest path, vertex cover, match-
ing, network ow and scheduling problems, it should be clear to the reader that we only
scratched the surface in terms of similar applications and interpretations of dual variables.
For various applications of LP duality see the book [16].
A very commonly used example is in the class of production planning problems formu-
lated as maxc
T
x : Ax b, x 0. Then, the dual variables correspond to shadow prices of
resources. These are internal prices which depend on the current optimal solution (hence
the optimal production plan). For each resource i, the ith component of an optimal dual
solution y
answers the important question of what would be the rate of increase in the to-
tal optimal prot, per unit increase in the availability of resource i? In addition to such
strong connections to mathematical economics, linear programming also has close historical
ties to economics. During the early 1900s, in the area of mathematical economics, Leontief
[19051999] and others were working on various problems that had connections to linear
optimization. One of the most notable models is the input-output systems. Suppose there are
n major sectors of a national economy (e.g., construction, labor, energy, timber and wood,
paper, banking, iron and steel, food, real estate, . . .). Let a
i j
denote the amount of inputs from
sector i required to produce one unit of the product of sector j (everything is in same units,
dollars). Let A R
nn
denote the matrix of these coefcients. (A is called the input-output
matrix.) Now, given b R
n
, where b
i
represents the outside demands for the output of sector
i, we solve the linear system of equations Ax +b = x. If (AI) is non-singular, we get a
unique solution x determining the output of each sector. Indeed, to have viable system, we
need every x
j
to be nonnegative. Otherwise, the economy requires some imports and/or some
kind of outside intervention to function properly. Leontief proposed this model in the 1930s.
Then in 1973 he won the Nobel Prize in Economics for it.
Kantorovich [19121986] won the Nobel Prize in Economics in 1975. Koopmans [1910
1985] who had worked on Transportation Problem (a generalization of maximum-weight bi-
partite matching problem) in the 1940s as well as on input-output analysis (similar to Leon-
tief s interests) for production systems, shared the Nobel Prize in Economics in 1975 with
Kantorovich. According to Dantzig, the Transportation Problem was formulated (and a solu-
tion method proposed) by Hitchcock in 1941. Later, the problem was also referred to as the
Hitchcock-Koopmans Transportation Problem.
Just as Dantzig was developing his ideas on Linear Programming, independently of Dantzig,
von Neumann [19031957] was working on Game Theory. There is a very nice connection
between LP duality theory and Game Theory for Two-Person Zero-Sum Games. We will com-
ment on this some more following the chapter on non-linear optimization.
Duality theory for linear programming has close ties to work on solving systems of linear
inequalities. Such connection goes back at least to Joseph Fourier (recall Fourier series
and Fourier transform) [17681830]. Fouriers paper from 1826 addresses systems of linear
inequalities. The result known as Farkas Lemma came during the early 1900s. Hermann
Minkowski [18641909] also had similar results in his pioneering work on convex geometry.
There are many related results involving different forms of inequalities, some using strict
inequalities, for example, theorems by Gordan and Stiemke. See, [16].
Chapter 5
Solving Integer Programs
Algorithms to solve integer program fall into two distinct categories:
1. problem specic algorithms, and
2. general purpose algorithms.
The rst class of algorithms takes advantage of the special structure of the problemconsidered.
A seminal example is the shortest path problem, where we are given a graph G, non-negative
cost for each of its edges, and two specic vertices s and t in V. The goal is now to nd a
minimum-cost path connecting s and t. We will see that there is a polynomial-time algorithm
to solve this problem in Section 5.1. As a second, somewhat more sophisticated example, we
revisit the maximum weight matching problem. Just like for the shortest path problem, there
is a polynomial-time algorithm for this problem. We will see a procedure for the special case
where the graph is bipartite in Section 5.2. The second class of algorithms are algorithms
for solving general integer programs. As Integer Programming is N P-hard (see Chapter 3)
one cannot expect that such an algorithm will always nd an optimal solution in polynomial
time. The running time of these algorithms can be exponential in the worst case. However,
they can be quite fast for many instances, and are capable of solving many large-scale, real-
life problems. These algorithms follow two general strategies: The rst attempts to reduce
integer programs to linear programs, this is known as the cutting plane approach and will be
133
5.1. SHORTEST PATHS 134
described in Section 5.3. The other strategy is a divide and conquer approach which is known
as branch and bound which will be discussed in Section 5.4. In practice both strategies are
combined under the heading of branch and cut. This remains the preferred approach for all
general purpose commercial codes.
5.1 Shortest paths
The shortest-path problem has been one of the recurring examples in this book; we introduced
it and its integer programming formulation in Chapter 1.2.2, and discussed shortest-path du-
ality in depth in the previous chapter. The main goal here will be to develop an efcient
algorithm for the problem. Recall that in an instance of the problem, we are given a graph
G = (V, E), special vertices s, t V, and edge-lengths c
e
0 for all e E. The goal is to nd
a shortest s, t-path in G; i.e., we want to nd a sequence of edges in E of the form
sv
1
, v
1
v
2
, v
2
v
3
, . . . , v
k2
v
k1
, v
k1
v
k
, v
k
t,
where v
i
are vertices in V, for all i. The goal is to minimize the total length
c
sv
1
+c
v
1
v
2
+. . . +c
v
k
t
of the path. We had seen an integer programming formulation for the problem in Chapter
1.2.2; recall the relaxation of this IP from Chapter 4.3:
min

(c
e
x
e
: e E)
subject to
(x
e
x 0
(5.1)
Its dual has a variable y
S
for every st-cut (S) in G.
CHAPTER 5. SOLVING INTEGER PROGRAMS 135
max

(y
S
: (S) is an st-cut)
subject to
(y
S
: e (S)) c
e
(e E)
y 0
(5.2)
We observe that LP (5.1) has a small number of variables, but a large number of con-
straints: there is one constraint for each st-cut (S), and the number of such cuts could be
exponential in the input size. Solving the LP with the Simplex algorithm is therefore clearly
not efcient: we cannot even hope to write down the initial canonical LP efciently! Never-
theless we will present an algorithm that will compute an optimal solution x
to (5.1), and a
corresponding dual solution y
to (5.2) such that

c
T
x
= y
T
b.
Moreover, we will see that there is also an st-path P
such that x
e
= 1 only if e is an edge of
P
and x
e
= 0 otherwise. Thus, our algorithm will prove that, in the case of shortest st-paths,
there is no duality gap (see also Chapter 4.9.1). The complementary slackness conditions
from Chapter 4.11.2 will be used crucially in our algorithm. We therefore restate them here:
(S1) for each edge e E with x
e
> 0 we have (y
S
: e (S)) = c
e
(S2) for each st-cut (S) with y
S
> 0 we have (x
e
: e (S)) = 1.
Theorem 27 shows that if x
and y
are feasible for (5.1) and (5.2), respectively, and if they

satisfy (S1) and (S2), then x
and y
are primal, and dual optimal solutions. Let us rephrase

these two conditions in a convenient way: call an edge e E an equality edge for some
feasible dual y if
(y
S
: e (S)) = c
e
,
i.e., if the dual constraint for edge e is satised with equality. We dene the set of equality
edges for y:
E (y) =e E :
(y
S
: e (S)) = c
e
.
Suppose now that y
is some feasible dual, and P
is a path that consists of equality edges

only. We can then dene x
, by letting
x
e
=
_
1 : if e P
0 : otherwise,
for every e E. By denition x
is feasible for (5.1), and thus we have
(x
e
: e (S)) 1, (5.3)
for all st-cuts (S). Moreover, since P
has only equality edges, condition (S1) is clearly

satised. In order to show that P
is indeed a shortest st-path, it now sufces to show that the

inequality in (5.3) holds with equality, whenever y
S
> 0.
5.1.1 An algorithm for shortest paths
The algorithm proceeds in iterations where the ultimate goal is to compute a pair of feasible
primal and dual solutions that satisfy the complementary slackness conditions (S1) and (S2).
The algorithm maintains a feasible dual solution at all times; initially, this is just y = 0, and
as the algorithm progresses, a carefully chosen set of dual variables is increased.
In order to describe the algorithmic details, we have to rst introduce some more notation.
A sequence of edges
v
1
v
2
, v
2
v
3
, . . . , v
k1
v
k
, v
k
v
1
,
where v
i
V for all i is called a cycle.
Let T = (W, F) be a graph; the follow-
ing are equivalent:
[T1] T is a tree
[T2] T has a unique path between any two vertices u, v W,
[T3] T is connected and has no cycles, where a graph is called connected if it has at least one
path between every pair of vertices,
[T4] T is connected, and [F[ =[W[ 1; i.e., the number of edges in a tree is one less than the
number of vertices.
Let y be a feasible solution for the shortest path dual (5.2). In the algorithm, we will use
sl
y
(e) for the slack of the dual constraint corresponding to edge e E; i.e., we dene
sl
y
(e) =
_
c
e
(y
S
: e (S))
_
,
and we omit the index y when the dual is clear from the context. We are ready to describe our
algorithm.
Algorithm 3 An algorithm for the shortest-path problem.
Input: Graph G = (V, E), costs c
e
0 for all e E, s, t V, G has an st-path
Output: A shortest st-path P
1: y
S
:= 0 for all st-cuts (S)
2: Tree T := (U
1
, E
1
) where U
1
=s and E
1
= / 0
3: i := 1
4: while t ,U
i
do
5: Let e
i
= u
i
v
i
be an edge in (U
i
) of smallest slack; assume that u
i
U
i
and v
i
V U
i
.
6: y
U
i
:= sl(e
i
)
7: U
i+1
=U
i
v
i
, E
i+1
= E
i
u
i
v
i
8: i := i +1
9: end while
10: return Unique st-path P
in nal tree T.
The algorithm maintains the following key invariants throughout its execution:
[I1] (U
i
, E
i
) is a tree in every iteration i = 1, 2, . . .,
[I2] y is a feasible dual solution for (5.2), and
[I3] the edges in E
i
are equality edges for all i.
Let us look more closely at one iteration i of the main while loop in the algorithm. At the
start of the loop, we are given a tree T = (U
i
, E
i
). The algorithm rst checks whether t U
i
.
Let us assume that this is not the case, and (U
i
) is therefore an st-cut. We now attempt
to increase y
U
i
by as much as we can without violating dual feasibility. Notice that y
U
i
is a
variable in the dual constraint of edge e if and only if e (U
i
). Thus, in order to calculate the
maximum feasible increase of y
s
i
it sufces to determine an edge of smallest slack among all
edges in (U
i
). Let e
i
(U
i
) be such an edge. Observe that increasing y
U
i
by sl(e
i
) makes
e
i
an equality edge, and we can therefore add it to our tree. Furthermore, as e
i
(U
i
), one
endpoint of the edge is not in U
i
. Thus, adding e
i
to E
i
will not create a cycle. We summarize
our ndings in the following Proposition.
Proposition 30. Let T = (U
i
, E
i
) be the tree maintained at the beginning of the algorithms
while loop, and suppose that e
i
= u
i
v
i
is an edge of smallest slack in (U
i
). Then (U
i
v
i
, E
i
u
i
v
i
) is a tree of equality edges.
Suppose now that t U
i
at the beginning of iteration i. In this case, T has an st-path P
by [T2]. Invariant [I3] furthermore ensures that P
has only equality edges and thus (S1) is

satised. We show later that (S2) holds as well, and hence P
is a shortest st-path by Theorem

27.
Before completing the correctness proof, let us illustrate the algorithm with an example.
Example 8. Consider the graph in Figure 5.1.(i). We start with y = 0 and the initial tree is
T = (U
1
, E
1
) with U
1
=s and E
1
= / 0. We compute the slack of the edges in (U
1
):
sl(sa) = 6, sl(sb) = 2 sl(sc) = 4,
and therefore e
1
= sb, and we let y
U
1
= 2. The new tree is T = (U
2
, E
2
) where U
2
=s, b and
E
2
=sb. Vertex t is still not in U
2
, and hence we compute the slack of all edges in (U
2
):
sl(sa) = 6y
s
= 4
sl(sc) = 4y
s
= 2
sl(bc) = 1
sl(bt) = 5.
U
1
U
4
U
3
U
2
t
c
a
b
s
2 1
1
4
5
6
2 4
t
c
a
b
s
2 1
1
4
5
6
2 4
t
c
a
b
s
2 1
1
4
5
6
2 4
(i)
(v) (iv)
(iii) (ii)
t
c
a
b
s
2 1
1
4
5
6
2 4
t
c
a
b
s
2 1
1
4
5
6
2 4
Figure 5.1: Example for shortest-path algorithm.
The algorithm selects edge e
2
= bc, and increases y
U
2
to 1. The new tree is (U
3
, E
3
) where
U
3
= s, b, c and E
3
= sb, bc. The tree is highlighted in red in the gure. In the next
iteration, the algorithm determines the edge e
3
= ca as that of smallest slack, and adds it to
the tree. At the same time it lets y
U
3
= 1. In a nal step, we let U
4
=s, a, b, c, and y
U
4
= 1.
The unique st-path in the resulting tree is a shortest path in the underlying graph.
Let us prove that the algorithm is correct in two steps. First, we show that it terminates
with an st-path if G has one.
Proposition 31. Suppose that G contains an st-path. In this case, the main while loop of
Algorithm 3 is executed at most [V[ times, and at the end, the tree of equality edges contains
an st-path.
Proof. Consider iteration i of the while loop, and let y be the dual at its beginning. Suppose
U
i
does not contain t. By construction s U
i
, and thus (U
i
) is an st-cut. Thus, (U
i
) must
contain at least one edge of any st-path and can therefore not be empty. This implies that there
is an edge u
i
v
i
in (U
i
) of smallest slack that will be added to our tree. Since endpoint v
i
is
not in U
i
, the new tree has strictly more vertices, and the main while loop of the algorithm can
therefore not be executed more than V times.
Dene x
e
= 1 if e P
, and x
e
= 0 otherwise, and let y
be the dual at the end of the

algorithm. Clearly, x
and y
are feasible for the primal and dual linear programs, respectively.
Furthermore, complementary slackness condition (S1) is clearly satised by construction. It
remains to show that condition (S2) also holds.
Proposition 32. Let P
be the path returned by Algorithm 3, and let y
be the dual at the end

of the algorithm. Then y
S
> 0 only if P
contains exactly one edge of

G
(S).
Proof. Let U
1
, . . . ,U
k
be the vertex sets of the trees constructed in the algorithm. We have
y
S
> 0 only if S = U
i
for some 1 i k. For the sake of contradiction, assume that (U
i
)
contains at least two edges of P
. We deliberately choose the two such edges u

1
v
1
and u
2
v
2
that were added earliest by the algorithm. Moreover we assume that u
1
v
1
was added before
u
2
v
2
.
s
t
u
1
v
1
v
2
u
2
U
i
P
P
w
Following the usual notation, we let u
1
and u
2
be vertices in U
i
, and v
1
and v
2
ver-
tices in V U
i
. Let T = (

U,

E) be the tree
maintained by the algorithmjust before u
1
v
1
was added.
We rst argue that y
U
i
could not have
been increased after adding u
1
v
1
. To see
this, observe that u
1
v
1
is an equality edge once it is part of the tree. Increasing variable
y
S
where the corresponding st-cut (S) contains equality edges yields a violation of dual
feasibility. This implies that U
i

U, and thus, by (T2), T has a unique u
1
u
2
-path P
u
1
u
2
; we
call this path P
/
and indicate it in the gure by a dashed line connecting u
1
and u
2
.
Now consider the segment of P
that connects edges u

1
v
1
and u
2
v
2
. We refer to this path
as P
//
, and the gure shows it as a thick black curve. We claim that at least some of the edges
of P
//
were not part of

E. In order to see this, consider the edge v
1
w of P
//
that is incident to
v
1
. Since v
1
was not a member of

U, the edge vw can certainly not have been in

E either. This
sufces to show that paths P
/
and P
//
are distinct, and hence
Q = P
/
P
//
u
1
v
1
, u
2
v
2
contains a cycle. Clearly, Q cannot be part of the edge-set of any tree generated by the algo-
rithm, and this is a contradiction.
This completes the proof of correctness of the shortest path algorithm.
5.2 Maximum weight matching algorithm
Let G = (V, E) be a graph. Recall that M E is a matching if every vertex u is the end of at
most one edge of M. A matching M is perfect if every vertex u is the end of exactly one edge
of M. In the following example the bold edges for the graph on the left form a matching which
is not perfect, whereas the bold edges for the graph on the right form a perfect matching.
In this section, we will describe a procedure that given a bipartite graph G = (V, E) with
non-negative weights w will nd a maximum weight matching. We rst observe that it is suf-
cient to develop an algorithm for nding (if one exists) a maximum weight perfect matching.
Proposition 33. If we have a polynomial time algorithm to nd a maximum weight perfect
matching in a bipartite graph, we can derive a polynomial time algorithm to nd a maximum
weight matching in a bipartite graph.
5.2. MAXIMUM WEIGHT MATCHING ALGORITHM 142
Proof. Let G = (V, E) be a bipartite graph with edge weights w 0. Suppose we wish to nd
a maximum weight matching M of G. Since G is bipartite there is a partition U,W of V such
that every edge of G has one end in U and the other in W. We may assume that [U[ [W[.
Construct a newgraph G
/
= (V
/
, E
/
) with weights w
/
fromGand wby adding newvertices toU
until U and W have the same number of vertices, and by adding dummy edges i j with weights
w
/
i j
= 0 for every i j , E. Thus G
/
is a complete bipartite graph. Now, let us use the algorithm
to nd a perfect matching M
/
in G
/
. Let M be obtained from M
/
by removing all dummy edges.
Since M
/
is a matching so is M. Since dummy edges have weight zero, w(M) = w
/
(M
/
). We
claim that M is a maximum weight matching of G. For otherwise there exists a matching

M of
G where w(M) < w(

M). Since G
/
is a complete bipartite graph, we can extend

M to a perfect
matching

M
/
of G
/
of weight at least w(

M). But then w
/
(

M
/
) w(

M) > w(M) = w
/
(M
/
),
which contradicts the fact that M
/
was a maximum weight perfect matching of G
/
. We leave
it as an exercise to show that if the algorithm for nding a maximum weight perfect matching
is polynomial then so is the resulting algorithm for nding a maximum weight matching.
In light of the previous proposition we shall therefore focus our attention on an algorithm for
nding a maximum weight perfect matching in a bipartite graph.
5.2.1 Halls condition
We rst wish to characterize which bipartite graphs have a perfect matching. Throughout this
section G = (V, E) is bipartite with bipartition U,W, i.e. V is the disjoint union of U and W
and every edge has one end in U and one end in W. A necessary condition for G to have a per-
fect matching is that [U[ =[W[. However, this is not sufcient as the following example illus-
trates.
a
b
c
d
e
f
g
h
S
(S)
In this example, the set of all neighbors of a, b, c is e, f . Let
M be any matching of G. Since each of e, f is the end of
at most one edge in M there are at most 2 edges of M with
ends in a, b, c. Hence, one of a, b, c is not the end of an
edge of M and M is not a perfect matching.
Let us proceed more generally. For a graph G = (V, E)
and a subset S V, we denote the set of all neighbors of S
by N
G
(S), i.e. N
G
(S) :=r : sr E and s S. When there
is no ambiguity, we omit the index G and write N(S) for
N
G
(S). Suppose that there exists a set S U such that [S[ >[N(S)[ then for any matching M
of G there are at most [N(S)[ edges with ends in S, hence at least one vertex of S is not the end
of an edge in M and M is not a perfect matching. We call a set S U such that [N(S)[ < [S[
a decient set. Thus we have argued that if G contains a decient set, then G has no perfect
matching. The following result, states that the converse is true.
Theorem 34. (Halls Theorem)
Let G = (V, E) be a bipartite graph with bipartition U,W where [U[ = [W[. Then, there
exists a perfect matching M if and only if G has no decient set. Moreover, there exists an
efcient (polynomial-time) algorithm that given G will either nd a perfect matching M or
nd a decient set S.
We shall omit the proof of the previous result and the description the associated algorithm.
5.2.2 An optimality condition
Proposition 28 gives a sufcient condition for a matching to have maximum weight. We
want to derive the analogous result for perfect matchings. Consider a graph G = (V, E) with
(arbitrary) weights w
e
for every e E. Let us formulate the maximumweight perfect matching
as an integer program,
max w
T
x
subject to
_
x
i j
: i j (i)
_
= 1 (for all i V)
x 0, x integer.
(5.4)
This only differs from (4.20) in that all constraints (other than non-negativity) are equality
constraint. Let us nd the dual of the linear programming relaxation of (5.4):
min

iV
y
i
subject to
y
i
+y
j
w
i j
(for all i j E)
y
i
are free (for all i V).
(5.5)
This only differs from (4.21) in that the dual variables y
i
are free. This arises from the fact that
the constraints of the linear programming relaxation of (5.4) (other than non-negativity) are
equality constraints (see rules for nding general duals in Section 4.5). The complementary
slackness conditions for the linear programming relaxation of (5.4) and for (5.5) are:
If x
i j
> 0 then y
i
+y
j
= w
i j
(for all i j E). (5.6)
Let us say that y is feasible if for every edge i j, y
i
+y
j
w
i j
; i.e. if y is feasible for (5.5).
Given feasible y, we say that an edge i j of G is an equality edge for y if y
i
+y
j
=w
i j
. Suppose
M is a perfect matching of G and every edge of M is an equality edge. Let us dene x
where
for every e E,
x
e
=
_
_
1 if e M
0 if e , M.
Since M is a perfect matching x
is a feasible solution to the linear programming relaxation

of (5.4). Thus x
and y are a pair of feasible solutions for the primal and the dual respectively.
Suppose x
i j
> 0 for some edge i j. Then i j M. Therefore, i j is an equality edge of M; i.e.,
y
i
+y
j
= w
i j
. It follows that the complementary slackness conditions (5.6) hold. It follows
from Theorem 27 that x
is an optimal solution to the linear programming relaxation of (5.4).

Hence, in particular it is an optimal solution to (5.4); i.e., M is a maximum weight perfect
matching. Thus we have proved,
Proposition 35. Let G = (V, E) be a graph with weights w
e
for every e E. Suppose that
M is a perfect matching of G and suppose that y is feasible. Then M is a maximum weight
perfect matching if every edge of M is an equality edge.
5.2.3 The Hungarian matching algorithm
The algorithm will proceed as follows. At each step we have a feasible solution y to the dual
(5.5). To get an initial feasible dual solution, let denote the value of the maximum weight
edge. Then setting y
v
:=
1
2
, for all v V gives a feasible solution for the dual. Given a
dual feasible solution y we construct a graph H as follows: H has the same set of vertices
as G, and the edges of H consists of all edges of G that are equality edges. We know from
Theorem 34 that we can either nd a perfect matching M in H or that H has a decient set S.
In the former case, Proposition 35 implies that M is a maximum weight perfect matching of
G, in which case we can stop the algorithm. Otherwise, we dene a new feasible dual solution
y
/
as follows:
y
/
v
:=
_
_
y
v
for v S
y
v
+ for v N
H
(S)
y
v
otherwise.
We wish to choose 0 as large as possible such that y
/
is feasible for (5.5), i.e. for every
edge uv we need to satisfy y
/
u
+y
/
v
w
uv
0. As y is feasible for (5.5) y
u
+y
v
w
uv
0. There
are four possible cases for an edge uv.
1. u , S, v , N
H
(S).
2. u , S, v N
H
(S).
3. u S, v N
H
(S).
4. u S, v , N
H
(S).
1
2
3
4
U W
S
N
H
(S)
Then (see gure),
Case 1. y
/
u
+y
/
v
w
uv
= y
u
+y
v
w
uv
0.
Case 2. y
/
u
+y
/
v
w
uv
= y
u
+(y
v
+) w
uv
y
u
+y
v
w
uv
0.
Case 3. y
/
u
+y
/
v
w
uv
= (y
u
) +(y
v
+) w
uv
= y
u
+y
v
w
uv
0.
Algorithm 4 Hungarian Algorithm
Input: Bipartite graph G = (V, E) with bipartition U,W where [U[ =[W[ and weights w.
Output: A maximum weight matching M or a decient set S.
1: y
i
:=
1
2
maxw
e
: e E, for all i V.
2: loop
3: Construct graph H with vertices V and edges uv E : y
u
+y
v
= w
uv
4: if H has perfect matching M then

5: stop (M is a maximum weight perfect matching of G)
6: end if
7: Let S U be a decient set for H
8: if , edges of G between S and W N
H
(S) then
9: stop (S is a decient set of G)
10: end if
11: := miny
u
+y
v
w
uv
: uv E, u S, v W N
H
(S)
12: y
v
:=
_
_
y
v
for v S
y
v
+ for v N
H
(S)
y
v
otherwise.
13: end loop
Thus, we only need to worry about Case 4. Note, we may assume that there is such an edge,
for otherwise S is a decient set for G and we may stop as G has no perfect matching. We
want, 0 y
/
u
+y
/
v
w
uv
= (y
u
) +y
v
w
uv
, i.e. y
u
+y
v
w
uv
. Note that since uv is not
an edge of H, it is not an equality edge. Hence, y
u
+y
v
w
uv
> 0. Thus, we can choose as
follows:
= miny
u
+y
v
w
uv
: uv E, u S, v W N
H
(S)
and > 0. Note, that 1
T
y 1
T
y
/
= ([S[ [N
H
(S)[) > 0. Hence, the new dual solution y
/
has
a lower (better) value than y. See Algorithm 4 for a formal description of the procedure.
Note that we have shown that at each step we either nd an maximum weight matching or
that we improve the objective function value of the dual solution y.
We will now illustrate the algorithm on an example. The next gure indicates the graph
and the dual at the end of the initialization and at the end of the rst and second iteration.
The number next to each edge e is the cost w
e
. The number next to each vertex v is the dual
variable y
v
. The shaded edges are the edges in M
. For the initialization we picked y

v
=
1
2
4,
a
b
c
d
S
f
e
4
1
4
3
2
0
4
2
2
2
2
2
2
a
b
c
d
S
f
e
4
1
4
3
2
0
4
2
2
3
2
1
1
a
b
c
d
f
e
4
1
4
3
2
0
4
2
4
5
0
-1
-1
Initialization Iteration 1
Iteration 2
for every v as the largest weight is 4. For iteration 1, we computed
= miny
a
+y
e
w
ae
, y
b
+y
e
w
be
, y
b
+y
f
w
b f
= min2+23, 2+21, 2+20 = 1.

5.3. CUTTING PLANES 148
For iteration 2, we computed
= miny
b
+y
f
w
b f
, y
c
+y
f
w
c f
= min1+20, 2+22 = 2.
At the end of iteration 2, the graph H consisting of the shaded edges has a perfect matching
ae, bd, c f . This matching is a maximum weight perfect matching of G.
Exercise 2. Find a maximum weight perfect matching for the complete bipartite graphs G
with the following edge-weights:
g h i j
a 2 7 1 2
b 3 4 3 2
d 6 5 5 5
f 2 6 2 3
and
i j k l m
a 13 26 20 7 30
b 15 30 22 9 32
d 21 35 26 14 38
f 13 27 18 7 29
g 26 38 30 19 43
5.3 Cutting planes
We will solve integer programs by solving a sequence of linear programs.
5.3.1 General scheme
Suppose you wish to solve the following integer program (IP):
max 3x
1
+10x
2
subject to
x
1
+4x
2
8 (5.7)
x
1
+x
2
4 (5.8)
x
1
, x
2
0 (5.9)
x
1
, x
2
integer. (5.10)
As we do not know how to deal with the integrality constraints, we shall simply ignore them,
initially. Thus, we solve the linear program relaxation (LP1) obtained by removing constraints
(5.10) from (IP). We obtain the optimal solution x
/
= (8/3, 4/3)
T
. Unfortunately, x
/
is not
integral, thus it is not a feasible solution to (IP).
We wish to nd a valid inequality () which satises the following properties:
1. () is valid for (IP), i.e. every feasible solution of (IP) satises (),
2. x
/
does not satisfy ().
An inequality that satises both (1) and (2) is called a cutting plane for x
/
. We will discuss
in the upcoming section how to nd such a cutting plane, but for the time being, let us ignore
that problem and suppose that we are simply given such an inequality,
x
1
+3x
2
6. (*)
We add () to the system (LP1) to obtain a new linear program (LP2). Note, that because ()
satises property (1), it follows that (LP2) remains a relaxation of (IP). Moreover, property
(2) of () implies x
/
is not feasible for (LP2), so in particular the optimal solution to (LP2)
will be distinct from x
/
.
In our case (LP2) has an optimal solution x
//
= (0, 2)
T
. Note, that x
//
is integral. Since it
maximizes the objective function over all solutions of (LP2) it also maximizes the objective
function over all solutions of (IP), hence x
//
is optimal for (IP).
We shall now formalize the procedure. We write to mean either of the following sym-
bols , =, . Thus, the notation Ax b, simply means that Ax and b determine a system of
linear constraints, where each of the constraints can be either of , = or . Suppose we wish
to solve the integer program,
max c
T
x
subject to
Ax b
x 0, x integer.
(IP)
Remark 36. Let (P) be the linear programming relaxation of (IP). Then
1. the optimal value of (P) is an upper bound on the optimal value of (IP),
2. if x is an optimal solution to (P) that is integer, then x is an optimal solution for (IP).
Proof. (1) follows from the fact that every feasible solution to (IP) is a feasible solution to
(LP). Moreover, (2) follows from (1).
Algorithm 5 Cutting plane Algorithm.
1: loop
2: Solve the linear programming relaxation maxc
T
x : Ax b.
3: Let x
be the optimal solution.

4: if x
is integral then
5: stop (x
is an optimal solution to (IP))

6: end if
7: Find a cutting plane a
T
x for x
8: Add constraint a
T
x to the system Ax b
9: end loop
A formal statement of the cutting plane method can be found in Algorithm 5. Note, that
the scheme outlined here is for pure integer programs, however, it can be adapted to mixed
integer programs in a straight forward way. We did not discuss the possibility that any of the
LP relaxations may be unbounded or infeasible.
5.3.2 Valid inequalities
Let F be a subset of R
n
. We say that an inequality a
T
x is valid for F if every point in F
is satised by the inequality. For instance suppose that F is the set of solutions to a system of
inequalities, Ax b, then a valid inequality for F can be obtained by multiplying each of the
constraints of Ax b by some non-negative amount, and by adding the resulting constraints
together (see Chapter 2.1.1).
Suppose now that F is the set of all integer points in the set P := x : Ax b, x 0.
We outline a way of generating a valid inequality for F. Recall that given a real number
, | denotes the largest integer value smaller or equal to . For instance 3.7| = 3 and
1.6| =2.
1. Choose a valid inequality for P:
n
j=1
a
j
x
j
.
As P F, this inequality is clearly valid for F.
2. Since x
j
0 for any x
P, we have that a
j
|x
j
a
j
x
j
for all j. Hence, the inequality
n
j=1
a
j
|x
j

remains valid for P, hence for F.
3. Since for any x
F,
n
j=1
a
j
|x
j
is integer. The following inequality is valid for F:
n
j=1
a
j
|x
j
| (*)
We say that the constraint () is a Chv atal-Gomory cut of P, or short a CG-cut. We have
shown that any CG-cut of P is valid for F. Note that in general () will not be valid for P.
We shall illustrate this procedure on two concrete examples.
Example 9. Suppose that F is the set of solutions to the system of constraints given by (5.7)-
(5.10). Multiplying constraints (5.7) and (5.8) by 2/3 and 1/3 respectively and adding the
resulting inequalities together we obtain,
2
3
(x
1
+4x
2
) +
1
3
(x
1
+x
2
)
16
3
+
4
3
or equivalently,
x
1
+3x
2
20
3
.
It follows that the following CG constraint () is valid for F:
x
1
+3x
2
_
20
3
_
= 6.
Now, we can easily verify that () was indeed a cutting plane for x
/
in the previous section.
Example 10. Consider the graph G given below. Assign a variable x
i j
for every edge i j. Let
F be the set of solutions to the following system of constraints,
x
12
+x
14
+x
15
1 (5.11)
x
12
+x
23
1 (5.12)
x
23
+x
34
+x
36
+x
37
1 (5.13)
x
14
+x
34
+x
45
+x
47
1 (5.14)
x
15
+x
45
1 (5.15)
x
36
+x
67
1 (5.16)
x
37
+x
47
+x
67
1 (5.17)
x 0 (5.18)
x integer. (5.19)
S
1
2 3
4 5
6
7
As seen in Section 4.6, the points in F correspond exactly to the matchings of G. Let us
derive a CG-cut for this case. First, we add all the constraints which correspond to vertices in
the set S = 1, 2, 3, 4, 5, i.e. constraints (5.11)-(5.15), and multiply the resulting constraint
by 1/2. This yields,
x
12
+x
23
+x
34
+x
45
+x
15
+x
14
+
1
2
(x
36
+x
37
+x
47
)
5
2
.
After rounding down the coefcients of the left hand side we obtain,
x
12
+x
23
+x
34
+x
45
+x
15
+x
14
5
2
,
and nally the CG-constraint is obtained by rounding down the right hand side,
x
12
+x
23
+x
34
+x
45
+x
15
+x
14
_
5
2
_
= 2.
More generally, it is shown in Section 4.6 that for any graph G = (V, E) if we dene F to be
set set of solutions to,
i j(i)
x
i j
1 i V
x 0 integer,
then the points in F correspond exactly to the matchings of G. We leave it as an exercise to
show that for any set S V where [S[ is odd, the constraint,
(x
i j
: i j E : i S, j S)
[S[ 1
2
is valid for F.
5.3.3 Cutting plane and simplex
We have yet to show how to nd cutting planes. Let us revisit the example from Section 5.3.1.
We rst put the problem in standard equality form by introducing slack variables x
3
and x
4
,
max 3x
1
+10x
2
subject to
_
1 4 1 0
1 1 0 1
_
_
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
_
=
_
8
4
_
x 0.
(LP1)
The reformulation corresponding to the optimal basis is given by,
max 64/3+(0, 0, 7/3, 2/3)
T
x
subject to
_
1 0 1/3 4/3
0 1 1/3 1/3
_
x =
_
8/3
4/3
_
x 0.
(LP2)
The corresponding basic solution x
= (8/3, 4/3, 0, 0)
T
is not integral. We will use the refor-
mulation (LP2) to nd a cutting plane. Consider any constraint of (LP2) where the right hand
side is fractional. In this case, we have a choice and select the rst constraint, i.e.
x
1
1
3
x
3
+
4
3
x
4
=
8
3
. (5.20)
As this constraint holds with equality it also holds with . Let us derive from the resulting
inequality the corresponding CG-constraint. We obtain,
x
1
+
_
1
3
_
x
3
+
_
4
3
_
x
4
= x
1
x
3
+x
4
_
8
3
_
= 2.
Hence,
x
1
x
3
+x
4
2 (5.21)
is a valid inequality for the original (IP). Moreover, x
does not satisfy (5.21) as,

x
1
x
3
+x
4
=
8
3
0+0 > 2.
Hence, it is a cutting plane for x
.
Note that (5.21) is equivalent to the inequality x
1
+3x
2
6 in Example 9. To see this,
substitute in (5.21) x
3
and x
4
by the expression given by the constraints in (LP1), namely
x
3
= 8x
1
4
2
and x
4
= 4x
1
x
2
.
Let us generalize this argument. Consider an integer program,
max c
T
x
subject to
Ax = b
x 0 x integer.
(IP)
We solve the linear programming relaxation (P) of (IP) using the simplex procedure. If (P) has
no solution, then neither does (IP). Suppose we obtain an optimal basis B of (P). We rewrite
(P) in canonical form for B to obtain a linear program of the form,
max z + c
T
N
x
N
subject to
x
B
+

A
N
x
N
=

b
x 0.
The corresponding basic solution x is given by x
B
=

b and x
N
=0. If

b is integer, then so is x,
and x is an optimal solution to (IP). Thus, we may assume that

b
i
is fractional for some index
i. Constraint i is of the form
x
i
+
jN
A
i j
x
j
=

b
i
. (5.22)
As this constraint holds with equality it also holds with . Let us derive from the resulting
inequality the corresponding CG-constraint. We obtain,
x
i
+
jN
A
i j
|x
j

b
i
|. (5.23)
Remark 37. Constraint (5.23) is a cutting plane for the basic solution x.
Proof. Since x
j
= 0 for all j N, the left hand side of (5.23) is x
i
=

b
i
. As

b
i
is fractional
b
i
>
b
i
|. Then the left hand side is larger than the right hand side for x.
5.4 Branch & Bound
In this section, we introduce a solution method for integer programs, called branch & bound.
It is a divide and conquer approach for solving linear integer programming problems. We
illustrate some of the main ideas using a production example.
Every year, the University of Waterloo readies itself for the annual home coming festivi-
ties. The university expects hundreds of alumni and their families to visit its campus for the
occasion, and campus stores in particular expect to be busier than usual. Two items promise
to be in especially high demand during these times: the universitys infamous Pink Tie, as
well as the (almost) equally popular Pink Bow-Tie. With only a few weeks to go until home
5.4. BRANCH & BOUND 156
coming, it is high time to replenish stocks for these two items. The university manufactures
its ties from its own special cloth of which it currently has 45ft
2
in stock. For 10 pink bow
ties, it needs 9ft
2
of cloth, and producing 10 pink ties requires 11ft
2
of the raw material. In
addition to cloth, the manufacturing process also requires labour which is particularly short
during the next few weeks: only a total of 6 work days are available. Manufacturing of the
ties is done in batches of 10. Producing 10 pink bow ties takes 1.5 days, and producing 10
pink ties takes a day. The university expects a prot of $6 for a pink bow tie, and a prot of
$5 for a pink tie. How much of each product should it produce?
Just like in Chapter 1, we can formulate this problem as an integer program. We introduce
variables x and y for 10s of bow-ties and ties to produce. The following integer program has
constraints restricting the available labour and raw material, and its objective function is the
total prot for the given production vector.
max 60x +50y (5.24)
subject to 1.5x +y 6
9x +11y 45
x, y 0
x, y integer. (5.25)
How can we solve this integer program? Let us use what we know: linear program-
ming! We drop the integrality constraints (5.25) and obtain the linear programming relaxation
(5.24 LP) of (5.24). Remark 36(2) states that if we solve the LP relaxation of an integer
program, and obtain a solution, all of whose variables are integers, then it is also an optimal so-
lution of the original IP! Unfortunately, solving (5.24LP) gives the solution x =2.8, y =1.8
of value 258 which has fractional components. Remark 36 implies, however, that no integral
feasible solution for (5.24) can have value greater than 258, and hence we have found an
upper bound on the maximum value of any feasible solution of the original IP.
We will now use the fractional solution to partition the feasible region. In the following,
x>=3 x<=2
x=2.8 y=1.8
value: 258

Subproblem 3
x=3 y=1.5
value: 255
Subproblem 2
Subproblem 1
Figure 5.2: The branch & bound tree after two iterations.
we let Subproblem 1 denote the original LP relaxation of the IP. We observe that every feasible
solution for (5.24) must have either x 2 or x 3, and that the current fractional solution does
not satisfy either one of these constraints. Thus, we now branch on variable x and create two
additional subproblems:
Subproblem 2 Subproblem 1 + Constraint x 3.
None of the two subproblems contains the point (2.8, 1.8), and the optimal solution for the
original LP relaxation can therefore not re-occur when solving either one of the two subprob-
lems. We now choose any one of the above two subproblems to process next. Arbitrarily, we
pick subproblem 2. Solving the problem yields the optimal solution x = 3, and y = 1.5 with
value 255. This solution is still not integral (as y is fractional), but 255 gives us a new, tighter
upper bound on the maximum value of any integral feasible solution for this subproblem.
The subproblem structure explored by the branch & bound algorithm is depicted in Figure
5.2. Each of the subproblems generated so far is shown as a box which are referred to as
branch & bound nodes. The two nodes for subproblems 2 and 3 are connected to their parent
node, and the corresponding edges are labeled with the corresponding constraints that were
added to subproblem 1. The entire gure is commonly known as the branch & bound tree
generated by the algorithm.
The optimal solution for Subproblem 2 is still not integral as the value of y is fractional.
We branch on y, and generate two more subproblems:
Subproblem 4 Subproblem 2 + Constraint y 1.
Running the Simplex algorithm on Subproblem 5, we quickly nd that the problem is
infeasible. Hence, this subproblem has no fractional feasible solution, and thus no integral
one either. Solving Subproblem 4, we nd the solution x = 3
1
3
, and y = 1 of value 250. We
generate two more subproblems by branching on x.
Solving subproblem 6 yields integral solution x = 3, y = 1 of value 230. Solving subprob-
lem 7 gives integral solution x = 4, y = 0 of value 240, which is the current best. Figure 5.3
shows the current status.
So far, we have found a feasible integral solution of value 240 for subproblem 7. On the
other hand, we know from the LP solution at subproblem 1 that no fractional (and hence no
integral) feasible solution can have value bigger than 258. We continue exploring the tree at
subproblem 3. Solving the subproblem yields solution x = 2 and y = 2.45 and value 242.73.
Branching on y gives two more subproblems.
Solving subproblem 8 gives integral solution x = 2, y = 2 of value 220; this is inferior to
the solution of subproblem 7 that we had previously found. Solving subproblem 9 gives the
fractional solution x = 4/3, and y = 3 of value 230. Now observe the following: the optimal
value of this subproblem is 230, and hence no integral solution for this subproblem has value
x>=3 x<=2
y<=1
y>=2
x<=3
x=2.8 y=1.8
value: 258

Subproblem 3
x=3 y=1.5
value: 255
Subproblem 2
x=3 y=1
value: 250
Subproblem 4
Infeasible
Subproblem 5
x=3 y=1
value: 230
Subproblem 6
x>=4
x=4 y=0
value: 240
Subproblem 7
Subproblem 1
Figure 5.3: The branch & bound tree after three iterations.
greater than that. On the other hand, we have already found an integral solution of value 240,
and thus, will not nd a solution that is better within this subproblem. We therefore might as
well stop branching here!
Formally, whenever the optimal value of the current subproblem is at most the value of the
best integral solution that we have already found, then we may stop branching at the current
node. We say: we prune the branch of the branch & bound tree at the current node.
We are done as no unexplored branches of the branch & bound tree remain. The nal
tree is shown in Figure 5.4. The optimal solution to the original IP (5.24) is therefore x = 4,
y = 0 and achieves a value of 240. Therefore, in order to optimize prot, the university is best
advised to invest all resources into the production of pink bow-ties!
5.4.1 A discussion
We conclude this section with a brief discussion. The branch & bound algorithm discussed
here can be viewed as a smart enumeration algorithm: it uses linear programming in a smart
x>=3 x<=2
y<=1 y>=2
x<=3
x=2.8 y=1.8
value: 258
x=2 y=2.45
value: 242.73
Subproblem 3
x=3 y=1.5
value: 255
Subproblem 2
x=3 y=1
value: 250
Subproblem 4
Infeasible
Subproblem 5
x=3 y=1
value: 230
Subproblem 6
x>=4
x=4 y=0
value: 240
Subproblem 7
x=2 y=2
value: 220
Subproblem 8
x=4/3 y=3
value: 230
Subproblem 9
PRUNED
Subproblem 1
Figure 5.4: The nal branch & bound tree.
way, partitioning the space of feasible solutions into mutually exclusive and exhaustive re-
gions. For example, in the pink tie example discussed above, it is instructive to see that any
feasible integral solution occurs in one of the subproblems corresponding to the leaves of the
tree in Figure 5.4. The algorithm is smart as it uses the optimal LP value of a subproblem to
possibly prune it. Sometimes, such pruning can save exploring a large number of potential
solutions hidden in a subproblem.
The algorithm as described is quite exible in many ways, and in our description, we
made many arbitrary choices. For example, if we solve a subproblem, and the solution has
more than one fractional variable, how do we decide which variable to branch on? And once
we branched on a variable, which of the generated problems do we solve next? The depth-
rst search style exploration chosen in our example, where newly generated subproblems
are explored rst, is popular as it leads to integral solutions quickly. However, many other
strategies have been analyzed.
Why do we not simply present the strategy that works best? Well, such a strategy likely
does not exist. This relates to the discussion we started in Chapter3. Solving integer programs
in general is N P-hard, and very likely, no efcient algorithm exists. In practice, however,
branch & bound is nearly always superior to simple enumeration of feasible solutions, and is
used in some form or the other in most commercial codes.
We conclude with one last comment regarding implementation. In this section, we reduced
the problem of nding an optimal production plan for a tie production problem to that of
solving a series of 9 linear programming problems. The reader may notice that these problems
are extremely similar! I.e., branching on a variable merely adds a single constraint to an LP
for which we know an optimal basic solution. Can we use such an optimal basic solution for
a parent problem to compute solutions for the generated subproblems faster? The answer is
yes! The so called dual Simplex algorithm applies in situations where a single constraint is
added to a linear program, and where the old optimal solution is rendered infeasible by this
new constraint. In many cases, the algorithm reoptimizes quickly from the given infeasible
starting point.
The shortest path algorithm presented in Section 5.1 is equivalent to the well-known algorithm
by Dijkstra [8]. This algorithm has several very efcient implementations using sophisticated
heap datastructures; e.g. see the work by Fredman and Tarjan [9].
The Hungarian Algorithm can be implemented very efciently so that it runs in polynomial
time, in n, independent of the weights w! For details of the implementation and a complexity
analysis, see [5].
The algorithms presented in Sections 5.1 and 5.2 are examples of so called primal-dual
algorithms. The primal-dual technique has more recently also been used successfully for
literally hundreds of NP-hard optimization problems. A good introduction to this topic which
is beyond the scope of these notes can be found in the books of Vazirani [19], and Williamson
and Shmoys [20].
When we study integer programs or mixed integer programs, we usually assume that the
data are rational numbers (no irrational number is allowed in the data). This is a reasonable
assumption. Besides, allowing irrational numbers in data can cause some difculties, e.g.,
(IP) may have an optimal solution and the LP relaxation may be unbounded (consider the
problem maxx
1
: x
1
2x
2
= 0, x
1
, x
2
integer). Or, we may need innitely many cutting-
planes to reach a solution of the IP (to construct an example for this, suitably modify the
previous example).
In practice, for hard IP problems, we use much more sophisticated rules for branching,
pruning, choosing the next subproblem to solve, etc. Moreover, as we mentioned before, we
use a hybrid approach, called branch and cut, which uses cutting planes for the original IP
as well as some of the subproblems. In addition, in many hard cases, we adjust our pruning
strategy so that instead of guaranteeing an optimal solution to the IP, we lower our standards
and strive for generating feasible solutions of the IP that are provably within say 10% of
the optimum value (or 2% of the optimum value, etc.). For further background in integer
programming, see, for instance [22].
Chapter 6
Geometry of optimization
In this chapter we introduce a number of geometric concepts and will interpret much of the
material dened in the previous chapters through the lens of geometry. Questions that we will
address include: what can we say about the shape of the set of solutions to a linear program,
how are basic feasible solutions distinguished from the set of all feasible solutions, what does
that say about the simplex algorithm, is there a geometric interpretation of complementary
slackness?
6.1 Feasible solutions to linear programs and polyhedra
max (c
1
, c
2
)x
s.t.
_
_
_
_
_
_
_
_
1 1
0 1
1 0
1 0
0 1
_
_
_
_
_
_
_
_
x
_
_
_
_
_
_
_
_
3
2
2
0
0
_
_
_
_
_
_
_
_
(1)
(2)
(3)
(4)
(5)
(6.1)
We represented the set of all feasible solutions to (6.1) in the following gure. The set of
all points (x
1
, x
2
)
T
satisfying constraint (2) with equality correspond to line (2). The set of
163
6.1. FEASIBLE SOLUTIONS TO LINEAR PROGRAMS AND POLYHEDRA 164
1
2
x
1
x
2
(3)
(1)
(2)
3
1 2 3
0
(4)
(5)
Figure 6.1: Feasible region
all points satisfying constraint (2) correspond to all points to the left of line (2). A similar
argument holds for constraints (1),(3),(4) and (5). Hence, the set of all feasible solutions of
(6.1) is the shaded region (called the feasible region). Looking at examples in R
2
as above
can be somewhat misleading however. In order to get the right geometric intuition we need
to introduce a number of denitions. Given a vector x = (x
1
, . . . , x
n
), the Euclidean norm |x|
of x is dened as,
_
x
2
1
+. . . +x
2
n
and |x| is the length of vector x, i.e. the distance of point x
from the origin.
Remark 38. Let a, b R
n
. Then, a
T
b = |a||b|cos(), where is the angle between a and
b. Therefore, for every pair of nonzero vectors a, b, we have
a
T
b = 0 if and only if a, b are orthogonal,
a
T
b > 0 if and only if the angle between a, b is less than 90
o
,
a
T
b < 0 if and only if the angle between a, b is larger than 90
o
Let a be a non-zero vector with n components and let R, we dene
1. H :=x R
n
: a
T
x = is a hyperplane, and
CHAPTER 6. GEOMETRY OF OPTIMIZATION 165
2. F :=x R
n
: a
T
x is a halfspace.
Consider the following inequality,
a
T
x . ()
Hence, H is the set of points satisfying constraint () with equality and F is the set of points
satisfying constraint (). Suppose that x H and let x be any other point in H. Then a
T
x =
a
T
x = . Equivalently, a
T
(x x) = 0, i.e. a and x x are orthogonal. This implies (1) in the
following remark, we leave (2) as an exercise,
Remark 39. Let x H.
1. H is the set of points x for which a and x x are orthogonal,
2. F is the set of points x for which a and x x form an angle of at least 90
o
.
We illustrate the previous remark in the following gure. The line is the hyperplane H and
the shaded region is the halfspace F.
a
H
F
x
x
x
x
x
x x
In R
2
a hyperplane is a line, i.e. a 1-dimensional object. What about in R
n
? Consider the
hyperplane H :=x R
n
: a
T
x = 0. Then H is a vector space and we know how to dene its
dimension. Recall, that for any mn matrix A we have the relation,
dimx : Ax = 0+rank(A) = n.
It follows that dimx : a
T
x = 0+rank(a
T
) = n. Since by denition a ,=0, rank(a
T
) = 1, i.e.
dim(H) = dim(x : a
T
x = 0) = n1. Hence, hyperplanes are n1 dimensional objects.
6.2. CONVEXITY 166
For any mn matrix A and vector b, we say that P :=x R
n
: Ax b is a polyhedron.
Note, that the set of solutions to any one of the inequalities of Ax b is a halfspace. Thus
equivalently, we could dene a polyhedron to be the intersection of a nite number of halfs-
paces. Given an inequality a
T
x we can rewrite it as a
T
x and given an equation
a
T
x = we can rewrite it as a
T
x and a
T
x . Hence, any set of linear constraints
can be rewritten as Ax b for some matrix A and some vector b. It follows in particular that
the set of feasible solutions to a linear program is a polyhedron. In the following section we
will investigate geometric properties of polyhedra.
6.2 Convexity
Let x
(1)
, x
(2)
be two points in R
n
. We dene the line through x
(1)
and x
(2)
to be the set of points
x
(1)
+(1)x
(2)
: R.
We dene the line segment with ends x
(1)
and x
(2)
to be the set of points
x
(1)
+(1)x
(2)
: 0 1.
Observe that the aforementioned denitions correspond to the commonly used notions of lines
and line segments. A subset C of R
n
is said to be convex if for every pair of points x
(1)
and
x
(2)
in C the line segment with ends x
(1)
, x
(2)
is included in C.
The shaded regions in gure (i) contained in R
2
is convex, as is the shaded region (iii)
contained in R
3
. The shaded regions corresponding to gures (ii) and (iv) are not convex. We
prove this for either case by exhibiting two points x
(1)
, x
(2)
inside the shaded region for which
the line segment with ends x
(1)
, x
(2)
is not completely included in the shaded region.
Remark 40. Halfspaces are convex.
Proof. Let H be a halfspace, i.e. H = x : a
T
x for some vector a R
n
and R.
Let x
(1)
, x
(2)
H. Let x be an arbitrary point in the line segment between x
(1)
and x
(2)
, i.e.
(i)
(ii) (iii) (iv)
x
(1)
x
(2)
x
(1)
x
(2)
x = x
(1)
+(1)x
(2)
for some [0, 1]. We need to show that x H. We have
a
T
x = a
T
_
x
(1)
+(1)x
(2)
_
=
..
0
a
T
x
(1)
. .
+(1)
. .
0
a
T
x
(2)
. .
+(1) = .
Hence, x H as required.
Remark 41. For every j J let C
j
denote a convex set. Then the intersection
C :=
C
j
: j J
is convex. Note, that J can be innite.
Proof. Let x
(1)
and x
(2)
be two points that are in C. Then for every j J, x
(1)
, x
(2)
C
j
and
since C
j
is convex the line segment between x
(1)
and x
(2)
is in C
j
. It follows that the line
segment between x
(1)
and x
(2)
is in C. Hence, C is convex.
Remarks 40 and 41 imply immediately that,
Proposition 42. Polyhedra are convex.
Note, that the unit ball is an example of a convex set that is not a polyhedron. It is the
intersection of an innite number of halfspaces (hence, convex) but cannot be expressed as
the intersection of a nite number of halfspaces.
6.3 Extreme points
We say that a point x is properly contained in a line segment if it is in the line segment but it
is distinct from its ends. Consider a convex set C and let x be a point of C. We say that x is an
extreme point of C if no line segment that properly contains x is included in C. Equivalently,
6.3. EXTREME POINTS 168
Remark 43. x C is not an extreme point of C if and only if
x = x
(1)
+(1)x
(2)
for distinct points x
(1)
, x
(2)
C and with 0 < < 1.
(i)
(ii)
x
(1)
x
(2)
x
In gures (i) and (ii) the shaded regions represent convex sets
included in R
2
. In (i) we indicate each of the six extreme points
by small dark circles. Note that for (ii) every point in the bound-
ary of the shaded gure is an extreme point. This shows in par-
ticular that a convex set can have an innite number of extreme
points. In gure (ii) we illustrate why the point x is not extreme
by exhibiting a line segment with ends x
(1)
, x
(2)
which are con-
tained in the shaded gure and that properly contains x.
Next, we present a theorem that will characterize the extreme points in a polyhedron. We
rst need to introduce some notation and denitions. Let Ax b be a system of inequalities
and let x denote a solution to Ax b. We say that a constraint of a
T
x of Ax b is tight
for x if a
T
x = . Such constraints are also called active in part of the literature. We denote
the set of all inequalities among Ax b that are tight for x by A
=
x = b
=
.
Theorem 44. Let P = x R
n
: Ax b be a polyhedron and let x P. Let A
=
x = b
=
be the
set of tight constraints for x. Then x is an extreme point of P if and only if rank(A
=
) = n.
We will illustrate this theorem on the polyhedron P that is a feasible region of the linear
program (6.1), see Figure (6.1). Suppose x = (1, 2)
T
. We can see in that gure that x is an
extreme point. Let us verify that this is what the previous theorem also indicates. Constraints
(1) and (2) are tight, hence,
A
=
=
_
1 1
0 1
_
.
It follows that rank(A
=
) = 2 = n, hence x is indeed an extreme point. Suppose x = (0, 1)
T
. We
can see in that gure that x is not an extreme point. Let us verify that this is what the previous
theorem also indicates. Constraint (4) is the only tight constraint, hence,
A
=
=
_
1 0
_
.
It follows that rank(A
=
) = 1 < n, hence x is not an extreme point.
Proof of Theorem 44. Assume that rank(A
=
) = n. We will show that x is an extreme point.
Suppose for a contradiction this is not the case. Then there exist (see Remark 43) x
(1)
, x
(2)
P,
where x
(1)
,= x
(2)
and where 0 < < 1 for which x = x
(1)
+(1)x
(2)
. Thus,
b
=
= A x = A
=
_
x
(1)
+(1)x
(2)
_
=
..
>0
A
=
x
(1)
. .
b
+(1)
. .
>0
A
=
x
(2)
. .
b
b
=
+(1)b
=
= b
=
.
Hence, we have equality throughout which implies that A
=
x
(1)
= A
=
x
(2)
= b
=
. As rank(A
=
) = n
there is a unique solution to A
=
x = b
=
. Therefore, x = x
(1)
= x
(2)
, a contradiction.
Assume that rank(A
=
) <n. We will showthat x is not an extreme point. Since rank(A
=
) <n
there exists a non-zero vector d such that A
=
d = 0. Pick > 0 small and dene
x
(1)
:= x +d and x
(2)
:= x d.
Hence, x =
1
2
x
(1)
+
1
2
x
(2)
and x
(1)
, x
(2)
are distinct. It follows that x is in the line segment
between x
(1)
and x
(2)
. It remains to show that x
(1)
, x
(2)
P for > 0 small enough. Observe
rst that
A
=
x
(1)
= A
=
( x +d) = A
=
x
..
=b
+ A
=
d
..
=0
= b
=
.
Similarly, A
=
x
(2)
= b
=
. Let a
T
x be any of the inequalities of Ax b that is not in A
=
x b
=
.
It follows that for > 0 small enough,
a
T
(x
(1)
+d) = a
T
x
(1)
. .
<
+a
T
d ,
hence x
(1)
P and by the same argument, x
(2)
P as well.
Consider the following polyhedron,
P =
_
x R
4
:
_
1 3 1 0
2 2 0 1
_
x =
_
2
1
_
, x 0
_
.
6.3. EXTREME POINTS 170
Note that x = (0, 0, 2, 1)
T
is a basic feasible solution. We claim that x is an extreme point of
P. To be able to apply Theorem 44 we need to rewrite P as the set of solutions to Ax b for
some matrix A and vector b. This can be done by choosing,
A =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 3 1 0
2 2 0 1
1 3 1 0
2 2 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
and b =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
2
1
1
1
0
0
0
0.
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
Let A
=
x b
=
be the set of tight constraints for x, then
A
=
=
_
_
_
_
_
_
_
_
_
_
1 3 1 0
2 2 0 1
1 3 1 0
2 2 0 1
1 0 0 0
0 1 0 0
_
_
_
_
_
_
_
_
_
_
and it can be readily checked that the rst two and last two rows of A
=
form a set of 4 linearly
independent rows. Hence rank(A
=
) 4 =n. This implies (as x is feasible) by Theorem 44 that
x is an extreme point of P. Using the idea outlined in the previous example we leave it as an
exercise to prove the following theorem which relates basic feasible solutions (for problems
in standard equality form) to extreme points,
Theorem 45. Let A be a matrix where the rows are linearly independent and let b be a vector.
Let P =x : Ax = b, x 0 and let x P. Then x is an extreme point of P if and only if x is a
basic solution of Ax = b.
6.4 Geometric interpretation of Simplex algorithm
max z = 2x
1
+ 3x
2
s.t.
2x
1
+ x
2
10 (1)
x
1
+ x
2
6 (2)
x
1
+ x
2
4 (3)
x
1
, x
2
0.
(P)
In Figure 6.2 we indicate the feasible region of this linear program as well as a feasible
2
4
x
1
x
2
(3)
(1)
(2)
x*
6
8
10
2 4 6 8 -2 -4
0
z=7
Figure 6.2: Feasible region and optimal solution
solution x
= (1, 5)
T
. The line z = 7 indicates the set of all vectors for which the objective
function evaluates to 7. The points above this line have objective value greater than 7 and the
points below this line have value less than 7. Since there are no points in the feasible region
above the line z = 7, it follows that x
is an optimal solution. As the line z = 7 intersects the

feasible region in only one point, namely x
, this also implies that x
is the unique optimal

solution.
6.4. GEOMETRIC INTERPRETATION OF SIMPLEX ALGORITHM 172
Let us use the simplex algorithm to nd this optimal solution x
. We rst need to refor-

mulate this problem in standard equality form. This can be achieved by introducing slack
variables x
3
, x
4
, x
5
for constraints respectively (1),(2) and (3) of (P). We obtain,
max z = 2x
1
+ 3x
2
s.t.
2x
1
+ x
2
+ x
3
= 10 (1)
x
1
+ x
2
+ x
4
= 6 (2)
x
1
+ x
2
+ x
5
= 4 (3)
x
1
, x
2
, x
3
, x
4
, x
5
0.
(
P)
Given any point x = (x
1
, x
2
)
T
we dene,
x :=
_
_
_
_
_
_
_
_
x
1
x
2
102x
1
x
2
6x
1
x
2
4+x
1
x
2
_
_
_
_
_
_
_
_
,
i.e. the components x
3
, x
4
, x
5
are dened as the value of the slack of the constraints (1),(2) and
(3) respectively of (P). Thus, x is feasible for (P) if and only if x is feasible for (
P). Suppose
x is a feasible solution of (P) which is not an extreme point of the feasible region. Then x
is properly contained in the line segment with ends x
(1)
, x
(2)
where x
(1)
, x
(2)
are feasible for
(P), i.e. x
(1)
,= x
(2)
and there exists such that 0 < < 1 and x = x
(1)
+ (1 )x
(2)
. It
can be readily checked that x = x
1
+ (1 ) x
2
. Hence, x is properly contained in the line
segment with ends x
1
, x
2
. In particular, x is not an extreme point of the feasible region of (
P).
Conversely, if x is not an extreme point for (
P) then x is not an extreme point for (P). Hence,

Remark 46. x is an extreme point for the feasible region of (P) if and only if x is an extreme
point for the feasible region of (
P).
Starting in Section 2.3 we solved the linear program (
P). The following table summarizes

the sequence of bases and basic solutions we obtained,
ITERATION BASIS x
T
x
T
1 3, 4, 5 (0, 0, 10, 6, 4) (0, 0)
2 1, 4, 5 (5, 0, 0, 1, 9) (5, 0)
3 1, 2, 5 (4, 2, 0, 0, 6) (4, 2)
4 1, 2, 3 (1, 5, 3, 0, 0) (1, 5)
At each step, x is a basic solution. It follows from Theorem 45 that x must be an extreme point
of the feasible region of (
P). Hence, by Remark 46, x must be an extreme point of the feasible

region of (P). We illustrate this in Figure 6.3. Each of x
(1)
, x
(2)
, x
(3)
, x
(4)
is an extreme point
and the simplex moves from one extreme point to another adjacent extreme point. In this
2
4
x
1
x
2
x
6
8
10
2 4 6 8 -2 -4
0
(1)
x
(2)
x
(3)
x
(4)
Figure 6.3: Sequence of extreme points visited by simplex
example, at each iteration, we move to a different basic feasible solution. The simplex goes
from one feasible basis to another feasible basis at each iteration. It is possible however, that
the corresponding basic solutions for two successive bases are the same. Thus the simplex
algorithm can keep the same feasible basic solution for a number of iterations. With a suitable
rule for the choice of entering and leaving variables the simplex will eventually move to a
different basic solution, see Section 2.5.3.
6.5 Cutting planes
Recall the example we considered in Section 5.3.1. The integer program we need to solve is,
max 3x
1
+10x
2
subject to
x
1
+4x
2
8
x
1
+x
2
4
x
1
, x
2
0
x
1
, x
2
integer.
In Figure 6.4 (left) we indicate the feasible region of the LP relaxation (P1) of the integer
program as well as the optimal solution x
(1)
= (8/3, 4/3)
T
of (P1). Since x
(1)
is not integral
1
2
x
1
x
2
x
1 2 3 4
0
1
2
x
1
x
2
x
1 2 3 4
0
(1)
(2)
( )
*
Figure 6.4: Cutting plane algorithm
we add a cutting plane (): x
1
+3x
2
6 to the LP relaxation to obtain the LP relaxation (P2)
indicated on Figure 6.4 (right). Note, that it is easy to see on the gure that () is indeed a
cutting plane. It is not satised for x
(1)
and every integer point which is feasible for (P1) is
satised by (). The optimal solution of the LP relaxation (P2) is x
(2)
= (0, 2)
T
. As x
(2)
is
integer, it must be an optimal solution to the IP.
6.6 A geometric interpretation of optimality
Given a linear program (P) and a feasible solution x we will give a geometric characterization
of when x is an optimal solution for (P). We will see that this geometric statement is equivalent
to the complementary slackness conditions.
a
1
a
2
a
3
Before we can state this result we will need a number
of preliminary denitions. Let a
1
, . . . , a
k
be a set of vectors
in R
n
. We dene the cone generated by a
1
, . . . , a
k
to be the
set
C =
_
k
i=1
i
a
i
:
i
0 for all i = 1, . . . , k
_
,
i.e., C is the set of all points that can be obtained by mul-
tiplying each of a
1
, . . . , a
k
by a non-negative number and
adding all of the resulting vectors together. We denote C by
conea
1
, . . . , a
k
. Consider the gure on the right. We represent each of the vectors a
1
, a
2
, a
3
by an arrow from the origin and the innite region containing a
3
bounded by the half lines
from the origin determined by a
1
and a
2
respectively is conea
1
, a
2
, a
3
. Note, we dene the
cone generated by the empty set to be the zero vector.
Let P = x : Ax b be a polyhedron and let x P. Let J( x) be the row indices of A
corresponding to the tight constraints of Ax b for x. We dene the cone of tight constraints
for x to be the cone C generated by the rows of A corresponding to the tight constraints, i.e.
C = conerow
i
(A)
T
: i J. Consider for instance,
max (3, 1)x
s.t.
_
_
_
1 0
1 1
0 1
_
_
_
x
_
_
_
2
3
2
_
_
_
.
(1)
(2)
(3)
(6.2)
Then, x = (2, 1)
T
is a feasible solution to (6.2). The tight constraints are constraints (1) and
6.6. A GEOMETRIC INTERPRETATION OF OPTIMALITY 176
(2). Hence, the cone of tight constraints for x is,
C := cone
__
1
0
_
,
_
1
1
__
.
The objective function for (6.2) is z = (3, 1)x. Note, that (3, 1)
T
C as,
_
3
1
_
= y
1
_
1
0
_
+ y
2
_
1
1
_
(6.3)
for y
1
= 2 and y
2
= 1. This can also be seen in the following gure that represents the feasible
region of (6.2) and the the cone C.
0
1
1
x
1
x
2
(3)
(1)
(2)
2
2
x
3
1
1
0
1
1
Figure 6.5: Geometric interpretation of optimality
We claim, this implies that x is an optimal solution. The dual of (6.2) is given by,
min (2, 3 , 2)y
subject to
_
1 1 0
0 1 1
_
y =
_
3
1
_
y 0,
(6.4)
and the complementary slackness conditions state that,
(CS1) y
1
= 0 or x
1
= 2,
(CS2) y
2
= 0 or x
1
+x
2
= 3,
(CS3) y
3
= 0 or x
2
= 2.
We have y
1
= 2 and y
2
= 1, set y
3
= 0. Then (6.3) implies that,
_
3
1
_
= y
1
_
1
0
_
+ y
2
_
1
1
_
+ y
3
_
0
1
_
=
_
1 1 0
0 1 1
_
_
_
_
y
1
y
2
y
3
.
_
_
_
.
Hence, y = ( y
1
, y
2
, y
3
)
T
is a feasible solution to (6.4). Since (1) and (2) are tight constraints
for x in (6.2) conditions (CS1) and (CS2) hold. Since y
3
= 0 condition (CS3) holds. Hence,
x is feasible for (6.2), y is feasible for its dual (6.4) and x, y satisfy complementary slackness
conditions. It follows from Theorem 27 that x is an optimal solution to (6.2). Thus we proved
that as the vector of the objective function is in the cone generated by the tight constraints of
x, the feasible solution x is an optimal solution. This argument generalizes as the following
theorem indicates.
Theorem 47. Let x be a feasible solution to
maxc
T
x : Ax b. (P)
Then, x is an optimal solution to (P) if and only if c is in the cone of tight constraints for x.
Proof. The dual of (P) is given by
minb
T
y : A
T
y = b, y 0 (D)
Let m denote the number of row of A. The complementary slackness conditions state that,
(CS) y
i
= 0 or row
i
(A)x = b
i
for every i 1, . . . , m.
Suppose c is in the cone of the tight constraints for x. Let J 1, . . . , m be the row indices
of A corresponding to the tight constraints of Ax b. Then, c conerow
i
(A)
T
: i J, i.e.
there exists y
i
0 for all i J( x) such that c =
iJ
y
i
row
i
(A)
T
. For each i 1, . . . , mJ( x)
set y
i
= 0. Then,
c =
iJ
y
i
row
i
(A)
T
=
m
i=1
y
i
row
i
(A)
T
= A
T
y,
where y = ( y
1
, . . . , y
m
)
T
. As y 0, y is feasible for (D). Finally, by construction if y
i
> 0 then
i J( x) and row
i
(A) x = b
i
. Thus (CS) holds. It follows from Theorem 27 that x is optimal for
(P). Assume that x is optimal for (P). It follows from Theorem 27 that there exist a feasible
solution y for (D) such that (CS) holds for x and y. Dene, J :=i : y
i
> 0. Then
c = A
T
y =
m
i=1
y
i
row
i
(A) =
iJ
y
i
row
i
(A).
Hence, c conerow
i
(A)
T
: i J. For i J( x), y
i
> 0 and by (CS) row
i
(A)
T
x = b
i
, i.e.
row
i
(A) corresponds to a tight constraint of (P). Hence c is in the cone of the tight constraints
for x.
The optimality condition described by this theorem generalizes to a special class of non-
linear optimization problems. The corresponding theorem is known the Karush-Kuhn-Tucker
Theorem and is one of the main topics of the next chapter.
In Chapter 2, we hinted at another geometric interpretation of the Simplex algorithm, due
to Dantzig, which gave the method its name. Dantzigs interpretation looks at the column
geometry in the primal (in this chapter, we considered the cones of certain subsets of rows of
A) by considering objects equivalent to cones of subsets of columns of A. For further details,
see [7].
Some of our discussion in this chapter is related to a beautiful area of mathematics: convex
geometry. In this area, there are very elegant theorems which relate to the theory of linear
optimization, e.g., Carath eodorys Theorem and Hellys Theorem. For further details, see [4].
Another fundamental theorem is the Hyperplane Separation Theorem which states that any
convex set and any point not in the given convex set can be separated by a hyperplane. This
result is strong enough to imply results like Farkas Lemma and the Strong Duality Theorem.
For further details, see [14].
Chapter 7
Nonlinear optimization
In this chapter, we dene nonlinear programs (NLP). We will show that solving general NLP
is likely to be difcult, even when the problem has small size. One reason is that the feasible
region (the set of all feasible solutions) of an NLP is not always convex. We therefore, turn
our attention to the special case where the feasible region is convex. We discuss optimality
conditions in this chapter.
Key concepts covered in this chapter include: convex functions, epigraphs, subgradients,
Lagrangians, and the Karush-Kuhn-Tucker Theorem.
7.1 What is a nonlinear program?
A NonLinear Program (NLP) is an optimization problem of the following form:
min z = f (x)
s.t.
g
1
(x) 0,
g
2
(x) 0,
.
.
.
.
.
.
.
.
.
g
m
(x) 0,
(7.1)
where f : R
n
R, and g
i
: R
n
R, for every i 1, 2, . . . , m.
179
7.1. WHAT IS A NONLINEAR PROGRAM? 180
Example 11. Suppose n = 2 and m = 4 in (7.1) and that for x = (x
1
, x
2
)
T
R
2
we have,
f (x) := x
2
, and
g
1
(x) :=x
2
1
x
2
+2, g
2
(x) := x
2
3
2
, g
3
(x) := x
1
3
2
, g
4
(x) :=x
1
2.
The feasible region is a subset of R
2
. It corresponds to the union of the two shaded regions
in Figure 7.1. For instance, g
1
(x) = x
2
1
x
2
+2 0 or equivalently, x
2
2 x
2
1
. Thus the
solution to the rst constraint of the NLP is the set of all points above the curve indicated by g
1
in the gure. As we are trying to nd among all points x = (x
1
, x
2
)
T
in the feasible region the
one that minimizes f (x) = x
2
, the unique optimal solution will be the point a = (2, 2)
T
.
Observe that the feasible region is not convex, indeed it is not even connected (i.e. there
does not exists a continuous curve contained in the feasible region joining any two points of
the feasible region).
x
1
x
2
1 2
0
1 2
a
b
g
1
g
2
g
3
g
4
1
2
1
2
Figure 7.1: Feasible region for Example 11
Example 12. Suppose n = 2 and m = 3 in (7.1) and that for x = (x
1
, x
2
)
T
R
2
we have,
f (x) :=x
1
x
2
, and
g
1
(x) :=x
1
+x
2
2
, g
2
(x) :=x
2
+x
2
1
, g
3
(x) :=x
1
+
1
2
.
CHAPTER 7. NONLINEAR OPTIMIZATION 181
The feasible region is a subset of R
2
. It corresponds to the shaded region in Figure 7.2.
For instance g
2
(x) = x
2
+x
2
1
0 or equivalently, x
2
x
2
1
. Thus the solution to the second
constraint of the NLP is the set of all points above the curve indicated by g
2
in the gure.
For g
1
we interchange the role of x
1
and x
2
in g
2
. We will prove that the feasible solution
a = (1, 1)
T
is an optimal solution to the NLP. Observe that the feasible region is convex in
this case.
x
1
x
2
a =
1
1
g
2
g
1
1
4
1
2
3
4
1
1
4
1
2
3
4
1
0
g
3
Figure 7.2: Feasible region for Example 12
Observe that if in (7.1) every function f and g
i
is afne (that is, a function of the form
a
T
x + for a given vector a and a given constant ), then we have f (x) = c
T
x +c
0
, and
g
i
(x) =a
T
i
xb
i
, for every i 1, 2, . . . , m and we see that our nonlinear optimization problem
(7.1) becomes a linear optimization problem. Thus, NLPs generalize linear programs but as
we will show in the next section they are much harder to solve than linear programs.
7.2. NONLINEAR PROGRAMS ARE HARD 182
7.2 Nonlinear programs are hard
7.2.1 NP-hardness
Nonlinear optimization programs can be very hard in general. For instance, suppose that for
every j 1, . . . , n we have the constraints,
x
2
j
x
j
0,
x
2
j
+x
j
0.
Then the constraints dene the same feasible region as the quadratic equations: x
2
j
= x
j
, for
every j 1, 2, . . . , n. Therefore, the feasible region of these constraints is exactly the 0,1
vectors in R
n
. Now, if we also add the constraints Ax b, we deduce that every 0,1 integer
programming problem can be formulated as an NLP. In other words, 0,1 integer program-
ming is reducible to nonlinear programming. Moreover, this reduction is clearly polynomial.
Therefore, solving an arbitrary instance of an NLP is at least as hard as solving an arbitrary
instance of a 0,1 integer programming problem. In particular, as 0,1 feasibility is NP-hard
(see Section 3.3.3), so is nonlinear programming feasibility.
7.2.2 Hard small dimensional instances
We might be tempted to think that if the number of variables in the NLP is very small perhaps
then solving NLP would be easy. However, this is not the case as we will illustrate.
Pierre de Fermat in 1637, conjectured the following result,
Theorem 48. There do not exist integers x, y, z 1 and integer n 3 such that x
n
+y
n
= z
n
.
Fermat, wrote his conjecture in the margin of a journal, and claimed to have a proof of this
result, but that it was too large to t in the margin. This conjecture became known as Fermats
Last theorem. The rst accepted proof of this result was published in 1995, some 358 years
after the original problem was proposed. We will show that a solution to a very simple looking
NLP with only four variables has the following key property: the optimal objective value of
zero is attained, if and only if Fermats Last Theorem is true. Hence, solving this particular
NLP is at least as hard as proving Fermats Last Theorem! In this NLP, see (7.1), we have 4
variables (n = 4) and 4 constraints (m = 4). For x = (x
1
, x
2
, x
3
, x
4
)
T
R
4
,
f (x) :=
_
x
x
4
1
+x
x
4
2
x
x
4
3
_
2
+(sinx
1
)
2
+(sinx
2
)
2
+(sinx
3
)
2
+(sinx
4
)
2
,
and
g
1
(x) := 1x
1
, g
2
(x) := 1x
2
, g
3
(x) := 1x
3
, g
4
(x) := 3x
4
.
First observe that the feasible region of this NLP is given by,
S :=x R
4
: x
1
1, x
2
1, x
3
1, x
4
3.
Note that f (x) is a sum of squares. Therefore, f (x) 0 for every x R
4
and it is equal to zero
if and only if every term in the sum is zero, i.e.,
x
x
4
1
+x
x
4
2
= x
x
4
3
and sinx
1
= sinx
2
= sinx
3
= sinx
4
= 0.
The latter string of equations is equivalent to x
j
being integer for every j 1, 2, 3, 4. More-
over, the feasibility conditions require x
1
1, x
2
1, x
3
1, x
4
3. Therefore, f ( x) = 0 for
some x S if and only if x
j
is a positive integer for every j with x
4
3, and x
x
4
1
+ x
x
4
2
= x
x
4
3
.
That is, if and only if Fermats Last Theorem is false. Surprisingly, it is not difcult to prove
(and it is well-known) that the inmum of f over S is zero. Thus, the difculty here lies
entirely in knowing whether the inmum can be attained.
We just argued that some nonlinear optimization problems can be very hard even if the
number of variables is very small (e.g., at most 10) or even if the nonlinearity is bounded
(e.g., at most quadratic functions). However, by carefully isolating the special nice structures
in some classes of nonlinear programming problems and exploiting these structures allow us
to solve many, large-scale non-linear programs in practice.
7.3 Convexity
Consider an NLP of the form given in (7.1) and denote by S the feasible region. We say that
x S is locally optimal if for some positive d R we have that f ( x) f (x) for every x S
7.3. CONVEXITY 184
where |x x| d, i.e. no feasible solution of the NLP that is within distance d of x has better
value than x. We sometimes call an optimal solution to the NLP, a globally optimal solution.
It is easy to verify, that if S is convex, then locally optimal solutions are globally optimal.
However, when S is not convex we can have locally optimal solutions that are not globally
optimal. This is illustrated in Example 11. There, b is locally optimal, yet a ,= b is the only
globally optimal solution.
A natural scheme for solution an optimization problem is as follows: nd a feasible solu-
tion, and then repeatedly, either (i) show that the current feasible solution is globally optimal,
using some optimality criteria, or (ii) try to nd a better feasible solution (here better might
mean one with better value for instance, though this may not always be possible). The simplex
algorithm for linear programming follows this scheme. Both steps (i) and (ii) may become
difcult when the feasible region is not convex. We will therefore turn our attention to the
case where the feasible region is convex. In this section, we establish sufcient conditions for
the feasible region of an NLP to be convex (see Remark 51).
7.3.1 Convex functions and epigraphs
We say that the function f : R
n
R is convex if for every pair of points x
(1)
, x
(2)
R
n
and for
every [0, 1],
f
_
x
(1)
+(1)x
(2)
_
f
_
x
(1)
_
+(1) f
_
x
(2)
_
.
In other words, f is convex if for any two points x
(1)
, x
(2)
the unique linear function on the
line segment between x
(1)
and x
(2)
that takes the same value for x
(1)
and x
(2)
as f , dominates
the function f . An example of a convex function is given in Figure 7.3. An example of a
non-convex function is given in Figure 7.4.
Example 13. Consider the function f : R R where f (x) = x
2
. Let a, b R be arbitrary, and
x R
n
f(x
(1)
) + (1 )f(x
(2)
)
x
(1)
x
(2)
f(x)
y R
Figure 7.3: Convex function
f(x
(1)
) + (1 )f(x
(2)
)
x
(1)
x
(2)
f(x)
y R
x R
n
Figure 7.4: Non-convex function
consider an arbitrary [0, 1]. To prove that f is convex we need to verify
1
that
[a+(1)b]
2
?
a
2
+(1)b
2
.
Clearly, we may assume that / 0, 1, i.e. that 0 < < 1. After expanding and simplifying
the terms, it sufces to verify that,
(1)2ab
?
(1)(a
2
+b
2
),
or equivalently as , (1) >0, that a
2
+b
2
2ab 0, which is clearly the case as a
2
+b
2
2ab = (ab)
2
, and the square of any number is non-negative.
The concepts of convex functions and convex sets are closely related through the notion of
epigraph of a function. Given f : R
n
R, dene the epigraph of f as
epi( f ) :=
__

x
_
RR
n
: f (x)
_
.
In both gures 7.5 and 7.6 we represent a function f : R R and its corresponding epigraph
(represented as the shaded region going to innity in the up direction). The following result,
relates convex functions and convex sets,
1
Applying the denition is not the only way to prove that a function is convex. In this particular case, for
instance, we can compute its second derivative and observe that it is non-negative.
7.3. CONVEXITY 186
Proposition 49. Let f : R
n
R. Then, f is convex if and only if epi( f ) is a convex set.
Observe in the previous proposition that the epigraph is living in an (n+1)-dimensional space.
The function in Figure 7.5 is convex as its epigraph is convex. However, the function in
Figure 7.6 is not convex as its epigraph is not convex.
f(x)
(x, y), y f(x)
x, f(x)
epi(f)
x
x R
n
y R
Figure 7.5: Convex epigraph
f(x)
x R
n
y R
epi(f)
Figure 7.6: Non-convex epigraph
Proof of Proposition 49. Suppose f : R
n
R is convex. Let
_

1
u
_
,
_

2
v
_
epi( f ) and
[0, 1]. We have
f (u+(1)v) f (u) +(1) f (v)
1
+(1)
2
,
which implies
_

1
+(1)
2
u+(1)v
_
epi( f ). Note that in the above, the rst inequality
uses the convexity of f and the second inequality uses that facts 0, (1 ) 0 and
_

1
u
_
,
_

2
v
_
epi( f ).
Now, suppose that epi( f ) is convex. Let u, v R
n
and [0, 1]. Then
_
f (u)
u
_
,
_
f (v)
v
_
epi( f ). Hence,
_
f (u)
u
_
+(1)
_
f (v)
v
_
epi( f ).
This implies (by the denition of epi( f )), f (u+(1)v) f (u) +(1 ) f (v). There-
fore, f is convex.
7.3.2 Level sets and feasible region
Let g : R
n
R be a convex function and let R. We call the set
x R
n
: g(x)
a level set of the function g.
Remark 50. The level set of a convex function is a convex set.
In Figure 7.7 we represent a convex function with a convex level set. In Figure 7.8 we rep-
resent a non-convex function with a non-convex level set. We leave it as an exercise to show
however, that it is possible to have a non-convex function where every level set is convex.
{x R
n
: f(x) }
f(x)
y R
x R
n
y =
Figure 7.7: Convex level set
f(x)
y R
x R
n
y =
{x R
n
: f(x) }
Figure 7.8: Non-convex level set
Proof of Remark 50. Let g : R
n
R be a convex function and let R. We need to show
that S :=x R
n
: g(x) is convex. Let x
(1)
, x
(2)
S and let [0, 1]. Then
g
_
x
(1)
+(1)x
(2)
_

..
0
g
_
x
(1)
_
. .
+(1)
. .
0
g
_
x
(2)
)
_
. .
,
where the rst inequality follows from the fact that g is a convex function. It follows that
x
(1)
+(1)x
(2)
S. Hence, S is convex as required.
7.4. RELAXING CONVEX NLPS 188
Consider the NLP dened in (7.1). We say that it is a convex NLP if g
1
, . . . , g
m
and f are
all convex functions. It follows in that case from Remark 50 that for every i 1, . . . , n the
level set x R
n
: g
i
0 is a convex set. Since the intersection of convex sets is a convex set
(see Remark 41), we deduce that the feasible region,
x R
n
: g
1
(x) 0, g
2
(x) 0, . . . , g
m
(x) 0 (7.2)
is convex as well. Hence,
Remark 51. The feasible region of a convex NLP is a convex set.
When g
1
, . . . , g
m
are all afne functions, then the feasible region (7.2) is a polyhedron. More-
over, in that case, the functions are clearly convex. Hence, the previous result implies in
particular that every polyhedron is a convex set, which was the statement of Proposition 42.
7.4 Relaxing convex NLPs
Consider an NLP of the form (7.1) and let x be a feasible solution. We say that constraint
g
j
(x) 0 is tight for x if g
j
( x) = 0 (see also Section 6.3). We will show that under the right
circumstances, we can replace in a convex NLP a tight constraint by a linear constraint such
that the resulting NLP is a relaxation of the original NLP (see Corollary 53). This will allow
us in Section 7.5 to use our optimality conditions for linear programs to derive a sufcient
condition for a feasible solution to be an optimal solution to a convex NLP. The key concepts
that will be needed in this section are that of subgradients and supporting halfspaces.
7.4.1 Subgradients
Let g R
n
R be a convex function and let x R
n
. We say that s R
n
is a subgradient of f
at x if for every x R
n
the following inequality holds,
f ( x) +s
T
(x x) f (x).
Denote by h(x) the function f ( x) +s
T
(x x). Observe that h(x) is an afne function ( x is a
constant). Moreover, we have that h( x) = f ( x) and h(x) f (x) for every x R
n
. Hence, the
function h(x) is an afne function that provides a lower bound on f (x) and approximates f (x)
around x. See Figure 7.9.
x R
n
h(x) := f( x) + s
T
(x x)
x
f(x)
y R
Figure 7.9: Subgradient
Example 14. Consider g : R
2
R where for every x = (x
1
, x
2
)
T
we have g(x) = x
2
2
x
1
. It
can be readily checked (see Example 13) that g is convex. We claim that s := (1, 2)
T
is a
subgradient of g at x = (1, 1)
T
. We have h(x) :=g( x)+s
T
(x x) =0+(1, 2)
T
_
x(1, 1)
_
=
x
1
+2x
2
1. We need to verify, h(x) f (x) for every x R
2
, i.e. that
x
1
+2x
2
1
?
x
2
2
x
1
,
or equivalently that x
2
2
2x
1
+1 = (x
2
1)
2
0 which clearly holds as the square of any
number is non-negative.
7.4.2 Supporting halfspaces
Recall the denitions of hyperplanes and halfspaces from Chapter 6.1. Consider a convex set
C R
n
and let x C. We say that the halfspace F :=x R
n
: s
T
x (s R
n
and R)
is a supporting halfspace of C at x if the following conditions hold:
7.4. RELAXING CONVEX NLPS 190
1. C F
2. s
T
x = , i.e. x is on the hyperplane that denes the boundary of F.
In Figure 7.10 we represent a convex set C R
2
. For the point x
(1)
C there is a unique
supporting halfspace. For the point x
(2)
C there are an innite number of different supporting
halfspaces, we represent two of these.
many supporting
halfspaces
unique supporting
halfspace
C
x
(1)
x
(2)
s
Figure 7.10: Supporting halfspace
The following remark relates subgradients and supporting halfspaces.
Remark 52. Let g : R
n
R be a convex function, let x R
n
such that g( x) = 0, and let s R
n
be a subgradient of f at x. Denote by C the level set x R
n
: g(x) 0 and by F the halfspace
x R
n
: g( x) +s
T
(x x) 0.
2
Then F is a supporting halfspace of C at x.
Proof. We need to verify conditions (1), (2) of supporting halfspaces. (1) Let x
/
C. Then
g(x
/
) 0. Since s is a subgradient of g at x, g( x) +s
T
(x
/
x) g(x
/
). It follows that g( x) +
s
T
(x
/
x) 0, i.e. x
/
F. Thus C F. (2) s
T
x = s
T
x g( x) as g( x) = 0.
2
F is clearly a halfspace since we can rewrite it as x : s
T
x s
T
x g( x) and s
T
x g( x) is a constant.
This last remark is illustrated in Figure 7.11. We consider the function g : R
2
R, where
g(x) = x
2
2
x
1
and the point x = (1, 1)
T
. We saw in Example 14 that the vector s = (1, 2)
T
is a subgradient for g at x. Then F =x R
2
: x
1
+2x
2
1 0. We see in the gure that
F is a supporting halfspace of C at x as predicted.
We deduce the following useful tool from the previous remark,
Corollary 53. Consider an NLP of the form given in (7.1). Let x be a feasible solution and
suppose that constraint g
i
(x) 0 is tight for some i 1, . . . , m. Suppose g
i
is a convex func-
tion that has a subgradient s at x. Then the NLP obtained by replacing constraint g
i
(x) 0
by the linear constraint s
T
x s
T
x g
i
( x) is a relaxation of the original NLP.
0
1 2 x
1
x
2
s
F
C
1
2
x
g
Figure 7.11: Subgradient and supporting halfspace
7.5 Optimality conditions for the differentiable case
In this section we consider convex NLPs of the form (7.1) that satisfy the additional condition
that all of the functions f and g
1
, . . . , g
m
are differentiable. In that setting we can characterize
7.5. OPTIMALITY CONDITIONS FOR THE DIFFERENTIABLE CASE 192
(see Theorem 56) when a feasible solution x is an optimal solution (assuming the existence of
a Slater point).
7.5.1 Sufcient conditions for optimality
We claim that it is sufcient to consider NLPs where the objective function is linear, i.e. of
the form,
min z = c
T
x
s.t.
g
1
(x) 0,
g
2
(x) 0,
.
.
.
.
.
.
.
.
.
g
m
(x) 0.
(7.3)
This is because problem (7.1) is reducible to problem (7.3). To prove this fact, we can proceed
as follows: given a NLP of the form (7.1) introduce a new variable x
n+1
and add the constraint
f (x) x
n+1
, to obtain the NLP,
max z =x
n+1
s.t.
f (x) x
n+1
0,
g
1
(x) 0,
g
2
(x) 0,
.
.
.
.
.
.
.
.
.
g
m
(x) 0.
This NLP is of the form (7.3), and minimizing x
n+1
is equivalent to minimizing f (x).
Let x be a feasible solution to the NLP (7.3) and assume that it is a convex NLP. Let
us derive sufcient conditions for x to be an optimal solution to (7.3). Dene, J( x) :=
i : g
i
( x) = 0. That is, J( x) is the index set of all constraints that are tight at x. Suppose
for every i J( x) we have a subgradient s
i
of the function g
i
at the point x. Then we construct
a linear programming relaxation of the NLP as follows: rst we omit every constraint that is
not tight at x, and for every i J( x), we replace (see Corollary 53) the constraint g
i
(x) 0 by
the linear constraint s
T
i
x s
T
i
x g( x). Since the objective function is given by minc
T
x we
can rewrite it as maxc
T
x. The resulting linear program is thus,
max z =c
T
x
s.t.
s
T
i
x s
T
i
x g( x) for all i J( x)
(7.4)
Theorem 47 says that x is an optimal solution to (7.4), and hence of (7.3) as it is a relaxation
of (7.4), when c is in the cone of the tight constraints, i.e. if c cones
i
: i J( x). Hence,
we proved the following,
Proposition 54. Consider the NLP (7.3) and assume that g
1
, . . . , g
m
are convex function. Let
x be a feasible solution and suppose that for all i J( x) we have a subgradient s
i
at x. If
c cones
i
: i J( x) then x is an optimal solution.
Thus we have sufcient conditions for optimality. Theorem 56 that we will state in the next
section, essentially says that when the subgradients are unique, then these conditions are also
necessary. We illustrate the previous proposition on an example.
x
1
x
2
(1, 1)
g
2
g
1
1
4
1
2
3
4
1
1
4
1
2
3
4
1
0
g
3
F
1
F
2
Figure 7.12: Linear programming relaxation of NLP
Example 12 continued. Let us use the previous proposition to show that the point x = (1, 1)
T
is an optimal solution to the NLP given in Example 12. In this case J( x) = 1, 2, and the
feasible region for the linear programming relaxation (7.4) corresponds to the shaded region
7.5. OPTIMALITY CONDITIONS FOR THE DIFFERENTIABLE CASE 194
in Figure 7.12. In Example 14 we showed that the subgradient of g
1
at x is (1, 2)
T
. Similarly,
we can show that the subgradient of g
2
at x is (2, 1)
T
. In this example with have c = (1, 1)
T
,
thus Proposition 54 asks us to verify that
c =
_
1
1
_
?
cone
__
1
2
_
,
_
2
1
__
,
which is the case since,
_
1
1
_
= 1
_
1
2
_
+1
_
2
1
_
.
7.5.2 Differentiability and gradients
Let f : R
n
R and x R
n
be given. If there exists s R
n
such that
lim
h0
f ( x +h) f ( x) s
T
h
|h|
2
= 0,
we say that f is differentiable at x and call the vector s the gradient of f at x. We denote s by
f ( x). We will use the following without proof,
Proposition 55. Let f : R
n
R be a convex function and let x R
n
. If the gradient f ( x)
exists then it is a subgradient of f at x.
Note that in the above denition of the gradient, h varies over all vectors in R
n
. Under some
slightly more favorable conditions, we can obtain the gradient f ( x) via partial derivatives of
f at x. For example, suppose that for every j 1, 2, . . . , n, the partial derivatives
f
x
j
exist
and are continuous at every x R
n
. Then,
f (x) =
_
f (x)
x
1
,
f (x)
x
2
, . . . ,
f (x)
x
n
_
T
,
and the gradient of f at x is given by f ( x).
Example 14 continued. Consider g : R
2
R where for every x = (x
1
, x
2
)
T
we have g(x) =
x
2
2
x
1
. In Example 14 we gave a proof that (1, 2)
T
is a subgradient of g at x = (1, 1)
T
. We
give an alternate proof based on the previous discussion. The partial derivatives of g exists
and g(x) = (2x
2
, 1)
T
. Evaluating, at x we deduce that (2, 1)
T
is the gradient of g at x.
Since g is convex, Proposition 55 implies that (2, 1)
T
is also a subgradient of g at x.
7.5.3 A Karush-Kuhn-Tucker Theorem
To state the optimality theorem we need a notion of strictly feasible point for NLP. More
rigorously, we say that the NLP has a Slater point) if there exists x
/
such that g
i
(x
/
) < 0 for
every i 1, . . . , m, i.e. every inequality is satised strictly by x
/
. For instance in Example 12,
the point (
3
4
,
3
4
)
T
is a Slater point.
We can now state our optimality theorem,
Theorem 56 (Karush-Kuhn-Tucker Theorem based on the gradients). Consider a convex NLP
of the form (7.1) that has a Slater point. Let x R
n
be a feasible solution and assume that
f , g
1
, g
2
, . . . , g
m
are differentiable at x. Then x is an optimal solution of NLP if and only if
f ( x) cone
_
g
i
( x) : i J( x)
_
. ()
We illustrate the previous theorem in Figure 7.13. In that example, the tight constraints for x
are g
1
(x) 0 and g
2
(x) 0. We indicate the cone formed by g
1
( x), g
2
( x) (translated to
have x as its origin). In this example, f ( x) is in that cone, hence the feasible solution x is
in fact optimal.
g
1
(x) = 0
g
2
(x) = 0
g2
(
x
)
g
1 (
x
)
cone
g
j
( x) : j J( x)
f( x)
f( x)
x
feasible
region
Figure 7.13: Karush-Kuhn-Tucker Theorem based on gradients
Suppose that in Theorem 56 the function f (x) is the linear function c
T
x, i.e. the NLP is of
7.6. OPTIMALITY CONDITIONS FOR LAGRANGIANS 196
the form (7.3). Then f ( x) = c and the sufciency of Theorem 56 follows immediately from
Proposition 54, i.e. we have shown that if the condition () holds then x is indeed an optimal
solution. The essence of the theorem is to prove the reverse direction, i.e. that x will only be
optimal when () holds. This is when the fact that x is a Slater point comes into play. Observe,
that when f and g
1
, . . . , g
m
are all afne functions, then Theorem 56 becomes the optimality
theorem for linear programs (Theorem 47).
Theorem 47 was a restatement of the Complementary Slackness Theorem 27. Similarly,
we leave it as an exercise to check that condition () can be restated as follows, there exists
y R
m
+
such that the following conditions hold:
f ( x) +
m
i=1
y
i
g
i
( x) = 0 (Dual feasibility, together with y R
m
+
)
y
i
g
i
( x) = 0, i 1, 2, . . . , m (Complementary Slackness)
7.6 Optimality conditions for Lagrangians
Recall the motivation for the derivation of the LP dual based on the desire to derive the best
bounds on the optimal objective value. We can try to apply a similar idea to NLP by taking
nonnegative linear combinations of the constraints. Suppose we want to prove that for a given
real number z, there does not exist a feasible solution x of NLP of objective value at least z.
Then, it sufces to nd nonnegative coefcients y
i
so that every feasible solution x of NLP
satises
f (x) +
m
i=1
y
i
g
i
(x) z.
That is, the function in the left hand side is bounded below by z. To get the strongest result, we
would minimize the LHS over all nonnegative vectors y. Next, we can take z as the optimal
value of the NLP (if it exists). When we do that, to make the LHS equal to the optimal value
of NLP, it is clear that we must set y
i
= 0 whenever g
i
( x) < 0 (i.e., Complementary Slackness
holds).
Let us dene the Lagrangian L : R
n
R
m
R for NLP as
L(x, y) := f (x) +
m
i=1
y
i
g
i
(x).
Note that the Lagrangian encodes all the information about the problem NLP:
Setting all y variables to zero in L(x, y) we obtain f (x). That is,
L(x, 0) = f (x), x.
Setting y to unit vectors and using the above, we can obtain all the constraint functions.
That is, for every i 1, 2, . . . , m,
L(x, e
i
) L(x, 0) = g
i
(x).
Perturbing y along a unit vector, we have the same result for every y:
L(x, y +e
i
) L(x, y) = g
i
(x).
Theorem 57 (Karush-Kuhn-Tucker Theorem based on the Lagrangian). Consider a convex
NLP of the form (7.1) that has a Slater point. Then a feasible solution x is an optimal solution
if and only if there exists y R
m
+
such that the following conditions hold:
L( x, y) L( x, y) L(x, y), x R
n
, y R
m
+
(Saddle-point)
y
j
g
j
( x) = 0, j 1, 2, . . . , m (Complementary Slackness)
We illustrate the Saddle-point condition of the Lagrangian in Figure 7.14. Among all points
(x, y), the point ( x, y) minimizes L(x, y). Among all points ( x, y), the point ( x, y) maximizes
L(x, y).
Now, let us see a proof that if x R
n
satises the conditions of the above theorem for some
y then it is an optimal solution of the NLP in the convex case. First, we prove that x is feasible.
We saw that the Lagrangian encodes all the information on the constraints of NLP and we can
7.6. OPTIMALITY CONDITIONS FOR LAGRANGIANS 198
x R
n
y R
m
( x, y)
(x, y)
( x, y)
L(x, y)
Figure 7.14: Saddle point function
extract this information using unit vectors. So, let us x i 1, 2, . . . , m and apply the rst
inequality in the saddle-point condition using y := y +e
i
:
0 L( x, y +e
i
) L( x, y) = g
i
( x).
Hence, x is feasible in NLP.
Next, we use the second inequality in the saddle-point condition and complementary slack-
ness. We have
L( x, y) L(x, y), x R
n
,
which is equivalent to
f ( x) +
m
i=1
y
i
g
i
( x) f (x) +
m
i=1
y
i
g
i
(x), x R
n
.
Consider all feasible x (that is, g
i
(x) 0 for every i). Since y
i
0 for every i, we have
m
i=1
y
i
g
i
(x) 0. Hence, our conclusion becomes
f ( x) +
m
i=1
y
i
g
i
( x) f (x), x feasible in NLP.
Using the fact that complementary slackness holds at ( x, y), we observe that
m
i=1
y
i
g
i
( x) is
zero and we conclude
f ( x) f (x), x feasible in NLP.
Indeed, this combined with feasibility of x means, x is optimal in NLP.
The last two theorems can be strengthened via replacing the assumption on the existence
of a Slater point by weaker, but more technical assumptions.
Moreover, the last two theorems can be generalized to NLP problems that are not convex.
In the general case, the rst theorem only characterizes local optimality.
For a further introduction into nonlinear optimization, see Peressini et al. [14], Boyd and
Vandenberghe [3]. There are also algorithms designed for linear optimization using insights
gained from developing the theory of nonlinear convex optimization (see Khachiyan [12],
Karmarkar [11] and Ye [23]).
There are historical as well as theoretical connections to game theory. Suppose two play-
ers P and D are playing a two-person zero-sum game. Player Ps strategies are denoted by
a vector x R
n
and player Ds strategies are denoted by a nonnegative vector y R
m
. Play-
ers choose a strategy (without knowing each others choices) and reveal them simultaneously.
Based on the vectors x and y that they reveal, Player P pays Player D [ f (x) +
m
i=1
y
i
g
i
(x)]
dollars (if this quantity is negative, Player D pays the absolute value of it to Player P). Player
Ps problem is
min
xR
n
max
yR
m
+
L(x, y) = min
xR
n
max
yR
m
+
_
f (x) +
m
i=1
y
i
g
i
(x)
_
,
player Ds problem is
max
yR
m
+
min
xR
n
L(x, y) = max
yR
m
+
min
xR
n
_
f (x) +
m
i=1
y
i
g
i
(x)
_
.
Bibliography
[1] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust Optimization. Princeton Series in Applied
Mathematics. Princeton University Press, Princeton, NJ, 2009.
[2] M. J. Best. Portfolio Optimization. Chapman & Hall/CRC Finance Series. CRC Press, Boca
Raton, FL, 2010.
[3] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge,
2004.
[4] V. Chv atal. Linear Programming. A Series of Books in the Mathematical Sciences. W. H.
Freeman and Company, New York, 1983.
[5] W. J. Cook, W. H. Cunningham, W. R. Pulleyblank, and A. Schrijver. Combinatorial Optimiza-
tion. Wiley-Interscience Series in Discrete Mathematics and Optimization. John Wiley & Sons
Inc., New York, 1998.
[6] G. Cornu ejols and R. T ut unc u. Optimization Methods in Finance. Mathematics, Finance and
Risk. Cambridge University Press, Cambridge, 2007.
[7] G. B. Dantzig. Linear Programming and Extensions. Princeton University Press, Princeton, N.J.,
1963.
[8] E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik,
1:269271, 1959.
[9] M. L. Fredman and R. E. Tarjan. Fibonacci heaps and their uses in improved network optimization
algorithms. Journal of the ACM, 34(3):596615, July 1987.
[10] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins Studies in the Mathemat-
ical Sciences. Johns Hopkins University Press, Baltimore, MD, third edition, 1996.
[11] N. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica,
4(4):373395, 1984.
[12] L. G. Khachiyan. A polynomial algorithm in linear programming. Dokl. Akad. Nauk SSSR,
244(5):10931096, 1979.
[13] J. Kleinberg and

E. Tardos. Algorithm Design. Pearson Studium, 2006.
201
BIBLIOGRAPHY 202
[14] A. L. Peressini, F. E. Sullivan, and J. J. Uhl. The Mathematics of Nonlinear Programming.
Undergraduate Texts in Mathematics. Springer-Verlag, New York, 1988.
[15] M. L. Pinedo. Scheduling. Springer, New York, third edition, 2008. Theory, algorithms, and
systems, With 1 CD-ROM (Windows, Macintosh and UNIX).
[16] A. Schrijver. Theory of Linear and Integer Programming. Wiley-Interscience Series in Discrete
Mathematics. John Wiley & Sons Ltd., Chichester, 1986.
[17] M. Sipser. Introduction to the Theory of Computation. Cengage Learning, 2nd edition, 2006.
[18] R. J. Vanderbei. Linear Programming. International Series in Operations Research & Manage-
ment Science, 37. Kluwer Academic Publishers, Boston, MA, second edition, 2001.
[19] V. V. Vazirani. Approximation Algorithms. Springer, 2001.
[20] D. P. Williamson and D. Shmoys. The Design of Approximation Algorithms. Cambridge Univer-
sity Press, 2011.
[21] W. L. Winston. Operations Research, Applications and Algorithms. Thomson Learning, 2004.
[22] L. A. Wolsey. Integer Programming. Wiley-Interscience Series in Discrete Mathematics and
Optimization. John Wiley & Sons Inc., New York, 1998.
[23] Y. Ye. Interior Point Algorithms. Wiley-Interscience Series in Discrete Mathematics and Opti-
mization. John Wiley & Sons Inc., New York, 1997.
Index
O(), 73
(), 73
st-ows, 110
st-path, 109
active constraints, 168
afne function, 181
assignment problem, 14
auxiliary
linear program, 61, 62
variables, 61
basic feasible solution, 45
basic solution, 45
basis, 44
feasible, 45
bipartition, 142
branch & bound, 155
branch & bound nodes, 157
branch & bound tree, 157, 159
branch and cut, 134
branching on a variable, 157
canonical form, 47
capacity of the st-cut, 111
Carath eodorys Theorem, 178
certicate of infeasibility, 33
certicate of optimality, 36
certicate of unboundedness, 37
CG-cut, 151
Chv atal-Gomory cut, 151
clause, 78
complementary slackness conditions, 120
complementary slackness conditions, general
LP, 122
Complementary Slackness Theorem, 120
cone, 175
cone of tight constraints, 175
convex set, 166
cutting plane, 149
decision problem, 78
decient set, 143
depth-rst search, 160
differentiability, 194
dual, 88, 89
dual Simplex algorithm, 161
edge, 16
incident, 16
ellipsoid method, 77
epigraph of a function, 185
equivalent linear programs, 39
exponential-time algorithm, 75
extreme point, 167
Farkas Lemma, 33, 178
feasible region, 164
Fermats Last Theorem, 182
formula, 78
free variable, 70
Fundamental Theorem of LP, 30, 65
Fundamental Theorem of LP (SEF), 65
gradient, 194
graph, 16
bipartite, 16
halfspace, 165
Halls Theorem, 143
Hellys Theorem, 178
203
INDEX 204
hyperplane, 164
Hyperplane Separation Theorem, 178
input size, 73
input-output systems, 130
instance, 73
Karush-Kuhn-Tucker Theorem, 195, 197
Lagrangian, 197
level set, 187
lexicographic rule, 70
linear program
feasible, 29
infeasible, 29
unbounded, 29
linear programming, 7
literal, 78
Markowitz model, 28
matching, 16, 104, 141
maximum weight matching problem, 17
optimality conditions, 125
perfect, 141
matching problem, 14
Max Flow- Min-Cut Theorem, 111
maximum st-ow, 110
neighbors, 142
network ow, 107
nonlinear program (NLP), 22
optimal solution, 10
pivot rules, 70
polyhedron, polyhedra, 166
polynomial-time algorithm, 75
polynomially-reducible, 79
portfolio optimization, 24, 28
primal, 88
pruning, 159
reducible, 59
robust optimization, 28
shadow prices, 130
Simplex algorithm, 57
Slater point, 195
smallest subscript rule, 58, 65
solution
feasible, 29
optimal, 29
standard equality form (SEF), 38
Strong Duality Theorem, 116, 118, 178
subgradient, 188
supporting halfspace, 189
tight constraint, 168
value, objective function value, 29
variable
basic, 44
free, 39
indicator variable, 15
non-basic, 44
slack, 41
vertex, 16
Weak Duality Theorem, 103
Weak Duality Theorem, 90

CO250 Web

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

CO250 Web

Caricato da

Copyright:

Formati disponibili

Introduction to Optimization

Course Notes for CO 250/CM 340

is decreased to 0 then we can select to leave

for it. Your boss now naturally asks you Is x

optimal? Suppose that

is a feasible solution of (P), y

is a feasible solution to (D), and c

is an optimal solution of the primal, and y

is an optimal solution of the dual.

for any feasible solution x of (P) and since x

achieves this bound, x

is optimal. The same argument applies to y

. The resulting LP is,

is a feasible solution of (P), y

is a feasible solution to (D), and the value of x

is an optimal solution of the primal, and y

for any feasible

achieves this bound, x

is optimal. The same argument applies

, and its weight is 4 +5 +7 = 16. This

depicted in Figure 4.1. Thus, it follows from

are either 0 or 1. Let C

be the set of edges e for

contains at least one edge from P

contains at least one

is a set which intersects every st-path

from the graph disconnects s from t.

. By Corollary 21 we obtain immediately, that the maximum

= sv, uv, ut, and hence

= (10, 0, 20, 20) which can be easily veried to satisfy the

is the incidence vector of C

are optimal solutions

is an optimal solution for the transmission

be any st-cut of G. Dene y

is clearly feasible for (4.26).

is an upper bound for (4.25).

is an optimal solution to (P), can we nd a solution y

to (D) whose value is

? In other words, given an optimal solution to a linear program, is there a

is an optimal solution to (P). Then there exists an optimal solution y

be the basic solution for B, i.e.

is feasible for (D). Since c

we have proved Theorem 25. Well,

be a feasible solution to (P) and

be a feasible solution to (D). Then the following statements are equivalent,

is optimal for (P) and y

is optimal for (D);

we cannot deduce that x

is not optimal for (P). Nor can we

is not optimal for (D). We can only conclude that x

is feasible for (4.36). The complementary slackness conditions are:

is optimal for (4.35) and y

is optimal for (4.36).

where for every e E,

is a feasible solution to the primal (4.37). We dened y

and y are a pair of feasible

and y satisfy the

of shaded tasks in the gure corresponds to a feasible solution x

implies that j is an equality task,

to (5.2) such that

are feasible for (5.1) and (5.2), respectively, and if they

are primal, and dual optimal solutions. Let us rephrase

is some feasible dual, and P