Sei sulla pagina 1di 32

Computers & Operations Research 35 (2008) 35303561

www.elsevier.com/locate/cor
Dynamic modeling and control of supply chain systems: A review
Haralambos Sarimveis
a,
, Panagiotis Patrinos
a
, Chris D. Tarantilis
b
, Chris T. Kiranoudis
a
a
School of Chemical Engineering, National Technical University of Athens, 9 Heroon Polytechniou str. Zografou Campus, 15780 Athens, Greece
b
Department of Management Science and Technology, Athens University of Economics and Business,
47A Evelpidon Street/33 Lefkados Street, Athens 113-62, Greece
Available online 7 February 2007
Abstract
Supply chains are complicated dynamical systems triggered by customer demands. Proper selection of equipment, machinery,
buildings and transportation eets is a key component for the success of such systems. However, efciency of supply chains mostly
depends on management decisions, which are often based on intuition and experience. Due to the increasing complexity of supply
chain systems (which is the result of changes in customer preferences, the globalization of the economy and the stringy competition
among companies), these decisions are often far from optimum. Another factor that causes difculties in decision making is that
different stages in supply chains are often supervised by different groups of people with different managing philosophies. From the
early1950s it became evident that a rigorous frameworkfor analyzingthe dynamics of supplychains andtakingproper decisions could
improve substantially the performance of the systems. Due to the resemblance of supply chains to engineering dynamical systems,
control theory has provided a solid background for building such a framework. During the last half century many mathematical tools
emerging from the control literature have been applied to the supply chain management problem. These tools vary from classical
transfer function analysis to highly sophisticated control methodologies, such as model predictive control (MPC) and neuro-dynamic
programming. The aim of this paper is to provide a review of this effort. The reader will nd representative references of many
alternative control philosophies and identify the advantages, weaknesses and complexities of each one. The bottom line of this
review is that a joint co-operation between control experts and supply chain managers has the potential to introduce more realism
to the dynamical models and develop improved supply chain management policies.
2007 Elsevier Ltd. All rights reserved.
Keywords: Supply chain management; Control; Dynamic modeling; Review; Dynamic programming; Model predictive control
1. Introduction
A supply chain is a network of facilities and distribution entities (suppliers, manufacturers, distributors, retailers)
that performs the functions of procurement of raw materials, transformation of raw materials into intermediate and
nished products and distribution of nished products to customers. A supply chain is typically characterized by a
forward ow of materials and a backward ow of information. Recently, enterprises have shown a growing interest for
efcient supply chain management. This is due to the rising cost of manufacturing and transportation, the globalization
of market economies and the customer demand for diverse products of short life cycles, which are all factors that

Corresponding author. Tel.: +30 210 7723237; fax: +30 210 7723138.
E-mail address: hsarimv@chemeng.ntua.gr (H. Sarimveis).
0305-0548/$ - see front matter 2007 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cor.2007.01.017
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3531
increase competition among companies. Efcient supply chain management can lead to lower production cost, in-
ventory cost and transportation cost and improved customer service throughout all the stages that are involved in the
chain.
Various alternative methods have been proposed for modeling supply chains. According to Beamon [1], they can be
grouped into four categories: deterministic models where all the parameters are known, stochastic models where at least
one parameter is unknown but follows a probabilistic distribution, economic game-theoretic models and models based
on simulation, which evaluate the performance of various supply chain strategies. The majority of these models are
steady-state models based on average performance or steady-state conditions. However, static models are insufcient
when dealing with the dynamic characteristics of the supply chain system, which are due to demand uctuations,
lead-time delays, sales forecasting, etc. In particular, they are not able to describe, analyze and nd remedies for a
major problem in supply chains, which recently became known as the bullwhip effect.
The bullwhip phenomenon is the amplication of demand variability as we move from a downstream level to an
upstream level in a supply chain. Lee et al. [2] identied four major causes of the bullwhip effect:
1. Demand forecasting which often is performed independently by each element in the supply chain based on its
immediate customers.
2. Batching of orders to reduce processing and transportations costs.
3. Price uctuations due to special promotions like price discounts and quantity discounts.
4. Supply shortages, which lead to articial demands.
Two recent publications provide excellent reviews on the subject of the bullwhip effect. They both report additional
causes of this undesired behavior and pinpoint methods for eliminating the problem. In particular, according to
Miragliotta [3] there are conicting approaches to describe and analyze the bullwhip phenomenon between academi-
cians and managers. A new taxonomy was proposed, which shares the scientic rigor of the former with the practical
attitude of the latter. According to Geary et al. [4] human factors, such as ignorance, arrogance and indifference are
contributing to the bullwhip effect, but proper re-engineering of the supply chain (such as smooth production strategies)
can eliminate the causes of this undesired phenomenon.
Fromthe above discussion, it is clear that consideration of the dynamic characteristics offers a competitive advantage
in modeling supply chain systems. It is not surprising that dynamic analysis and design of supply chain systems as a
whole has attracted a lot of attention, both from the academia and the industry. A recent review paper [5] focused on the
alternative approaches that have been proposed for modeling the dynamics of supply chains, which were categorized
as follows: continuous-time differential equation models, discrete-time difference models, discrete event models and
classical operational research methods.
Control theory provides sufcient mathematical tools to analyze, design and simulate supply chain management
systems, based on dynamic models. In particular, control theory can be used to study and nd solutions to the bullwhip
phenomenon. The aim of this paper is to review the research efforts of the last half century regarding the application of
control theory to the supply chain management problem. We believe that this work will help researchers and practitioners
that would like to get involved in this exciting scientic area, to gain knowledge about the major developments that
have emerged throughout the years and get informed about the state-of-the-art methods of today. We should note
that excellent reviews on the application of control theory to the production and inventory control problem have been
published previously [68]. However, the rst two review papers [6,7] are almost two decades old, so they do not
cover the major advances that have taken place since then. The paper of Ortega and Lin [8] is recent, but it focuses
on classical control methodologies. The present work can be considered as a complement to the paper of Ortega and
Lin, since it presents an extensive review of the application of advanced control methodologies to the production and
inventory control problem. For the sake of completeness, a major section of this paper is devoted to classical control
applications. The classical control section is updated by the recent advances that have emerged over the last few years.
The rest of the review paper is synthesized as follows: the next section contains applications of classical control
to the supply chain modeling problem, where most of the analysis concerns linear systems and is performed in the
frequency domain. In particular, Laplace transfer functions and z transfer functions are used to model the dynamics of
continuous and discrete linear systems in the frequency domain. Standards analysis tools, such as the Bode and Nyquist
plots, the Routh and Hurwitz stability criteria and transient responses are used to analyze and evaluate the alternative
designs.
3532 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
Section 3 is devoted to the application of advanced control theory, where the system dynamics are examined in the
time domain and are described by state space models. Advanced control methodologies are basically optimal control
methods aiming at the optimization of an objective function that describes the performance of the system. Dynamic
programming and HamiltonJacobiBellman (HJB) equations [9,10] are prevalent in optimal control theory. Another
major issue that is often taken into account in advanced control theory is the presence of uncertainties, which complicates
the process of making effective decisions regarding production, storage and distribution of products. Uncertainties are
involved in future demand prediction, lead time estimation, estimation of failure probabilities, etc. Many problems
in supply chain theory can be cast as stochastic optimal control problems. Therefore, much of the literature that
considers supply chain networks from a system theoretic point of view is largely based on optimal control and dynamic
programming.
Due to the curse of dimensionality, many models that are based on dynamic programming and optimal control cannot
be solved analytically. Eventually one must resort to some kind of approximation. Model predictive control (MPC)
[11] and the rolling horizon concept is a viable approach to cope with intractability in optimal closed-loop feedback
control design. Its main idea is to solve on-line a nite-horizon open-loop optimal control problem considering the
current state as the initial state for the problem. The problem is formulated and solved at each discrete-time instance.
MPC techniques have recently been applied in supply chain problems and are reviewed in Section 4.
Another way of dealing with uncertainty is to model it as a deterministic uncertain-but-bounded quantity. In this
case, no information regarding probability information of the disturbances is required. For example, future demand can
be bounded between lower and upper limits, without needing to dene the likelihood of occurrence of each possible
event within these limits. Systems in which disturbances are described as uncertain-but-bounded quantities are the
main concern of robust control. Specically, robust optimal control [12] seeks a feedback controller that minimizes the
worst-case value of a cost criterion over all possible realizations of the uncertain parameters. Furthermore, constraints
regarding the operation of the system must be fullled for every possible value of the uncertain parameters. Articles
describing applications of robust control theory to supply chain management problems are reviewed in Section 5.
In Section 6 we review alternative methods that have been proposed to combat the curse of dimensionality in
dynamic programming and the lack of an accurate model for the stochastic system under investigation. These methods
are usually based on some formof approximation of the value function combined with simulations. They grewout of the
articial intelligence community and they are usually referred to as reinforcement learning techniques, neuro-dynamic
programming or approximate dynamic programming [13].
The paper ends with the concluding remarks and some suggestions for further research.
2. Classical control theory
The utilization of classical control techniques in the supply chain management problemcan be traced back to the early
1950s when Simon [14] applied servomechanism continuous-time theory to manipulate the production rate in a simple
system involving just a single product. The idea was extended to discrete-time models by Vassian [15] who proposed
an inventory control framework based on the z-transform methodology. A breakthrough, however, was experienced
in the late 1950s by the so-called industrial dynamics methodology, which was introduced by the pioneering work
of Forrester [16,17]. The methodology, later referred to as system dynamics used a feedback perspective to model,
analyze and improve dynamic systems, including the production-inventory system. The scope of the methodology was
later broadened to cover complex systems from various disciplines such as social systems, corporate planning and
policy design, public management and policy, micro- and macro-economic dynamics, educational problems, biological
and medical modeling, energy and the environment, theory development in the natural and social sciences, dynamic
decision-making research, strategic planning and more [18]. The book written recently by Sterman [19] is an excellent
source of information on the system dynamics philosophy and its various applications and includes special chapters
on the supply chain management problem.
Forresters work was appreciated for providing powerful tools to model and simulate complex dynamical phenomena
including nonlinear control laws. However, the industrial dynamics methodology was criticized for not containing
sufcient analytical support [20] and for not providing guidelines to the systems engineers on how to improve per-
formance [21]. Motivated by the need to develop a new framework that could be used as a base for seeking new
novel control laws and/or new feedback paths in production/inventory systems, Towill [21] presented the inventory
and order based production control system (IOBPCS) in a block diagram form, extending the work of Coyle [22].
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3533
G
a
(s/z)
-
+
+
+
+ G
p
(s/z)
-
+
- +
e
lT
m
s
/ z
l
G
w
(s/z)
-
+
G
i
(s/z)
G
d
(s/z)
k
Demand Policy
Pipeline Policy
Inventory Policy
CONS
AINV
Lead Time
Target Stock Setting
AVCON
ORATE COMRATE
DINV
DWIP
EWIP
AWIP
EINV
Fig. 1. The family of IOBPCS models.
It was considered that the system deals with aggregate product levels or alternatively it reects a single product. The
system was subject to many modications and improvements in subsequent years including extensions to discrete-time
systems, thus leading to the IOBPCS family that is presented in a block diagram form in Fig. 1. Standard nomenclature
used in industrial dynamics is adopted to represent input, output and intermediate signals in the block diagram:
AINV actual inventory holding
AVCON average consumption
AWIP actual WIP holding
COMRATE completion rate
CONS consumption or market demand
DINV desired inventory level
DWIP desired work in progress
EINV error in inventory holding
EWIP error in work in progress
ORATE order rate
Using control terminology, actual inventory level (AINV) is the controlled variables, while market demand (CONS)
is a disturbance and ORATE (ORATE) is the manipulated variable. The two integrators are used to accumulate the
inventory and work in process (WIP) decits over time.
Each member of the IOBPCS family is constructed by dening some or all of the following ve components [23]:
The lead time, which represents the time between placing an order and receiving the goods into inventory. In
manufacturing sites, lead time incorporates production delays. Alternatively, this component can be interpreted as a
production smoothing element, representing how slowly the production unit adapts to changes in ORATE [24].
The target stock setting, which can be either xed or a multiple of current average sales rates.
The demand policy, which in essence is a forecasting mechanism that averages the current market demand. The
demand policy is a feed-forward loop within the replenishment policy.
The inventory policy, which is a feedback loop that controls the rate at which inventory decit (difference between
desired stock setting and AINV) is recovered.
3534 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
The pipeline policy, which is a feedback loop that determines the rate at which WIP decit (difference between
desired WIP level and actual WIP level) is recovered.
The lead time is a characteristic of the system to be controlled. Although, the designer of the control system cannot
manipulate the lead time, it is important to model the delay in the best possible way. A generic lead time model with
two parameters was proposed by Winker [25] in the continuous-time formulation but can be easily extended to the
discrete-time formulation:
G
p
(s) =
1
((T
p
/n)s + 1)
n
. (1)
Three are the common choices for the parameter n:
n = 1: First order delay.
n = 3: Third order delay.
n 8: Pure (innite order) delay.
For the rst two choices T
p
is the average lead time of the unit, while for the last choice T
p
is the xed lead time. For
the last choice of the parameter n, the transfer functions G
p
(s/z) can be written as follows:
G
p
(s) = e
T
p
s
, G
p
(z) = z
q
, (2)
where T
p
= qT
m
and T
m
is the sampling interval in the discrete-time case.
The designer has to decide on how the target stock will be set (xed value or multiple of average sales) and select
the three policies (demand policy, inventory policy and pipeline policy), in order to optimize the system with respect
to the following performance objectives:
(a) Inventory level recovery.
(b) Attenuation of demand rate uctuations on the ordering rate.
The second objective aims at the reduction of the bullwhip effect. The term bullwhip was only recently introduced
as mentioned in the introduction, but the phenomenon where a small random variation in sales at the marketplace
is amplied at each level in the supply chain was already identied by the pioneering work of Forester in industrial
dynamics [17]. This was later postulated by Burbidge under the Law of Industrial Dynamics [26]. The utilization of
control engineering principles in tackling the problemby providing supply chain dynamic modeling and re-engineering
methodologies was soon recognized as reported by Towill [27].
The two performance objectives are conicting. Thus, for each particular supply chain, the control system designer
seeks for the best inventory level and ordering rate trade-off. A qualitative look at the two extremes scenarios (perfect
satisfaction of each one of the two objectives) clearly shows that a compromise is needed to arrive at a well designed
control system. If a xed ordering rate is used then large inventory deviations are observed, since inventory levels
follow any demand variation. This policy (known as Lean Production in manufacturing cites) obviously results in large
inventory costs. On the other hand a xed inventory level (known as Agile Production in manufacturing cites [28])
results in highly variable production schedules and hence, large production costs.
Standard control metrics are used in the literature to quantify the performance of alternative control policies with
respect to the aforementioned objectives. Regarding the rst objective, the dynamic behavior of the system when a step
input is introduced to the demand rate is studied. The inventory response in then evaluated with respect to performance
criteria such as the rise time, the settling time and the maximum overshoot to name a few. Another useful metric
to quantify inventory recovery is the integral of time absolute error (ITAE) criterion. Frequency response tests are
typically used to evaluate the performance of the system with respect to the second objective. For a particular transfer
function G(s), frequency response (Bode) plots draw the magnitude and the phase angle of the complex number G(jc)
as a function of c. The frequency response plots provide valuable information, since when a sinusoidal input is presented
to the system, the output is a sine wave of the same frequency c. The ratio of the amplitude of the output signal over
the amplitude of the input signal (amplitude ratio, AR) is equal to the magnitude of G(jc), while the phase shift is
equal to the angle of G(jc). Based on the frequency responses, the noise bandwidth metric can be easily computed to
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3535
Table 1
Demand, inventory and pipeline policies for several models in the IOBPCS family
Model Target stock setting Demand policy Inventory policy Pipeline policy
IBPCS Inventory based production
control system
Constant G
a
(s) = 0
G
a
(z) = 0
G
a
(s) =
1
T
i
G
a
(z) =
1
T
i
G
w
(s) = 0
G
w
(z) = 0
IOBPCS Inventory and order based
production control system
Constant
G
a
(s) =
1
T
a
s+1
G
a
(z) =
a
1(1a)z
1
G
a
(s) =
1
T
i
G
a
(z) =
1
T
i
G
w
(s) = 0
G
w
(z) = 0
VIOBPCS Variable inventory and or-
der based production control
system
Multiple of average
market demand
G
a
(s) =
1
T
a
s+1
G
a
(z) =
a
1(1a)z
1
G
a
(s) =
1
T
i
G
a
(z) =
1
T
i
G
w
(s) = 0
G
w
(z) = 0
APIOBPCS Automatic pipeline, inven-
tory and order based produc-
tion control system
Constant G
a
(s) =
1
T
a
s+1
G
a
(z) =
a
1(1a)z
1
G
a
(s) =
1
T
i
G
a
(z) =
1
T
i
G
w
(s) =
1
T
w
, G
d
(s) = T
p
G
w
(z) =
1
T
w
, G
d
(z) = T
p
APVIOBPCS Automatic pipeline, variable
inventory and order based
production control system
Multiple of average
market demand
G
a
(s) =
1
T
a
s+1
G
a
(z) =
a
1(1a)z
1
G
a
(s) =
1
T
i
G
a
(z) =
1
T
i
G
w
(s) =
1
T
w
, G
d
(s) = T
p
G
w
(z) =
1
T
w
, G
d
(z) = T
p
quantify the noise amplication (bullwhip effect). This is dened as the area under the squared frequency response of
the system. Disney and Towill [29] showed that the noise bandwidth metric divided by is equivalent to the variation
ratio measure (variance of ordering rate over the variance of the demand rate), which was proposed by Chen et al. [30]
to quantify the bullwhip effect.
It can be shown mathematically and experimentally that using the feed-forward component (demand policy), we
can achieve zero steady-state error between actual and desired inventory level when a step change is introduced in the
consumption rate, even if no integral action is included in the inventory policy. However, if market demand is used in
the forward component without any form of averaging, i.e. if we set G
a
(s/z) = 1, excessive uctuations are observed
in the production completion/order rates, thus failing to satisfy the second performance objective. This is alleviated by
utilizing an average measure of current market demand.
The inventory policy denes the rate at which inventory decits are recovered by manipulating the ORATE. The
inventory policy should take into account the dynamics of the system and mainly the lead time. A decision made at a
given time instance on the ORATE, will result in an actual modication of the inventory level only after a time period
has passed, which is equal to the lead time. An inventory policy that aims at recovering all the inventory decit in a
single time period, will result in a signicant excess WIP on the shop oor and eventually in an oscillatory behavior,
as far as both the completion rate (COMRATE) and the inventory level are concerned. The consequences of such a
dynamic behavior are: higher handling/production costs (since ORATE is not smooth), higher inventory costs (when
there is a surplus of inventory) and poor customer service (when actual inventory is below the target value). Moreover,
a higher capacity is required in both the production and storage facilities. Therefore, only a fraction of the inventory
discrepancy should be recovered by the inventory policy.
The pipeline policy is a correction mechanism which uses information that is not included in the AINV. In essence,
the WIP signal cancels out the inventory signal and increases the contribution of AVCON in reaching a steady state.
The WIP decit is formulated by subtracting the actual WIP signal from the desired WIP level, which is produced
based on the average measure of current market demand. The pipeline policy aims at the reduction of this discrepancy
and is the third element (along with the demand policy and the inventory policy) that is used in the construction of
ORATE. Compared to the inventory policy, the pipeline policy identies faster the need to increase or reduce ORATE,
especially when sudden changes are observed in the market demand. In general, the inclusion of the WIP control loop
reduces the rise time and increases the percentage overshoot of ORATE, but increases the time required to reach the
steady state.
A number of models belonging to the IOBPCS family are presented in Table 1, along with the respective denitions
of the four components that the designer can manipulate. When a rst order lag is used as the demand smoothing policy,
3536 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
the link between the tuning parameter T
a
in the s-domain and the parameter a in the z-domain can be approximated as
follows:
a =
1
(1 + (T
a
/T
m
))
, (3)
where T
m
is the sampling interval as indicated above.
The IOBPCS model was the rst to be studied extensively in its continuous-time format by Towill [21]. FromTable 1,
we can observe that the WIP feedback loop is not considered in the IOBPCS model. Moreover, the target inventory level
is xed as it is not inuenced by modication/uctuations in customer demands. The completed production (lead time)
is a delayed version of the ORATE and is modeled by a rst order lag (time constant T
p
), while the demand-averaging
process (demand policy) is also represented by a rst order lag (time constant T
a
). The ordered production (inventory
policy) is computed as the summation of the average consumption and a fraction (1/T
i
) of the inventory decit. The
transfer functions between the variables COMRATE, AINV and the disturbance CONS are given below:
AINV
CONS
= T
i
T
a
T
p
s
2
+ (T
a
+ T
p
)s
(T
a
s + 1)(T
i
T
p
s
2
+ T
i
s + 1)
, (4)
COMRATE
CONS
=
(T
a
+ T
i
)s + 1
(T
a
s + 1)(T
i
T
p
s
2
+ T
i
s + 1)
. (5)
The characteristic equation is common for the two transfer functions. It is a third order polynomial, dened as the
product of a rst order term and a quadratic term. Since all coefcients of both terms are positive, the transfer functions
are stable for all non-zero choices of the tuning parameters. It is important to note that the tuning parameter of the feed-
forward component T
a
is not involved in the quadratic term and thus it does not affect the generation of any oscillatory
behavior. Regarding the rst performance objective, selection of the parameter T
i
in the range [T
p
2T
p
] leads to well
designed second order systems, since this corresponds to damping ratios 0.50.707. Based on the calculation of the
noise bandwidth of transfer function (5) using the assumption that the disturbance signal is white noise, it was found
that both the feed-forward and feedback signals are contributing to the attenuation of consumption rate uctuations, by
increasing the values of the tuning parameters T
a
, T
i
. However, higher values of T
i
are more effective in disturbance
attenuation. In order to compromise between inventory recovery and random disturbance rejection, transformation of
the transfer function (5) to a third order coefcient plane model proved extremely useful. Based on this analysis, it was
found that both objectives are met successfully, by selecting the time-to-adjust inventory and demand averaging time
of comparable magnitude to the production delay time.
The IOBPCS was also studied by Agrell and Wikner [31], using a multi-criteria decision-making (MCDM) approach.
They used the model in a case study, where the proposed generic MCDM design method for dynamical systems was
applied, by considering the response speed and response smoothness as different objectives to be optimized. More
specically three criteria were utilized, namely the rise time of COMRATE, the overshoot of COMRATE and the
undershoot of the inventory level (AINV) following a step change in the market demand (CONS). Several sets of the
two tuning parameters T
a
, T
i
were obtained, according to the importance that is given to the three conicting criteria.
The only difference of the variable inventory and order based production control system (VIOBPCS) compared to
the IOBPCS model is that instead of using a xed desired inventory level, DINVis set as a multiple k of current average
sales rates (AVCON). This way the target inventory stock is reduced in a falling market and conversely it is increased
in a rising market. The two models were compared by frequency analysis tools [32]. It was found that for a reasonable
choice of the tuning parameters, the IOBPCS model shows lower ARs and larger phase shifts. These observations in
the frequency domain are translated to lower production capacity requirements, lower risks for stock-outs, but at the
same time slower responses of the IOBPCS model compared to the VIOBPCS model.
The automatic pipeline inventory and order based production control system(APIOBPCS) model [33] uses a constant
inventory level set point and utilizes all three control policies (demand, inventory, pipeline policies) to determine
ORATE. Compared to the IOBPCS model, the addition of the WIP controller allows us to decouple the damping
ratio from the natural frequency. This in turn leads to more successful results, including a more effective ltering of
high frequency noise which is present in customer demand signals, although at the expense of a slight increase in
COMRATE. In the APIOBPCS model, the designer has to choose values for three tuning parameters, namely T
a
, T
i
,
T
w
. Table 1 indicates that the target WIP level is formulated by multiplying the average sales with T
p
, which is the time
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3537
constant associated with the rst order time lag representing the lead time. Pure time delays in the WIP feedback loop
are usually due to inaccuracies in the recording of WIP on the real world shop oor. Disney et al. [34] selected the tuning
parameters, so that robustness to production lead times and pipeline level information delity and system selectivity
are achieved. More specically, a single objective function was formulated describing the performance of the system
in terms of the aforementioned criteria for various lead times, orders of approximations of the production delays and
time delays in the WIP feedback loop. The optimization problem was solved using a standard genetic algorithm and
led to similar values for the tuning parameters with those derived by more conventional techniques [33]. The procedure
was repeated several times by giving more emphasis on the inventory recovery as opposed to the capacity costs and
vice versa. The results showed that when attenuation of demand uctuations becomes more important, larger values of
the T
i
and T
w
tuning parameters are obtained, meaning that the inventory and most importantly the WIP information
become negligible. The opposite happens when more emphasis is given on the inventory recovery objective.
Disney and Towill [29] studied a special case of the discrete-time APIOBPCS model, where T
i
is set equal to T
w
.
It was named DE-APIOBPCS model due to Deziel and Eilon, who rst studied this case [35]. This particular choice of
the tuning parameters simplies considerably the model since the lead time cancels out of the ORATE/CONS transfer
function. Moreover, the system is guaranteed to be stable and is robust to a number of nonlinear effects. It was found
that the bullwhip effect can be reduced by increasing the average age of forecast and by reducing lead times.
Riddalls and Benett presented a modied version of the APIOBPCS model in the continuous-time domain [36],
where instead of the standard exponential smoothing forecasting mechanism, a moving average forecasting approach
was employed. Innite order (pure delay) was used to model the lead time component. The authors of the paper obtained
stability criteria for the APIOBPCS model by recasting it into a Smith predictor and using the Bellman and Cookes
theorem [37]. They showed that if the ratio T
i
/T
w
is greater than 0.5 the model is stable independent of the delay.
In the opposite case, stability is achieved only if the lead time is below an upper bound which is dened as a function
of T
i
and T
w
. The stability boundary was later corrected by Warburton et al. [38], who veried their results using a
second order Pad approximation.
Based on the APIOBPCS model, Zhou et al. presented a hybrid system containing both manufacturing and reman-
ufacturing [39]. Remanufacturing of products has received a lot of academic and industrial interest during the last
years, because of the considerable savings it offers to companies and the signicant environmental benets that help in
meeting the strict environmental legislations. Zhou et al. [39] studied the effect of including a remanufacturing process
in the dynamics of the system. The manufacturing process was modeled by a typical continuous-time APIOBPCS
model. In particular, the lead time was modeled by a rst order delay and the standard exponential smoothing was used
as the forecasting mechanism. In the remanufacturing loop, a kanban policy [40] was employed to represent a pull
system. The kanban policy is designed specically to replenish inventory in just-in-time manufacturing. It was shown
that the kanban policy can be modeled as an inventory based production control system (IBPCS) (see Table 1). It was
assumed that the products produced by the remanufacturing process are as good as new, so that a common nished
goods inventory is used to store the production of both processes. The common inventory level is the trigger for the
manufacturing and remanufacturing processes with a predened routing probability, which for the remanufacturing
process is equal to the return yield. The performance of the system was analyzed in terms of a step change in the
customer demand with respect to all the tuning parameters that now contain the ones of the remanufacturing loop.
It was found that good settings determined in previous studies for the single loop APIOBPCS, provide satisfactory
transient responses. The most interesting result, however, was that by including the remanufacturing process, we can
achieve faster responses to market demands and lower risk of stockout and over-ordering. The hybrid system was
proven robust to changes in the return yield and the lead times of the two processes. However, the performance of the
system was investigated only with respect to the rst objective stated in the beginning of this section. It was not tested
to other than step input changes, such as random inputs which are closer to real world applications.
Dejonckheere et al. studied the ARof the ORATE/CONS transfer function in single stage discrete-time systems [41].
They showed that using order-up-to replenishment policies, AR is high in all frequencies regardless of the demand
forecasting mechanism and this leads to the generation of the bullwhip effect. More specically, in any order-up-to
policy, ordering decision is as follows:
ORATE(t ) =
_
S(t ) (AINV(t ) + AWIP(t )), if AINV(t ) + AWIP(t ) <S(t ),
0, if AINV(t ) + AWIP(t )S(t ),
(6)
3538 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
where S(t ) is the time varying order-up-to level and the summation in the parenthesis (AINV and actual WIP) is the
so-called inventory position. Various order-up-to policies differ on the way the order-up-to level is updated with time.
When this level is estimated using an exponential smoothing demand forecast, the AR of the ORATE/CONS transfer
function is greater than 1 in all frequencies. This means that any demand pattern will be amplied. The bullwhip effect
increases as the parameter T
a
decreases. When a moving average demand forecast is used, S(t ) is calculated as follows:
S(t ) =

L1
i=0
CONS(t i)
L
. (7)
For this type of forecasting mechanism, the AR plot has a sinusoidal shape. For some frequencies the AR is below 1,
while for some others it is greater than 1. Obviously the bullwhip generated by this type of forecasting mechanism is
much less compared to the one generated when exponential forecasts are used. Finally a demand signal processing
[42] was studied where the order-up-to level is updated as follows:
S(t ) = S(t 1) + (CONS(t ) CONS(t 1)). (8)
The frequency response plot for this policy showed that the AR is greater than 1 in all frequencies but increases
proportionally with frequency, meaning that high frequency noise signals are highly amplied. It was shown that the
order-up-to policy is a special case of the automatic pipeline variable inventory and order based production control
system (APVIOBPCS) model, where the tuning parameters are selected equal to 1. With proper tuning of the two
parameters involved in the APVIOBPCS model, a desired frequency response plot can be obtained, where the AR is
greater than 1 only in small frequencies. For the remaining of the frequency spectrum, which represents undesired
noisy signals, the AR is smaller than one, thus decreasing the bullwhip effect.
The results of the above paper were extended by Dejonckheere et al. [43] to the case of a centralized supply chain,
where customer demand data is shared throughout the chain. It was found that for order-up-to level replenishment rules,
the bullwhip effect is reduced, but not completely eliminated. This nding agreed with the results reported by Chen et
al. [30]. On the contrary, the APVIOBPCS model with a proper selection of the design parameters is able to reduce
variance of demand and have a smoothing or dampening impact.
Lalwani et al. [44] presented discrete-time state space representations (matrices A, B, C, D) for several models in the
IOBPCS family. This allows the analysis of IOBPCS models and the development of control strategies using advanced
control methods that are presented in subsequent sections. The state space models were derived by rst transforming the
discrete-time transfer functions into the control canonical form [45], which is a block diagram involving z only as the
delay operator z
1
. Particular emphasis was given to the state space representation of the APVIOBPCS model, which
was checked for stability based on the eigenvalues of the A matrix. The results matched those obtained by applying
the Routh criterion, after the Tustin transformation was applied to map the z-domain into the w-domain [46]. The state
space model also passed the controllability and observability tests. This is, however, expected, since the model was
derived from the discrete transfer function, which contains only the controllable and observable part of the system.
Several other modications of the different components constituting the IOBPCS family of models were proposed
by various researchers. White [47] showed that a more sophisticated inventory control policy, such as utilization of
a proportional-integral-derivative (PID) inventory controller can reduce stock levels by 80% and hence reduce cost.
It is important to note that if the PID approach is adopted, the feedforward forecasting unit is no longer necessary
to eliminate the discrepancy between DINV and AINV. This is now accomplished by the integral action offered by
the I element of the controller. However, the PID approach has not received much attention in the literature, probably
because it does not correspond to what is actually performed in real productioninventory systems, where forecast is
presented explicitly. Moreover, the addition of the integral and derivative elements complicates the tuning effort, since
two more tuning parameters are included in the model. A PID approach was also presented in Wikner et al. [24].
Both the APIOBPCS and APVIOBPCS yield successful results, based on the assumption that the lead time is
estimated with accuracy. Otherwise, zero inventory offset cannot be achieved. However, this assumption is unrealistic
in many situations. There are several sources of uncertainty involved in the lead time estimation, especially when the
model describes the dynamics in a manufacturing site [48]. Examples are: lack of raw materials, inconsistencies in
the human decision-making process, variations in shop-oor lead time due to the large number of products owing
through the shop oor, etc. In order to remove this assumption from the APIOBPCS model, an additional feedback
loop was proposed in the continuous-time format of the model [48,49]. The additional lead-time loop is nonlinear and
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3539
time-varying and is used to provide updated estimates of current lead time, which in turn updates the desired level of
WIP, DWIP. It was shown that signicant advantage is gained by adapting the system to lead-time changes. Moreover,
it was shown that by including an integral element in the inventory policy, we can avoid long-term stock drifts and
improve the system performance during a lead-time increase.
As far as the demandpolicyis concerned, except fromthe standardexponential smoothingpresentedinTable 1, several
different approaches have been proposed. Dejonckheere et al. [50] investigated the utilization of a linear (Type II) or
quadratic (Type III) instead of a constant (Type I) exponential smoothing forecasting mechanismin the continuous-time
APIOBPCS model. The transfer functions between AVCONand CONS for the two alternative forecasting mechanisms
are the following:
Type II forecasting mechanism:
G
a
(s) =
2T
a
s + 1
T
2
a
s
2
+ 2T
a
s + 1
. (9)
Type III forecasting mechanism:
G
a
(s) =
3T
2
a
s
2
+ 3T
a
s + 1
T
3
a
s
3
+ 3T
2
a
s
2
+ 3T
a
s + 1
. (10)
The two mechanisms are able to produce zero steady-state offsets in AVCON for a ramp and parabolic change in the
input variable CONS. However, as we move to higher order models, we observe more oscillatory transient responses
when the same value of the parameter T
a
is used. The demand amplication problem (bullwhip effect) can be
resolved by adjusting the value of the parameter T
a
downwards as we move to higher order systems. Calculation
of the COMRATE/CONS transfer functions corresponding to the three different forecasting techniques shows that
no signicant benets are obtained by using more sophisticated forecasting methods. All three congurations track
adequately step and ramp input changes and provide a constant error when a parabolic change is given in the input
variable CONS. The only benet of using a higher order forecasting mechanism is that by a careful selection of the
parameter T
a
the production adaptation and inventory costs are slightly reduced.
Grubbstrm and Wikner [51] studied traditional inventory replenishment systems in terms of control theory. In these
systems, inventory is replenished in batches after a certain lead time, when the stock levels reaches or falls below a
trigger level, which is the reorder point. It was shown that inventory trigger control policies can be mathematically
described by difference or differential equations involving Heaviside and Dirac impulse functions, which are able to
reproduce the typical sawtooth inventory pattern.
Based on the funnel model and the theory of the logistic operating curve, Wiendahl and Breithaupt [52] developed
a continuous-time model for a single working center, which contains four input and four output variables. However,
since there are dependencies among the output variables which are linked through the funnel formula, only two control
loops are required to control the system. More specically, the rst controller adjusts the capacity of the work system
to reduce the backlog to zero as fast as possible. In the second control loop the target WIP is the reference value while
the input rate of the system is the respective manipulated variable.
A quite different approach from the IOBPCS family of models has been developed by Grubbstrms group for
deciding on the production schedule in a single working center. In contrast to previously mentioned techniques, this
approach explicitly takes into account costs and/or revenues. More specically the following problem was posed:
Determine the optimal sequence of production quantities over a nite horizon, with respect to the number of batches,
the batch sizes and their timings, assuming that:
(a) Production takes place in batches of possibly different sizes.
(b) External demand is a stochastic process where stochastic events are separated by stochastic time intervals with a
given probability function.
(c) The production lead times are deterministic.
The problem was solved for one-level systems [53] (i.e. assembly of products in the working center does not require
other products from the same center), under the assumption that demand follows the Poisson process, which is the
simplest possible stochastic process. The objective function to be maximized was the annuity streamwhich is a variation
3540 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
of the net present value (NPV). More precisely, it is the constant stream of payments corresponding to a given NPV
determined fromthe cash owwithin the nite horizon. The cash owin turn is made up by the in-payment for sold units
and the out-payments for set-up costs and variable production costs. Laplace transforms were found extremely useful
in solving the problem, since they were used to model the dynamics of the system, capture the stochastic properties
by serving as generating functions and assess the resulting cash ows when adopting the NPV principle. In a previous
publication [54] the system was optimized with respect to a different objective function, consisting of the set-up cost,
the inventory holding cost and the backlog cost.
This approach together with the inputoutput analysis is suitable for describing multi-level, multi-stage (MLMS)
productioninventory systems. An extra degree of complexity is introduced in those systems due to the fact that often
there is a high degree of commonality of components and materials between products at different stages. Thus, every
external order generates internal orders that have to be accounted for. InputOutput analysis is a technique used to
describe in a matrix form the multi-item production case with a linear or proportional dependence [55]. Preliminary
results using simple ordering policies, such as xed order quantities are presented by Grubbstrm and Ovrin [56].
Recently, publications from the same group presented results for MLMS capacity constrained systems with zero lead
times and stochastic demands [57] or non-zero lead times and deterministic demands [58]. Dynamic programming was
adopted as the solution procedure in both cases. An extensive overview of publications focusing on MLMS systems
using inputoutput analysis and Laplace transforms can be found in the paper of Grubbstrm and Tang [59].
Popplewell and Bonney [60] used discrete time linear control theory for the analysis and simulation of MLMS
systems. They considered each level and stage as a different element whose dynamics are represented by a z-transfer
function. Inputs and outputs to each element were considered as time-series signals, also represented by z-transforms.
Two different ordering policies were considered: the re-order cycle system, where the only input to each element is
demand arising from production schedules at the next level and a material requirement planning (MRP)-type system,
where in addition each element receives as input the external demands. The models were checked for stability and were
used to provide transient responses and responses to random noise.
Wikner [61] presented a methodology that introduces structure dependencies of MLMS systems in the IOBPCS
production control framework. The methodology uses matrix representation to account for multiple informational
channels. It was shown that for a single-level single-stage system, the model is reduced to the standard IOBPCS
format. The extended model has the capability to describe the dynamics of both pull-driven (base stock, kanban) and
push-driven (MRP) policies.
Burns and Sivazlian [62] considered a multi-echelon supply chain, where each echelon uses a typical discrete-time
decision rule for placing orders, consisting of a replenishment term and an inventory adjustment term. The rst term
equals the orders received during the same time period, while the second termremoves a fraction of the gap between the
desired and the actual inventory. The inventory adjustment term involves a forecasting mechanism that exponentially
smoothes the demand fromthe previous supply point. The order received by the rst supply point in the chain, constitutes
the demand imposed upon the total system. Using z-transforms, the transfer functions between the discrete-time signals
representing the orders placed by one echelon and the orders received by the same echelon were derived. The dynamic
response of the system was tested by introducing unit step changes and uniformly distributed random numbers to the
external demand. The results showed that even in a two-echelon system, minor variations in the consumer demand are
amplied into major disturbances into the last supply point. The disturbances are becoming more severe as we increase
the number of echelons that constitute the system. The amplication is due to the unavoidable inventory adjustment,
but also to an unwanted false order effect. The latter arises because adjustment is based solely on the order received
form the next lower level, which in turn contains the adjustments from all lower levels. Based on the discrete-time
transfer function, a recovery operator was proposed, so that in each echelon the original input received by the rst
supply point is recovered. The application of this recovery operator nally led to the derivation of a new decision
rule that suppresses the false order effect and experiences fewer stock-outs, lower average inventories and smoother
amplications throughout the supply chain.
Wikner et al. [63] considered a three-echelon simplied Forrester production system, which was transformed into
a block diagram representation in order to test methods for improving total dynamic performance. Five alternative
strategies for smoothing the supply chain dynamics were tested in terms of the response of the three echelons to a step
input in the market demand: tuning the existing echelon rules, reducing time delays, removing an echelon from the
supply chain, changing individual echelon rules by taking into account pipeline information and making available true
market demand to all the echelons in the supply chain. All ve strategies which can be used in combinations of two or
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3541
more improved the dynamic response of the system. The most effective, however, was the last strategy, thus illustrating
the importance of information ow.
Disney and Towill [64] studied a simple vendor managed inventory (VMI) supply chain consisting of one produc-
tion unit and one distributor. In VMI systems all supply points in the chain have access to stock positions for setting
production and distribution targets. The discrete-time APIOBPCS model was used to describe the dynamics of the
manufacturing unit. Pure delay was initially utilized to model the production delay. The only difference to the API-
OBPCS structure presented previously is that instead of the demand signal CONS, the manufacturing facility receives a
virtual consumption signal. This is produced by adding in each time period the demand signal received by the distrib-
utor to the difference between the current time period and the previous period reorder point. The reorder point is time
varying and is dened as a multiple of the average (exponentially smoothed) demand received by the distributor. The
system was checked for stability. The stability criteria that were produced are also valid for the standard APIOBPCS
model, since the distributors policy described previously is a stable feed-forward element. It was found that when
T
i
= T
w
the stability of the system is guaranteed. If the equality does not hold, stability criteria were obtained, which
are expressed as inequalities involving the tuning parameters T
i
, T
w
. Another important result is that when T
i
=T
w
the
system is robust to changes in the distribution of the production delay. However, setting T
i
= T
w
leads to conservative
design.
An integrated continuous-time approach for modeling the dynamics in supply chain management systems was pre-
sented by Perea et al. [65,66]. The framework they presented considers the entire supply chain, including manufacturing
sites with single-unit multi-product processes, multi-product multi-stage distribution network and the end customers.
The dynamic behavior of the supply chain is captured by modeling separately the owof materials and information. The
delivery rate between nodes is modeled so that the amplication bullwhip effect can be reproduced. A heuristic pro-
duction policy is chosen, according to which each time we have to make a decision on the product to be manufactured,
the product with the highest backlog is selected. The plant continues making this product until the batch is completed.
The production policy in general is not optimal, but ensures that the system is stable. The orders placed between nodes
are considered as manipulated variables and the ordering policies as control laws. Four different ordering policies were
analyzed with respect to their inuence on the dynamic behavior of supply chain systems. The rst (base) policy is
the standard policy where the set points for the inventory levels are xed. According to the second policy, the ordering
rate of each node to its upstream product node is proportional to the total amount of orders received by the node. With
the third policy, the ordering rate is proportional to the difference between the inventory level and the total existing
backlog at the node. Finally, the fourth policy increases the ordering rate of the third policy by a term which accounts
for providing all nodes with information about the actual orders from the customers. The four policies were tested
in case studies where step changes or periodic changes were introduced in the customer demands. The policies were
compared in terms of quantitative indices describing the total operational (storage and production) cost and customer
satisfaction level. The bullwhip effects of the four policies were also examined. The results show that policies 1 and
4 offer a higher customer satisfaction level at an extra storage cost. The base policy proved more successful as far as
dampening of amplication is concerned. Finally, there was agreement with previous studies on the improvement of the
performance of the system when information about end customers demand is available throughout the entire supply
chain.
Lin et al. [67] presented a discrete-time model of a supply chain system, using z-transforms to obtain the transfer
functions for each unit. The supply chain is assumed to have no branches, so that each logistic echelon has only one
upstream node and one downstream node. The dynamics of each particular node in the chain was modeled in detail
in a block diagram form. The bullwhip effect was studied by obtaining the transfer function between the order signal
placed to the upstream node and the signal representing the customer demand. By using a proportional controller on
the difference between the desired set point and the actual inventory position, the authors found that the bullwhip effect
can be avoided when the gain is less than 1. When the inventory position target is not xed, but changes according to a
forecasted demand based on an exponential lter, an even smaller gain is required to suppress the bullwhip effect. For
this case (variable inventory position target) and assuming a stochastic demand fromdownstreamorders, three ordering
policies were examined: P and PI controllers acting on the difference between the set point and the actual inventory
position and a cascade PI controller where the input is the ltered trend of the same difference. The performance of the
alternative ordering policies was evaluated with respect to the transient response, the bullwhip effect, the back-order
level and the excess inventory caused when stochastic changes are introduced in the customer demand. It was found
that the P controller was not able to drive the system to the desired set point. The two PI controllers improved all
3542 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
the performance criteria. In particular, the cascade PI structure was found superior in meeting customer demand and
suppressing the bullwhip effect, but as far as the two remaining performance criteria are concerned it was slightly
inferior to the standard PI scheme.
3. Dynamic programming and optimal control
Due to their dynamic and uncertain nature, production/inventory problems can be naturally formulated as dynamic
programs. Dynamic programming is the standard procedure to obtain an optimal state feedback control lawfor stochastic
optimal control problems.
For sake of completeness, we will briey describe the dynamic programming philosophy, in the discrete-time, nite-
horizon setting [9]. In the continuous-time framework the philosophy is basically the same, although more technicalities
are involved. Consider the discrete-time dynamic system:
x(t + 1) = f (x(t ), u(t ), d(t )), t = 0, . . . , T 1. (11)
The state x(t ) is constrained to lie in a set X R
n
, while the control (also called the vector of manipulated variables) u(t )
must belong to U R
m
. The exogenous disturbance d(t ) is a randomvector characterized by a probability distribution
with support D R
p
that may depend explicitly on x(t ) and u(t ), but not on prior disturbances. Furthermore, consider a
function : XUD Rrepresenting the one-stage cost. An admissible state feedback control law(or, equivalently,
control policy or decision rule) is a sequence = {k
0
, . . . , k
T 1
}, where k
t
are vector functions mapping states x(t )
into controls u(t ) = k
t
(x(t )) and are such that k
t
(x(t )) U. Finally, we denote by H(x){ : x(0) = x, k
t
(x(t ))
U, f (x(t ), k
t
(x(t )), d(t )) X, t = 0, . . . , T 1} the set of all admissible policies. The nite-horizon cost associated
with an admissible control policy starting from a given initial state x(0) is
V

(x(0)) =E
_
T 1

t =0
(x(t ), k
t
(x(t )), d(t ))
_
. (12)
Our goal is to nd an optimal control policy

, i.e. a policy that minimizes the nite-horizon cost (we make the
simplifying assumption that such a policy exists, thus we can use min instead of inf in the following equations):
V

(x(0)) = min
H
V

(x(0)). (13)
Notice that the control policies we are interested in are closed-loop policies, in the sense that they determine values for
the manipulated variables once the state of the systembecomes known at each time period. This is the main difference of
feedback policies resulting from closed-loop optimization, as opposed to open-loop policies, which determine values
(and not functions of the state) for the manipulated variables over the time horizon. Closed-loop policies can take
advantage of the extra information revealed in each time period and thus they lead to lower costs than open-loop
policies.
Dynamic programming is based on the principle of optimality to solve the optimal control problem. The principle of
optimality simply states that if a policy

={k

0
, . . . , k

T 1
} is optimal for the optimal control problem over the interval
t =0, . . . , T 1 then it is necessarily optimal over the subinterval t =t, . . . , T 1 for any t {0, 1, . . . , T 1}. Dynamic
programming uses this concept to formulate the problem as a recurrence relation. Thus, the dynamic programming
algorithm decomposes the optimal control problem by solving the associated sub-problems starting from the last time
period and proceeding backwards in time. Mathematically, the algorithm is described by the following equations:
V
T
(x(T )) = 0,
V
t
(x(t )) = min
u(t )U
t
(x(t ))
E
d
t
[(x(t ), u(t ), d(t )) + V
t +1
(f (x(t ), u(t ), d(t )))], t = 0, 1, . . . , T 1. (14)
Hence, applying the dynamic programming algorithmwe get the optimal cost V
0
(x(0)). Furthermore, if u

(t )=k

t
(x(t ))
minimizes the right-hand side of (14) for each x(t ) and t, the policy

={k

0
, . . . , k

T 1
} is optimal. In this case, in each
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3543
iteration the dynamic programming algorithm gives the optimal cost-to-go for every possible state, which we denote
by V

t
.
The concept of dynamic programming has been prevalent in formulating, analyzing and solving the very rst problem
from inventory theory, which is the basic building block of supply chain models. We will now present the simplest
application of dynamic programming in inventory management [68]. We consider a single-echelon, single-product
system where the problem is to optimally select orders u(t ) of the product in order to meet uncertain demand d(t ),
while minimizing the total expected purchasing, inventory and shortage cost. The dynamics of the systemare described
by the following state space equation:
x(t + 1) = x(t ) + u(t ) d(t ), (15)
where x(t ) is the inventory level at the beginning of tth period. It is also the state variable of the dynamic system since
it summarizes the sufcient information we need to make a decision. The order u(t ) is the control or manipulated
variable and the demand d(t ) is the exogenous disturbance. Furthermore, we assume that disturbances between stages
are independent random variables and that excess demand is backlogged and lled as soon as additional inventory
becomes available. The one-stage cost function is
(x, u, d) = cu + h(x + u d)
+
+ p(x + u d)

, (16)
where c, h and p are purchasing, holding and shortage unit costs, respectively, z
+
=max(0, z) and z

=max(0, z). By
making the transformation y(t ) =x(t ) +u(t ) and applying the dynamic programming algorithm, through an inductive
argument one can show that the cost-to-go functions are convex and minimizing scalars y

(t ) := S(t ) exist for the


unconstrained problem. Thus, an optimal policy is determined by the sequence of scalars {S(0), S(1), . . . , S(T 1)}
and has the form:
k

t
(x(t )) =
_
S(t ) x(t ) if x(t ) <S(t ),
0 if x(t )S(t ).
(17)
This type of policy is often referred as a base-stock or order-up-to policy (please also see Eq. (6)).
Clark and Scarf [69] were the rst to show that the optimal feedback law for a multi-echelon system is a base-stock
policy for each echelon when the demand volumes in different periods are independent and identically distributed.
They formulated the problem as a discrete-time nite-horizon optimal control problem and showed that the optimal
base-stock levels can be solved by a series of single location inventory problems with appropriately adjusted penalty
functions. Scarf [70] derived the optimal ordering policy for a single facility facing uncertain demand and with setup
costs associated with inventory orders. By formulating the respective dynamic programming equations for the nite-
horizon problem and showing that the value function is K-convex, he came up with the so-called (s, S) policy with
s <S, meaning order-up-to S whenever the inventory level falls to or below s. Iglehart [71] proves that a stationary
(s, S) policy is optimal for the innite-horizon problem. These results were of great theoretical and practical importance
and served as the precursors of modern supply chain theory.
Following the pioneering work of Clark, Scarf and coworkers, many researchers extended this work by treating more
complicated problems which are closer to reality. Hausman and Peterson [72] considered a single echelon, capacity-
constrained multi-product system with terminal demand, where the forecasts for total sales follow a lognormal model.
They formulated the problem as a dynamic programming problem and provided heuristic solutions. Federgruen and
Zipkin [73] extended the multi-echelon model of Clark and Scarf [69] to the innite-horizon case. They concluded
that the innite-horizon discounted cost case is simpler to solve than the long-term average cost case, a result that
is typical in dynamic programming literature. They also provided a closed form solution for a two-echelon system
when inventory holding and shortage cost are linear and customer demand follows a normal distribution. However, the
computations for systems with more than two echelons become prohibitively complex.
On the other hand, many researchers have recognized the dynamic nature of the demand process in many cases, and
thus attempted to incorporate it to production inventory models, digressing from stationary demand models. Iglehart
and Karlin [74] examined a discrete-time, continuous state formulation of the single location inventory problem, where
demand is a function of a discrete-time, nite-state Markov chain. They proved that a state feedback base-stock policy is
3544 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
optimal for the discounted innite-horizon problem. Song and Zipkin [75] studied the continuous-time, discrete-state,
discounted-cost model with xed costs and by formulating the problem as a dynamic program with two state variables
(inventory position and exogenous information), they showed that a state feedback (s, S) policy is optimal. Sethi and
Cheng [76] proved the optimality of a state feedback (s, S) policy for a single location discrete-time system with a
more general cost structure than that of Song and Zipkin [75], in the case where demand follows a Markov process
and xed costs are present. Beyer and Sethi [77] proved the existence of an optimal state dependent (s, S) policy for
the long-run average cost problem. Based on the theory of impulse control, Bensoussan et al. [78] proved that an (s, S)
policy is optimal in a continuous-time stochastic inventory model with a xed ordering cost when demand is a mixture
of a diffusion process and a compound Poisson process with exponentially distributed jump sizes or a mixture of a
constant demand and a compound Poisson process. The Bellman equation of dynamic programming for such a problem
reduces to a set of quasi-variational inequalities (QVI). Note that all the above papers discuss problems involving a
single location.
Dong and Lee [79] used approximations for the induced penalty term introduced by Clark and Scarf [69] to provide
an approximate, simple closed-form solution to the multi-echelon inventory problem. They extended their results in
the case of time-correlated demand processes using the Martingale model of forecast evolution (MMFE) of Heath and
Jackson [80] and Graves et al. [81] to model the forecast process.
In order to exploit advanced information about customer demand that some companies have the ability to obtain,
dynamic programming models were developed by various authors taking into account this extra ow of information.
These studies manage to further reduce costs by successfully incorporating forecast updates in the stochastic dynamic
models. Gallego and Ozer [82,83] studied optimal replenishment policies for single and multi-echelon uncapacitated
inventory systems with advance demand information. They modeled the forecast evolution as a super martingale
and proved optimality of a state-dependent (s, S)-type policy. Ozer and Wei [84] established optimal policies for a
capacitated inventory system with advance demand information. Sethi et al. [85] developed a model of forecast updates
that is analogous to peeling layers of an onion in the sense that for each demand period more information is revealed as
time passes. They also considered the ability of ordering from a fast and a slow supply source, the former with larger
ordering cost. They established the existence of an optimal Markov policy by showing that the value function is convex
and by utilizing the measurable selection theorem appearing in Bensoussan et al. [86] under appropriate assumptions.
They further proved that the optimal policy has the structure of a forecast update dependent base-stock policy. Sethi
et al. [87] extended the work of Sethi et al. [85] to include xed costs ending up with a forecast update dependent
(s, S)-type optimal policy, while in Feng et al. [88] delivery modes were considered.
Simchi-Levi and Zhao [89] analyzed the value of information sharing of the retailer and the producer in a two-
echelon supply chain with a nite-production capacity over a nite-horizon. They examined three scenarios, each with
a different level of information sharing and optimality. They used dynamic programming to derive qualitative results
for each scenario and concluded through computational experiments that information sharing can be very benecial,
especially when the manufacturer has excessive capacity.
It is worth noticing that for the above references, dynamic programming serves as a tool for proving existence of
optimal feedback control laws and characterizing their general form. However, it is not employed as a computational
tool due to the curse of dimensionality which is prevalent even for simplied, medium-scale supply networks. In
order to solve these complex stochastic control problems, some kind of simplifying assumptions, decompositions or
approximations need to be considered.
Another production problem that has been investigated thoroughly by the control research community is the produc-
tion planning of manufacturing systems with unreliable machines. Olsder and Suri [90] were the rst to formulate the
problem of controlling manufacturing systems with unreliable machines, as a stochastic control problem. The related
optimal control problem falls under the general class of systems with jump Markov disturbances [91,92] or piecewise
deterministic processes [93]. In their model, each machine is subject to randombreakdown according to a homogeneous
Markov process. However, they recognized that the problem was intractable due to the difculty of solving the HJB
equations characterizingng the optimal control.
We will nowbriey describe the generic model of a exible manufacturing system(FMS) with mmachines producing
n parts. We assume that the machines are completely exible meaning that there are no setup costs and times involved.
We also assume that machines are failure prone and repairable. We denote by {s
j
(t ), t 0} the stochastic process
describing the operational mode of machine j, j = 1, . . . , m. We have s
j
(t ) = 1 when machine j is in the functional
mode and s
j
(t ) = 0 when machine j is under repair, i.e. s
j
(t ) S
j
:= {0, 1}. Now, the operational mode of the
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3545
stochastic FMS can be described by the random vector s(t ) =(s
1
(t ), . . . , s
2
(t ))

describing the mode of the system and


taking values in S=S
1
S
m
. Obviously the set S has 2
m
states. We further assume that the stochastic process
{s(t ), t 0} is modeled by a continuous-time Markov chain with a known generator Q, where Q = {q
ab
}, a, b S is
a 2
m
2
m
matrix such that q
ab
0 for a = b and q
aa
=

a=b
q
ab
. The production ow dynamics are described by
the following state equations:
x(t ) = u(t ) d, x(0) = x, (18)
where x is the initial vector of surplus levels, x(t ) R
n
is the surplus level (inventory/shortage) at time t, u(t ) R
n
+
denotes the production rate at time t such that u
i
() =

m
j=1
u
ij
(), i = 1, . . . , n, with u
ij
describing the production
rate of product i on machine j and d R
n
is a known constant positive vector denoting the demand rate.
The set of the feasible feedback control policies depends on the stochastic process {s(t ), t 0} and is given by
U(a) = {u(t ) R
n
, 0u
j
(t )u
j
max
(a), j = 1, . . . , n},
where u
j
max
(s(t )) =

m
i=1
u
max
ij
is the maximal production rate of part type j at mode a. We denote by F
t
the natural
ltration generated by {s(t ), t 0}, i.e. F
t
= o(s(t): 0tt ). A control u() = {u(t ), t 0} is said to be admissible
if u() is F
t
-measurable and u(t ) U(s(t )) for all t 0. A F
t
-measurable function u(x, s) is an admissible feedback
control if for any given initial x, the state equations have a unique solution and u() = u(x(), s()) is an admissible
control. It is easy to see that the system dynamics are described by a hybrid state containing both a discrete and a
continuous component.
In the rst class of problems considered in the literature, the objective is to nd an admissible feedback control
law u() U() so as to minimize the expected discounted cost of production and inventory/backlog cost over an
innite-horizon, given by
J
d
(a, x, u()) =E
__

0
e
jt
(h(x(t )) + c(u(t ))) dt |x(0) = x, s(0) = a
_
, (19)
where j is the discount factor and h(, ) denotes the instantaneous production and inventory/backlog cost function.
The second class of problems deals with the long-run average cost of the form:
J
a
(a, x, u()) = lim sup
T
1
T
E
__

0
e
jt
(h(x(t )) + c(u(t ))) dt |x(0) = x, s(0) = a
_
. (20)
We dene the value function of the innite-horizon discounted stochastic optimal control problem, given initial surplus
vector x and initial mode a, i.e.
V(x, a) = inf
u()U(a)
J(a, x, u()). (21)
It can be shown that the optimal value function satises the following set of HJB equations [91]:
jV(x, a) = min
uU(a)
{(u d)

V
x
(x, a) + c(u)} + h(x) +

b
q
ab
V(x, b), (22)
where f
x
denotes the partial derivative of the function f with respect to x.
The HJB equation associated with the long-run average cost criterion is of the following form:
z = min
uU(a)
{(u d)

W
x
(x, a) + c(u)} + h(x) +

b
q
ab
W(x, b), (23)
where z is a constant, and W is the so-called potential function. A solution to the HJB is a pair (z, W).
To this end, let us stress the fundamental difference of these two problem classes. The discounted-cost criterion
considers short-term costs to be more important than long-term costs. On the other hand, the average cost criterion
ignores the short-term costs and considers only the distant future costs.
3546 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
Based on these concepts, Akella and Kumar [94] studied the innite-horizon discounted-cost optimal control problem
of a single, failure prone machine producing a single product (m=1, n =1), in order to fulll a deterministic demand.
The inventory and shortage costs are supposed to be linear functions where (h(x) = c
+
x
+
+ c

) and there is
no production cost (c(u) = 0). The transition between the functional and the breakdown state is described by a
continuous-time Markov chain. Based on the HJB equation for the system under investigation, they derived a simple
formula of the optimal policy which involves the determination of the optimal inventory level, known as the hedging
point. In simple words the optimal policy is as follows: when the state of the machine is functional and the current
inventory level exceeds the hedging point, do not produce at all; if the current inventory level is below the hedging
point, produce at the maximum rate; if it is exactly equal, produce on demand. The idea behind this policy is that some
non-negative production surplus should be maintained at the times of excess capacity to hedge against future capacity
shortages. Furthermore they proved that the value function is continuously differentiable. Note that this solution is
valid only when the inventory and shortage costs are linear. In the case of general convex costs and more than two
machine states, the explicit solution cannot be obtained. Eventually one has to resort to a numerical approach to nd
an approximate value function by solving the associated HJB equation. Bielecki and Kumar [95] extended the work of
Akella and Kumar [94] for the case of long-run average cost obtaining an optimal hedging point policy as well. Kimemia
and Gershwin [96] studied the multi-machine multi-part type problem and showed that the optimal feedback control is
a hedging point policy. They recognized that for such complex problems, the derivation of the optimal policy in a closed
form is not possible, thus they proposed an approximation scheme that calculates an approximate value function based
on off-line discretization, followed by the on-line solution of a linear program whenever the manufacturing system
changes an operational state.
Sethi et al. [97] investigated the general innite-horizon discounted problem for multiple machines and multiple part
types where the demand and the capacity level are nite-state continuous-time markov chains and the inventory/shortage
and production costs are general convex functions, using the viscosity solution technique [98]. They proved that the
value function is convex and continuously differentiable. Finally, through rigorous arguments they showed that the
value function is the unique viscosity solution for the associated HJB equation and, based on this fact, they dened
the turnpike sets as the attractors of the optimal trajectories for the system under investigation. It turned out that the
hedging point policy Akella and Kumar [94] is a special case of the result of Sethi et al. [97].
Presman et al. [99] investigated the problem of optimal feedback production planning in a machine owshop. Since
the number of parts in the internal buffers of the owshop cannot be negative, state constraints must be imposed and
certain boundary conditions need to be taken into account. Thus, the authors formulated the HJB equations in terms
of directional derivatives at inner and boundary points. They also proved existence and uniqueness of an optimal
feedback control. The optimal feedback control for this kind of problems does not possess the structure of hedging
point policies. Presman et al. [100] extended these results for owshops with limited storage capacities. Sethi and
Zhou [101] obtained the explicit solution for the deterministic two-machine owshop problem with innite-horizon
discounted cost criterion.
Gharbi and Kenne [102] examined the production control problem for a multiple-part multiple-machine manufac-
turing system. Due to the inherent complexity of the HJB equations they resorted to the heuristic, simulation-based
determination of the parameters of a modied hedging point policy which gives the best approximation of the value
function.
Boukas and Haurie [103], addressed the optimal control problem of a single-product, multiple-machine manufac-
turing system where the failure probabilities and the repair times are age dependent, and they added the possibility
of performing preventive maintenance in order to increase the availability of the production system. However, the
complexity of the stochastic optimal control problem is increased due to the additional state variables representing
machine ages and the additional control variables representing transitions from the functional mode to preventive
maintenance. Therefore they proposed a numerical method based on Markov chain approximation [104] in order to
solve approximately the underlying HJB. This method is based on the discretization of the corresponding continuous-
time, continuous state HJBequation, formulating an approximate discrete-time, discrete-state Markov decision problem
and then solving it by policy iteration. However, the proposed numerical scheme still suffers from the curse of dimen-
sionality, thus its use is restricted to small scale problems. In Boukas and Yang [105] a somewhat different problem
is considered, in the sense that the maintenance procedure is executed while the machine is operating. They showed
that the optimal production policy is described by a critical surface. However, their results are valid only for the single
machine problem. Recognizing the increased complexity of optimal control problems addressing both production and
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3547
maintenance, Boukas et al. [106], Kenne et al. [107], Kenne and Gharbi [108,109], Gharbi and Kenne [102] developed
sub-optimal age-dependent hedging point policies. However, the determination of the threshold levels seems to be a
difcult task involving simulation and heuristic arguments.
All the above papers rely on the assumption that the machines are completely exible, meaning that no set-up costs or
setup times incur while switching from one product to another. This assumption was relaxed by Sethi and Zhang [110],
where stochastic manufacturing systems with setups have been considered. Their study focused on the characterization
of the exact optimal policy via viscosity solutions of HJB equations. However, the optimal policies cannot be explicitly
computed and one has to resort to numerical schemes in order to obtain a sub-optimal feedback controller. Yan and
Zhang [111] developed a numerical method for the solution of the optimal production and setup scheduling problemfor
a single machine producing multiple parts based on the Markov chain approximation scheme of Kushner and Dupuis
[104]. They proved that the resulting policy is asymptotically optimal as the length of the nite-difference interval
approaches zero. Boukas and Kenne [112] developed near-optimal control policies for production, maintenance and
set-up scheduling for age-dependent failure prone machines using the discretization methods of Kushner and Dupuis
[104]. Liberopoulos and Caramanis [113] investigated numerically several examples in order to characterize the value
function as well as the optimal production and set-up policy of the problem. Bai and Elhafsi [114] provided a suitable,
heuristic production and set-up policy structure, the so-called Hedging Corridor policy (HCP). Gharbi et al. [115]
proposed the modied HCP that guarantees lower cost than that of HCP. In order to calculate the optimal set of values
for the parameters of the proposed policy, they proposed a heuristic scheme based on stochastic optimal control theory,
discrete event simulation and experimental design. Through two case studies they showed superiority of their proposed
policy against HCP.
Demand uncertainty is one of the major factors affecting the decision making in production and control. Feng and
Yan [116] incorporated a Markovian demand in a discrete-state version of Akella and Kumar [94]. They considered an
optimal inventory/production control problem in a stochastic manufacturing system with a discounted cost criterion
in which the demand, the capacity of production, and the processing time per unit are random variables. Song and
Sun [117] considered the optimal control problem of a serial production line with failure-prone machines and random
demand. They showed that the optimal policy is a bangbang control and that it can be determined by a set of switching
manifolds. Based on the structure of these switching manifolds, they proposed sub-optimal policies which are easy to
implement in real systems.
Boukas and Liu [118] investigated a continuous-time inventoryproduction planning problem, where the products
deteriorate and their value reduces with time. Examples of this kind of products are electronic devices (due to rapid
technological changes), clothing and of course perishable goods such as foodstuff and cigarettes. In this case the state
equation (1) becomes
x(t ) = x
+
(t ) + u(t ) d, x(0) = x (24)
with = diag(
1
, . . . ,
n
) the matrix of deterioration rates for products i = 1, . . . , n and x
+
(t ) = max(x(t ), 0). They
proved that the value function is convex and Lipschitz in x and that it is the unique viscosity solution of the underlying
HJB equation. For the special case of one machine (m=1) producing one product (n=1) they showed that the optimal
policy is a modied hedging policy, presented by Akella and Kumar [94]. However, they too emphasize the fact that
the closed form solution for general complex problems is very difcult to obtain.
Research progress on the problem with long-run average cost criterion was largely based on the explicit solution
of Bielecki and Kumar [95] for the single-machine single-product problem. Sharifnia [119] extended the Bielecki and
Kumar model to a machine with more than two machine states and showed how to calculate the optimal hedging point.
Using Sharifnias method, Liberopoulos and Hu [120] dealt with an extension of the BieleckiKumar model in which
there are more than two machine states and the transition rates of the machine states depend on the production rate.
However, the preceding papers were mostly based on heuristic arguments and on the ability of being able to explicitly
compute the value function.
Perkins and Srikant [121] investigated the problem of a single failure prone machine producing multiple parts with
the objective of minimizing a long-run average cost. They restricted their investigation to the class of linear switching
curve (LSC) and prioritized hedging point (PHP) policies. They provided a characterization of the set of problem
parameters for which a Just-In-Time policy is optimal.
Sethi et al. [122] used the vanishing discount approach to prove optimality of the hedging point policy for convex
surplus cost and linear or convex production cost. Sethi et al. [123] examined the optimal production planning problem
3548 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
Stochastic Optimal Control Problem
(Stochastic HJB)
Limiting deterministic Optimal Control
Problem
(Deterministic HJB)
Asymptotically
Optimal Control
Singular
Perturbation
Fig. 2. Hierarchical control methodology.
for multi-product FMS. They developed the appropriate dynamic programming equations and they proved an existence
theorem and a verication theorem for optimality, by starting from the discounted cost problem and using a vanishing
discount approach. Sethi and Zhang [124] provided the explicit form of the optimal control policy for the long-run
average cost problem of a single machine producing multiple-part types when inventory and shortage costs are equal.
The policy they came up with can be considered as a variant of the kanban policy.
One can see that sufcient progress has taken place concerning theoretical results for FMSs with unreliable machines.
However, the explicit solution is available only for the simplest problems. Unfortunately, todays manufacturing systems
are large scale and complex, characterized by several decision subsystems with different time scales. The purpose of
hierarchical control methods is to approximately solve such kind of problems by exploiting their structure. The main idea
of hierarchical control is to reduce the problems complexity by replacing fast evolving processes with their mean values.
By fast evolving processes we mean these processes that reach their stationary distributions in a time period during
which there are few, if any, uctuations in the other processes. For example, in the above problemthe stochastic process
s(t ) is fast evolving if the rate of change in the machine states is much larger than the rate at which the cost is discounted.
This way, a deterministic limiting problem is constructed which is computationally more tractable. Then, based on the
solution of the limiting problem we can construct controls for the original system which are asymptotically optimal
as the uctuation rate of the fast evolving processes goes to innity. However, it is not clear how to construct controls
for the original systems. Usually, a lot of experimentation and intuition is required [125,126]. The essence behind
this approach is that strategic level management can ignore the day-to-day uctuation in machine capacities, or more
generally the details of the shop oor events, in carrying out long-term planning decisions. The lower operational level
management can then derive approximate optimal policies for running the actual (stochastic) manufacturing systems.
A schematic representation of hierarchical control methods appears in Fig. 2. A detailed exposition of hierarchical
control method is presented in Sethi and Zhang [127], while a recent review presenting up-to-date results is that of
Sethi et al. [128]. Extensive numerical results and comparisons with heuristic policies are presented in Samaratunga
et al. [129].
4. Model predictive control
MPCor receding horizon control has nowbecome a standard control methodology for industrial and process systems.
Its wide adoption fromthe industry is largely based on the inherent ability of the method to handle efciently constraints
and nonlinearities of multi-variable dynamical systems. MPC is based on the following simple idea: at each discrete-
time instance the control action is obtained by solving on-line a nite-horizon open-loop optimal control problem,
using the current state of the system as the initial state. A nite-optimal control sequence is obtained, from which
only the rst element is kept and applied to the system (Fig. 3). The procedure is repeated after each state transition
[130132]. Its main difference fromstochastic dynamic programming and optimal control is that the control input is not
computed a priori as an explicit function of the state vector. Thus, MPC is prevalent in the control of complex systems
where the off-line solution of the dynamic programming equations is computationally intractable due to the curse of
dimensionality. However, when the optimal control problem is stochastic in nature, one can only obtain suboptimal
solutions, due to the open-loop nature of the methodology. A formulation of the MPC on-line optimization problem
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3549
Set Point
Prediction Horizon
x(t)
t+P
predictions
t t+1
time
Future Past
Open loop inputs

x t+1 t) (

x t+l t) (

u t+l t) (

u t+l t) (

u t t) (
Fig. 3. Model predictive control philosophy.
can be written as follows. At time t we solve the following nite-horizon optimal control problem:
min
{u(t +i|t )}
M1
i=0
_
P

i=0
y(t + i|t ) y
SP

2
Q
x
+
P

i=0
u(t + i|t ) u
SP

2
Q
u
+
P

i=0
u(t + i|t )|
2
Q
u
_
s.t. x(t + l + 1|t ) = f ( x(t + l|t ), u(t + l|t ),

d(t + l|t )), l = 0, 1, . . . , P 1,
y(t + l|t ) = g( x(t + l|t ), u(t + l|t ), e(t + l|t )), l = 0, 1, . . . , P,
u
min
u(t + l|t )u
max
, l = 0, 1, . . . , P 1,
y
min
y(t + l|t )y
max
, l = 0, 1, . . . , P,
u
min
u(t + l|t )u
max
, l = 0, 1, . . . , P 1. (25)
where u
SP
, y
SP
are the set-points (desired values) of input and output vectors,

d(|t ), e(|t ) are predictions for the
disturbances, x(|t ), y(|t ) are predictions for the state and output vector, f (, , ) and g(, , ) are the functions of the
state space model describing the discrete-time dynamics of the system, is the backward difference operator and P is
the prediction horizon.
Q
denotes the Q-weighted norm, i.e. x
Q
=
_
x
T
Qx. Notice that the objective function
penalizes deviations from the desired values of input and output as well as excess movement of the control vector.
The matrix Q
u
is usually termed as the move suppression penalty matrix. When the state space model is linear the
optimization problem reduces to a quadratic program for which efcient algorithms exist for its solution. To this end,
let us mention that this is the simplest form of an MPC conguration with no terminal constraints and terminal costs,
which are required so as to ensure stability of the closed-loop system. These types of features have not yet been used
in applications of MPC for supply chain management problems.
The signicance of the basic idea implicit in the MPC has been recognized a long-time ago in the operations man-
agement literature as a tractable scheme for solving stochastic multi-period optimization problems, such as production
planning and supply chain management, under the term rolling horizon [133136]. For a review of rolling horizons
in operation management problems and interesting trade-offs between horizon lengths and costs of forecasts, we refer
the reader to Sethi and Sorger [137] and Chand et al. [138].
Kapsiotis and Tzafestas [139] were the rst to apply MPCto an inventory management problem, for a single inventory
site. They included a penalty term for deviations from an inventory reference trajectory in order to compensate for
production lead times. Tzafestas et al. [140] considered a generalized production planning problem that includes
3550 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
both production/inventory and marketing decisions. They employed a linear econometric model concerning sales as
a function of advertisement effort so as to approximate a nonlinear VidaleWolfe process. The dynamics of sales
are coupled with an inventory balance equation. The optimal control problem is formulated as an MPC, where the
control variables are the advertisement effort and the production levels. The objective function penalizes deviations
from desired sales and inventory levels.
Perea-Lopez et al. [141] employed MPC to manage a multi-product, multi-echelon production and distribution
network with lead times, allowing no backorders. They formulated the optimal control problem as a large scale mixed
integer linear programming (MILP) problem, due to discontinuous decisions allowed in their model. In their formulation
the demand is considered to be deterministic. They tested their formulation in a quite complex supply chain producing
three products and consisting of three factories, three warehouses, four distribution centers and 10 retailers servicing
20 customers. They compared their centralized approach against two decentralized approaches. The rst decentralized
approach optimizes distribution only and uses heuristic rules for production/inventory planning. The second approach
optimizes manufacturing while allowing the distribution network to follow heuristic rules. Through simulations, they
inferred that the centralized approach exhibits superior performance.
Seferlis and Gianellos [142] developed a two-layered hierarchical control scheme, where a decentralized inventory
control policy is embedded within an MPC framework. Inventory levels at the storage nodes and backorders at the
order receiving nodes are the state variables for the linear state space model. The control variables are the product
quantities transferred through the network permissible routes and the amounts delivered to the customers. Backorders
are considered as output variables. Deterministic transportation delays are also included in the model. The cost function
of the MPC consists of four terms, the rst two being inventory and transportation costs, the third being a quadratic
function that penalizes backorders at retailers and the last term being a quadratic move suppression term that penalizes
deviations of decision variables between consecutive time periods. In order to account for demand uncertainty, they
employed an autoregressive integrated moving average (ARIMA) forecasting model for the prediction of future product
demand variation. Based on historical demand they performed identication of the order and parameters of the ARIMA
model.
PID controllers were embedded for each inventory node and each product. These local controllers are responsible
for maintaining the inventory levels close to the pre-specied target levels. Hence, the incoming ows to the inventory
nodes are selected as the manipulated variables for the PID controllers. This way a decoupling between inventory
level maintenance and satisfaction of primary control objectives (e.g. customer satisfaction) is achieved, permitting
the MPC conguration to react faster to disturbances in demand variability and transportation delays. However, tuning
of the localized PID controllers requires a time consuming trial-and-error procedure based on simulations. In their
experiments, assuming that demand is deterministic and performing a step change, they observed an amplication of
set point deviations for upstream nodes (bullwhip). For stochastic demand variation, they noted that the centralized
approach requires a much larger control horizon to achieve a comparable performance with their two-layered strategy.
Braun et al. [143] developed a linear MPC framework for large scale supply chain problems resulting from the
semiconductor industry. Through experiments, they showed that MPC can handle adequately uncertainty resulting
from model mismatch (lead times) and demand forecasting errors. Due to the complexity of large scale supply chains,
they proposed a decentralized scheme where a model predictive controller is created for each node, i.e production
facility, warehouse and retailers. Inventory levels are treated as state variables for each node, the manipulated variables
are orders and production rates, and demands are treated as disturbances. The goal of the MPC controller is to keep
the inventory levels as close as possible to the target values while satisfying constraints with respect to production
and transportation capacities. Their simulations showed that using move suppression (i.e. the term in the objective
function that penalizes large deviations on control variables between two consecutive time instants), backorders can
be eliminated. It is well known in the MPC community that the move suppression term has the effect of making
the controller less sensitive to prediction inaccuracies, although usually at the price of degrading set point tracking
performance. Through simulations, Braun et al. [144] and Wang et al. [145] justied further the signicance of move
suppression penalties as a means for increased robustness against model mismatch and hedging against inaccurate
demand forecasts.
Wang et al. [146] treated demand as a load disturbance and they considered it as a stochastic signal driven by
integrated white-noise (the discrete-time analog of Brownian motion). They applied a state estimation-based MPC in
order to increase the system performance and robustness with respect to demand variability and erroneous forecasts.
Assuming no information on disturbances, they employed a Kalman lter to estimate the state variables, where the
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3551
lter gain is a tuning parameter based on the signal-to-noise ratio. Through simulations they concluded that when there
is a large error between the average of actual demands and the forecast, a larger lter gain can make the controller
compensate for the error sufciently fast.
Dunbar and Desa [147] applied a recently developed distributed/decentralized implementation of nonlinear MPC
[148] to the problem of dynamic supply chain management problem, reminiscent of the classic MIT Beer Game
[19]. By this implementation, each subsystem is optimized locally for its own policy, and communicates the most
recent policy to those subsystems to which it is coupled. The supply network consists of three nodes, a retailer, a
manufacturer and a supplier. Information ows (i.e., ows moving upstream) are assumed to have no time delays
(lead times). On the other hand, material ows (i.e., ows moving downstream) are assumed to have transportation
delays. The proposed continuous-time dynamic model is characterized by three state variables, namely, inventory level,
unfullled orders and backlog for each node. The control inputs are the order rates for each node. Demand rates and
acquisition rates (i.e., number of items per day acquired from the upstream node) are considered as disturbances. The
control objective is to minimize the total cost, which includes avoiding backorders and keeping unfullled orders
and inventory levels low. Their model demonstrates bidirectional coupling between nodes, meaning that differential
equation models of each stage depend upon the state and input of other nodes. Hence, cycles of information dependence
are present in the chain. These cycles complicate decentralized/distributed MPC implementations since at each time
period coupled stages must estimate states and inputs of one another. To address this issue, the authors assumed that
coupled nodes receive the previously computed predictions from neighboring nodes prior to each update, and rely on
the remainder of these predictions as the assumed prediction at each update. To bound the discrepancy between actual
and assumed predictions, a move suppression term is included in the objective function. Thus, with the decentralized
scheme, an MPC controller is designed for each node, which updates its policy in parallel with the other nodes
based on estimates regarding information for interconnected variables. Through simulations, they concluded that the
decentralized MPC scheme performs better than a nominal feedback control derived in Sterman [19], especially when
accurate forecasts regarding customer demand exist. However, both approaches exhibit non-zero steady-state error with
respect to unfullled demands when a step increase is applied to the customer demand rate. Furthermore, the bullwhip
effect is observed in their simulations.
Based on the model of Lin et al. [67], Lin et al. [149] presented a minimum variance control (MVC) system, where
two separate set points are posed: one for the AINV and one for the WIP level. The system is in essence an MPC
conguration where the objective function to be minimized consists of the deviations of the predicted inventory and
WIP levels from the desired set points over two (different in general) prediction horizons and the order changes over a
control horizon. An ARIMA model is used as a mechanism to forecast customer demands. The system proved superior
to other approaches such as the order-up-to-level policy, PI control and the APVIOBPCS model in maintaining proper
inventory levels without causing the bullwhip effect, whether the customer demand trend is stationary or not.
Yildirimet al. [150] studied a dynamic planning and sourcing problemwith service level constraints. Specically, the
manufacturer must decide howmuch to produce, where to produce, when to produce, howmuch inventory to carry, etc.,
in order to fulll random customer demands in each period. They formulated the problem as a multi-period stochastic
programming problem, where service level constraints appear in the formof chance constraints [151]. In order to obtain
the optimal feedback control one should be able to solve the resulting stochastic dynamic program. However, due to the
curse of dimensionality the problem is computationally intractable. Thus, in order to obtain a sub-optimal solution they
formulated the problem as a static deterministic optimization problem. They approximated the service level chance
constraints with deterministic equivalent constraints by specifying certain minimum cumulative production quantities
that depend on the service level requirements [152]. The rolling horizon procedure is applied on-line following the MPC
philosophy, i.e. by solving the resulting mathematical programming problem at each discrete-time instance, applying
only the rst decision and moving to a newstate, where the procedure is repeated. The authors compared their approach
to certain threshold subcontracting policies yielding similar results.
5. Robust control
Describinguncertainties ina stochastic frameworkis the standardpractice usedbythe operations researchcommunity.
For example, in the majority of papers reviewed so far, uncertainties concerning customer demands, machine failures
and lead times were mostly described by probability distributions and stochastic processes. However, in many practical
situations one may not be able to identify the underlying probability distributions or such a stochastic description may
3552 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
simply not exist. On the other hand, based on historical data or experience one can easily infer bounds on the magnitude
of the uncertain parameters.
Having realized this fact a long-time ago, the control engineering community has developed the necessary theoretical
and algorithmic machinery for this type of problems, the so-called robust control theory [153155]. In this framework,
uncertainties are unknown-but-bounded quantities and constraints dictated by performance specications and physical
limitations are usually hard, meaning that they must be satised for all realizations of the uncertain quantities. In the
robust control framework, models can be usually infected with two types of uncertainty; exogenous disturbances
(e.g. customer demands) and plant-model mismatch, that is, uncertainties due to modeling errors.
Ina series of papers, Blanchini andcoworkers [156158] studiedthe general dynamic production/distributionproblem
described by the following discrete-time state space model:
x(t + 1) = x(t ) + Bu(t ) + Ed(t ), (26)
where the state x(t ) is the vector of inventory levels in the distribution network nodes, the control u(t ) is the vector
of resource ows between the nodes and the exogenous disturbance d(t ) is the vector of demands. State and control
vectors must satisfy the following constraints:
x(t ) X= {x R
n
: x
min
x x
max
}, (27)
u(t ) U= {u R
m
: u
min
uu
max
}. (28)
The demands are unknown but belong to
d(t ) D= {d R

: d
min
d d
max
}. (29)
Blanchini et al. [156] studied the existence of a feedback controller K: X [0, ) and a set X
f
X such that for
all x(0) X
f
and for all d(t ) D we have u(t ) = K(x(t ), t ) U and x(t ) X for all times. In words, they studied
the problem of keeping inventory levels inside prescribed bounds for all possible demands by using ows that are
subject to hard bounds. They proved the existence of such a controller if two necessary and sufcient conditions hold,
the rst one being ED BU and the second one being x
min
o
min
x
max
o
max
where o
min
= min
dD
E
i
d and
o
max
=max
dD
E
i
d. The rst condition involves the controlled admissible ow which must dominate the uncontrolled
ow. The second involves the inventory capacities which must be large enough to be able to fulll current customer
demands. It should be mentioned that the authors did not consider any cost function, that is, they did not seek for
an optimal controller. They only examined necessary and sufcient conditions for stabilizability. Their procedure is
largely motivated by the work of Bertsekas and Rhodes [159] and Bertsekas [160] target tube reachability.
For the same problem, Blanchini et al. [157] studied the problem of driving the system to the least worst-case
inventory level. Furthermore, they derived a way of calculating the least inventory levels. The least inventory level is
actually the steady state to which we want to steer the dynamic production/inventory system. They also provided an
upper bound for the time periods required to reach that level. It turns out that the feedback controller is a periodic-review,
order-up-to level policy. The order-up-to level is equal to the least inventory level. Finally, they provided a method
based on linear programming for computing on-line the optimal control strategy. Blanchini et al. [158] extended the
work of Blanchini et al. [157] by considering lead times in their model.
Blanchini et al. [161] investigated the same problem, this time in a continuous-time setting. They showed that
existence of an admissible feedback controller is guaranteed as long as the rst of the two sufcient and necessary
conditions of the discrete-time counterpart holds. They also investigated the set point tracking problem, i.e. nding an
admissible feedback controller that steers the inventory levels to target values. They showed that such a controller exists
and is discontinuous in the sense that at any time, any controlled process is required to work either to its maximal or to its
minimal intensity. This bangbang controller has some noteworthy properties, such as decentralization and robustness
against failures. Blanchini et al. [162] extended the work of Blanchini et al. [161] by taking into consideration setup
times.
Blanchini et al. [163] considered the problem of optimally controlling the system (26)(29) with respect to some
nite-horizon integral cost criterion. The cost function they considered depends only on the inventory levels at the
nodes. This is a minmax problem where the control goal is to minimize the worst-case cost over all admissible values
of customer demands. They showed that the optimal cost of a suitable auxiliary problemwith no uncertainties is always
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3553
an upper bound for the original problem. Finally they provided a numerical method for the implementation of the
guaranteed cost control.
Boukas et al. [164] considered a FMS where machines are subject to failure and demand is unknown-but-bounded.
They cast the system in the framework of Eqs. (18) and (20), but this time demand is not a constant vector. They
considered the objective function that is the supremum of the expected discounted cost over all demand realizations,
a problem which is closely related to H

control. In terms of dynamic game theory, H

optimal control deals


with zero-sum games where the controller can be considered as the minimizing player and the disturbance as the
maximizing player. The controller resulting from this minimax approach is certainly more conservative in terms of
performance. Boukas et al. [164] derived the HamiltonJacobiIsaacs equations for this problem. They proved that the
value function is convex, hence locally Lipschitz and almost everywhere differentiable. Furthermore, they provided a
verication theorem which gives sufcient conditions that an optimal feedback controller must satisfy. As an example,
they considered the problem of a single machine producing a single part and they derived the optimal controller in
closed form. However, for complex problems derivation of closed form solutions is almost impossible.
Boukas et al. [165] investigated a continuous-time productioninventory problem with deteriorating items, similar
to that of Boukas and Liu [118]. The deterioration rate of the products depends on the demand rate which in turn is
a function of a continuous-time Markov chain. They cast the model under the framework of systems with Markovian
jumps. They considered the problem of minimizing a nite-horizon quadratic production and inventory/shortage cost.
In order to solve the optimal control problem, they derived the HJB equation and solved for the optimal feedback
law as a function of the partial derivative (with respect to the continuous state) of the value function, assuming that
the value function is continuously differentiable. Then, they substituted back into the HJB and obtained a rst order
partial differential equation for the value function. Under the standard procedure of guessing an expression for the
value function, they derived a set of coupled Ricatti equations. They also dealt with the innite-horizon problem. In
that case, in order to guarantee existence of the optimal solution more stringent conditions are required, i.e. stochastic
stabilizability and stochastic detectability. In their approach, except from the stochasticity due to uncertain demands,
they addressed model uncertainty, that is, uncertainty corresponding to modeling errors. They showed how to design a
controller which guarantees stochastic quadratic stability for the closed-loop systemand achieves a guaranteed adequate
level of performance at the same time. They presented a tracking problem with one failure prone machine producing
one deteriorating item. By solving the coupled Ricatti equations, they derived the explicit piecewise linear feedback
controller. Through simulations they showed that the tracking error converges to a neighborhood of the origin. However,
in their example they did not consider uncertainty due to plant-model mismatch.
Boukas et al. [166] studied an inventoryproduction system with uncertain processing time and delay in control.
Demandrate is composedof a constant termplus anunknowntime-varyingcomponent withnite energy: d(t )=

d+w(t )
where
_

0
w
2
(t) dt <. The time delay uncertainty was assumed to lie in a measurable domain ( t and m known):
I = {t(t ): 0t(t ) t, t(t )m<1}.
The state equations, in the case of multiple machines producing multiple products are
x(t ) = Ax(t ) + B
0
u(t t(t )) D + B
1
w(t ), x(t ) = 0, t 0, (30)
where x(t ) is the number of different products in the stock level, u(t ) is the vector of the production rates at time t, D
is the vector of the constant demand rates, w(t ) is the vector of disturbances of the demand rate at time t and A, B
0
, B
1
are constant matrices. They also impose the following constraint u(t )
2
ax(t )
2
, t for the input.
Their main goal was to render the closed-loop systemasymptotically stable and satisfy an H

performance criterion.
In order to achieve this goal, they designed a memoryless linear state feedback controller based on sufcient Ricatti-like
conditions. They used Schur complements to derive sufcient linear matrix inequality (LMI) conditions [167] for the
satisfaction of input constraints for the feedback controller. LMIs are very popular in robust control, since numerous
stability conditions can be stated as LMIs and very efcient algorithms exist for their solution [168].
They also extended their results to the problemof robust H

control where the uncertain systemunder consideration


is modeled as follows:
x(t ) = [A +A(t )]x(t ) + Bv(t t(t )) + B
1
w(t ), x(t ) = 0, t 0,
z(t ) = Cx(t ) + Gv(t t(t )), (31)
3554 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
where the matrix A(t ) is real time varying representing norm-bounded parameter uncertainty. In both cases the
parameters of the controller are computed via LMI techniques.
Boukas et al. [169] modeled the inventory problem with deteriorating items as a switched linear system [170]. The
switching variable is the inventory level. Specically, the systems dynamics are described by the following piecewise
afne dynamics:
x(t ) = j
i
x(t ) + u(t ) d(t ), i = 1, 2, (32)
where j
1
= j (the deteriorating rate) if x(t ) >0 and j
2
= 0 if x(t ) <0 (backorders).
They considered the problem of rendering the system quadratically stable while keeping the inventory level close to
zero when there are uctuations in the demand, using H

control theory. The demand rate is modeled as in Boukas


et al. [166]. Therefore, their goal is to minimize the L
2
-induced norm from w(t ) to x(t ). A piecewise afne state
feedback controller was designed based on the solution of a sufcient Lyapunov-like LMI condition.
Laumanns and Lefeber [171] modeled supply networks as directed connected graphs. They considered information
ows andmaterial ows owingthroughthe arcs of the graphwhile the vertices represent the facilities of the supplychain
network. Transportation delays were modeled by adding auxiliary nodes to the graph. It was shown that the dynamics
for the entire system can be written as a linear state space model with the customer demands as external unknown-but-
bounded disturbances. Eventually, a constrained robust optimal control problem was formulated, where the worst-case
cost is minimized. This problem has been recently solved analytically through multi-parametric programming [172].
The optimal state feedback law was shown to be piecewise-afne. Laumanns and Lefeber [171] used this formulation
to the classical beer distribution game and realized that the optimal control law is the well known order-up-to policy.
6. Approximate dynamic programming
As we have already noted, dynamic programming is a very elegant framework for analyzing optimal control problems
relating to production/inventory/distribution systems. However, it is mostly used in a theoretical level to prove existence
of solutions and characterize the optimal policy. In order to utilize dynamic programming as a practical tool for dynamic
decision making in supply chain management a way of combating the curse of dimensionality needs to be found. This
is the main goal of approximate dynamic programming techniques. Starting from the eld of articial intelligence
[173], a cornucopia of algorithms based on dynamic programming, simulation and some form of approximation has
been suggested in the literature for the solution of large scale discrete-time stochastic optimal control problems.
These methods usually combine classical dynamic programming algorithms like value and policy iteration with an
approximation architecture for the value function (critic methods) or the optimal policy (actor methods). The data
needed for training the models are provided through simulations. Early encouraging results, the most noticeable being
Tesauros backgammon player [174], have drawn the attention of researchers from operations research and control
theory. The book of Bertsekas and Tsitsiklis [175] describes thoroughly such algorithms and analyzes their performance
both theoretically and through simulations. However, a lot of experimentation is required with regard to the selection
of the approximation architecture and the tuning parameters of the method. In fact, no algorithm has been proven to
be of general use. Thus, the selection of the appropriate algorithm, approximation architecture and tuning parameters
is problem specic and requires a lot of experimentation.
In this section we will present some selected applications of approximate dynamic programming to supply chain
management problems. Van Roy et al. [176] studied a multi-retailer/inventory system with random demands and
transport delays. Their goal was to minimize the discounted innite-horizon storage, shortage and transportation costs.
If a customer order cannot be fully satised by the retailer, he has the option of requesting a special delivery, that is, a
direct delivery from the warehouse to the customer. The problem was formulated as a dynamic programming problem
and three test cases were examined: a systemwith one warehouse and one-retailer having three state variables, a system
with one warehouse and 10 stores having 33 state variables, and a problem with the same number of warehouses and
stores having 46 state variables. The increased number of state variables is due to longer transportation delays. In all the
case studies, the control vector consisted of two order-up-to levels: one for the warehouse and one for the stores. Thus
the policy resulting from their formulation was an order-up-to with a threshold value that depends on the current
state. The authors found that on-line temporal difference learning with feature-based linear approximation of the value
function and active exploration performs better than approximate policy iteration and temporal-difference learning
coupled with multi-layer perceptrons. The policy resulting from the proposed neuro-dynamic programming scheme
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3555
cuts off costs about 10% relative to an optimized stationary order-up-to policy. However, the selection of features,
that is, states that capture the necessary information for the system dynamics are quite judicious and require a lot of
experimentation.
Patrinos and Sarimveis [177] developed an optimistic variant of policy iteration coupled with least-squares policy
evaluation [178]. They used a radial basis function network as an approximation of the value function. The centers of
the gaussian basis functions were selected so as to span the entire state space. Once this is accomplished, the network
can combine good approximation properties through the entire state space while retaining its simple linear structure
with respect to the weight parameters. Through simulations, it was shown that the resulting policy performs better than
an optimized order-up-to policy.
Powell and coworkers [179181] developed an interesting approximate dynamic programming framework for large
scale dynamic nite-horizon resource allocation problems, such as supply chain management. Their methods combine
dynamic programming, mathematical programming, simulation and stochastic approximation [182]. They usually
employ linear or separable concave approximations of the value function [183]. Another main characteristic is the
reformulation of the dynamic programming equation around the post-decision state. Topaloglu and Powell [181]
presented a comparison of their approximate dynamic programming methods with the rolling horizon approach (or
MPC). Through simulations they showed that their methods combined with separable concave approximations of the
value function perform better than rolling horizon. For more information regarding this approach we refer the reader
to the forthcoming book of Powell [184].
Bauso and coworkers [185,186] considered a multi-retailer inventory system where each retailer shares a limited
amount of information with other retailers so as to coordinate their decisions and share set-up costs. Thus, their main
goal was to design a consensus protocol [187]. Each retailer chooses a threshold policy with the threshold being the
number of active retailers. That is, a retailer decides to order only if at least a certain number of retailers are willing
to do the same. In order to compute locally the threshold level for each retailer depending on the inventory level and
the expected demand, they developed a distributed neuro-dynamic programming algorithm. The proposed algorithm
is a variant of approximate policy iteration with linear function approximation, where the policy evaluation step is
performed using quasi-Monte Carlo simulations and temporal difference learning. They also used active exploration
at the initial iterations of the algorithm so as to explore the state space sufciently and to avoid getting stuck in local
minima. The resulting feedback law is of the form:
j(I
k
i
) =
_
S
k
i
x
k
i
if a
k

k
i
,
0 if a
k
<
k
i
,
(33)
where x
k
i
is the inventory level of retailer i, a
k
is the transmitted information through the consensus protocol regarding
active retailers,
k
i
{1, . . . , n} is the threshold of active retailers and S
k
i
is the order-up-to level for retailer i at time k.
They considered an example with three retailers facing a randomdemand that follows a Poisson distribution. Through
simulations, they show that the algorithm converges to a Nash equilibrium in six iterations.
7. Conclusions
The aim of this review paper was to present alternative control philosophies that have been applied to the dynamic
supply chain management problem. Representative references were provided that can guide the reader to explore
in depth the methodologies of his/her choice. The efforts started in the early 1950s by applying classical control
techniques where the analysis was performed in the frequency domain. More recently, highly sophisticated optimal
control methods have been proposed mainly based on the time domain. However, many recent reports state that the
majority of companies worldwide still suffer from poor supply chain management. Moreover, undesired phenomena,
such as the bullwhip effect have not yet been remedied. The applicability of control methodologies in real life supply
chain problems is thus, naturally questioned.
It is true that in many methodologies that have been presented in this paper, the assumptions on which they are based
are often not valid in reality. For example, lead times are not xed and are not known with accuracy, as many models
assume. Inventory levels should be bounded below by zero and above due to warehouse capacities, but these bounds
are not always taken into account. The same happens with the production rates which are limited by the machinery
capacities. Another limitation is that single stage systems are usually studied, assuming production of a single product or
3556 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
aggregated production. In real life systems, various products are produced with different production rates and different
lead times, which, however, share common machinery and storage facilities. Horizontal integration is often represented
by considering the supply chain stages in a raw, while interconnections between different level and same level stages
are ignored. Finally, rawmaterial costs which may be variable, labor costs and inventory costs are rarely taken explicitly
into account.
Fromthe above discussion, it is evident that despite the considerable advances that have occurred throughout the years
in controlling supply chain systems, there is still plenty of room for further improvements. Elimination of the above
limitations will lead to new methodologies of more applicability. Therefore, dynamic control of supply chain systems
remains an open and active research area. Among the alternative methodologies that have been presented in this review
paper, we would like to draw the attention of the reader to the MPC framework which has become extremely popular
in the engineering community, as it proved successful in facing problems similar to the ones mentioned above. Among
other advantages, the MPC framework can easily incorporate bounds on the manipulated and controlled variables and
leads to the formulation of computationally tractable optimization problems.
References
[1] Beamon BM. Supply chain design and analysis: models and methods. International Journal of Production Economics 1998;55:28194.
[2] Lee HL, Padmanabhan V, Whang S. The bullwhip effect in supply chains. Sloan Management Review 1997;38(3):93102.
[3] Miragliotta G. Layers and mechanisms: a new taxonomy for the bullwhip effect. International Journal of Production Economics 2006;
104(2):36581.
[4] Geary S, Disney SM, Towill DR. On bullwhip in supply chainshistorical review, present practice and expected future impact. International
Journal of Production Economics 2006;101:218.
[5] Riddalls CE, Bennett S, Tipi NS. Modelling the dynamics of supply chains. International Journal of Systems Science 2000;31(8):96976.
[6] Axster S. Control theory concepts in production and inventory control. International Journal of Systems Science 1985;16(2):1619.
[7] Edghill JS, Towill DR. The use of systems dynamics in manufacturing systems. Transactions of the Institute of Measurement and Control
1989;11(4):20816.
[8] Ortega M, Lin L. Control theory applications to the productioninventory problem: a review. International Journal of Production Research
2004;42:230322.
[9] Bertsekas DP. Dynamic programming and optimal control. Belmont, MA: Athena Scientic; 2000.
[10] Bellman RE. Dynamic programming. New Jersey, NJ: Princeton University Press; 1957.
[11] Camacho EF, Bordons C. Model predictive control. London: Springer; 1999.
[12] Zhou K, Doyle JC, Glover K. Robust and optimal control. New Jersey, NJ: Prentice Hall; 1995.
[13] Bertsekas DP, Tsitsiklis JN. Neuro-dynamic programming. Belmont, MA: Athena Scientic; 1996.
[14] Simon HA. On the application of servomechanism theory in the study of production control. Econometrica 1952;20:24768.
[15] Vassian JH. Application of discrete variable servo theory to inventory control. Operations Research 1955;3:27282.
[16] Forrester JW. Industrial dynamics: a major breakthrough for decision makers. Harvard Business Review 1958;36:3766.
[17] Forrester JW. Industrial dynamics. Cambridge, MA: MIT Press; 1961.
[18] Barlas Y, Yasarcan H. Goal setting, evaluation, learning and revision: a dynamic modeling approach. Evaluation and Program Planning
2006;29:7987.
[19] Sterman JD. Business dynamics: systems thinking and modeling for a complex world. New York, NY: McGraw-Hill; 2000.
[20] Ansof HI, Slevin DP. An appreciation of industrial dynamics. Management Science 1968;14:91106.
[21] Towill DR. Dynamic analysis of an inventory and order based production control system. International Journal of Production Research
1982;20:67187.
[22] Coyle RG. Management system dynamics. London: Wiley; 1977.
[23] Lalwani CS, Disney SM, Towill DR. Controllable, observable and stable state space representations of a generalized order-up-to policy.
International Journal of Production Dynamics 2006;101:17284.
[24] Wikner J, Naim MM, Towill DR. The system simplication approach in understanding the dynamic behaviour of a manufacturing supply
chain. Journal of Systems Engineering 1992;2:16778.
[25] Wikner J. Continuous-time dynamic modeling of variable lead times. International Journal of Production Research 2003;41:278798.
[26] Burbidge JL. Automated production control with a simulation capability. In: Proceedings of IFIP conference WG 5-7, Copenhagen, 1984.
p. 114.
[27] Towill DR. Industrial dynamics modeling of supply chains. Logistics Information Management 1996;9:4356.
[28] Towill DR, McCullen P. The impact of agile manufacturing programme on supply chain dynamics. International Journal of Logistics
Management 1999;10(1):8396.
[29] Disney SM, Towill DR. On the bullwhip and inventory variance produced by an ordering policy. Omega 2003;31:15767.
[30] Chen F, Drezner Z, Ryan JK, Simchi-Levi D. Quantifying the bullwhip effect in a simple supply chain: the impact of forecasting, lead times,
and information. Management Science 2000;46:43643.
[31] Agrell PJ, Wikner J. An MCDM framework for dynamic systems. International Journal of Production Economics 1996;45:27992.
[32] Edghill JE, Towill DR. Assessing manufacturing system performance: frequency response revisited. Engineering Costs and Production
Economics 1990;19:31926.
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3557
[33] John S, Naim MM, Towill DR. Dynamic analysis of a WIP compensated decision support system. International Journal of Manufacturing
System Design 1994;1:28397.
[34] Disney SM, NaimMM, Towill DR. Genetic algorithmoptimization of a class of inventory control systems. International Journal of Production
Economics 2000;68:25978.
[35] Deziel DP, Elion S. A linear productioninventory control rule. The Production Engineer 1967;43:93104.
[36] Riddalls CE, Bennett S. The stability of supply chains. International Journal of Production Research 2002;40:45975.
[37] Bellman R, Cooke KL. Differential-difference equations. New York, NY: Academic Press; 1963.
[38] Warburton RDH, Disney SM, Towill DR, Hodgson JPE. Further insights into the stability of supply chains. International Journal of Production
Research 2004;42:63948.
[39] Zhou L, NaimMM, Tang O, Towill DR. Dynamic performance of a hybrid inventory systemwith a Kanban policy in remanufacturing process.
Omega 2006;34:58598.
[40] Lai CL, Lee WB, Ip WH. A study of system dynamics in just-in-time logistics. Journal of Materials Processing Technology 2003;138:2659.
[41] Dejonckheere J, Disney SM, Lambrecht MR, Towill DR. Measuring and avoiding the bullwhip effect: a control theoretic approach. European
Journal of Operational Research 2003;147:56790.
[42] Lee HL, Padmanabhan V, Whang S. Information distortion in a supply chain: the bullwhip effect. Management Science 1997;43:54658.
[43] Dejonckheere J, Disney SM, Lambrecht MR, Towill DR. The impact of information enrichment on the bullwhip effect in supply chains: a
control engineering perspective. European Journal of Operational Research 2004;153:72750.
[44] Lalwani CS, Disney SM, Towill DR. Controllable, observable and state space representations of a generalized order-up-to policy. International
Journal of Production Economics 2006;101:17284.
[45] Franklin GF, Powell JD, Workman M. Digital control of dynamic systems. Menlo Park, CA: Addison-Wesley; 1998.
[46] Disney SM, Towill DR. Eliminating inventory drift in supply chains. International Journal of Production Economics 2005;9394:33144.
[47] White AS. Management of inventory using control theory. International Journal of Technology Management 1999;17:84760.
[48] Towill DR, Evans GN, Cheema P. Analysis and design of an adaptive minimum reasonable inventory control system. Production Planning &
Control 1997;8:54557.
[49] Evans GN, NaimMM, Towill DR. Application of a simulation methodology to the redesign of a logistical control system. International Journal
of Production Economics 1998;5657:15768.
[50] Dejonckheere J, Disney SM, Lambrecht MR, Towill DR. Transfer function analysis of forecasting induced bullwhip in supply chains.
International Journal of Production Economics 2002;78:13344.
[51] Grubbstrm RW, Wikner J. Inventory trigger policies developed in terms of control theory. International Journal of Production Economics
1996;45:397406.
[52] Wiendahl HP, Breithaupt JW. Automatic production control applying control theory. International Journal of Production Economics 2000;63:
3346.
[53] Grubbstrm RW. A net present value approach to safety stocks in planned production. International Journal of Production Economics
1998;5657:21329.
[54] Grubbstrm RW, Molinder A. Safety production plans in MRP-systems using transform methodology. International Journal of Production
Economics 1996;4647:297309.
[55] Lancaster K. Mathematical economics. New York, NY: Macmillan; 1968.
[56] Grubbstrm RW, Orvin P. Intertemporal generalization of the relationship between material requirements planning and inputoutput analysis.
International Journal of Production Economics 1992;26:3118.
[57] Grubbstrm RW, Wang Z. A stochastic model of multi-level/multi-stage capacity constrained productioninventory systems. International
Journal of Production Economics 2003;8182:48394.
[58] Grubbstrm RW, Huynh TTT. Multi-level, multi-stage capacity constrained productioninventory systems in discrete-time with non-zero lead
times using MRP theory. International Journal of Production Economics 2006;101:5362.
[59] Grubbstrm RW, Tang O. An overview of inputoutput analysis applied to productioninventory systems. Economic Systems Research
2000;12:325.
[60] Popplewell K, Bonney MC. The application of discrete linear control theory to the analysis and simulation of multi-product, multi-level
production control systems. International Journal of Production Research 1987;25:4556.
[61] Wikner J. Dynamic analysis of a productioninventory model. Kynernetes 2003;34:80323.
[62] Burns JF, Sivazlian BD. Dynamic analysis of multi-echelon supply systems. Computers and Industrial Engineering 1978;2:18193.
[63] Wikner J, Towill DR, Naim M. Smoothing supply chain dynamics. International Journal of Production Economics 1991;22:23148.
[64] Disney SM, Towill DR. A discrete transfer function model to determine the dynamic stability of a vendor managed inventory supply chain.
International Journal of Production Research 2002;40:179204.
[65] Perea E, Grossmann I, Ydstie E, Tahmassebi T. Dynamic modeling and classical control theory for supply chain management. Computers and
Chemical Engineering 2000;24:11439.
[66] Perea-Lpez E, Grossmann I, Ydstie E, Tahmassebi T. Dynamic modeling and decentralized control of supply chains. Industrial Engineering
Chemistry Research 2001;40:336983.
[67] Lin PH, Wong DSH, Jang SS, Shieh SS, Chu JZ. Controller design and reduction of bullwhip for a model supply chain systemusing z-transform
analysis. Journal of Process Control 2004;14:48799.
[68] Arrow KJ, Karlin S, Scarf H. Studies in the mathematical theory of inventory and production. Stanford, CA: Stanford University Press; 1958.
[69] Clark A, Scarf H. Optimal policies for a multi-echelon inventory problem. Management Science 1960;6:47590.
[70] Scarf H. The optimality of (s, S) policies for the dynamic inventory problem. In: Proceedings of the rst stanford symposium on mathematical
methods in social sciences. Stanford, CA: Stanford University Press; 1960.
3558 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
[71] Iglehart D. Optimality of (s, S) policies in the innite horizon dynamic inventory problem. Management Science 1963;9:25967.
[72] Hausman WH, Peterson R. Multiproduct production scheduling for style goods with limited capacity, forecast revisions and terminal delivery.
Management Science 1972;18:37083.
[73] Federgruen A, Zipkin P. Computational issues in an innite-horizon, multi-echelon inventory model. Operations Research 1984;32:81836.
[74] Iglehart D, Karlin S. Optimal policy for dynamic inventory process with nonstationary stochastic demands. In: Arrow K, Karlin S, Scarf H,
editors. Studies in Applied Probability and Management Science. Stanford, CA: Stanford University Press; 1962. [Chapter 8].
[75] Song JS, Zipkin P. Inventory control in a uctuating demand environment. Operations Research 1993;43:35170.
[76] Sethi SP, Cheng F. Optimality of (s, S) policies in inventory models with Markovian demand. Operations Research 1997;45:9319.
[77] Beyer D, Sethi S. Average cost optimality in inventory models with Markovian demands. Journal Optimization Theory Applications 1997;92:
497526.
[78] Bensoussan A, Liu RH, Sethi SP. Optimality of an (s, S) policy with compound poisson and diffusion demands: a QVI approach. SIAM
Journal on Control and Optimization 2006;44:165076.
[79] Dong L, Lee HL. Optimal policies and approximations for a serial multiechelon inventory system with time-correlated demand. Operations
Research 2003;51:96980.
[80] Heath DC, Jackson PL. Modeling the evolution of demand forecasts with application to safety stock analysis in production/distribution systems.
IEE Transactions 1994;26:1730.
[81] Graves SC, Kletter DB, Hetzel WB. A dynamic model for requirements planning with application to supply chain optimization. Operations
Research 1998;46:S3549.
[82] Gallego G, Ozer O. Integrating replenishment decisions with advance order information. Management Science 2001;47:134460.
[83] Gallego G, Ozer O. Optimal replenishment policies for multiechelon inventory problems under advance demand information. Manufacturing
& Service Operations Management 2003;5:15775.
[84] Ozer O, Wei W. Inventory control with limited capacity and advance demand information. Operations Research 2004;52:9881000.
[85] Sethi SP, Yan H, Zhang H. Peeling layers of an onion: a periodic review inventory model with multiple delivery modes and forecast updates.
Journal of Optimization Theory and Applications 2001;108:25381.
[86] Bensoussan A, Crouhy M, Proth JM. Mathematical theory of production planning. New York, NY: North-Holland; 1983.
[87] Sethi SP, Yan H, Zhang H. Inventory models with xed costs multiple delivery modes, and forecast updates. Operations Research 2003;51:
3218.
[88] Feng Q, Gallego G, Sethi SP, Yan H, Zhang H. Periodic-review inventory model with three consecutive delivery modes and forecast updates.
Journal of Optimization Theory and Applications 2005;124:13755.
[89] Simchi-Levi D, Zhao Y. The value of information sharing in a two-stage supply chain with production capacity constraints. Naval Research
Logistics 2003;50:888916.
[90] Olsder GJ, Suri R. Time optimal part-routing in a manufacturing system with failure prone machines. Proceedings of the IEEE Conference
on Decision and Control, vol. 1, 1980. p. 7227.
[91] Rishel R. Dynamic programming and minimum principles for systems with jump Markov disturbances. SIAM Journal on Control 1975;13:
33871.
[92] Rishel R. Control of systems with jump Markov disturbances. IEEE Transactions on Automatic Control 1975;20:2414.
[93] Davis MHA. Markov models and optimization. London: Chapman & Hall; 1993.
[94] Akella R, Kumar PR. Optimal control of production rate in a failure-prone manufacturing system. IEEE Transactions on Automatic Control
1986;31:11626.
[95] Bielecki T, Kumar PR. Optimality of zero-inventory policies fro unreliable manufacturing systems. Operations Research 1988;36:35362.
[96] Kimemia JG, Gershwin SB. An algorithm for the computer control production in exible manufacturing systems. IIE Transactions 1983;15:
35362.
[97] Sethi SP, Soner HM, Zhang Q, Jiang J. Turnpike sets and their analysis in stochastic production planning problems. Mathematics of Operations
Research 1992;17:93250.
[98] Fleming WH, Soner HM. Controlled Markov processes and viscosity solutions. New York: Springer; 1993.
[99] Presman E, Sethi SP, Zhang Q. Optimal feedback production planning in a stochastic N-machine owshop. Automatica 1995;31:132532.
[100] Presman E, Sethi SP, Suo W. Optimal feedback production planning in a stochastic N-machine owshop with limited buffers. Automatica
1997;33:1899903.
[101] Sethi SP, Zhou XY. Optimal feedback controls in deterministic dynamic two-machine owshops. Operations Research Letters 1996;19:
22535.
[102] Gharbi A, Kenne JP. Optimal production control problem in stochastic multiple-product multiple-machine manufacturing systems. IEE
Transactions 2003;35:94152.
[103] Boukas EK, Haurie A. Manufacturing owcontrol and preventive maintenance: a stochastic control approach. IEEETransactions on Automatic
Control 1990;35:102431.
[104] Kushner HJ, Dupuis PG. Numerical methods for stochastic control problems in continuous time. New York, NY: Springer; 1992.
[105] Boukas EK, Yang H. Optimal control of manufacturing ow control and preventive maintenance. IEEE Transactions on Automatic Control
1996;41:8815.
[106] Boukas EK, Kenne JP, Zhu Q. Age dependent hedging point policies in manufacturing systems. Proceedings of the American Control
Conference 1995;3:21789.
[107] Kenne JP, Gharbi A, Boukas EK. Control policy simulation based on machine age in a failure prone one-machine, one-product manufacturing
system. International Journal of Production Research 1997;35:143145.
[108] Kenne JP, Gharbi A. Experimental design in production and maintenance control of a single machine, single product manufacturing system.
International Journal of Production Research 1999;37:62137.
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3559
[109] Kenne JP, Gharbi A. Production planning problem in manufacturing systems with general failure and repair time distributions. Production
Planning and Control 2000;11:5818.
[110] Sethi SP, Zhang Q. Hierarchical production and setup scheduling in stochastic manufacturing systems. IEEE Transactions on Automatic
Control 1995;40:92430.
[111] Yan H, Zhang Q. A numerical method in optimal production and set-up scheduling of stochastic manufacturing systems. IEEE Transactions
on Automatic Control 1997;42:14525.
[112] Boukas EK, Kenne JP. Maintenance and production control of manufacturing systems with setups. Lectures in Applied Mathematics 1997;33:
5570.
[113] Liberopoulos G, Caramanis M. Numerical investigation of optimal policies for production ow control and set-up scheduling: lessons from
two-part-type failure-prone FMSs. International Journal of Production Research 1997;35:210933.
[114] Bai SX, Elhafsi M. Scheduling an unreliable manufacturing system with non-resumamble set-ups. Computers & Industrial Engineering
1997;32:90925.
[115] Gharbi A, Kenne JP, Hajji A. Operational level-based policies in production rate control of unreliable manufacturing systems with set-ups.
International Journal of Production Research 2006;44:54567.
[116] Feng Y, Yan H. Optimal production control in a discrete manufacturing system with unreliable machines and random demands. IEEE
Transactions on Automatic Control 2000;35:228096.
[117] Song DP, Sun YX. Optimal service control of a serial production line with unreliable workstations and random demand. Automatica
1998;34:104760.
[118] Boukas EK, Liu ZK. Manufacturing systems with random breakdowns and deteriorating items. Automatica 2001;37:4018.
[119] Sharifnia A. Production control of manufacturing system with multiple machine state. IEEE Transactions on Automatic Control 1988;33:
6205.
[120] Liberopoulos G, Hu JQ. On the ordering of optimal hedging points in a class of manufacturing ow control models. IEEE Transactions on
Automatic Control 1995;40:2826.
[121] Perkins JR, Srikant R. Hedging policies for failure-prone manufacturing systems: optimality of JIT and bounds on buffer levels. IEEE
Transactions on Automatic Control 1998;43:9537.
[122] Sethi SP, Suo W, Taksar MI, Zhang Q. Optimal production planning in a stochastic manufacturing system with long-run average cost. Journal
of Optimization Theory and Applications 1997;92:16188.
[123] Sethi SP, Suo W, Taksar M, Yan H. Optimal production planning in a multi product stochastic manufacturing system with long-run average
cost. Journal of Discrete Event Dynamic Systems: Theory and Applications 1998;8:3754.
[124] Sethi SP, Zhang H. Average-cost optimal policies for an unreliable exible multiproduct machine. International Journal of Flexible
Manufacturing Systems 1999;11:14757.
[125] Lehoczky J, Sethi S, Soner HM, Taskar M. An asymptotic analysis of hierarchical control of manufacturing systems under uncertainty.
Mathematics of Operations Research 1991;16:596608.
[126] Kenne JP, Boukas EK. Hierarchical control of production and maintenance rates in manufacturing systems. Journal of Quality in Maintenance
Engineering 2003;9:6682.
[127] Sethi SP, Zhang Q. Hierarchical decision making in stochastic manufacturing systems. Boston, Cambridge, MA: Birkhauser; 1994.
[128] Sethi SP, Yan H, Zhang H, Zhang Q. Optimal and hierarchical controls in dynamic stochastic manufacturing systems: a survey. Manufacturing
& Service Operations Management 2002;4:13370.
[129] Samaratunga C, Sethi SP, Zhou XY. Computational evaluation of hierarchical production control policies for stochastic manufacturing systems.
Operations Research 1997;45:25874.
[130] Keerthi SS, Gilbert EG. Optimal, innite-horizon feedback laws for a general class of constrained discrete-time systems: Stability and
moving-horizon approximations. Journal of Optimization Theory and Application 1998;57:26593.
[131] Morari M, Lee JH. Model predictive control: past, present, and future. Computers and Chemical Engineering 1999;23:66782.
[132] Mayne DQ, Rawlings JB, Rao CV, Scokaert POM. Constrained model predictive control: stability and optimality. Automatica 2000;36:
789814.
[133] Modigliani F, Hohn FE. Production planning over time and the nature of the expectation and planning horizon. Econometrica 1955;23:4666.
[134] Charnes A, Cooper WW, Mellon B. A model for optimizing production by reference to cost surrogates. Econometrica 1955;23:30723.
[135] Johnson SM. Sequential production planning over time at minimum cost. Management Science 1957;3:4357.
[136] Wagner HM, Whitin TM. Dynamic version of the economic lot size model. Management Science 1958;5:8996.
[137] Sethi SP, Sorger G. A theory of rolling horizon decision making. Annals of Operations Research 1991;29:387416.
[138] Chand S, Hsu VN, Sethi S. Forecast, solution, and rolling horizons in operations management problems: a classied bibliography.
Manufacturing and Service Operations Management 2002;4:2543.
[139] Kapsiotis G, Tzafestas S. Decision making for inventory/production planning using model-based predictive control. In: Tzafestas S, Borne P,
Grandinetti L, editors. Parallel and distributed computing in engineering systems. Amsterdam: Elsevier; 1992. p. 5516.
[140] Tzafestas S, Kapsiotis G, Kyriannakis E. Model-based predictive control for generalized production planning problems. Computers in Industry
1997;34:20110.
[141] Perea Lopez E, Ydstie BE, GrossmannI. Amodel predictive control strategyfor supplychainmanagement. Computers &Chemical Engineering
2003;27:120118.
[142] Seferlis P, Giannelos NF. A two-layered optimization-based control strategy for multi-echelon supply chain networks. Computers & Chemical
Engineering 2004;28:799809.
[143] Braun MW, Rivera DE, Flores ME, Carlyle WM, Kempf KG. A model predictive control framework for robust management of multi-product,
multi-echelon demand networks. Annual Reviews in Control 2003;27:22945.
3560 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561
[144] Braun MW, Rivera DE, Carlyle WM, Kempf KG. Application of model predictive control to robust management of multiechelon demand
networks in semiconductor manufacturing. Simulation 2003;79:13956.
[145] Wang W, Rivera DE, Kempf KG, Smith KD. Amodel predictive control strategy for supply chain management in semiconductor manufacturing
under uncertainty. In: Proceedings of the American Control Conference. 2004. p. 457782.
[146] Wang W, Rivera DE, Kempf KG. A novel model predictive control algorithm for supply chain management in semiconductor manufacturing.
Proceedings of the American control conference, vol. 1, 2005. p. 20813.
[147] Dunbar WB, Desa S. Distributed model predictive control for dynamic supply chain management. In: Proceedings of the international workshop
on assessment and future directions of NMPC, Freudenstadt-Lauterbad, Germany, August, 2005.
[148] Dunbar WB, Murray RM. Distributed receding horizon control with application to multi-vehicle formation stabilization. Automatica
2006;42:54958.
[149] Lin PH, Jang SS, Wong DSH. Predictive control of a decentralized supply chain unit. Industrial Engineering Chemistry Research 2005;44:
91208.
[150] Yildirim I, Tan B, Karaesmen F. A multiperiod stochastic production planning and sourcing problem with service level constraints. OR
Spektrum 2005;27:47189.
[151] Birge JR, Louveaux F. Introduction to stochastic programming. Springer Series in Operations Research. New York: Springer; 1997.
[152] Bitran GR, Haas EA, Matsuo H. Production planning of style goods with high setup costs and forecast revisions. Operations Research
1986;34:22636.
[153] Zhou K, Doyle J, Glover K. Robust and optimal control. Upper Saddle River, NJ: Prentice-Hall; 1995.
[154] Basar T, Bernhard P. H-innity optimal control and related minimax design problems: a dynamic game approach. Boston, MA: Birkhuser;
1995.
[155] Dullerud GE, Paganini F. A course in robust control theory: a convex approach. New York, NY: Springer; 2000.
[156] Blanchini F, Rinaldi F, Ukovich W. Anetwork design problemfor a distribution systemwith uncertain demands. SIAMJournal on Optimization
1997;7:56078.
[157] Blanchini F, Rinaldi F, Ukovich W. Least inventory control of multistorage systems with non-stochastic unknown inputs. IEEE Transactions
on Robotics and Automation 1997;13:63345.
[158] Blanchini F, Pesenti R, Rinaldi F, Ukovich W. Feedback control of production-distribution systems with unknown demand and delays. IEEE
Transactions on Robotics and Automation 2000;16:3137.
[159] Bertsekas DP, Rhodes IB. On the minimax reachability of target sets and target tubes. Automatica 1971;7:23347.
[160] Bertsekas DP. Innite-time reachability of state-space regions by using feedback control. IEEE Transactions on Automatic Control 1972;17:
60413.
[161] Blanchini F, Miani S, Ukovich W. Control of production-distribution systems with unknown inputs and system failures. IEEE Transactions
on Automatic Control 2000;45:107281.
[162] Blanchini F, Miani S, Pesenti R, Rinaldi F. Stabilization of multi-inventory systems with uncertain demand and setups. IEEE Transactions on
Robotics and Automation 2003;19:10316.
[163] Blanchini F, Miani S, Rinaldi F. Guaranteed cost control for multi-inventory systems with uncertain demand. Automatica 2004;40:21323.
[164] Boukas EK, Yang H, Zhang Q. Minimax production planning in failure-prone manufacturing systems. Journal of Optimization Theory and
Applications 1995;82:26986.
[165] Boukas KE, Shi P, Andijani A. Robust inventory-production control problem with stochastic demand. Optimal Control Applications and
Methods 1999;20:120.
[166] Boukas EK, Shi P, Agarwal RK. An application of robust control technique to manufacturing systems with uncertain processing time. Optimal
Control Applications and Methods 2000;21:25768.
[167] Boyd S, Ghaoui LE, Feron E, Balakrishnan V. Linear matrix inequalities in system and control theory. Philadelphia, PA: SIAM; 1994.
[168] Nesterov Y, Nemirovski A. Interior point polynomial time algorithms. Philadelphia, PA: SIAM; 1994.
[169] Boukas EK, Rodrigues L. In: Boukas EK, Malham RP, editors. Inventory control of switched production systems: LMI approach, analysis,
control and optimization of complex dynamic systems. Dordrecht, London: Kluwer Academic Publisher; May 2005.
[170] Liberzon D, Morse AS. Basic problems in stability and design of switched systems. Control Systems Magazine 1999;19(5):5970.
[171] Laumanns M, Lefeber E. Robust optimal control of material ows in demand-driven supply networks. Physica A 2006;363:2431.
[172] Bemporad A, Borrelli F, Morari M. Minmax control of constrained uncertain discrete-time linear systems. IEEE Transactions on Automatic
Control 2003;48:16006.
[173] Sutton RS, Barto AG. Reinforcement learning. Cambridge, MA: MIT Press; 1998.
[174] Tesauro GJ, Gammon TD. A self-teaching backgammon program, achieves master-level play. Neural Computation 1998;6:2159.
[175] Bertsekas DP, Tsitsiklis JN. Neuro-dynamic programming. Belmont, MA: Athena Scientic; 1996.
[176] Van Roy B, Bertsekas DP, Lee Y, Tsitsiklis JN. A neuro-dynamic programming approach to retailer inventory management. Technical report,
Laboratory for Information and Decision Systems. Cambridge, MA: Massachusetts Institute of Technology; 1998.
[177] Patrinos P, Sarimveis H. An RBF based neuro-dynamic approach for the control of stochastic dynamic systems. In: Proceedings of the 16th
IFAC world congress, Prague, Czech Republic, 2005.
[178] Nedic A, Bertsekas DP. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems:
Theory and Applications 2003;13:79110.
[179] Powell WB, Van Roy B. Approximate dynamic programming for high dimensional resource allocation problems. In: Si J, Barto A, Powell
WB, Wunsch D, editors. Learning and approximate dynamic programming: scaling up to the real world. New York: Wiley; 2004.
[180] Powell WB, George A, Bouzaiene-Ayari B, Simao H. Approximate dynamic programming for high dimensional resource allocation problems.
In: Proceedings of the IJCNN, Montreal, August 2005.
H. Sarimveis et al. / Computers & Operations Research 35 (2008) 35303561 3561
[181] Topaloglu H, Powell WB. Dynamic programming approximations for stochastic, time-staged integer multicommodity ow problems. Informs
Journal on Computing 2006;18:3142.
[182] Kushner HJ, Yin GG. Stochastic approximation algorithms and applications. New York: Springer; 1997.
[183] Powell WB, Ruszczynski A, Topaloglu H. Learning algorithms for separable approximations of stochastic optimization problems. Mathematics
of Operations Research 2004;29:81436.
[184] Powell WB. Approximate dynamic programming for operations research. Available for download at http://www.castlelab.princeton.edu/
Papers.html; 2006.
[185] Bauso D, Giarre L, Pesenti R. Neurodynamic programming for cooperative inventory control. In: 2004 American control conference, June
30July 2, Boston, MA, USA, 2004. p. 552732.
[186] Bauso D, Giarre L, Pesenti R. Cooperative inventory control. In: Menini L, Zaccarian L, Abdallah CT, editors. Current trends in nonlinear
systems and control. Basel: Birkhauser; 2005.
[187] Olfati Saber R, Murray RM. Consensus protocols for networks of dynamic agents. In: Proceedings of American control conference, vol. 2,
Denver, Colorado, 2003. p. 9516.

Potrebbero piacerti anche