Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
This article presents two hybrid robust process optimization approaches integrating
artificial neural networks (ANN) and stochastic optimization formalisms ᎏ genetic al-
gorithms (GAs) and simultaneous perturbation stochastic approximation (SPSA). An
ANN-based process model was de®eloped solely from process input ᎐ output data and
then its input space comprising design and operating ®ariables was optimized by employ-
ing either the GA or the SPSA methodology. These methods possess certain ad®antages
o®er widely used deterministic gradient-based techniques. The efficacy of ANN-GA and
ANN-SPSA formalisms in the presence of noise-free as well as noisy process data was
demonstrated for a representati®e system in®ol®ing a nonisothermal CSTR. The case
study considered a nontri®ial optimization objecti®e, which, in addition to the con®en-
tional parameter design, also addresses the issue of optimal tolerance design. Compari-
son of the results with those from a robust deterministic modelingroptimization strategy
suggests that the hybrid methodologies can be gainfully employed for process optimiza-
tion.
Introduction
Conventionally, chemical plant design consists of choosing ture of chemical processes, it leads to complex nonlinear
and sizing appropriate process equipment, as well as fixing models, which in most cases are not amenable to analytical
the nominal operating points. In this endeavor, deterministic solutions; thus, computationally intensive numerical methods
gradient-based optimization techniques that mostly use must be utilized for obtaining solutions. The difficulties asso-
steady-state process models are utilized. Here, the objective ciated with the construction and solution of the phenomeno-
function to be optimized is a suitably chosen cost function Žto logical models necessitate exploration of alternative modeling
be minimized. or a profit function Žto be maximized.. Tradi- formalisms. Process identification via empirical models is one
tionally, issues such as the choice and design of the process such alternative. They are mostly discrete-time dynamic mod-
control system are addressed once the nominal operating els comprising, for instance, Hammerstein and Wiener mod-
point is known consequent to the process design activity. els, Volterra models and polynomial autoregressive moving-
Availability of a process model assumes considerable im- average models with exogenous inputs ŽARMAX. ŽHenson,
portance in the process design activity. For a given process, a 1998.. These linear models and their nonlinear counterparts
‘‘first principles Žphenomenological.’’ model can be con- are constructed exclusively from the input᎐output process
structed from the knowledge of mass, momentum, and en- data. A fundamental deficiency of the empirical modeling ap-
ergy balances, as well as from other chemical engineering proach is that the model structure Žform. must be specified a
principles. Owing to the lack of a good understanding of the priori. Satisfying this requirement, especially for nonlinearly
underlying physicochemical phenomena, development of behaving processes is a cumbersome task, since it involves
phenomenological process models poses considerable diffi- selecting heuristically an appropriate model structure from
culties. Moreover, nonlinear behavior being a common fea- numerous alternatives.
In recent years, artificial neural networks ŽANNs. have
been found to be an attractive tool for steady-staterdynamic
process modeling, and model-based control in situations
Correspondence concerning this article should be addressed to S. S. Tambe. where the development of phenomenological or the empiri-
Present address of S. Ghosh: Chemical Engineering Department, Indian Institute
of Technology ŽIIT ., Kharagpur, West Bengal 721 302, India. cal models just given either becomes impractical or cumber-
the decision variables becomes zero. Thus, gradient computa- process parametersrvariables, such as flow rate, temperature,
tion is an integral feature of such optimization paradigms. and pressure.
Additionally, most gradient-based techniques require the ob- 䢇 Model-inherent uncertainty. Accounts for variations in the
jective function to be smooth, continuous, and differentiable. phenomenological model parameters representing, for in-
In the case of an ANN, it is possible to express the nonlin- stance, kinetic constants, heat-rmass-transfer coefficients,
ear mapping that it executes in terms of a generic closed-form and physical properties.
function. It can be noted that the nonlinear mapping ability 䢇 External uncertainty. Considers variations in parameters
of ANNs is due to the nonlinear activation function used for that are external to the process, but influencing the process
computing the node-specific outputs. For computing an out- cost Ž feed stream a va ilab ility, p rod u ct d em a n d ,
put, the nonlinear activation function makes use of the argu- pollutionreconomic indices, etc...
ments comprising a number of network parameters Žweights. 䢇 Discrete uncertainty. Accounts for the equipment avail-
and node-specific inputs. Consequently, the mapping exe- ability and other random discrete events.
cuted by an ANN attains a complex nonlinear character that The conventional deterministic process optimization ap-
cannot be guaranteed to simultaneously fulfill the smooth- proach ignores uncertainties, thereby resulting in suboptimal
ness, continuity, and differentiability criteria for the objective solutions. Uncertainties are capable of influencing, for in-
function. This feature of ANN models poses difficulties in stance, the product quality and control cost and, therefore,
using the conventional deterministic techniques for optimiz- they need to be considered during process design and opti-
ing their input space. Hence, formalisms that do not impose mization activity. Accounting for uncertainties leads to toler-
stringent conditions on the form of the objective function ance design, which aims at obtaining the optimal size of the
need to be explored. The stochastic optimization formalisms, window for each uncertainty-affected process variabler
namely, genetic algorithms ŽGAs. and simultaneous pertur- parameter. The best average process performance can be
bation stochastic approximation ŽSPSA., among others, are achieved consequent to optimal tolerance design so long as
not heavily constrained by the properties of the objective the process operates within the optimized operating zones.
function, and thus they are potential candidates for employ- In a recent article by Bernardo and Saraiva Ž1998., the au-
ment in the optimization of an ANN model. An important thors have introduced a novel robust optimization ŽRO.
characteristic of the GA and SPSA methodologies is that they framework that deals with the optimization objective alluded
need measurements of the objective function only, and not to earlier. An advantage of the optimal solution given by the
the measurements Žor direct calculation . of the gradient Žor RO framework is that it provides the best operating regions
higher-order derivatives. of the objective function ŽSpall, for designing a control system. One of the optimization prob-
N ob s
subject to,
Ý f Ž ⌿, ⌽j , W .
js 1
ŽI. f Ž ⌿, ⌽ , W . s y for all ˆ;
⌽g⌽ ys . Ž 13.
Nobs
T
ˆ s ⌽ :⌽ g J Ž ⌽ . 4
⌽ s w 1 , 2 , . . . , N x ; ⌽ Ž9.
For computing the estimate of the operating cost, E Ž C o p .,
Ž II . ˆy s y : y g L Ž y . 4 . Ž 10. the following expression is used:
RMSEs ) Ý 2 El
ls1
Npat = No
, Ž 16.
sures how well the network is generalizing. Upon training the
network over a large number of iterations, the weight matrix
resulting in the smallest Et st magnitude for the test set data,
is taken to be an optimal weight set. It may, however, be
noted that this weight set pertains to the specific number of
where l refers to the input pattern index Ž l s1, 2, . . . , Npat .; hidden units Ž NH . considered in the network architecture.
No denotes the number of output layer nodes, and El is a For a given ANN-based modeling problem, the number of
measure of the sum-of-squares error ŽSSE., defined as nodes in the network’s input layer Ž NI . and output layer Ž NO .
are dictated by the input᎐output dimensionality of the sys-
1 No
2
tem being modeled. However, the number of hidden units
El s
2
Ý Ž yli ᎐ oli . , Ž 17. Ž NH . is an adjustable structural parameter. If the network
is1 architecture contains more hidden units than necessary, they
lead to an oversized network and, consequently, an overpa-
where, y li denotes the desired output of the ith output node rameterized network model. Such a model, like an over-
when the lth input pattern is presented to the network, and trained one, gives poor representation of the trends in the
oli refers to the corresponding desired output. The task of example data. For excluding the possibility of an oversized
RMSE minimization is accomplished by ‘‘training’’ the net- network, it becomes essential to study the effect of the num-
work wherein a gradient descent technique, such as the gen- ber of hidden units on the network’s function approximation
eralized delta rule ŽGDR. ŽRumelhart et al., 1986., is utilized and generalization capabilities. Accordingly, multiple net-
for the updation of connection weights. work training simulations are conducted by systematically
Network training is an iterative procedure that begins with varying the number of hidden units. Theses simulations es-
initializing the weight matrix randomly. A training iteration sentially aim at obtaining an optimal network architecture
tion methods requiring only the scalar values of the objective in the offspring population by randomly flipping bits of the
kJrmin, CA0 s w3,000᎐4,000x molrm3, CB0 s w30᎐600x molrm3, MLP training simulations, use of sigmoid transfer function
and T 0 s w300᎐320x K. Using these ranges, 50 random combi- was made for computing the outputs of the hidden and out-
nations of the CSTR’s design and operating variables were put-layer nodes. The optimal MLP architecture obtained
generated, and using each combination, the corresponding thereby has six input nodes, two hidden nodes, and one out-
steady-state value of the quality variable, y, was computed. put node Ž NI s6, NH s 2, NO s1.; the corresponding values
The data set comprising design and operating variables forms of the learning rate Žl . and momentum coefficient Ž ␣ m .
the network’s input space, and the corresponding y values were 0.7 and 0.01, respectively. An MLP network with good
represent the network’s desired Žtarget. output space. After function approximation and generalization abilities results in
normalizing and partitioning these data into the training set small but comparable RMSE values for both the training set
Ž40 patterns . and the test set Ž10 patterns ., an optimal MLP Ž Et r n . and the test set Ž Et st .. In the case of the MLP-based
network model was developed in accordance with the three- CSTR model, the Et r n and Et st magnitudes were 0.0061 and
step network training procedure described earlier. In the 0.0063, respectively. Additionally, values of the coefficient of
rigorous penalty function approach described earlier ŽEq. 27. recorded value of the noise-corrupted steady-state measure-
can be used for handling constraints ŽWang and Spall, 1999.. ments may show a positive or negative deviation from its true
It was observed during implementation of the SPSA method- mean Žnominal set point.. The deviation magnitude, which is
ology that the proper choice of the SPSA parameters, namely variablerparameter-specific, is likely to vary from one run to
A, r, and Z, is a prerequisite to successful convergence. For another. This situation is different from the process uncer-
a judicious selection of the stated parameters, the reader may tainties that are caused by the random physical variations in
refer to several guidelines provided in Spall Ž1998a.. The re- the process variablesrparameters.
sults of the SPSA-based CSTR optimization are presented in For the present case study, we consider a scenario wherein
column 2 of Table 2, where it is seen that the SPSA-mini- steady-state values of all the monitored process variables are
mized annual total cost Ž$13,900.58ryr. is nearly equal to that corrupted, with noise obeying the Gaussian PDF. Accord-
given by GA-based optimization Ž$13,853.47ryr.. However, the ingly, the steady-state values of the CSTR’s design, operating,
control and quality cost values corresponding to the GA- and and quality variables obtained earlier by solving the phe-
SPSA-based solutions differ significantly. A high value for the nomenological model, were corrupted using the Gaussian
quality cost results when Ž1. the mean of the quality variable noise. The extent of measurement noise in each variable was
Ž y . deviates significantly from its desired magnitude, andror assumed to lie within "5% tolerance limit. Letting l be the
Ž2. the corresponding standard deviation value Ž y . is high true steady-state value Žnominal setpoint. of the lth process
Žsee Eq. 12.. It is noticed in the GA-based optimization re- inputroutput variable, the corresponding standard deviation
sults that the y value Ž11.48. is higher than the correspond- l , required for generating the noisy measurements, was
ing SPSA-based value Ž8.54.. As a result, the product quality computed as
will exhibit greater variability, eventually leading to higher
quality cost. In the case of the SPSA-based solution, it 0.05= l
is observed that the control cost has a higher magnitude l s . Ž 28.
3.09
Ž$2,597.86ryr. as compared to the GA-based solution
Ž$866.02ryr.. By definition, the control cost is inversely pro-
All seven elements of the 50 patterns representing the CSTR’s
portional to the tolerance values Žrefer to Appendix B, Eq.
noise-free steady-state input᎐output data set were randomly
B4., since smaller tolerances necessitate stricter process con-
corrupted using variable-specific Gaussian mean Ž l . and
trol, thereby increasing the cost of control. This can be veri-
standard deviation Ž l . values. Specifically, a time series se-
fied from the tolerance values corresponding to the SPSA-
quence comprising one thousand noisy measurements was
based solution. It is observed that the optimized tolerances,
generated for each pattern element. The sequence obtained
0.049, 0.02, and 0.004, in respect of the process variables Q,
thereby was denoised using a nonlinear noise-reduction algo-
CA0 , and T 0 , are smaller as compared to those optimized by
rithm ŽKantz and Schreiber, 1997., and the resulting se-
the GA Ž0.059, 0.031, and 0.005.. Consequently, the control
quence was averaged out. The database obtained thereby
cost has assumed a higher value Ž$2,597.86ryr compared to
consists of 50 patterns representing noise-filtered steady-state
$866.02ryr..
values of the CSTR’s seven input᎐output variables. It is worth
pointing out here that even after noise-filtration, the resul-
CSTR optimization in the presence of noisy process data tant steady-state values do contain a small amount of resid-
Sensors monitoring process variables and parameters often ual noise. For creating training and test sets the noise-filtered
generate noisy measurements. Consequently, the mean steady-state database was normalized and partitioned in a 4:1
varying the magnitude of a design or an operating variable very close to their desired magnitude Ž600 molrmin.. Follow-
while maintaining values of the remaining ten decision vari- ing comparison with the RO solution ŽTable 2, column 5. ob-
ables at their optimum. The six panels in Figure 5 depict the tained by Bernardo and Saraiva Ž1998., the annual plant costs
effect of variations in V, F, Q, CA0 , CB0 , and T 0 on the values in respect to the GA-based Ž$13,853.47ryr, $13,853.81ryr. and
of C yr and y . It is seen in all six C yr profiles that a single SPSA-based Ž$13,900.58ryr, $14,319.21ryr. solutions are a few
minimum exists and that the GA-searched solution always lies percent lower than the corresponding RO solution value
at the valley bottom. In view of the efforts made toward lo- Ž$14,716ryr.. Such a reduction of C yr was brought about ei-
cating a deepest local or global minimum, it can thus be in- ther by the reduction in the control cost, Cc Žsee column 1 of
ferred that the GA was successful in fulfilling the objective. Table 2., or by the reduction in the quality cost, C q Žsee
columns 2, 3, 4 of Table 2.. It is noticed from the standard
deviations Ž y . of the quality variable that their magnitudes
Discussion 11.48, 8.54, 11.26, and 7.0, pertaining to the solutions given
Upon examining the solutions given by the hybrid method- by the GA and SPSA methodologies, are smaller than the
ologies Žlisted in Table 2., it is observed that the mean values corresponding RO solution value of 16.8, although the re-
of the quality variable Ž599.12, 601.39, 599.85, and 600.18. are spective y values deviate marginally from their desired mag-
Conclusions
To summarize, this article presents two process optimiza-
tion strategies combining an ANN-based process model with
stochastic optimization formalisms, namely, GA and SPSA.
The principal advantage of using neural networks for process
modeling is that the model can be developed exclusively from
process input᎐output data without invoking process phe-
nomenology. Having built an ANN model, its input space
comprising process input variables is optimized using the GA
and SPSA techniques. These optimization paradigms possess
positive characteristics, such as: Ž1. only objective function
measurements Žand not the measurements of objective func-
tion derivatives. are needed in their optimization procedures,
and Ž2. the paradigms can tolerate noisy objective functions.
It is necessary to point out at this juncture that the magni-
Figure 6. Comparison of PDFs pertaining to the quality
tudes of various algorithmic parameters utilized in the devel-
variable, y.
opment of the ANN models and implementation of the
Ža . GA-optimized solution using noise-free data Ž y s
599.12, y s 11.48 .; Žb . SPSA-optimized solution using GArSPSA methodologies are problem-specific and, except
noise-free data Ž y s 601.39, y s 8.54 .; Žc . GA-optimized for a few Žfor instance, and  values of the SPSA algo-
solution corresponding to the noisy process data Ž y s rithm., must be selected heuristically. Notwithstanding this
599.85, y s 11.26 .; Žd . SPSA-optimized solution corre-
sponding to the noisy process data Ž y s 600.18, y s 7.0 .; fact, development of ANN-based process models is still an
and Že . RO framework solution ŽBernardo and Saraiva, 1998 . easier and more cost-effective task compared to the develop-
Ž y s 600.0, y s 16.8 ..
ment of phenomenological models. The efficacy of ANN-GA
and ANN-SPSA formalisms has been demonstrated by con-
nitude of 600 molrmin. These results suggest that for the case sidering a nontrivial optimization objective, which in addition
of the CSTR, there exists a trade-off between the mean and to the parameter design, also addresses the issue of tolerance
standard deviation values of the quality variable. The nature design. Thus, the ANN-model-based mathematical frame-
of this trade-off can be understood from Figure 6, wherein work required for fulfilling the stated optimization objective
the PDFs pertaining to the quality variable y are plotted. In has been formulated. A case study involving CSTR has been
the figure, the PDFs formed by the dashed lines correspond conducted for validating the optimization performance of the
to the solutions given by the GA and SPSA methods, whereas ANN-GA and ANN-SPSA strategies; the optimization objec-
the PDF formed by the continuous line refers to the RO so- tive considered was minimization of the CSTR’s total annual
lution. It is noticed in the figure that implementation of cost. In the case study, two ANN models were developed us-
GArSPSA-based solutions will result in the y values, which ing noise-free and noisy steady-state process data. It was ob-
are marginally different from their desired value of 600 served that both the ANN models possess closely comparable
molrmin. On the other hand, implementation of the RO so- data-fitting and generalization abilities. Input space of the
lution will result in a y value exactly equal to 600 molrmin. ANN models consisting of the CSTR’s design and operating
This, however, will be achieved at the cost of more widely variables was then optimized using the GA and SPSA meth-
spread steady-state values of the quality variable. ods; the tolerances associated with the operating variables
A peculiar feature of the GA and SPSA techniques, which were simultaneously optimized. The solutions obtained
is shared by most stochastic methods, is that the obtained thereby have been found to compare excellently with that
solution is influenced by the random number sequence used given by a robust deterministic optimization formalism. The
during their implementation. As a result, multiple optimiza- ANN-GA and ANN-SPSA approaches presented here are
tion runs, each time taking a different random number se- sufficiently general, and therefore can be employed for all
quence Žby changing the random number generator seed., kinds of process design and optimization problems. These
were performed to obtain an overall optimal solution. It is strategies become considerably simple to implement when the
seen from the CPU times consumed by the GArSPSA optimization objective involves only parameter design. In that
methodologies Žlast row of Table 2. that the SPSA procedure case, tolerances defining operating windows need not be de-
consumes less time Ž47 and 54 s. as compared to the time termined, thereby avoiding the usage of a sampling technique
taken by the GAs Ž80.3 and 85.5 s.. These values also suggest and associated numerical computations.
that implementation of hybrid formalisms is not computa-
tionally burdensome, even if multiple runs need to be per- Acknowledgment
formed. In the case of GA-based optimization, it took 10᎐15 One of the authors ŽS.N.. thanks the Council of Scientific and In-
runs to arrive at the overall optimal solutions reported in dustrial Research ŽCSIR., the Government of India, New Delhi, for
Table 2, although it was noticed that the converged solution a Junior Research Fellowship.
Ž B4 .
0.6227
V
C eq p Ž $ryr . s 4,199.55 ž /
. Ž B1 .
where n and n refer to the mean and standard deviation
of the PDF pertaining to the nth operating variable.
2. The operating cost Ž C o p . includes the utility cost Ž Cut il . 4. The quality cost has been computed using Taguchi loss
and the pumping cost Ž Cpump .. The Cu t i l value is calculated function ŽTaguchi, 1986. given as
using heat recovery rate, Q ŽJrmin. and Q N Ž s 2.54=10 7
Jrmin., according to
C q Ž $ryr . s k l Ž y ᎐ yU . q y2 ,
2
Ž B5 .
4 2
Cut il Ž $ryr . s1.145 7,896y6,327 Ž QrQ N . q4.764=10 Ž QrQ N .
y1.022=10 4 Ž QrQ N . ,
4
Ž B2 . where y and y denote the mean and standard deviation,
respectively, of Nobs number of quality variable values, y4,
obtained using the ANN-based CSTR model; yU refers to
and Cpump is evaluated using the flow rate Ž F, m3rmin., as
the desired value Ž600 molrmin. of the quality variable, y;
given by
and k l Ž s6.536. is the loss coefficient.
0.8050
Cpump Ž $ryr . s13.8831 Ž 264.2 F . . Ž B3 . Manuscript recei®ed Jan. 5, 2000, and re®ision recei®ed June 7, 2000.