Sei sulla pagina 1di 25

Approximation models in optimization functions

Alan D az Manr quez


Abstract Nowadays, the most real world problems require the optimization of one or more goals, dierent techniques have been used for the solution of such problems. However, evolutionary algorithms have shown exibility, adaptability and good performance in this class of problems. The main disadvantage of these algorithms is that they require too many evaluations of the tness function to achieve an acceptable results. Therefore, when the tness function is computationally expensive (eg simulations of engineering problems) the optimization process becomes prohibitive. To reduce the cost, surrogate models, also known as metamodels, are constructed and used instead of the real tness function. There are a wide variety of work on this class of models, however, most existing work does not justify the choice of the metamodel. There are studies in the literature that make a comparison between dierent techniques for creating metamodels. However, most only make comparisons between two techniques, or only taken as a point of comparison the accuracy of the metamodel. A main problem of the surrogate models is the scalability, due to that, the most have a good performance for a relatively low number of variables but has a poor performance for high dimensionalities. In this work, we compare four metamodeling techniques (Polynomial approximation models, Kriging, Radial basis functions, Support vector regression), dierent aspects are taken into account for measuring the performance in six scalables problems of optimization that representing dierent classes of problems. The objective of this study is investigate the advantages and disadvantages of the metamodeling techniques in dierent test problems, measuring performance based on multiple aspects, including scalability.

Introduction

In recent years, Evolutionary Algorithms (EAs) have been applied with a great success to complex optimization problems. The main advantage of the EA lies in their ability to locate close to the global optimum. However, for many real-world problems, the number of calls to the objective function for locate a near optimal solution may be too. In many science and engineering problems researchers make heavy use of computer simulation codes in order to replace expensive physical experiments and improve the quality and performance of engineered products and devices. For example, Computational Fluid Dynamics (CFD) solvers, Computational Electro Magnetics (CEM) and Computational Structural Mechanics (CSM) have been shown to be very accurate. Such simulations are often times very expensive computationally. A simulation can take several minutes, hours, days or even weeks or years. Hence, in many real world optimization problems, the number of objective function evaluations needed to obtain a good solution dominates the optimization cost, that is, the optimization process is taken up by runs of the computationally expensive analysis codes. In order to obtain ecient optimization algorithms, it is crucial to use prior information gained during the optimization process. Conceptually, a natural approach to utilizing the known prior information is building a model of the tness function to assist in the selection of candidate solutions for evaluation. A variety of techniques for constructing of such a model, often also referred to as surrogates, metamodels or approximation models for computationally expensive optimization problems have been considered. There are a variety of work on such problems, however, most existing work does not justify the choice of the metamodel or do not make a comparison between dierent surrogate models. In this work, we focus on a study on surrogate models, evaluating dierent aspects, in order to choose correctly the use of a metamodel, depending on the interests of the algorithm. The remainder of this work is organized as follows. We begin with a brief overview of surrogate modeling techniques commonly used in the literature. Section 3 presents an overview of the state of the art of evolutionary algorithms with surrogate models. Section 4 presents the proposed methodology for the comparison

of the metamodeling techniques. Experimental results obtained on synthetic test problems are presented in section 5 and the discussion of results. Finally, section 6 summarizes our main conclusions.

Background

A surrogate model is a mathematical model that mimics the behavior of a computationally expensive simulation code over the complete parameter space as accurately as possible, using as little data points as possible. There are a variety of techniques for creating metamodels: rational functions, radial basis functions, articial neural networks, kriging models, support vector machines, splines, polynomial approximation models. The following are the most common approaches to constructing approximate models based on learning and interpolation from known tness values of a small population, also know as metamodeling techniques.

2.1

Polynomial approximation models

The response surface methodology (RSM) approximation is one of the most well established meta-modeling techniques. This methodology employs the statistical techniques of regression analysis and analysis of variance in order to obtain minimum variances of the responses. For most responses surfaces, the function use for approximation are polynomials because of their simplicity, although other types of function are, of course, possible. In general, a polynomial in the coded inputs x1 , x2 , ..., xk is a function which is a linear aggregate (or combination) of powers and products of the x s. A term in the polynomial is said to be of order j (or degree j ) if it contains the product of j of the x s. A polynomial is said to be of order d, or degree d, if the term(s) of highest order in it is (are) of order or degree d. The response surface for d = 2, k = 2 and if x1 and x2 denote two coded inputs is described as follows: p) = ( ) + ( x y( 0 1 1 + 2 x2 ) + (11 x1 x1 + 22 x2 x2 + 12 x1 x2 )
(p) (p) (p) (p) (p) (p) (p) (p) (p) x1 (p) x2 .

(1)

p is the response to where y and In the expression 1 the coecients s are coecients of (empirical) parameters which, in practice, have to be estimated from the data. The polynomial model is written in matrix notation as: p = T x p y p

(2)

is the vector corresponding to the form of the where is the vector of coecients to be estimated, and x (p) and x2 terms in the polynomial model. As seen from Table 1, the number of parameters increases rapidly as the number, k, of the input variables and the degree, d, of the polynomial are both increased. x1
(p)

Table 1: Number of coecients in Polynomials of Degree d involving k inputs Number of inputs k 2 3 4 5 1 Planar 3 4 5 6 Degree of Polynomial, d 2 3 Quadratic Cubic 6 10 10 20 15 35 21 56 4 Quartic 15 35 70 126

To estimate the unknown coecients of the polynomial model, both the least squares method (LSM) and the gradient method can be used, but either of them requires at least the same number of samples of the real objective function than the number of coecients in order to obtain good results. The PRS can be construct by a Full Regression or by a Stepwise Regression, the basic procedure for Stepwise Regression involve (1) identifying an initial model, (2) iteratively stepping, that is, repeatedly

altering the model at the previous step by adding or removing a predictor variable in accordance with the stepping criteria, and (3) terminating the search when stepping is no longer possible given the stepping criteria, or when a specied maximum number of steps has been reached. The principal advantages of PRS is that the tness of the approximated response surface can be evaluated using powerful statistical tools and the minimum variances of the response surfaces can be obtained using design of experiments with a small number of experiments. In practice, we can often proceed by supposing that, over limited regions of the factor space, a polynomial of only rst or second degree might adequately represent the true function. Higher-order polynomials can be used; however, instabilities may arise [1], or it may be too dicult to take sucient sample data to estimate all of the coecients in the polynomial equation, particularly in large dimensions. In this work, second degree PRS models are considered. In this work the code for PRS used is the reported in [23].

2.2

Kriging

Kriging is a spatial prediction method based on minimizing the mean squared error, which belongs to the group of Geo-statistical methods and describing the spatial and temporal correlation between the values of an attribute, named in honor of D. G. Krige, a South African engineer who developed an empirical method to determine the distribution of gold deposits based on samples of the same. The DACE model (Design and Analysis of Computer Experiments) is a parametric regression model developed by Sack et al. [20] using the Kriging approach, because Kriging has been often used in spaces usually of only two or three-dimensional in geostatistical situations, there is no obvious way to estimate the semivariogram for high-dimensional inputs. The DACE model can be expressed as a combination of a know function a(x) (e.g., polynomial function, trigonometric series, etc) and a Gaussian random process b(x). y (x) = a(x) + b(x) The Gaussian random process b(x) is assumed to have mean zero and covariance: E(b(x(i) ), b(x(j ) )) = Cov(b(x(i) ), b(x(j ) )) = 2 R(, x(i) , x(j ) ) (4) (3)

where 2 is the process variance of the response and R(, x(i) , x(j ) ) is the correlation model with parameters . In table 2, show dierent types of correlation models.
n

Table 2: The correlation functions have the form R(, w, x) =


i=1

Rj (, wj xj ) , dj = wj xj

Name Exponential 1(a) Gaussian 1(b) Linear 1(c) Spherical 1(d) Cubic 1(e)

Rj (, dj ) exp(j |dj |) exp(j d2 j) max{0, 1 j |dj |}


3 1 1.5j + 0.5j , 2 3 1 3j + 2 j ,

j = min{1, j |dj |} j = min{1, j |dj |} for for for 0 j 0 0.2 < j < 1 j 1

Spline 1(f)

2 3 + 30j 1 15j 3 1.25(1 j ) 0

j = j |dj |

Is common to choose the correlation function as a decreasing function of the distance between two points. Thus, two points close together will have a small distance and high correlation. In the gure 1 note that in all the cases the correlation decreases with |dj | and a larger j leads to faster decrease. For the set S of design sites (training set) we have F : F = [f (s1 ), f (s2 ), ..., f (sm )]T Further, dene R as the matrix R of stochastic-process correlations between z s at design sites, Rij = R(, si , sj ) At an untried point x let: r(x) = [R(, s1 , x), ..., R(, sm , x)]T be the vector of correlations between z s at designs sites and x. Now, for the sake of convenience, consider the linear predictor: y (x) = cT Y The error is: y (x) y (x) cT Y y (x) cT (F + Z ) (f (x)T + z ) c Z z + (F c f (x))
T T T

(5)

i, j = 1, ..., m

(6)

(7)

(8)

= = =

(9)

where Z = [z1 , ..., zm ] are the errors at the design sites. To keep the predictor unbiased we demand that F T c f (x) = 0, or F T c = f (x) Under this condition the mean squared error (MSE) of the predictor (8) is:
m

(10)

MSE =
i=1

[ y (x) y (x)]2

(11)

Using Lagrange multipliers the MSE can be minimized. And the predictor model of DACE can be writed as: + rT (x)R1 (Y 1) (F T R1 F )1 F T R1 y 1 1)T R1 (Y F )] [(Y m (12)

y (x)
2

= = =

where r(x) is the vector of correlations between z s at designs sites and x and R is the matrix of stochasticprocess correlations between z s at design sites. and 2 depend on the value of j . The parameter j can be found with the maximum The values of likelihood method or maximizing the expression: 1 [(m ln 2 ) + ln |R|] (13) 2 The principal disadvantage of Kriging is that the model construction can be very time-consuming, moreover, estimating the parameters is a n-dimensional optimization problem (n is the number of variables in the design space), that can be computacionally expensive to solve. In this work the code for KRG used is the reported in [15].

1 =0.2 0.9 0.8 0.7 0.6 Rj(,dj) 0.5 0.4 0.3 0.2 0.1 0 Rj(,dj) =1 =5

1 =0.2 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 =1 =5

0.2

0.4

0.6

0.8

1 dj

1.2

1.4

1.6

1.8

0.2

0.4

0.6

0.8

1 dj

1.2

1.4

1.6

1.8

(a) Exponential
1 =0.2 0.9 0.8 0.7 0.6 Rj(,dj) 0.5 0.4 0.3 0.2 0.1 0 Rj(,dj) =1 =5 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1

(b) Gaussian
=0.2 =1 =5

0.2

0.4

0.6

0.8

1 dj

1.2

1.4

1.6

1.8

0.2

0.4

0.6

0.8

1 dj

1.2

1.4

1.6

1.8

(c) Linear
1 =0.2 0.9 0.8 0.7 0.6 Rj(,dj) 0.5 0.4 0.3 0.2 0.1 0 Rj(,dj) =1 =5 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1

(d) Spherical
=0.2 =1 =5

0.2

0.4

0.6

0.8

1 dj

1.2

1.4

1.6

1.8

0.2

0.4

0.6

0.8

1 dj

1.2

1.4

1.6

1.8

(e) Cubic

(f) Spline

Figure 1: Correlation functions

2.3

Radial Basis Function Network

A radial basis function network (RBFN) is an articial neural network that uses radial basis functions (RBF) as activation functions. It is a linear combination of radial basis functions. They are used in function approximation, time series prediction, and control. The RBF were rst introduced by R. Hardy in 1971 [10]. A RBF is a real-valued function whose value depends only on the distance from the origin, so that (x) = (||x||); or alternatively on the distance from some other point c, called a center, so that (x, c) = (||x c||). Any function that satises the property (x) = (||x||) is a radial function. The norm is usually Euclidean distance, although other distance functions are also possible:
d

||x|| =
i=0

x2 i = distance of x to the origin

(14)

Typical choices for the RBF include linear splines, cubic splines, multiquadratics splines, thin-plate splines and Gaussian functions as show in Table 3.

Table 3: Radial Basis Functions r = ||x ci || Type of RBF linear splines cubic splines multiquadratics splines thin-plate splines Gaussian Function |r| |r|3 1 + (r)2 |r|2 ln |r| exp((r)2 )

A RBFN typically have three layers: an input layer, a hidden layer with a non-linear RBF activation function and a linear output layer (Figure 2). The output, : Rn R, of the network is thus:
N

(x) =
i=1

wi (||x ci ||)

(15)

where N is the number of neurons in the hidden layer, ci is the center vector for neuron i, and wi are the weights of the linear output neuron. In the basic form all inputs are connected to each hidden neuron. The norm is typically taken to be the Euclidean distance and the basis function is taken to be Gaussian. RBF networks are universal approximators on a compact subset of Rn . This means that a RBF network with enough hidden neurons can approximate any continuous function with arbitrary precision. In a RBF network there are three types of parameters that need to be chosen to adapt the network for a particular task:The weights wi , the center vector ci , and the RBF width parameters i . In the sequential training of the weights are updated at each time step as data streams in. For some tasks it makes sense to dene an objective function and select the parameter values that minimize its value. The most common objective function is the least squares function:

K (w ) =
t=1

Kt (w)

(16)

where Kt (w) = [y (t) (x(t), w)]2 (17)

Radial basis function networks have been shown to produce good ts to arbitrary contours of both deterministic and stochastic response functions. In this work the code for RBF is an own implementation obtained of [].

RBF

Input x

Output y

Linear Weights Weights

Figure 2: Architecture of a radial basis function network. An input vector x is used as input to all radial basis functions, each with dierent parameters. The output of the network is a linear combination of the outputs from radial basis functions.

2.4

Support Vector Regression

The Support Vector Machines (SVM) is mainly inspired froms statistical learning theory [22]. SVM are a set of related supervised learning methods which analyze data and recognize patterns. A support vector machine constructs a hyperplane or set of hyperplanes in a high or innite dimensional space, which can be used for classication, regression or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data points of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classier. SVM schemes use a mapping into a larger space so that cross products may be computed easily in terms of the variables in the original space making the computational load reasonable. The cross products in the larger space are dened in terms of a kernel function K (x, y ) which can be selected to suit the problem. In the Table 4, show dierent types of kernel functions.

Table 4: Kernel Functions for SVM Type of Kernel Polynomial Gaussian Radial Basis Function Exponential Radial Basis Function Multilayer Perceptron Kernel Function d K (x, x ) = <x, x >
x || K (x, x ) = exp ||x 2 2
2

x || K (x, x ) = exp ||x2 2 K (x, x ) = tanh(<x, x > + )

SVM can also be applied to regression problems1 by the introduction of an alternative loss function. The loss function must be modied to include a distance measure. Consider the problem of approximating the set of data,
1 The

SVM for regression problems are know as Support Vector Regression (SVR)

D = {(x1 , y 1 ), ..., (xl , y l )}, with a linear function,

x Rn , y R

(18)

f (x) = <w, x> + b the optimal regression function is given by the minimum of the functional, (w, ) = 1 ||w||2 + C 2
n + ) + i (i i=1

(19)

(20)

where C is a pre-specied value, and + , are slack variables representing upper and lower constraints on the outputs of the system. Exist dierent types of loss functions (Quadratic, Laplace, Huber, -insensitive), here only describe the problem with the -insensitive. Using an -insensitive loss function, L = the solution is given by, max W (, ) = max
, ,

0 |f (x) y |

for otherwise

|f (x) y | <

(21)

1 2

l (i i )(j j )<xi , xj > +

l i (yi ) i (yi + ) i=1

(22)

i=1 j =1

with constraints,
0 i , i C, i = 1, ..., l l (i i )=0 i=1

(23)

Solving Equation 22 with constraints Equation 23 determines the Lagrange multipliers, , , and the regression function is given by Equation 19, where:
l

w b

=
i=1

(i , i )x i

(24)
>

1 <w , (xr , xs ) 2

Note that the above solution is for a regression with a linear kernel function. In this work the code for SVR used is the reported in [5].

State of the art

This section describes some approaches that have successfully used the metamodeling techniques. Ratle [17] proposed a hybrid algorithm that use a Genetic Algorithm (GA) with Kriging, the rst generation is randomly initiated like in a basic GA. Fitness is evaluated using the true tness function. Solutions found in the rst generation are used for building up a metamodel with Kriging. The metamodel is then exploited for several generations until a convergence criteria is reached (the convergence criteria is proposed by the authors). The next generation is evaluated using the real tness function and the metamodel is updated using the new data points. The proposed algorithm waa applied in a test function with two variables and in a test function with 20 variables. The authors say that the algorithm seems to be appropriate for obtaining rapidly and moderately good solution. Ratle [18] combined a genetic algorithm with a approximated model created by the Kriging method. The metamodel is updated every k generations. Six dierent strategies proposed to update the metamodel

and they were evaluated in two test problems. Finally, the best strategy is used to the design of a simple mechanical structure with a noise or vibration level reduction criteria. The authors not justify the selection of the parameters of the Kriging method. Beltagy et al. [7] proposed an algorithm with a Gaussian regression model. The algorithm create a metamodel of the original function using a the Gaussian regression model, the metamodel is updated every time a generation delay criterion is satised, the update is performed by taking into account the tness and the minimum distance of the population algorithm with respect to the vectors who built the metamodel. The individuals that be taken into account for the creation of the model must meet a minimum distance with respect to the vectors with those was created the metamodel, helping to the diversity of points in the metamodel and to decrease computation time. The metamodeling approach seems to work best with smooth objective functions. It stalls in situations where the global optima has strong local features. Bull [3] proposed a GA with a neural network. The neural network is trained using example individuals with the explicit tness and the resulting model is then used by the GA to nd a solution. The model is updated every R generations, the current ttest individual from the evolving population is evaluated in the real function. This individual replace the one with lowest tness in the training set and the neural network is re-trained. The approach were applied at 20 tness landscapes created with the NK model [13]. Pierret [16] designed an algorithm for the turbomachinery blade design. For the improvement of the machine performance require a detailed knowledge that can be provided by Navier-Stokes solvers (these solvers are time consuming). The algorithm have a database containing the input and output of previous Navier-Stokes solutions. One starts by scanning the database to select the sample that has a performance closet to the required one. This is then adapted to the required performance for an optimization procedure. The optimization algorithm used is a simulated annealing and use a approximated model for the performance evaluation. The approximated model is obtained from the database with a neural network. The new solution obtained from the optimization procedure is evaluated by the Navier-Stokes solver and is added to the database. Finally, if the target performance has not been reached, a new iteration is started. The method require a few Navier-Stokes computations to dene an optimized blade. Beltagy et al. [6] used a Gaussian Process (GP) with a Evolutionary Algorithm (EA), they present the advantages of using GP over other neural-net biologically inspired approaches. The metamodel is updated with an online model expansion, the model can expand to include new data points with minimal computational cost. In the algorithm the metamodel isnt updated all the generations but is updated with respect to the predicted standard deviation of the metamodel. The results are presented for a real world-engineering problem involving the structural optimization of a satellite boom. Jin et al. [12] was mentioned for rst time the evolution control. The evolution control help to avoid false optima. They propose two methods of evolution control: The rst is the Individual-based control, in this approach part of the individuals in the population are chosen and evaluated with the real function. If the individual are chosen randomly the call it random strategy, if the individuals are the best individuals they call it a best strategy. The second strategy is the Generation-based control, in this approach the whole population of n generations will be evaluated with the real function in every k generations. They proposed a framework for managing approximate models in generation-based evolution control. Also, they proposed an algorithm that combine a evolutionary strategy (ES) with a neural network and used the proposed framework. They evaluated the new approach in two theoretical functions and in a real problem for the blade design optimization. Emmerich et al. [8] proposed the Metamodel Assisted Evolution Strategies (MAES), they used a ES combined with Kriging. The approach take into account the error associated with each prediction. The estimated value and the predicted error are used to select the individuals to be evaluated with the real function. They evaluated the approach in articial landscapes and in a problem of Airfoil Shape Optimization. The principal advantage of this method is the use of the error associated with the predictions. Regis and Shoemaker [19] proposed two algorithms 1) an ES with local quadratic approximation and 2) ES with local cubic radial basis function. The main feature of the algorithms is that the objective function value (or the tness value) of an ospring solution will be estimated by tting a model using its k-nearest neighbors among the previously evaluated points. The algorithms were applied to a twelve-dimensional (12-D) groundwater bioremediation problem involving a complex nonlinear nite-element simulation model. Bueche et al. [2] proposed an algorithm denominated Gaussian Process Optimization Procedure (GPOP), they use a Gaussian Process as a inexpensive function that replace the original function, the Gaussian Process is created with individuals in the neighborhood of the current best solution and with the most recent

individuals evaluated. The GPOP was applied to a real-world problem: the optimization of stationary gas turbine compressor proles. The authors mention that the GPOP converged much faster than a range of alternative evolution strategies, and to signicantly better results. The most of the previous work do not justify the choice of metamodel, probably the author chose the technique by his knowledge. There are some works in which the choice of the metamodel technique is by the characteristics of it [8]. Some other works compare some metamodeling techniques, however the comparison methodology is inecient [19]. There are studies in the literature that make a comparison between dierent techniques for creating metamodels. However, most only make comparisons between two techniques [4, 21, 9, 24], or only taken as a point of comparison the accuracy of the metamodel. Another work that takes into account several metamodeling techniques is the reported in [11], they compared four metamodeling techniques (Polynomial Regression, Kriging, Radial basis functions, Multivariate Adaptive Regression Splines), the comparison take into account multiple criteria to decide the best techniques in dierent problems. However, one disadvantage of this work is that the dimensionality isnt taken as an important factor. Other disadvantage is that not taken account the tness landscape of the metamodel, is natural to think that a metamodel can be very accurate but more dicult to optimize.

Metodology

The principal challenge in the approximation models is be as accurate as possible over the complete domain of interest while minimizing the simulation cost (eciency). The most of the approaches that use metamodeling techniques take account only the accuracy of the technique [4, 21, 9, 24]. However, other approaches suggest the use of multiple criteria for assessing the quality of a metamodel [11], by example, the robustness, the eciency, the simplicity. In this work take account this aspects and others have been added as scalability and easy to optimize. To measure the performance of the metamodeling techniques were taken into account the following aspects:

Accuracy The accuracy is the capability o have a prediction close to the real value of the system. For accuracy use two data sets, the rst one is the training data set and is used for train the metamodeling technique, the second set is the validation data set and is used for validate the accuracy of the technique. For measure the accuracy use the G-Metric: G=1
N i=1 N i=1

(yi y i )2 (yi y i )2

=1

MSE Variance

(25)

where N is the size of the validation data set, y i is the predicted value for the input i, and yi is the real value; y is the mean of the real values. The MSE (Mean Square Error) measure the dierence between the estimator and the real value, the variance describes how far values lie from the mean, that is the variance capture how irregular the problem is. The larger the value of G, the more accurate the metamodel.

Robustness The robustness is the capability of the technique to achieving good accuracy for dierent test problems. Six problems were used, shown in the previous section. Of the six problems three are unimodals and the other three are multimodals, the six problems have dierent features and are common problems for optimization algorithms. Scalability The scalability is the capability of the technique to achieving good accuracy for dierent number of decision variables. The six test problems have the capacity to be scaled in the number of decision variables. The number of variables used are divided in 9 levels v = [2, 4, 6, 8, 10, 15, 20, 25, 50]. Eciency The eciency refers to the computational eort required for the technique for construct the metamodel and for predict the response for a new input. The eciency of each metamodeling technique is measured by the time used for the metamodel construction and new predictions.

10

Easy to optimize The easy to optimize refers to the ease of optimizing a metamodel created for a technique. It is obvious to think that a more accurate metamodel is more complicated to optimize, because their tness landscape may be more rugged. For measure the ease to optimize a Dierential Evolution (DE) is used to optimize a metamodel and the best value found for the DE is evaluated for the real function and measured the distance from the optimum. Simplicity The simplicity is refers to ease of each technique. The number of parameters, the size of the parameters, the implementation and the he knowledge necessary for understanding the technique.

4.1

Test problems

To test the metamodeling techniques to dierent classes of problems, six test problems for unconstrained global optimization are selected. The six problems are scalables in the design space. The test problems were selected based in the space search shape and in the number of local minima. A summary of the features of the six problems is given in Table 5, the test problem are described in more detail in the appendix A.

Table 5: Features of the test problems Problem name Step Sphere Rosenbrock Ackley Rastrigrin Schwefel Space search shape Unimodal Unimodal Unimodal for n 3 otherwise multimodal Multimodal Multimodal Multimodal # of local minima no local minima except the global one no local minima except the global one several local minima for n>3 several local minima large number of local minima several local minima # of variables n n n n n n Global minimum x = (0, . . . , 0), f (x ) = 0 x = (0, . . . , 0), f (x ) = 0 x = (1, . . . , 1), f (x ) = 0 x = (0, . . . , 0), f (x ) = 0 x = (0, . . . , 0), f (x ) = 0 x = (420.9687, . . . , 420.9687), f ( x ) = 0

4.2

Scheme for metamodeling techniques comparison

The scheme proposed for the comparative study is the next: 1. Create a training data set with latin hypercubes [14] of size 100. 2. Train each technique used (PRS, KRG, RBF, SVR) with the training set. 3. Create the validation data set with latin hypercubes of size 200. 4. Predict the validation data set with the metamodel. 5. Measure the accuracy with the G-metric 6. Repeat the step 1 for 31 dierent data sets.

11

The procedure is applied to the six problems with the nine levels of variables (6 problems x 9 levels = 54 dierent problems). The objective of this experiment is to measure the accuracy, robustness, scalability and eciency of the techniques. For each technique the parameters were discretized and a full factorial design is created, in order to avoid aecting a technique by poor tuning of parameters. The parameters used for each technique are the next:

Polynomial Regression Degree of the polynomial={2} Technique used for construct the polynomial = {Full, Stepwise} Kriging Correlation function = {Gaussian, Exponential, Cubic, Linear, Spherical, Spline} Radial basis function Number of neurons in the hidden layer={3-100} Support Vector Regression C ={25 : 215 } ={0.1 : 2} ={210 : 25 }

5
5.1

Discussion of results
Accuracy, robustness and scalability

For each metamodeling technique was chosen the set of parameters that had the best accuracy in all the problems (a setting for each metamodeling technique), this is named as Best overall settings, in the same way was chosen the set of parameters that had the best accuracy for each problem (54 settings for each metamodeling technique), this is named as Best local settings. The Best overall settings found are the next:

Polynomial Regression Degree of the polynomial={2} Technique used for construct the polynomial = {Stepwise} Kriging Correlation function = {Exponential} Radial basis function Number of neurons in the hidden layer={6} Support Vector Regression C ={210.5 } ={0.2} ={22.5 }

To illustrate the performance of the metamodeling techniques use a boxplot graphic; the median is shown with a straight line inside the box and indicate that half of the problems are above it and the other half are below it. The mean is shown with a circle and indicate the average accuracy of a technique, the size of the box indicate the variability of the technique. The smaller the box was more robust technique. Figure 3 show the results of accuracy for all the problems and dierent problem sizes, the gure show that the accuracies of KRG, RBF and SVR the settings found for all the problems and dierent problem sizes

12

1.0067 0.8067 0.6067 0.4067 0.2067 0.0067 0.1933 GMetric 0.3933 0.5933 0.7933 0.9933 1.1933 1.3933 1.5933 1.7933 1.9933 2.1933 PRS KRG RBF SVR Best local settings Best overall settings

Metamodeling technique

Figure 3: Accuracy for the metamodeling techniques in all the problems.


(Best overall settings) have a comparable performance with respect to the Best local settings. Moreover, in PRS the results for the Best overall settings worsen the performance of the technique. Figure 3 show that the accuracies of RBF and KRG are among the best of the techniques; their values are very close to each other (the median is close to 0.9); RBF is slightly better than KRG. However their results are not conclusive with respect to SVR. The worst performance is for PRS. In gure 4 show the average of G-metric for all the problems, this gure is conrmed as the RBF and KRG have similar results and better performance with respect to SVR and PRS. In terms of the robustness of the accuracy for all the problems, the RBF is the best for Best local settings and Best overall settings, however their results are slightly better than KRG. Overall, RBF is the best for the average accuracy and the robustness when handling dierent types of problems. The problems were divided into two types: unimodal and multimodal. For each type the performance of each metamodeling technique was illustrated in a boxplot graphic. In gure 5 show that RBF are slightly better than KRG. PRS and SVR have a worst performance. However, the results show that there are certain problems for which PRS and SVR have a good performance because the top of the box is close to 1. Moreover, RBF is more robust than KRG because the box size is smaller. In gure 6 show that the best performance is for RBF, but their results are slightly better than KRG. In the same way as in the unimodal problems the worst performance is for PRS and SVR. In terms of the robustness the best is the RBF, but their results show that it is not very robust. Due to the lack of robustness, we need to know if the problem is on the increase in the number of variables or in the type of problem. Next we present an analysis of the techniques for each test problem:

Step In gure 7 shows the behavior of the metamodeling techniques in the Step function, you can see that with very few variables (2 or 4) SVR has good behavior. But at increasing the number of variables SVR worsens considerably. PRS is seen as having a bad behavior even with few variables. KRG has a better performance than RBF up to maximum 10 variables, since for a greater number of variables RBF achieves better results. Moreover, as a special mention can be seen that the RBF maintain a steady performance despite the increase in the variables, we can say that is robust to the increase of variables.

13

1 Best local settings 0.9 0.8 0.7 0.6 GMetric 0.5 0.4 0.3 0.2 0.1 0 Best overall settings

PRS

KRG

RBF

SVR

Metamodeling technique

Figure 4: Accuracy for the metamodeling techniques: average in all the problems.

0.5

0 GMetric Best local settings Best overall settings

0.5

1.5

2 PRS KRG RBF SVR

Metamodeling technique

Figure 5: Accuracy for the metamodeling techniques in unimodal problems and the nine levels of variables.

14

0.5

0 GMetric Best local settings 0.5 Best overall settings

1.5

2 PRS KRG RBF SVR

Metamodeling technique

Figure 6: Accuracy for the metamodeling techniques in multimodal problems and the nine levels of variables.
Finally, we note that both the best local settings and the best overall settings their results are similar, except for PRS. Thus, one can say that the general parameters found (Best Overall settings) can be used without prior adjustment (Best local settings).

Sphere In gure 8 shows the behavior of the metamodeling techniques in the Sphere function, it appears that the technique is with the best performing is RBF. Furthermore, its behavior is very consistent with the increase in the number of variables. It can be seen in best local settings PRS achieves good performance to maximum 10 variables, and then its performance decreases signicantly. In addition, KRG and SVR in both local best settings and global best settings, achieve comparable results. In the same way as in the previous problem can be said that the best global settings and best local settings obtained similar results at least for RBF, KRG and SVR. Rosenbrock In gure 9 shows the behavior of the metamodeling techniques in the Rosenbrock function, PRS is the technique with a worse performance, with just less than ve variables have achieved moderately good results. But the increase in the number of variables leads at the performance decreased signicantly. KRG and SVR have comparable results, both with a relatively low number of variables (v < 10) are better than RBF, however, to a greater number of variables (v 10) RBF achieves better results. A main feature of RBF is his constant behavior even with the increase in the number of variables. Ackley In gure 10 shows the behavior of the metamodeling techniques in the Ackley function, KRG and RBF have similar results in best local settings and best overall settings. PRS is the worst performing technique, even with few variables can not approximate the function. KRG is the technique with better performance with a few variables (v < 20) while RBF is the technique with better performance for a greater number of variables (v 20). Again, RBF remained constant with the increase of the variables. Rastrigrin

15

Behavior by number of variables Best local settings Problem: Step 1 0.9 0.8 0.7 0.6 GMetric KRG 0.5 0.4 0.3 0.2 0.1 0 PRS SVR RBF GMetric 1 0.9 0.8 0.7 0.6

Behavior by number of variables Best overall settings Problem: Step

KRG 0.5 0.4 0.3 0.2 0.1 0 PRS SVR RBF

10

15

20 25 30 Number of variables

35

40

45

50

10

15

20 25 30 Number of variables

35

40

45

50

(a) Best local settings

(b) Best overall settings

Figure 7: The mean for accuracy metric by number of variables, problem: Step

Behavior by number of variables Best local settings Problem: Sphere 1 0.9 0.8 0.7 0.6 GMetric 0.5 0.4 0.3 0.2 0.1 0 KRG PRS SVR RBF 0 5 10 15 20 25 30 Number of variables 35 40 45 50 0 0 5 10 15 0.1 GMetric 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

Behavior by number of variables Best overall settings Problem: Sphere

KRG PRS SVR RBF 20 25 30 Number of variables 35 40 45 50

(a) Best local settings

(b) Best overall settings

Figure 8: The mean for accuracy metric by number of variables, problem: Sphere

16

Behavior by number of variables Best local settings Problem: Rosenbrock 1 0.9 0.8 0.7 0.6 GMetric 0.5 0.4 0.3 0.2 0.1 0 KRG PRS SVR RBF 0 5 10 15 20 25 30 Number of variables 35 40 45 50 0 0 5 10 15 0.1 GMetric 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

Behavior by number of variables Best overall settings Problem: Rosenbrock

KRG PRS SVR RBF 20 25 30 Number of variables 35 40 45 50

(a) Best local settings

(b) Best overall settings

Figure 9: The mean for accuracy metric by number of variables, problem: Rosenbrock

Behavior by number of variables Best local settings Problem: Ackley 1 0.9 0.8 0.7 0.6 GMetric 0.5 0.4 0.3 0.2 0.1 0 GMetric 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Behavior by number of variables Best overall settings Problem: Ackley

KRG PRS SVR RBF

KRG PRS SVR RBF

10

15

20 25 30 Number of variables

35

40

45

50

10

15

20 25 30 Number of variables

35

40

45

50

(a) Best local settings

(b) Best overall settings

Figure 10: The mean for accuracy metric by number of variables, problem: Ackley

17

In gure 11 shows the behavior of the metamodeling techniques in the Rastrigrin function, RBF is the technique with best performing. PRS has a good performance up to maximum 15 variables. Moreover, SVR and KRG have a similar behavior, and its performance is decreasing with the increase in the number of variables. Finally, in the same way as in previous problems best overalls settings and best local settings are comparable in RBF, SVR and KRG.

Behavior by number of variables Best local settings Problem: Rastrigrin 1 0.9 0.8 0.7 0.6 GMetric 0.5 0.4 0.3 0.2 0.1 0 KRG PRS SVR RBF 0 5 10 15 20 25 30 Number of variables 35 40 45 50 0 0 5 10 15 0.1 GMetric 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

Behavior by number of variables Best overall settings Problem: Rastrigrin

KRG PRS SVR RBF 20 25 30 Number of variables 35 40 45 50

(a) Best local settings

(b) Best overall settings

Figure 11: The mean for accuracy metric by number of variables, problem: Rastrigrin

Schwefel In gure 12 shows the behavior of the metamodeling techniques in the Schwefel function, in this problem the four techniques are well-behaved, however the techniques that have better performance in best local settings are RBF and KRG, and RBF for best overall settings. PRS behavior is decreasing with the increase of the variables. For this problem KRG and RBF are remain constant with the increase in the number of variables.

In gure 13 show the performance of the techniques in the six test functions, while KRG is the technique that has better performance for a few variables (v < 15), RBF is the technique with better performance for high-dimensional functions. The technique with the worst overall performance is PRS. SVR achieves similar performance to KRG although this is slightly better. RBF is very robust against the increase in the number of variables in general for all problems, since their results are held constant.

Conclusions

The study presented in this paper has provided interesting results about the performance of dierents metamodeling techniques (PRS, KRG, RBF, SVR). The metamodeling techniques was evaluated making use of multiple aspects in several test problems with dierent features. We dene the size of the problem with respect to the dimensionality, the high dimensionality for problems with v > 15 and low dimensionality for others. In table 6 show that KRG is the best for low dimensionality problems. Moreover RBF is the best for High dimensionality problems. Overall the best technique with respect to the accuracy and the scalability is the RBF.

18

Behavior by number of variables Best local settings Problem: Schwefel 1 0.9 0.8 0.7 0.6 GMetric 0.5 0.4 0.3 0.2 0.1 0 KRG PRS SVR RBF 0 5 10 15 20 25 30 Number of variables 35 40 45 50 0 0 5 10 15 0.1 GMetric 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

Behavior by number of variables Best overall settings Problem: Schwefel

KRG PRS SVR RBF 20 25 30 Number of variables 35 40 45 50

(a) Best local settings

(b) Best overall settings

Figure 12: The mean for accuracy metric by number of variables, problem: Schwefel

Behavior by number of variables Best local settings General 1 0.9 0.8 0.7 0.6 GMetric 0.5 0.4 0.3 0.2 0.1 0 KRG PRS SVR RBF 0 5 10 15 20 25 30 Number of variables 35 40 45 50 0 0 5 10 15 0.1 GMetric 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

Behavior by number of variables Best overall settings General

KRG PRS SVR RBF 20 25 30 Number of variables 35 40 45 50

(a) Best local settings

(b) Best overall settings

Figure 13: The mean for accuracy metric by number of variables in all the problems

19

Table 6: Best techniques with respect to the accuracy and dimensionality: All the problems Low dimensionality High dimensionality Overall Unimodal KRG RBF RBF Multimodal KRG RBF RBF Overall KRG RBF RBF

In table 7 show the best techniques with respect to accuracy and the dimensionality by problem. For low dimensional problems the best technique is KRG, although exist techniques to achieve a performance comparable to it. For high dimensionality problems the best is RBF.

Table 7: Best techniques with respect to the accuracy and dimensionality: By problem Low dimensionality High dimensionality Overall Step KRG RBF RBF Sphere ALL RBF RBF Rosenbrock KRG,SVR RBF RBF Ackley KRG RBF RBF Rastrigrin ALL RBF RBF Schwefel KRG,RBF,SVR KRG,RBF RBF

Finally, to improve our study, we need evaluate the missing aspects (eciency, easy to optimize, simplicity).

20

A
A.1

Test problems
Step

Step is the representative of the problem of at surfaces. The step function consists of many at plateaus with uniform steep ridges. For algorithms that require gradient information to determine a search direction, this function poses considerable diculty. Flat surfaces are obstacles for optimization algorithms, because they do not give any information as to which direction is favorable. Unless an algorithm has variable step sizes, it can get stuck on one of the at plateaus.
D

f 1 (x ) =
i=1

|xi | + 0.5

(26)

Its global minimum equal f2 (x) = 0 is obtainable for xi = 0, i = 1, . . . , D. In Figure 14, show the Step function with D = 2.
Step function

50 40 f1(x1,x2) 30 20 10 0 5 0 5 5

5 0 x1

x2

Figure 14: An overview of Step function with D = 2.

A.2

Sphere

So called rst function of De Jongs is one of the simplest test benchmark. Function is continuous, convex and unimodal. It has the following general denition
D

f2 (x) =
i=1

x2 i

(27)

Its global minimum equal f2 (x) = 0 is obtainable for xi = 0, i = 1, . . . , D. In Figure 15, show the Sphere function with D = 2.

A.3

Rosenbrock

Rosenbrocks valley is a classic optimization problem, also known as banana function or the second function of De Jong. The global optimum lays inside a long, narrow, parabolic shaped at valley. To nd the valley is trivial, however convergence to the global optimum is dicult and hence this problem has been frequently used to test the performance of optimization algorithms. Function has the following denition:
D

f 3 (x ) =
i=1

100 xi+1 x2 i

+ (xi 1)2

(28)

21

Sphere function

50 40 30 20 10 0 5 5 0 0 5 5 x1

f2(x1,x2)

x2

Figure 15: An overview of Sphere function with D = 2.


Its global minimum equal f3 (x) = 0 is obtainable for xi = 1, i = 1, . . . , D. In Figure 16, show the Rosenbrock function with D = 2.
Rosenbrock function

4000

3000 f3(x1,x2)

2000

1000

0 2 1 0 1 2 x2 2 1 x1 0 1

Figure 16: An overview of Rosenbrock function with D = 2.

A.4

Ackley

The Ackley Problem is a minimization problem. Originally this problem was dened for two dimensions, but the problem has been generalized to D dimensions. Ackleys is a widely used multimodal test function. It has the following denition 1 D
D

f4 (x) = 20 exp(0.2

x2 i ) exp(
i=1

1 D

cos(2xi )) + 20 + exp(1)
i=1

(29)

Its global minimum equal f4 (x) = 0 is obtainable for xi = 0, i = 1, . . . , D. In Figure 17, show the Ackley function with D = 2.

22

Ackley function

12 10 8 f4(x1,x2) 6 4 2 2 0 3 0 2 1 0 x2 2 1 2 3 4 x1 4

Figure 17: An overview of Ackley function with D = 2.

A.5

Rastrigrin

The Rastrigin Function is a typical example of non-linear multimodal function. Rastrigins function is based on the function of De Jong with the addition of cosine modulation in order to produce frequent local minima. Thus, the test function is highly multimodal. This function is a fairly dicult problem due to its large search space and its large number of local minima. However, the location of the minima are regularly distributed. Function has the following denition
D

f5 (x) =
i=1

x2 i 10 cos (2xi ) + 10

(30)

Its global minimum equal f5 (x) = 0 is obtainable for xi = 0, i = 1, . . . , D. In Figure 18, show the Rastrigrin function with D = 2.
Rastrigrin function

70 60 50 f5(x1,x2) 40 30 20 10 0 4 2 0 2 x2 4 4 2 x1 0 2

Figure 18: An overview of Rastrigrin function with D = 2.

23

A.6

Schwefel

Schwefels function is deceptive in that the global minimum is geometrically distant, over the parameter space, from the next best local minima. Therefore, the search algorithms are potentially prone to convergence in the wrong direction. Function has the following denition
D

f6 (x) = 418.9809D
i=1

xi sin

|xi |

(31)

Its global minimum equal f6 (x) = 0 is obtainable for xi = 420.9687, i = 1, . . . , D. In Figure 19, show the Schwefel function with D = 2.

Figure 19: An overview of Schwefel function with D = 2.

References
[1] Russell R. Barton. Metamodels for simulation input-output relations. In Proceedings of the 24th conference on Winter simulation, WSC 92, pages 289299, New York, NY, USA, 1992. ACM. [2] D. Bueche, N.N. Schraudolph, and P. Koumoutsakos. Accelerating evolutionary algorithms with gaussian process tness function models. IEEE Trans. on Systems, Man, and Cybernetics: Part C, 2004. In press. [3] L. Bull. On model-based evolutionary computation. Soft Computing, 3:7682, 1999. [4] W. Carpenter and J.-F. Barthelemy. A comparison of polynomial approximation and articial neural nets as response surface. Technical Report 92-2247, AIAA, 1992. [5] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [6] M.A. El-Beltagy and A.J. Keane. Evolutionary optimization for computationally expensive problems using Gaussian processes. In Proceedings of International Conference on Articial Intelligence, pages 708714. CSREA, 2001. [7] M.A. El-Beltagy, P.B. Nair, and A.J. Keane. Metamodeling techniques for evolutionary optimization of computationally expensive problems: promises and limitations. In Proceedings of Genetic and Evolutionary Conference, pages 196203, Orlando, 1999. Morgan Kaufmann. [8] M. Emmerich, A. Giotis, M. Ozdenir, T. B ack, and K. Giannakoglou. Metamodel-assisted evolution strategies. In Parallel Problem Solving from Nature, number 2439 in Lecture Notes in Computer Science, pages 371380. Springer, 2002.

24

[9] Anthony A. Giunta and Layne T. Watson. A comparison of approximation modeling techniques: Polynomial versus interpolating models, 1998. [10] R. L. Hardy. Multiquadric equations of topography and other irregular surfaces. J. Geophys. Res., 76:19051915, 1971. [11] R. Jin, W. Chen, and T.W. Simpson. Comparative studies of metamodeling techniques under miltiple modeling criteria. Technical Report 2000-4801, AIAA, 2000. [12] Y. Jin, M. Olhofer, and B. Sendho. Managing approximate models in evolutionary aerodynamic design optimization. In Proceedings of IEEE Congress on Evolutionary Computation, volume 1, pages 592599, May 2001. [13] Stuart A. Kauman. The Origins of Order: Self-Organization and Selection in Evolution. Oxford University Press, USA, 1 edition, June 1993. [14] M. D. McKay, R. J. Beckman, and W. J. Conover. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2):239245, 1979. [15] H.B. Nielsen, S.N. Lophaven, Hans Bruun Nielsen, and J. Sndergaard. Dace - a matlab kriging toolbox, 2002. [16] S. Pierret. Turbomachinery blade design using a Navier-Stokes solver and articial neural network. ASME Journal of Turbomachinery, 121(3):326332, 1999. [17] A. Ratle. Accelerating the convergence of evolutionary algorithms by tness landscape approximation. In A. Eiben, Th. B ack, M. Schoenauer, and H.-P. Schwefel, editors, Parallel Problem Solving from Nature, volume V, pages 8796, 1998. [18] A. Ratle. Optimal sampling strategies for learning a tness model. In Proceedings of 1999 Congress on Evolutionary Computation, volume 3, pages 20782085, Washington D.C., July 1999. [19] R. G. Regis and C. A. Shoemaker. Local function approximation in evolutionary algorithms for the optimization of costly functions. Evolutionary Computation, IEEE Transactions on, 8(5):490505, 2004. [20] Jerome Sacks, William J. Welch, Toby J. Mitchell, and Henry P. Wynn. Design and Analysis of Computer Experiments. Statistical Science, 4(4):409423, November 1989. [21] T. Simpson, T. Mauery, J. Korte, and F. Mistree. Comparison of response surface and Kriging models for multidiscilinary design optimization. Technical Report 98-4755, AIAA, 1998. [22] Vladimir N. Vapnik. Statistical Learning Theory. Wiley-Interscience, September 1998. [23] F.A.C. Viana. SURROGATES Toolbox Users Guide. Gainesville, FL, USA, version 2.1 edition, 2010. [24] L. Willmes, T. Baeck, Y. Jin, and B. Sendho. Comparing neural networks and kriging for tness approximation in evolutionary optimization. In Proceedings of IEEE Congress on Evolutionary Computation, pages 663670, 2003.

25

Potrebbero piacerti anche