Sei sulla pagina 1di 10

Optimized Diagnostic Model Combination for Improving Diagnostic Accuracy

Surya Kunche Center for Advanced Life Cycle Engineering, University of Maryland, College Park, MD 20742 USA ksurya@calce.umd.edu Chaochao Chen Center for Advanced Life Cycle Engineering, University of Maryland, College Park, MD 20742 USA chaochao@calce.umd.edu Michael G. Pecht Center for Advanced Life Cycle Engineering, University of Maryland, College Park, MD 20742 USA pecht@calce.umd.edu

AbstractIdentifying the most suitable classifier for diagnostics is a challenging task. In addition to using domain expertise, a trial and error method has been widely used to identify the most suitable classifier. Classifier fusion can be used to overcome this challenge and it has been widely known to perform better than single classifier. Classifier fusion helps in overcoming the error due to inductive bias of various classifiers. The combination rule also plays a vital role in classifier fusion, and it has not been well studied which combination rules provide the best performance during classifier fusion. Good combination rules will achieve good generalizability while taking advantage of the diversity of the classifiers. In this work, we develop an approach for ensemble learning consisting of an optimized combination rule. The generalizability has been acknowledged to be a challenge for training a diverse set of classifiers, but it can be achieved by an optimal balance between bias and variance errors using the combination rule in this paper. Generalizability implies the ability of a classifier to learn the underlying model from the training data and to predict the unseen observations. In this paper, cross validation has been employed during performance evaluation of each classifier to get an unbiased performance estimate. An objective function is constructed and optimized based on the performance evaluation to achieve the optimal bias-variance balance. This function can be solved as a constrained nonlinear optimization problem. Sequential Quadratic Programming based optimization with better convergence property has been employed for the optimization. We have demonstrated the applicability of the algorithm by using support vector machine and neural networks as classifiers, but the methodology can be broadly applicable for combining other classifier algorithms as well. The method has been applied to the fault diagnosis of analog circuits. The performance of the proposed algorithm has been compared to other combination rules in the literature. It is observed that the proposed combination rule performs better in reducing the number of false positives and false negatives.

1. INTRODUCTION
The field of prognostics and health management (PHM) involves the development of technologies and methodologies to increase the availability and reliability of engineering systems [2] [3] [4]. As part of the PHM regimen, diagnostic algorithms have been developed and employed to assist in fault diagnosis. As the number of available PHM algorithms has increased, the algorithm selection process has become more complex and difficult. Dilemmas for users when choosing a diagnostic algorithm in PHM include the following: whether the data utilized for training is a suitable representation of the global population; how well the algorithm will perform in a noisy environment; and how well the algorithm will perform on data that it has not encountered (i.e. generalizability). A method is needed to quickly identify appropriate algorithms that meet the performance requirements for specific applications. Ensemble learning is one technique that has been employed to improve generalizability [1] as well as in situations where the training data is not a suitable representation of the global population. In ensemble learning, a collection of classifiers are trained simultaneously, and the results are combined in a suitable manner (also referred to as the combination rule) to improve performance. Generally, the approach of ensemble learning is to train a diverse set of classifiers and then devise a method to combine these trained classifiers. Diversity in the ensemble learning context refers to different classification outputs from each of the trained classifiers for a given input sample set. Studies have reported on the importance of and methodologies for generating diverse classifiers. Brown et al. [5] provided a mathematical account of the role of diversity in ensemble learning and how it helps to improve classification accuracy. Theoretically, the more diverse the classifiers are, the less correlated the classifier outputs are with each other. As a result, the prediction based on each classifier of the ensemble could be complementary to each other, which implies when one classifier makes a prediction error, other classifiers could be correct. The complementary nature of these classifiers helps in offsetting potential errors, thereby providing greater generalizability for these algorithms.

TABLE OF CONTENTS 1. INTRODUCTION .......................................... 1 2. OPTIMIZED FUSION METHODOLOGY ....... 2 3. RESULTS AND DISCUSSION ........................ 6 4. CONCLUSIONS ............................................ 8 REFERENCES ......................................................... 8 BIOGRAPHIES ........................................................ 9

978-1-4673-1813-6/13/$31.00 2013 IEEE 1

Studies have been conducted on the diversity-generating stage of the classifiers. The most widely used methodology for diversity-generation focuses on manipulating the training data; i.e., supplying each classifier with a different set of manipulated training data (for example, training a classifier with only part of the training data). When a classifier is trained with a manipulated training data set, it typically generates diverse predictions. Commonly used methods for manipulating training data are bootstrapping, bagging [17], and boosting [18]. Diversity is also achieved by changing the adjustable parameters of a classifier being trained; for example, in neural networks diversity can be achieved by changing the initial weights, the number of hidden neurons, the activation function and the training algorithm [5]. Some researchers have proposed the use of an evolutionary algorithm to achieve the optimal amount of diversity during the training phase [19, 20, 21].

learning algorithm achieves minimal prediction errors on unseen data when there is an optimal balance of bias and variance [35]. Figure 1 shows the changes in errors including bias, variance, and total error as the value of model complexity changes. As seen in Figure 1, good generalizability is achieved by an optimal balance of bias and variance. = | | =

(1) (2) (3)

Point of least total error

The optimized fusion methodology discussed in this paper combines multiple classifiers based on their performance so as to achieve the least total error in fault detection, i.e., the least number of false and missed alarms. To achieve this optimized fusion, we compute bias and variance to evaluate the performance of each classifier. The performance of the classifiers is also validated with a cost function using cross validation. We then developed a framework for this methodology that combines all the classifiers. A comparative analysis using experimental data has been conducted to demonstrate the accuracy of this algorithm over other methodologies. We discuss our optimized fusion methodology in section 2. In section 3 we present 2 case studies wherein we used this methodology to perform diagnostics for different analog circuits including a sallen-key band pass filter and a biquad low pass filter. We also conducted a comparative analysis to evaluate the performance of this methodology. In section 4 we give concluding remarks.

Figure 1 Error change as a function of model complexity [33] [36]. Once diversity is achieved for the trained classifiers, a combination rule is used to combine the classification results. The most common means of classifier fusion include averaging [7, 8], majority voting [9, 10, 11], weighted majority voting [12], and a localized fusion [13,14] based approach that improves the weighted majority voting algorithm by evaluating the performance of classifiers in the neighborhood of the test points. Bonissone et al. [15] proposed a fusion methodology based on Cartesian and regression trees that reduces the computation time in the localized fusion. The error of any machine learning algorithm can be classified into two parts: the bias component and the variance component. Given a training set { , , = 1,2, , } for = 1,2, , where is the feature vector and is the corresponding class vector. Let be the classifier trained on training set and is the prediction of classifier for input feature . The bias component of the error is defined as shown in (1). As seen from the equation bias component describes the correctness of the model [36]. The variance of the classifier is as shown in (2). The variance component describes the precision of the model and how prediction varies with training set [36]. A machine 2

2. OPTIMIZED FUSION METHODOLOGY

Figure 2 SVM classification. Inductive bias [22] is defined as the set of assumptions that a classifier makes in order to classify a given set of features. For example, support vector machine classification (Figure 2) assumes that different classes can be separated by a hyper plane. In the case of k nearest neighbor classification, the

Figure 3 Optimized fusion framework. assumption is that the distance of a test point from the k situations, the training data constitute only a small part of the nearest neighbors of each test point determines the class of whole data population and may have a lot of noise, which the test point, i.e., the test point belongs to the class to which can lead to misclassification of unseen data sets. this distance is least. These assumptions may not necessarily Bootstrapping is a potential solution to this problem. be true in all instances; for example, as seen in Figure 2, the hyperplane of the support vector machine is not able to BootstrappingThe bootstrapping method was originally correctly determine the decision boundary in this two-class proposed by Efron [23]. When bootstrapping is used, (blue and green) classification problem. Hence, the resampling of training data is done such that sample assumptions made by the classifiers lead to an error known observations are picked randomly by replacement from the as inductive bias in classification, which is a classification original training data to form new training data sets which error. are the same size as the original training data set [24]. Considering an original training data which has N In classifier fusion the complementary features of the observations, the bootstrapped training sets also consist of N classifiers are employed to overcome their individual errors. observations but these observations are picked by randomly To achieve this, classifier algorithms are initially trained on resampling from original training data set. Individual the training data set, and then the classification outputs of classifiers are then trained on these data sets. Once the the classifier algorithms are combined. The method for classifiers are trained, their classification performance can combining the results of all these classifiers is known as the be evaluated by cross-validation, wherein they are evaluated fusion or combination rule. Figure 3 shows the proposed on observations that they have not been trained on. This framework for the fusion of an ensemble of classifiers. The procedure gives an unbiased estimate of the classification framework can be divided into three partsalgorithm error and is discussed in detail in section performance training, fusion parameter computation, and classifier fusion. evaluation. These three parts are discussed in the following subsections. The aim of bootstrapping is to increase the diversity of the Algorithm Training algorithm. As suggested previously, increasing the diversity during training improves the generalizability when a suitable When training the classifiers, the training data must be combination rule is applied. Diversity has been achieved by representative of the global data population. In most using bootstrapped training data with different initial

weights in the neural networks as well as using various classifiers (neural networks and support vector machines). ClassifiersOnce the bootstrapped training data have been generated, the classifiers need to be trained on these samples. Different classifiers are employed to overcome the problem of inductive bias. In this paper, a support vector machine and neural networks are employed. Each member/classifier of the ensemble is trained to be a classification expert on the individual bootstrapped samples. Support vector machine (SVM) classification is based on VapnikChervonenkis theory for structural risk minimization [30]. The objective of SVM is to find the optimal hyperplane + where w is the normal to the hyperplane and ||/ is the distance of the hyperplane from the origin of the coordinate system [31]. Given an input feature vector , , , . , where and its corresponding class is , , , . , , where {1, 1}, the aim of the support vector machine is to find an optimal hyper plane such that the objective function shown in Equation (4) is minimized subject to the constraints in Equation (5). To solve for the hyper plane, the following objective function is minimized which results in a hyperplane that optimally separates the two classes.
min , = +

Performance EvaluationTo evaluate the classification performance, the bias and variance errors of each classifier need to be computed for the unseen observations by crossvalidation. Bias and variance errors cannot be minimized simultaneously because a reduction in either one of the two components could lead to an increase in the other, as shown in Figure 1. Therefore, the total error of a classifier is used to evaluate classification performance, which is a combination of both of these factors, as shown below [33]: = + 6 To estimate the bias of the classifiers, conventional validation methods will segment the training data into disjoint sets, e.g., training and validation data sets. For example, a data set can be segmented into two parts wherein 70% of the data are used to train the classifier and the remaining 30% are used for optimizing the classifier parameters [34]. But such a method is biased by the validation data. To choose how much data to use for training and validation, cross-validation has been thought to provide an unbiased estimate. In cross-validation, each of the trained classifiers is evaluated for accuracy on unseen observations. If training data are resampled into B bootstrapped sample sets, each classifier is trained on different sample sets. Once all the classifiers are trained, the performance of each classifier is evaluated on the unseen observations in the training data. The errors computed are the false positive error ( ) and false negative ( ) error, as shown in (7). =

(4) (5)

. . + 1 {1, 1}

where is the slack variable or the distance margin introduced to allow any misclassifications, and is the penalty or the cost for the misclassifications. The function is a mapping of to a higher dimensional space. A neural network is also used for performing classification. Given an input feature vector , , , , , where and its corresponding class , , , , {1, 1}, the neural network has N neurons in the input layer (where N is the size of input feature vector), a hidden layer and an output node [32]. A sigmoid activation function is used in each neuron in the hidden layer. The output of the neural network is the class label of the input features. We used a gradient-based approach to train the neural network. Fusion Parameter Computation The diverse classifiers generated in the previous step need to be suitably combined. Here, a cost function has been formulated and minimized to obtain the most suitable combination of these classifiers. The fusion parameter computation includes two steps: performance evaluation and fusion optimization.

1 = 1, | | : 1

(7) =

1 = 1, | | : 1

where is the output of the classifier trained on the bootstrap sample set , : 1 ; is the actual class output with the input feature ; N is the sample number in the training data; and Mbi is the sample index in the bootstrap sample set for ith observation , and |Mbi| is the number of such samples. In ensemble learning, classifiers with high variance are susceptible to small changes in the input features. For example, neural networks that have too many hidden layers and nodes can have an over-fitting problem that results in high variance and low bias and therefore poor generalizability performance [37]. The variance of the classifier is given by equation (8): 4

= : 1 (8)

= + (12)

where, = 1 0 < , , , < 1 (13) (14)

where is the expected fusion outcome of the classifiers. Since a weighted fusion methodology will be employed, the expected value is given by the following equation: = (9)

where is the weight of the ith classifier. Fusion OptimizationThe primary objective of this fusion methodology is to equip users with a tool that is capable of optimally combining the results of various diagnostic algorithms. The fusion optimization method helps achieve a good biasvariance balance, thereby providing good generalizability. The errors of false positive and false negative are calculated via cross-validation, as discussed in the previous section. For a given classifier , let the false positive error be and the false negative error be , : 1 ( i.e., there are B classifiers trained on the bootstrapped data). Let us assume that the cost of having a false positive is and that the cost for a false negative is . The cost factors, and , serve as prioritizing parameters, and they are relative terms that are used to prioritize the significance of false positives and false negatives in the cost function. For users who depend on system diagnostics for scheduling maintenance, the cost of a false positive is the cost incurred when a healthy system is erroneously classified as faulty, and the cost of a false negative is occurred due to the erroneous classification of a faulty system as healthy. These costs may not necessarily be tangible; some of the intangible types of cost, such as safety, customer satisfaction, availability, etc., could be incorporated. Quantifying these costs is out of the scope of this paper as these costs typically vary across organizations and applications. The total cost of the misclassification is as shown in Equation (10). The cost factors and can be changed based on the condition shown in Equation (11) = + (10) + = 1 (11)

To minimize the above nonlinear Equation (12), nonlinear optimization techniques need to be used to find the optimal weights so as to reduce the objective function value. This problem of minimization is equivalent to a constrained nonlinear minimization [26]. The idea of using Sequential Quadratic Programming (SQP) is to find the optimal weights in B dimensional space [27]. SQP is an iterative quadratic programmingbased approximation technique that is used to find the optimal values for the objective function in Equation (15) subject to equality and inequality constraints, where is the set of weights , , , . The non-linear problem in this case is the minimization of the objective function subject to inequality and equality constraints in the vectors shown in Equations (16) and (17), respectively. The set of all points in the constrained space are referred to as the feasible set of the optimization problem. The algorithm can converge to an optimal state if the initial seed point is close to the optimal parameters values. The initial seed point can move closer to the optimal solution in each iteration of the SQP computation. min s.t. 0 = 0 (15) (16) (17)

A slack variable z is introduced in Equation (16) such that + = 0 and 0. The Lagrangian for the minimization problem is given by Equation (18), where and are the set of Lagrangian multipliers associated with each of the constraints. This Lagrangian is sometimes referred to as the extended Lagrangian because it considers the slack variable . The feasible solution for the above constrained optimization problem needs to satisfy the first order necessary optimality conditions as shown in Equations (19)-(23) [28]. , , = + + + = 0 + = 0 = 0 0 = 0 5 (18) (19) (20) (21) (22) (23)

Now, we use a weighted sum of all the classifiers to obtain the final result. Let , , , be the weights assigned to the each classifier. The objective function associated with the fusion of all these classifiers is as shown in (12). This objective function consists of two main components: the bias component and the variance component.

The operator is used to define the gradient of a particular function. Hence and are the gradients of and , respectively, at . The optimization problem is obtained by solving a sequence of QP subproblems, as shown in Equations (24)-(27) [28].

of their nominal value. The condition of the circuit is defined as faulty when the components parameter values change by greater or lesser than 10% of the nominal values. Table 1 summarizes the 12 seeded faulty conditions. Table 1 Seeded Faulty Conditions Nominal Fault Condition Sample Faulty Value value C1 greater than 5nF 6nF,7nF, 8nF, 9nF, 10nF nominal C1 lesser than 4nF, 3.5nF, 3nF, 2.5nF, 5nF nominal 2nF C2 greater than 5nF 6nF,7nF, 8nF, 9nF, 10nF nominal C2 lesser than 4nF, 3.5nF, 3nF, 2.5nF, 5nF nominal 2nF R1 greater than 7.4k, 8.6k, 10k, 6.2k nominal 11.2k,12k R1 lesser than 5k, 4.35k, 3.72k, 6.2k nominal 3k, 2.5k R2 greater than 7.4k, 8.6k, 10k, 6.2k nominal 11.2k,12k R2 lesser than 5k, 4.35k, 3.72k, 6.2k nominal 3k, 2.5k R3 greater than 7.4k, 8.6k, 10k, 6.2k nominal 11.2k,12k R3 lesser than 5k, 4.35k, 3.72k, 6.2k nominal 3k, 2.5k R4 greater than 7.4k, 8.6k, 10k, 1.6k nominal 11.2k,12k R4 lesser than 5k, 4.35k, 3.72k, 1.6k nominal 3k, 2.5k

+ + (24) s.t. + = 0 : 1 (25)

+ + + = 0 : 1 (26) * (27)

is the extended Lagrangian at iteration given by , , . in the above equation is the Hessian of the extended Lagrangian function . is the set of weights { , , , }computed in the iteration. and are the change vectors, i.e., they define the change of the variable and . This QP sub-problem is a quadratic sub-problem with linearized constraints. This iteration is solved to obtain and for the iteration. This can then be used to compute and for the next iteration + 1: = + = + (28) (29)

where is the parameter used to ensure the convergence of the optimization problem. The SQP is an iterative process, where a sequence of feasible points is computed such that they converge towards the optimal solution. The iteration can be terminated when the solution to the kth iteration meets the optimality conditions (19)-(23). Fusion In this step, all the classifier model outcomes are combined with the weights computed by SQP optimization. The fusion output is computed by using a weighted sum of the classifier outputs, as shown in Equation (30). The weighted sum of the outputs gives more priority to the classifiers that have shown better performance with the lowest cost of false positives and false negatives. = (30)

3. RESULTS AND DISCUSSION


The proposed diagnostic method was tested on a biquad low pass filter as shown in Figure 4 [29]. The biquad low pass filter circuit has critical components that affect the frequency response of the circuit. The critical components for the circuit were capacitors C1 and C2, and resistors R1, R2, R3 and R4 as shown in Figure 4. The circuit is defined as healthy when the critical component values vary within 10% Figure 4 Biquad low pass filter

The circuit was excited by using an input sweep signal of bandwidth 1100 kHz for 100 ms. The output response of the circuit was then captured using an NI USB-6212 data acquisition board at a sampling rate of 200k samples/sec. The bandwidth of the sweep signal was larger than the operation frequency of the circuit. The output signal decomposed into 8 levels by using discrete wavelet transform. To obtain healthy features, a sweep signal was applied to the circuit with nominal values for the critical components for 50 times. Faulty features were obtained by exciting the circuit at each of the faulty values shown in Table 1 for 10 times each. Hence, we obtained 10 5 = 50 responses for each fault condition. The total output responses, including healthy and faulty, are 50 + 10 5 12 = 650. The feature set was divided equally into two parts: one for training and the other for testing. Hence, the training and test data sets are each 325 8 in dimension. Both the training and test data sets consist of a healthy data set 25 8 in dimension and an unhealthy data set 300 8 in dimension. Table 2 shows the comparative analysis on LS-SVM and fusion techniques. The results indicate a very similar

25 Number of Classification Errors 20 15 10 5 0 0 0.25 0.5 0.75 Cost of False Positive (C1) 1 Number of False Positives Number of False negatives

Figure 5 Change of false positive/ negative with cost parameter C . technique had a better demonstrate that the optimized fusion performance when compared to other classification techniques. The performance of the algorithms was also analyzed with a truncated training data set of size 195 8. The training set contained healthy data set of 15 8 in dimension and unhealthy feature of 180 8 in dimension. Good performance in fault detection can be observed by

Table 2 Comparative Analysis Neural Least Square Support Averaging Network Vector machine False Positives (percentage) False Negatives (percentage) 0 (0%) 0 (0%) 1 (4%) 3 (1%) 2 (8%) 1 (0.33%)

Localized weighted Fusion 0 (0%) 0 (0%)

Optimized Fusion 0 (0%) 0 (0%)

Table 3 Comparative Analysis with AWGN (SNR 10dB) Neural Least Square Support Localized weighted Averaging Network Vector machine Fusion False Positives (percentage) False Negatives (percentage) 4 (16%) 21 (7%) 4 (16%) 23 (7.67%) 5 (20%) 23 (7.67%) 3 (12%) 19 (6.34%)

Optimized Fusion 2 (8%) 10 (3.34%)

Table 4 Comparative Analysis with Truncated Training Data Neural Least Square Support Localized weighted Averaging Network Vector machine Fusion False Positives (percentage) False Negatives (percentage) 6 (24%) 36 (12%) 9 (36%) 43 (14.34%) 8 (32%) 40 (13.34%) 5 (20%) 33 (11%)

Optimized Fusion 5 (20%) 24 (8%)

performance for LS-SVM, localized weighted fusion and the optimized fusion technique. Averaging technique on the other hand had a very high false positive percentage. Table 3 shows the comparative analysis of the algorithms when additive white Gaussian noise with an SNR of 10dB was introduced in the extracted features. The results clearly

using the optimized fusion technique as shown in Table 4. From the Figure 5 we observe that the cost parameters C1 and C2 can be used to prioritize the reduction in false positives and false negatives.

engineering systems, Expert Systems with Applications, vol. 39, no. 10, pp. 9031-9040, 2012. [5] G. Brown, J. Wyatt, R. Harris, X. Yao, Diversity Creation Methods: A Survey and Categorisation, Journal of Information Fusion, Vol. 6, No. 1, 2005. [6] A. Sharkey, Combining artificial neural nets: Ensemble and Modular Multi-net Systems, Springer-Verlag, 1999. [7] S. Hashem, and B. Schmeiser, Improving model accuracy using optimal linear combination of trained neural networks, IEEE Transactions on Neural Networks, 1995, Vol. 6, No. 3, 792-794. [8] L. Xu, A. Krzyzak, C. Y. Suen, Methods of combining multiple classifiers and their application to handwritten recognition, IEEE Transactions on Systems, Man, and Cybernetics Vol. 22, No. 3, pp 418-435, 1992. [9] L.I. Kuncheva, Combining Pattern Classifiers, Methods and Algorithms. New York, NY: Wiley Interscience 2005. [10] F. Kimura, M. Shridhar, Handwritten Numerical Recognition based on Multiple Algorithms, Pattern Recognition, Vol. 24 No. 10, 969-983. [11] J. Franke, E. Mandler, A Comparison of two Approaches for combining the votes of cooperating classifiers, Proceesings of 11th international conference on pattern recognition, Vol. 2, pp 611-614, 1992. [12] N. Littelestone and M. Warmuth, The Weighted Majority Algorithm, Information and Computation, Volume 108, Issue 2, Pages 212261, 1994. [13] W. P. Kegelmeyer and K. Bowyer, Combination of Multiple Classifiers Using Local Accuracy Estimates, IEEE Transactions on Pattern Analysis and Machine Learning, Vol. 19, no. 4, pp 405-410, 1997. [14] F. Xue, R. Subbu, P. Bonissone, Locally Weighted Fusion of Multiple Predictive Models, International Joint Conference on Neural Networks, 2006. [15] P. Bonissone, F. Xue, R. Subbu, Fast meta-models for local fusion of multiple predictive models, Applied Soft Computing, Volume 11, Issue 2, March 2011, Pages 15291539. [16] N. Poh, J. Kittler, A Unified Framework for Biometric Expert Fusion Incorporating Quality Measures, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, Issue 1, 2012. [17] L. Breiman, Bagging predictors, Machine Learning, vol. 24, no. 2, pp. 123140, 1996.

4. CONCLUSIONS
The generalizability of an algorithm is affected by many factors, for example, noise, training data which is not a good representation of global population, inductive bias of classifiers, etc. Classifier fusion has been widely used to improve the generalizability. While bias compensation has been a widely accepted strategy in the literature to improve the classification performance of fused classifiers, variance also plays a significant role in the performance of the classifiers. It has been discussed in the literature that an optimal balance between bias and variance led to good generalizability. In this paper we proposed a novel combination rule to improve the classification accuracy of diagnostic algorithms and its generalizability. The objective function formulated can achieve an optimum balance between bias and variance and find the optimal classifier combination strategy using sequential quadratic programming. Bootstrapping was employed to reduce the ambiguity about how well the training data is representative of the global population. Cross validation was used to compute an unbiased estimate of the bias and variance of each classifier. The experimental results demonstrate that the proposed fusion methodology has significant fault diagnosis performance improvement over other fusion methodologies and individual classifiers (neural network and support vector machine). We compared the performance of the algorithm by varying the cost factors and the results demonstrate that the cost factor used in the bias equation helps in prioritizing either the false positives or the false negatives.

REFERENCES
[1] R. Polikar, Ensemble based systems in decision making, IEEE Circuits and Systems Magazine, vol.6, no.3, pp.21-45, Third Quarter 2006. [2] N.M., Vichare, M. Pecht, Prognostics and health management of electronics, IEEE Transactions on Components and Packaging Technologies, vol. 29, no. 1, pp. 222-229, 2006. [3] C. Chen, B. Zhang, G. Vachtsevanos, and M. Orchard, Machine condition prediction based on adaptive neurofuzzy and high-order particle filtering, IEEE Transactions on Industrial Electronics, vol. 58, no. 9, pp. 4353-4364, 2011. [4] C. Chen, D. Brown, C. Sconyers, B. Zhang, G. Vachtsevanos, M. Orchard, An integrated architecture for fault diagnosis and failure prognosis of complex

[18] R.E. Schapire, The strength of weak learnability, Machine Learning, vol. 5, no. 2, pp. 197227, 1990. [19] A.Chandra, X. Yao, Evolving hybrid ensembles of learning machines for better generalization, Neurocomputing, vol. 69, pp. 686-700, 2006. [20] Y. Liu, X. Yao, T. Higuchi, Evolutionary ensembles with negative correlation learning, IEEE Transactions on Evolutionary Computation, vol. 4, no. 4, pp. 380, 2000. [21] H.A. Abbass, Pareto neuro-evolution: constructing ensemble of neural networks using multi-objective optimization, IEEE Conference on Evolutionary Computation, vol. 3, pp. 20742080, 2003. [22] P.E.Utgoff, Machine Learning of Inductive Bias, Kluwer Academic Publishers, 1986. [23] S. Tuffry, Data Mining and Statistics for Decision Making. Wiley; 2nd edition, 2011. [24] B. Efron, Bootstrap Methods: Another look at the Jackknife, Annals of Statistics, Vol. 7, No1, pp 1-26, 1979. [25] B. LeBaron and A. S. Weigend, A bootstrap evaluation of the effect of data splitting on financial time series, IEEE Transactions on Neural Networks, Vol.9, No. 1, January 1988. [26] P. Gill, and E. Wong, Sequential Quadratic Programming Methods, Mixed Integer Nonlinear Programming in The IMA Volumes in Mathematics and its Applications, pp 147-224, Springer 2012 [27] W. Forst and D. Hoffmann, OptimizationTheory and Practice. Springer, 2010 [28] Byrd, R. H, J. C Gilbert and J. Nocedal, A trust region method based on interior point techniques for non-linear programming, Mathematical Programming, Vol.89, No.1, pp149-185, 2000. [29] A. Vasan, B. Long, and M. Pecht, Experimental Validation of LS-SVM Based Fault Identification in Analog Circuits Using Frequency Features, World Congress on Engineering Asset Management, 2011, 6th Annual Conference, Cincinnati, Ohio, Oct. 2011. [30] K. Vapnik, The nature of statistical learning theory. Springer-Verlag, 1995. [31] C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121167, 1998. [32] K. Gurney, K. N. Gurney, An Introduction to Neural Networks. CRC Press, 1997.

[33] S. Geman, E. Bienenstock, R. Doursat, Neural networks and the bias/variance dilemma, Neural Computation 4 (1) (1992) 15. [34] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Second edition, August 2009. [35] E. Briscoe, J. Feldman, Conceptual complexity and the bias/variance tradeoff, Cognition Vol. 118 pp. 216, 2011. [36] O. Maimom and L. Rokach, Data Mining and Knowledge Discovery Handbook. 2nd edition, Springer, pp. 733-746, 2010. [37] Y. Liu; J.A. Starzyk, Z. Zhu, "Optimized Approximation Algorithm in Neural Networks Without Overfitting," IEEE Transactions on Neural Networks, vol.19, no.6, pp.983-995, June 2008.

Biographies
Surya Kunche received his B.S. degrees in electronics and communication Engineering from Vellore Institute of Technology, Vellore, India. He is currently working towards his M.S. degree in reliability engineering at the Center for Advanced Life Cycle Engineering (CALCE), University of Maryland, College Park. He was a part of the team that won the 2012 IEEE PHM data challenge contest. His research interests focus on fusion techniques for prognostics and health management (PHM) and PHM software development. Chaochao Chen is a member of the research staff at the Center for Advanced Life Cycle Engineering (CALCE). His research areas include fault diagnosis and failure prognosis, focusing on data-driven approaches such as machine learning and statistical methods, prediction uncertainty management, prognostics and health management (PHM) software implementation, verification and validation, fault tolerant control and their applications to robotics, electronics, battery and various mechanical systems. Prior to joining CALCE, Dr. Chen spent over three years at the University of Michigan and Georgia Institute of Technology as a research fellow, working in PHM areas in collaboration with multiple organizations in industry and the military. He has published over 20 technical papers including those in IEEE Transactions on Industrial Electronics, IEEE Transactions on Instrumentation and Measurement, Mechanical Systems and Signal Processing, Expert Systems with Applications, ASME Journal of Dynamic Systems, Measurement, and Control. He has served as a session chair and been invited to give talks for

several reputed international conferences. He received his PhD in Mechanical Engineering from Kochi University of Technology, Japan, in 2007. Michael Pecht (M83SM90F92) received the M.S. and Ph.D. degrees in engineering mechanics from the University of Wisconsin, Madison. He is the Founder of Center for Advanced Life Cycle Engineering, University of Maryland, College Park, which is funded by over 150 of the worlds leading electronics companies at more than U.S.$6 million/year. He is also a Chair Professor in mechanical engineering and a Professor in applied mathematics with the University of Maryland. He is the Chief Editor for Microelectronics Reliability. He has written more than twenty books on electronic product development, use, and supply chain management and over 500 technical articles. He consults for 22 major international electronics companies, providing expertise in strategic planning, design, test, prognostics, IP, and risk assessment of electronic products and systems. Dr. Pecht is a Professional Engineer, an ASME fellow, a SAE fellow, and an IMAPS fellow. Prior to 2008, he was the recipient of European Micro and Nano-Reliability Award for outstanding contributions to reliability research, 3M Research Award for electronics packaging, and the IMAPS William D. Ashman Memorial Achievement Award for his contributions in electronics reliability analysis. In 2008, he was the recipient of the highest reliability honor, the IEEE Reliability Societys Lifetime Achievement Award. In 2010, he was the recipient of the IEEE Exceptional Technical Achievement Award. He served as a Chief Editor of the IEEE TRANSACTIONS ON RELIABILITY for 8 years and on the Advisory Board of IEEE Spectrum. He is an Associate Editor for the IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGY

10

Potrebbero piacerti anche