Support Vector Machines Based Model For Predicting Software Maintainability of Object-Oriented Software Systems

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 2, ISSUE 5, MAY 2012
23
Support Vector Machines based Model for Predicting Software Maintainability of Object-Oriented Software Systems
S. O. Olatunji1 and Hossain Arif2
1
College of Computer Science and Engineering, King Fahd University of Petroleum & Minerals, Saudi Arabia (on leave from Adekunle Ajasin University, Akungba Akoko, Ondo State, Nigeria) 2 School of Engineering and Computer Science, BRAC University 66 Mohakhali, Dhaka 1212, Bangladesh.
Abstract This paper presents a maintainability prediction model for an object-oriented (OO) software system based on support vector machines. As the number of object-oriented software systems increases, it becomes more important for organizations to maintain those systems effectively. However, currently only a small number of maintainability prediction models are available for object oriented systems. In this work, we develop a support vector regression maintainability prediction model for an object-oriented software system. The model was constructed using earlier published object-oriented metric datasets, collected from different object-oriented systems. Prediction accuracy of the model was evaluated and compared with other commonly used regressionbased models and also with Bayesian network based model which was earlier developed using the same datasets. Empirical results from experiments carried out indicates that the proposed SVM model produced better and promising results in terms of prediction accuracy measures authorized in OO software maintainability literatures, compared to other earlier implemented models on the same datasets. Index Terms Support Vector Machines, Object-Oriented software systems, Software Metrics, Software Maintainability prediction models.
u
1.0 Introduction Software maintainability is the process of modification of a software product after delivery to correct faults, to improve performance or other attributes, or to adapt the product to a changed environment. Maintaining and enhancing the reliability of software during maintenance requires that software engineers understand how various components of a design interact. People usually think of software maintenance as beginning when the product is delivered to the client. While this is formally true, in fact decisions that affect the maintenance of the product are made from the earliest stage of design. Software maintenance is classified into four types: corrective, adaptive, perfective and preventive. Corrective maintenance refers to fixing a program. Adaptive maintenance refers to modifications that adapt to changes in the data environment, such as new product codes or new file organization or changes in the hardware or software environments. Perfective maintenance refers to enhancements: making the product better, faster, smaller, better documented, cleaner structured, with more functions or reports. The preventive maintenance is defined as the work that is done in order to try to prevent malfunctions or improve maintainability. When a software system is not designed for maintenance, it exhibits a lack of stability under change. A modification in one part of the system has side effects that ripple throughout the system. Thus, the main challenges in software maintenance are to understand existing software and to make changes without introducing new bugs. It is arguable that many object-oriented (OO) software systems are currently in use. It is also arguable that the growing popularity of OO programming languages, such as Java, as well as the increasing number of software development tools supporting the Unified Modelling Language (UML), encourages more OO systems to be developed at present and in the future. Hence it is important that those systems are maintained effectively and efficiently. A software maintainability prediction model enables organizations to predict maintainability of a software system and assists them with managing maintenance resources. In addition, if an accurate maintainability prediction model is available for a software system, a defensive design can be adopted. This would minimize, or at least reduce future maintenance effort of the system. Maintainability of a software system can be measured in different ways. Maintainability could be measured as the number of changes made to the code during a maintenance period or be measured as effort to make those changes. The predictive model is called a maintenance effort prediction model if maintainability is measured as effort. Unfortunately, the number of software maintainability prediction models including maintenance effort prediction models, is currently very small in the literature. In this research work, we developed a new maintainability prediction model for an object-oriented software system based
2012 JICT www.jict.co.uk

on support vector machines (SVM). SVM has proved itself to be one of the most sought after regression techniques in recent time due to its accurate and reliable performance, especially in face of scarce dataset where few data points are often present. The rest of this paper is organized as follows. Section 2 contains review of related earlier works. Section 3 describes support vector machine that is proposed in this work. Section 4 contains empirical studies and discussions that include comparison with other models. It also contains the description of the dataset used and the metrics used in our study and their descriptions. Sections 5 concludes the paper. 2.0 Review of Related Work
24
In [1], Aggarwal et al. studied the application of artificial neural networks (ANN) using object-oriented (OO) metrics for predicting software maintainability. Predicting software quality includes estimating maintainability of software. In their empirical study, Aggarwal et al. used the principal components of eight OO metrics as independent variables, and maintenance effort as the dependent variable. The eight independent variables were Lack of Cohesion (LCOM), Number of Children (NOC), Depth of Inheritance (DIT), Weighted Methods per Class (WMC), Response for a Class (RFC), Data Abstraction Coupling (DAC), Message Passing Coupling (MPC), and Number of Methods per Class (NOM). The dependent variable was maintenance effort per class, which was measured by using the number of lines changed per class. Aggarwal et al. used data from commercial software products UIMS (User Interface System) and QUES (Quality Evaluation System) in their empirical study. Mean Absolute Relative Error (MARE) was the primary measure used for evaluating the model performance. Many software measurement researchers use MARE as the preferred error measure for empirical studies. The formula to calculate MARE is as follows [2]:
series of piecewise regression splines. As a result, MARS is much more suitable for modeling complex relationships than other modeling techniques. To build the MARS models, the authors made use of the Li and Henrys data sets, UIMS and QUES, obtained from two different object-oriented systems. In their study, maintainability was measured as the number of changes made to code during a maintenance period. To evaluate the benefits of MARS over other modeling techniques, the prediction performances of the MARS models were compared with those of the multivariate linear regression models, the artificial neural network models, the regression tree models, and the support vector models using leave-oneout cross-validation. The results show that the MARS models can accurately predict the maintainability of OO software systems. For the UIMS data set, the MARS model performed as accurately as the best prediction model among the other techniques. For the QUES data set, the MARS model achieved the best prediction accuracy compared to all the other four prediction models. However, one limitation of the study was that the metric data was collected from two systems implemented with a single language. Further research work is needed to replicate this study across multiple programming languages and platforms. This would help us confirm the capability of MARS in software maintainability prediction. Another interesting future work will be to develop more accurate prediction models by combining MARS with other prediction techniques. In [4], Singh et al. built a Support Vector Machine (SVM) model and conducted an empirical study to investigate the relationship between object-oriented metrics given by Chidamber and Kemerer and fault proneness. SVM is an effective technique for performing data classification, and has been successfully used in diverse applications such as text classification [5], pattern recognition [6], Chinese character classification [7], face identification, medical diagnosis, and identification of organisms [8]. In their study, Singh et al. investigate the answers to the two following questions: Are the predictions given by SVM fault proneness models feasible and adaptable? How accurately and precisely do the OO metrics predict faults in software?
Aggarwal et al. found the results of the ANN model showed that the Mean Absolute Relative Error (MARE) was 0.265. Maintenance effort was estimated within 30 percent of the actual maintenance effort in more than 72 percent of the classes in the validate set. So the authors claimed that the ANN models and the independent variable used are good predictors of maintenance effort. However, one limitation of the study is that the performance of ANN model is to a large degree dependent on the data on which they are trained. Further empirical studies must be carried out with large data sets to get an accurate measure of performance outside the development population. Zhou and Leung used a novel exploratory multivariate analysis technique called Multivariate Adaptive Regression Splines (MARS) for predicting maintainability of object oriented software [3]. Zhou and Leung claim that MARS attempts to adapt to the unknown functional form using a
The proposed model was empirically evaluated using the public domain KC1 NASA data set. The SVM model was used to predict the effect of OO metrics on fault proneness, which the authors claim was the first of such kind of research. For modeling with SVM, the recommended kernel function is the Radial basis Function (RBF). Singh et al. therefore used RBF function in SVM modeling to predict faulty classes in their study. The performance of the model predictions was evaluated using sensitivity, specificity, precision, completeness, and Area Under the Curve (AUC), as explained in [9]. Receiver Operating Characteristic (ROC) analysis ROC curve which is defined as a plot of sensitivity on the ycoordinate versus its 1-specificity on the x coordinate, is an effective method of evaluating the quality or performance of such prediction models [10]. Based on their study, Singh et al. concluded that the SVM method yielded good AUC using ROC analysis, and confirmed that construction of the SVM

models is feasible, adaptable to OO systems, and useful in predicting fault prone classes. Such models can be very helpful for planning and performing testing by focusing resources on fault-prone parts of the design and code. One limitation of the study is that the analysis is based on only one data set. As future work, this study should be replicated on different data sets to generalize the findings. Olatunji et al. developed an extreme learning machine (ELM) maintainability prediction model for object-oriented software systems [11]. They based their model on extreme learning machine algorithm for single-hidden layer feed-forward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. ObjectOriented (OO) software datasets published by Li and Henry [12] were used in the study. 3.0 Proposed Support Vector Machines Where the coefficients Support Vector Machines (SVMs) are modern learning systems that deliver state-of-the-art performance in real world Pattern Recognition and data mining applications such as Text Categorization, Hand-Written Character Recognition, Image Classification, Material Identification and others. Support Vector Machines is a recently proposed technique for both prediction and classification. It deals with kernel neuron functions instead of sigmoid-like ones, which allows projection to higher planes and solves more complex nonlinear problems. It has featured in a wide range of journals, often with better results [13-17]. Generally, in prediction and classification problems, the purpose is to determine the relationship among the set of input and output variables of a given dataset D = {Y , X } where
25
space
ur u ( x)
and constructs an Optimal Separating
Hyperplane (OSH), which maximizes the margin, the distance between the hyper plane and the nearest data points of each class in the space . Different mappings construct different SVMs. The mapping () is performed by a kernel function
u ur r u K ( xi , x j ) which defines an inner product in the space . The decision function implemented by SVM
can be written as [18, 19]:
ur u r u r N f ( x ) = sgn yi i K ( x, xi ) + b i =1
(1)
are obtained by solving the
following convex Quadratic Programming (QP) problem: Maximize
i =1 i
u ur r u 1 N N i =1 j =1i j yi y j K ( xi , x j ) 2
Subject to
0 i C
N
(2)
X R p represents the n-by-p matrix of p input variables.

It may be noted that Y R for forecasting problems and classification problems. Suppose Y R for
i =1
yi = 0
i = 1, 2,.....N .
D = { yi , xi1 ,, xip } is a training set for all i = 1,, n
of
input variables, Xj where [Xj = (x1j, .... xnj) T] for j = 1,, p , and the output variables, Y = ( y y )T . The lower case 1 n letters
In the equation (2), C is a regularization parameter which controls the tradeoff between margin and misclassification error. These
xj
are called Support Vectors only if the
x i 1 , x i 2 , , x ip
for all i = 1, , n refer to the
corresponding i
> 0.
values of each observation of the input variables, and
y =k
Several typical kernel functions include:
Y all k = 1, 2,, c , where c 2 .

to the response variable
to refer to class
A k for
u ur r u u ur r u K ( xi , x j ) = ( xi x j + 1) d , u ur r u u ur d r u K ( xi , x j ) = exp xi x j ,
(3)
Here we briefly present the basic ideas behind SVM for pattern recognition, especially for the two-class classification problem, and refer readers to [18, 19], for a full description of the technique. According to [18, 19] the goal is to construct a binary classifier or derive a decision function from the available samples which has a small probability of misclassifying a future sample. SVM implements the following idea: it maps the input vectors
(4)
r x
Equation (3) is the polynomial kernel function of degree d which will revert to the linear function when d = 1. Equation (4) is the Radial Basis Function (RBF) kernel with one parameter [13, 15, 19]. Other kernel functions are:
into a high dimensional feature
26
Linear: And
K ( xi , x j ) = xiT x j
4.1 Dataset and the metrics Studied This study makes use of two OO software data sets published by Li and Henry (1993). These metric data were collected from a total of 110 classes in two OO software systems The first data set, UIMS, contains the metric data of 39 classes collected from a user interface management system (UIMS). The second data set, QUES, contains the metric data of 71 classes collected from a quality evaluation system (QUES). Both systems were implemented in Ada. The datasets consist of five C&K metrics: DIT, NOC, RFC, LCOM and WMC, and four L&H metrics: MPC, DAC, NOM and SIZE2, as well as SIZE1, which is a traditional lines of code size metric. Maintainability was measured in CHANGE metric by counting the number of lines in the code, which were changed during a three-year maintenance period. Neither UIMS nor QUES datasets contain actual maintenance effort data. The description of each metric is given in the table below:
Sigmoid:
K ( xi , x j ) = tanh( xiT x j + r ) .
are kernel parameters.
Here
, r , and d
4.0 Empirical Studies and Discussions In this work, we made use of OO software datasets published by Li and Henry (1993). In this section, we describe the datasets used for this study and the accompanying metrics, the quality measures used, empirical works and discussions with comparisons.
Table 1: Description of metrics
Metric WMC (Weighted methods per class) DIT (Depth of inheritance tree) RFC (Response for a class)
Description The sum of McCabes cyclomatic complexity of all local methods in a given class The length of the longest path from a given class to the root in the inheritance hierarchy The number of methods that can potentially be executed in response to a message being received by an object of a given class The number of classes that directly inherit from a given class. i.e. number of direct sub-classes that the class has The number of pairs of local methods in a given class using no attribute in common. number of disjoint sets of local methods, i.e. number of sets of local methods that do not interact with each other, in the class The number of send statements defined in a given class The number of abstract data types defined in a given class The number of methods implemented within a given class The number of semicolons in a given class The total number of attributes and the number of local methods in a given class Insertion and deletion are independently counted as 1, change of the contents is counted as 2
NOC (Number of children)
LCOM (Lack of cohesion in methods)
MPC (Message-passing coupling) DAC (Data abstraction coupling) NOM (Number of methods) SIZE1 (Lines of code) SIZE2 (Number of properties) CHANGE (Number of lines changed in the class)
27
DIT, NOC, RFC, LCOM, WMC, MPC, DAC, NOM, SIZE2, and SIZE1 are the features that are combined and made use of to predict the attribute CHANGE. QUES data 4.2 Characteristics of the datasets Table 2: Descriptive statistics of the UIMS data set Metric WMC DIT RFC NOC LCOM MPC DAC NOM SIZE1 SIZE2 CHANGE Maximum 69 4 101 8 31 12 21 40 439 61 289 75% 12 3 30 1 8 6 3 13 131 16 39 Median 5 2 17 0 6 3 1 7 74 9 18 25% 1 2 11 0 4 1 0 6 27 6 10 Minimum 0 0 2 0 1 1 0 1 4 1 2
set has 71 sample cases, whereas UIMS has 39 sample cases.
Mean 11.38 2.15 23.21 0.95 7.49 4.33 2.41 11.38 106.44 13.97 46.82
Standard deviation 15.90 0.90 20.19 2.01 6.11 3.41 4.00 10.21 114.65 13.47 71.89
Skewness 2.03 0.54 2.00 2.24 2.49 0.731 3.33 1.67 1.71 1.89 2.29
Table 3: Descriptive statistics of the QUES data set Metric WMC DIT RFC NOC LCOM MPC DAC NOM SIZE1 SIZE2 Maximum 83 4 156 0 33 42 25 57 1009 82 75% 22 2 62 0 14 21 4 21 333 25 Median 9 2 40 NA 5 17 2 6 211 10 25% 2 2 34 0 4 12 1 5 172 7 Minimum 1 0 17 0 3 2 0 4 115 4 Mean 14.96 1.92 54.44 0 9.18 17.75 3.44 13.41 275.58 18.03
Standard deviation 17.06 0.53 32.62 0.00 7.34 8.33 3.91 12.00 171.60 15.21
Skewness 1.77 0.10 1.62 NA 1.35 0.88 2.99 1.39 2.11 1.71
28
Metric CHANGE
Maximum 217
75% 85
Median 52
25% 35
Minimum 6
Mean 64.23
Standard deviation 43.13
Skewness 1.36
4.3 Model Building and Evaluation The available data set, for each data set, were divided into two parts. One part was used as a training set, for constructing a maintainability prediction model. The other part was used for testing to determine the prediction ability of the developed model. Although there are many different ways to split a given dataset, we have chosen to use the stratify sampling approach in breaking the datasets due to its ability to break data randomly with a resultant balanced division based on the supplied percentage. The division, for instance could be 70% for training set and 30% for testing set. In this work, we selected 70% of the data for building the model (internal validation) and 30% of the data for testing/ validation (external validation or cross-validation criterion). We repeat both internal and external validation processes for 1000 times to have a fair partition through the entire process operations. We also evaluate and compare our developed model with other OO software maintainability prediction models, sited earlier, quantitatively, using the following prediction accuracy measures recommended in the literature: absolute residual (Ab. Res.), the magnitude of relative error (MRE) and the proportion of the predicted values that have MRE less than or equal to a specified value suggested in the literature (Pred measures). Details of all these measures of performance will be provided shortly. 4.3.1 Prediction accuracy measures In this paper, we compared the software maintainability prediction models using the following prediction accuracy measures: absolute residual (Abs Res), the magnitude of relative error (MRE) and Pred measures. The Ab. Res. is the absolute value of residual evaluated by:
MRE = abs (actual value predicted value) / actual value
The Max.MRE measures the maximum relative discrepancy, which is equivalent to the maximum error relative to the actual effort in the prediction. The mean of MRE, the mean magnitude of relative error (MMRE):
MMRE =
1 n MREi n i =1
According to Fenton and Pfleeger (1997), Pred is a measure of what proportion of the predicted values that have MRE less than or equal to a specified value, given by:
Pred (q) = k / n
where q is the specified value, k is the number of cases whose MRE is less than or equal to q and n is the total number of cases in the dataset. According to Conte and Dunsmore (1986), and MacDonell(1997), in order for an effort prediction model to be considered accurate, MMRE < 0.25 and/or either pred(0.25) > 0.75 or pred(0.30) > 0.70 . These are the suggested criteria in literature as far as effort prediction is concerned. 4.4 Results Discussion and Comparative Studies
Ab. Res. = abs (actual value predicted value)
Below are tables and figures showing the results of the proposed SVM model in comparison to the other earlier used models on the same datasets. 4.4.1 Results from QUES dataset Table 3 shows the values of the prediction accuracy measures achieved by each of the maintainability prediction models for the QUES dataset. According to Conte and Dunsmore (1986), and MacDonell(1997), in order for a software maintenance effort prediction model to be considered accurate, either of these conditions needed to be achieved; MMRE < 0.25 and/or either pred(0.25) > 0.75 or pred(0.30) > 0.70, needed to be achieved. Hence the closer a models prediction accuracy measure value is to these baseline values, the better. From table 3, it can be easily seen that the proposed SVM model has achieved MMRE value of 0.215, the pred (0.25) value of
In this paper, the sum of the absolute residuals (Sum Ab. Res.), the median of the absolute residuals (Med.Ab.Res.) and the standard deviation of the absolute residuals (SD Ab.Res.) are used. The Sum Ab. Res. measures the total residuals over the dataset. The Med. Ab. Res. measures the central tendency of the residual distribution. The Med. Ab. Res. is chosen to be a measure of the central tendency because the residual distribution is usually skewed in software datasets. The SD Ab. Res. measures the dispersion of the residual distribution. MRE is a normalized measure of the discrepancy between actual values and predicted values given by

0.786 and the pred (0.30) value of 0.857. It is clear from these, that the SVM model has met all the conditions stipulated and it is the only model that has achieved better than all the required values for all the three essential prediction measures, hence it is the best among all the presented models. Thus, SVM outperforms all the other models in terms of all the predictive measures used. Table 3: Prediction accuracy for the QUES dataset Model Bayesian network Regression Tree Backward Elimination Stepwise Selection ELM SVM Max. MRE 1.592 2.104 1.418 1.471 1.803 0.872 MMRE 0.452 0.493 0.403 0.392 0.3502 0.215 Pred (0.25) 0.391 0.352 0.396 0.422 0.368 0.786 Pred (0.30) 0.430 0.383 0.461 0.500 0.380 0.857 Sum Ab.Res. 686.610 615.543 507.984 498.675 56.122 238.968 Med. Ab.Res. 17.560 19.809 17.396 16.726 28.06 10.617 SD Ab.Res. 31.506 25.400 19.696 20.267 22.405 20.516
29
4.4.2 Results from UIMS dataset Table 4 shows the values of the prediction accuracy measures achieved by each of the maintainability prediction models for the UIMS dataset. From the results presented, the proposed SVM model has achieved the MMRE value of 0.52, the pred (0.25) value of 0.429 and the pred (0.30) value of 0.429. These values are all better than those of the other models considered. Specifically, the quality measures of the other models are far lesser than those of the proposed SVM model. Though the performance of SVM on UIMS dataset is low compared to its performance on QUES dataset, yet its performance compared to other models on the same dataset is the best and it is very promising. In comparison with the UIMS dataset, the performance of the models generally on QUES dataset is far better than that on UIMS. This indicates that the performance may vary depending on the characteristics of the dataset and/or depending on what prediction accuracy measure is used.
Table 4: Prediction accuracy for the UIMS dataset Model Bayesian network Regression Tree Backward Elimination Stepwise Selection ELM SVM Max. MRE 7.039 9.056 11.890 12.631 4.918 4.976 MMRE 0.972 1.538 2.586 2.473 0.968 0.520 Pred (0.25) 0.446 0.200 0.215 0.177 0.392 0.429 Pred (0.30) 0.469 0.208 0.223 0.215 0.450 0.429 Sum Ab.Res. 362.300 532.191 538.702 500.762 39.625 172.719 Med. Ab.Res 10.550 10.988 20.867 15.749 18.768 8.241 SD Ab.Res. 46.652 63.472 53.298 54.114 16.066 10.173
4.4.3 Further Discussions on Results With the exception of SVM that has values satisfying the stated criteria, particularly on QUES dataset, none of the
other prediction models presented get closer to satisfying any of the criteria of an accurate prediction model cited earlier. However, it is reported that prediction accuracy of software maintenance effort prediction models are often

low and thus, it is very difficult to satisfy the criteria, Lucia et al. (2005). This acclaimed difficult task has been successfully carried out by the proposed SVM with results satisfying the established criteria in the case of QUES dataset while getting closer to the set criteria for the case of UIMS dataset. However, in all the cases investigated, SVM performed better than all the earlier used methods considered. Thus, we are concluding that SVM model presented in this paper can predict maintainability of the OO software systems reasonably well to an acceptable degree. This submission of ours is as a result of the fact that only SVM model has been able to consistently perform better by having values that satisfy the criteria laid down in literature particularly on the QUES dataset. For UIMS, SVM performed better than all other methods while getting closer to satisfying the set criteria. For better visualization of results, the prediction accuracy measures for both QUES and UIMS datasets are depicted in figures 1 and 2 respectively. 7.0 Conclusion In this paper, SVM based object-oriented software maintainability prediction model has been constructed using the OO software metric data in Li and Henry datasets, Li and Henry (1993). The prediction accuracy of the model is evaluated and compared with the bayelsian network model, regression tree model and the multiple
30
linear regression models using the popular prediction accuracy measures that include: the absolute residuals, MRE and pred measures. The results indicate that SVM model is able to reliably model and predict maintainability of the OO software systems. The SVM model has achieved significantly better prediction accuracy than the other models on the two datasets used. SVM results satisfied the established criteria in the case of QUES dataset while getting closer to the set criteria for the case of UIMS dataset. However, in all the cases investigated, SVM performed better than all the earlier used methods considered. Therefore, we are concluding that SVM model presented in this paper can predict maintainability of the OO software systems reasonably well to an acceptable degree. This submission of ours is as a result of the fact that only SVM model has been able to consistently perform better by having values that satisfy the criteria laid down in literature particularly on the QUES dataset. For UIMS, SVM performed better than all other methods while getting closer to satisfying the set criteria. The results in this paper also indicated that the prediction accuracy of the SVM model may vary depending on the characteristics of dataset and/or the prediction accuracy measure used. This indeed has provided a form of hint as for other interesting directions in future studies.
31
Figure 1: Charts depicting the prediction accuracy for the QUES dataset Figure 2: Charts depicting the prediction accuracy for the UIMS dataset
32
References [1] K.K. Aggarwal, Yogesh Singh, Arvinder Kaur, Ruchika Malhotra, Application of Artificial Neural Network for Predicting Maintainability using ObjectOriented Metrics, World Academy of Science, Engineering and Technology, 22 (2006). [2] G. Finnie, G. Witting, AI Tools for Software Development Effort Estimation, in: International Conference on Software Engineering: Education and practice, 1996. [3] Y. Zhou, H. Leung, Predicting object-oriented software maintainability using multivariate adaptive regression splines, Journal of Systems and Software, 80 (2007) 1349-1361. [4] Yogesh Singh, Arvinder Kaur, Ruchika Malhotra, Software Fault Proneness Prediction Using Support Vector Machines, in: Proceedings of the World Congress on Engineering, 2009. [5] X. Wang, D. Bi, S. Wang, Fault recognition with Labeled multi-category, in: Third conference on Natural Computation, , Haikou, China, 2007. [6] C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2, 2 (1998) 121167. [7] L. Zhao, N. Takagi, An application of Support vector machines to Chinese character classification problem, in: IEEE International Conference on systems, Man and Cybernetics, Montreal, 2007. [8] C.W. Morris, A. Autret, L. Boddy, Support vector machines for identifying organisms a comparison with strongly partitioned radial basis function networks, Ecological Modelling, 146 (2001) 57-67. [9] K. El Emam, S. Benlarbi, N. Goel, S. Rai, A Validation of Object-Oriented Metrics, in: Technical Report, ERB-1063, NRC, 1999. [10] J. Hanley, B. McNeil, The meaning and use of the area under a Receiver Operating Characteristic ROC curve, Korean Journal of Radiology, 143 (1982) 29-36. [11] S.O. Olatunji, Z. Rasheed, K.A. Sattar, A.M. AlMana, M. Alshayeb, E.A. El-Sebakhy, Extreme Learning Machine as Maintainability Prediction model for Object-Oriented Software Systems, Journal of Computing, Volume 2, Issue 8, August 2010, 2 (2010) 42-56. [12] W. Li, S. Henry, Object-oriented metrics that predict maintainability, Journal of Systems and Software, 23 (1993) 111-122. [13] S.A. Mahmoud, S.O. Olatunji, Automatic Recognition of Off-line Handwritten Arabic (Indian) Numerals Using Support Vector and Extreme Learning Machines, International Journal of Imaging, 2 (2009) 34-53.
[14] S.A. Mahmoud, S.O. Olatunji, Handwritten Arabic numerals recognition using multi-span features & Support Vector Machines, in: 10th International Conference on Information Sciences Signal Processing and their Applications (ISSPA), 2010, pp. 618-621. [15] S.O. Olatunji, Comparison of Extreme Learning Machines and Support Vector Machines on Premium and Regular Gasoline Classification for Arson and Oil Spill Investigation, Asian Journal Of Engineering, Sciences & Technology, 1 (2011) 1-7. [16] K. Chen, M. Kurgan, L. Kurgan, Improved Prediction of Relative Solvent Accessibility Using Two-stage Support Vector Regression, in: The 1st International Conference on Bio-informatics and Biomedical Engineering, ICBBE 2007, 2007, pp. 37 40. [17] C.-L. Huang, H.-C. Liao, M.-C. Chen, Prediction model building and feature selection with support vector machines in breast cancer diagnosis, Expert Systems with Applications, 34 (2008) 578587. [18] Cortes, V. Vapnik, Support vector networks., machine Learning, 20 (1995) 273-297. [19] V. Vapnik, The Nature of Statistical Learning Theory, Springer , N.Y. , 1995.
S. O. Olatunji received the B.Sc. (Hons) Degree in Computer Science, Ondo State University (Now University of Ado Ekiti), Nigeria in 1999. He received M.Sc. Degree in Computer Science, University Of Ibadan, Nigeria in 2003. He worked as a Lecturer in Computer Science Department, Ondo State University, Akungba Akoko, Nigeria, from 2001 to 2006, from where he proceeded to obtain another M.Sc. Degree in Information and Computer Science, King Fahd University of Petroleum and Minerals (KFUPM), Saudi Arabia in 2008. He is currently completing his PhD in Computer Science. He is a member of ACM and IEEE. He has published several research outcomes in reputable international journals and conferences. He participated in numerous research projects in KFUPM including those with Saudi ARAMCO oil and gas Company, making use of artificial intelligence based methods in solving real industrial problems. Hossain Arif completed BS in Computer Science from Rochester Institute of Technology (RIT) in June 2000. He graduated with Highest Honors and was awarded RIT Outstanding Undergraduate Scholarship. He is currently working as Lecturer in the School of Engineering and Computer Science at BRAC University where he has been teaching for over four years.

Support Vector Machines Based Model For Predicting Software Maintainability of Object-Oriented Software Systems

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Support Vector Machines Based Model For Predicting Software Maintainability of Object-Oriented Software Systems

Caricato da

Copyright:

Formati disponibili

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 2, ISSUE 5, MAY 2012

2012 JICT www.jict.co.uk

and constructs an Optimal Separating

are obtained by solving the

following convex Quadratic Programming (QP) problem: Maximize

X R p represents the n-by-p matrix of p input variables.

D = { yi , xi1 ,, xip } is a training set for all i = 1,, n

are called Support Vectors only if the

for all i = 1, , n refer to the

values of each observation of the input variables, and

Several typical kernel functions include:

Y all k = 1, 2,, c , where c 2 .

into a high dimensional feature

Table 1: Description of metrics

NOC (Number of children)

LCOM (Lack of cohesion in methods)

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 2, ISSUE 5, MAY 2012

set has 71 sample cases, whereas UIMS has 39 sample cases.

Standard deviation 43.13

MRE = abs (actual value predicted value) / actual value

Ab. Res. = abs (actual value predicted value)

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 2, ISSUE 5, MAY 2012

2012 JICT www.jict.co.uk

Potrebbero piacerti anche