Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
3, 2007
Binghe Wang
Department of Chemistry and
Center for Biotechnology and Drug Design,
Georgia State University, Atlanta, GA 30302-4098, USA
E-mail: wang@gsu.edu
Reference to this paper should be made as follows: Jin, B., Zhang, Y-Q. and
Wang, B. (2007) ‘Granular Kernel Trees with parallel Genetic Algorithms for
drug activity comparisons’, Int. J. Data Mining and Bioinformatics, Vol. 1,
No. 3, pp.270–285.
1 Introduction
Kernel methods, specifically Support Vector Machines (SVMs) (Boser et al., 1992;
Cortes and Vapnik, 1995; Shawe-Taylor and Cristianini, 2004; Vapnik, 1998) have been
widely used in many fields such as bioinformatics (Schölkopf et al., 2004) and chemical
informatics (Burbidge et al., 2001; Weston et al., 2003) for data classification and pattern
recognition. With the help of kernels nonlinear mapping, input data are transformed into
a high dimensional feature space where it is ‘easy’ for SVMs to find a hyperplane to
separate data. SVMs performance is mainly affected by kernel functions. While
traditional kernels, such as RBF kernels and polynomial kernels do not take into
considerations the relationships and structure within each data item but simply treat each
data vector as one unit in operations. With the growing interests of biological data
prediction and chemical data prediction such as structure-property based molecule
comparison, protein structure prediction and long DNA sequence comparison, more
complicated kernels are designed to integrate data structures, such as string kernels
(Cristianini and Shawe-Taylor, 1999; Lodhi et al., 2001), tree kernels (Collins and Duffy,
2002; Kashima and Koyanagi, 2002) and graph kernels (Gärtner et al., 2003; Kashima
and Inokuchi, 2002) based on kernel decomposition concept. For detailed review please
see Gärtner (2003). One common character of these kernels is that feature
transformations are implemented according to objects structures without steps of input
feature generation. Many of them directly implement inner product operations with some
kinds of iterative calculations. These transformations are very efficient in the case that
objects include large structured information. While for many challenging problems,
objects are not structured or some relationships within objects are not easy to be
described directly. Furthermore, essential optimisations are needed once kernel functions
are defined. It should be mentioned that Haussler (1999) first detailedly introduced the
decomposition based kernel design and proposed convolution kernels.
In this paper, we use granular computing concepts to redescribe the decomposition
based kernel design and propose an evolutionary hierarchical approach to integrate the
prior knowledge such as data structures, feature relationships into the kernel design.
Features within an input vector are grouped into feature granules according to the
composition and structure of each data item. Each feature granule captures a particular
aspect of data items. For two input vectors, the similarity between a pair of feature
granules is measured by using a kernel function called granular kernel. Granular kernels
for different kinds of feature granules are fused together by hierarchical trees, called
GKTs. Parallel GAs are used to optimise GKTs and select an effective SVMs model.
272 B. Jin, Y-Q. Zhang and B. Wang
In applications, SVMs with new kernels trees are employed for the comparisons of
drug activities, which is a problem in Quantitative Structure Activity Relationships
(QSAR) analysis. QSAR is an important technique used in drug design, which describes
the relationships between compound structures and their activities. In QSAR analysis,
compounds with different activities are discriminated, and then predictive rules are
constructed. In this study, inhibitors of E. Coli dihydrofolate reductase (DHFR) are
analysed. These inhibitors are potential therapeutic agents for the treatment of malaria,
bacterial infection, toxoplasma, and cancer. Experimental results show that SVMs with
both GKTs and EGKTs can achieve much better performance than SVMs with the
traditional RBF kernels in terms of prediction accuracy.
The rest of the paper is organised as follows. Granular kernel, kernel tree design and
evolutionary optimisation are proposed in Section 2. Section 3 describes the experiments
of drug activity comparisons. Finally, Section 4 gives conclusions and directs the future
work.
2.1 Definitions
Definition 1 (Cristianini and Shawe-Taylor, 1999): A kernel is a function K that for all
G G
x , z ∈ X satisfies
G G G G
K ( x , z ) = φ ( x ), φ ( z ) (1)
Property 1: Granular kernels inherit the properties of traditional kernels such as the
closure under sum, product, and multiplication with a positive constant over the granular
feature spaces.
G G
Let G be a feature granule space and g , g ' ∈ G. Let gK1 and gK2 be two granule kernels
G G
operating over the same space G × G. The following gK ( g , g ') are also granular kernels.
G G G G
gK ( g , g ') = c × gK1 ( g , g '), c ∈ R + (5)
G G G G
gK ( g , g ') = gK1 ( g , g ') + c, c ∈ R + (6)
G G G G G G
gK ( g , g ') = gK1 ( g , g ') + gK 2 ( g , g ') (7)
G G G G G G
gK ( g , g ') = gK1 ( g , g ') × gK 2 ( g , g ') (8)
G G G G
gK ( g , g ') = f ( g ) f ( g '), f : X → R (9)
G G
G G gK1 ( g , g ')
gK ( g , g ') = G G G G . (10)
gK1 ( g , g ) × gK1 ( g ', g ')
These properties can be elicited from the traditional kernel properties directly.
Property 2 (Berg et al., 1984; Haussler, 1999): A kernel can be constructed with two
granular kernels defined over different granular feature spaces under sum operation.
G G G G
To prove it, let gK1 ( g1 , g1 ') and gK 2 ( g 2 , g '2 ) be two granular kernels, where
G G G G
g1 , g '1 ∈ G1 , g 2 , g '2 ∈ G2 and G1 ≠ G2. We may define new kernels like this,
G G G G G G
gK (( g1 , g 2 ), ( g1 ', g '2 )) = gK1 ( g1 , g '1 )
G G G G G G
gK '(( g1 , g 2 ), ( g '1 , g '2 )) = gK 2 ( g 2 , g '2 )
gK and gK′ can operate over the same feature space (G1 × G2) × (G1 × G2). We get
G G G G G G G G G G G G
gK1 ( g1 , g '1 ) + gK 2 ( g 2 , g '2 ) = gK (( g1 , g 2 ), ( g '1 , g '2 )) + gK '(( g1 , g 2 ), ( g '1 , g '2 )) .
According to the sum closure property of kernels (Cristianini and Shawe-Taylor, 1999),
G G G G
gK1 ( g1 , g '1 ) + gK 2 ( g 2 , g '2 ) is a kernel over (G1 × G2) × (G1 × G2).
Property 3 (Berg et al., 1984; Haussler, 1999): A kernel can be constructed with two
granular kernels defined over different granular feature spaces under product operation.
G G G G
To prove it, let gK1 ( g1 , g '1 ) and gK 2 ( g 2 , g '2 ) be two granular kernels, where
G G G G
g1 , g '1 ∈ G1 , g 2 , g '2 ∈ G2 and G1 ≠ G2. We may define new kernels like this,
G G G G G G
gK (( g1 , g 2 ), ( g1 ', g '2 )) = gK1 ( g1 , g '1 )
G G G G G G
gK '(( g1 , g 2 ), ( g '1 , g '2 )) = gK 2 ( g 2 , g '2 ).
274 B. Jin, Y-Q. Zhang and B. Wang
So gK and gK′ can operate over the same feature space (G1 × G2) × (G1 × G2). We get
G G G G G G G G G G G G
gK1 ( g1 , g '1 ) gK 2 ( g 2 , g '2 ) = gK (( g1 , g 2 ), ( g '1 , g '2 )) gK '(( g1 , g 2 ), ( g '1 , g '2 )).
Fitness: There are several methods to evaluate SVMs performance. One is using k-fold
cross-validation, which is a popular technique for performance evaluation. Others are
some theoretical bounds evaluation on the generalisation errors, such as Xi-Alpha bound
(Joachims, 2000), VC bound (Vapnik, 1998), Radius margin bound and VCs span bound
(Vapnik and Chapelle, 2000). Detail review can be found in Duan et al. (2003). In this
paper we use k-fold cross-validation to evaluate SVMs performance in training phase.
In k-fold cross-validation, the training data set S is separated into k mutually exclusive
subsets Sv . For v = 1, …, k, data set Λv is used to train SVMs with GKTs(cij) and Sv is
used to evaluate SVMs model.
Selection: In the algorithm, the roulette wheel method described in Michalewicz (1996)
is used to select individuals for the new population.
Crossover: Two chromosomes are first selected randomly from current generation as
parents and then the crossover point is randomly selected to separate the chromosomes.
Parts of chromosomes are exchanged between parents to generate two children.
Mutation: Some chromosomes are randomly selected and some genes are randomly
chosen from each selected chromosome for mutation. The values of mutated genes are
replaced by random values.
population. Second, the implementation is easy, clear, practical, and especially suitable
for SVMs model selection. Third, the system can be easily moved to the large distributed
computing environment, such as the grid-computing system.
QP decomposition based parallel computing can also speed up SVMs model selection in
a distributed system, while if the training data set is large, the communication costs for
transferring sub-QP meta results will be very high. On the other hand, in SVMs model
selection, each SVMs model spends much more time for QP calculation, which generally
has higher magnitude of running time than those of operations in GAs. In the
master-slave based parallel GAs-SVMs system, only parameters and fitness values need
to be transferred between the master and the slaves. So the communication costs are low.
Figure 3 shows an example of running time and speedup with parallel GAs on a cluster
system. The cluster is a shared-disk and distributed-memory platform. In the example, the
size of dataset is 314, RBF is chosen as the kernel function, the size of population is set to
300 and the number of generations is set to 50. For this example, we can see the speedup
can reach 10 with 14 nodes. Here each node is a processor. The system architecture of
SVMs with EGKTs is shown in Figure 4. In practice, the regularisation parameter C of
SVMs is also optimised by parallel GAs.
Figure 3 An example of running time and speedup with parallel GAs: (a) running time
and (b) speedup
(a) (b)
278 B. Jin, Y-Q. Zhang and B. Wang
3 Experiments
Since RBF kernels (equation (13)) usually have better performances among traditional
kernels, we compare GKTs and EGKTs with RBF kernels. To make a fair comparison
with EGKTs, traditional RBF kernels are also optimised by using GAs. Here we use
E-RBF to represent GAs based RBF kernels.
G G
exp(−γ || x − z ||2 ). (13)
(a) (b)
Granular Kernel Trees with parallel Genetic Algorithms 279
The pyrimidines dataset is randomly shuffled and split into two parts in the proportion of
4 : 1. One part is used as the training set, which contains pairs of 44 compounds.
The other part is chosen as the unseen testing set, which contains pairs of the left
compounds and those between the left compounds and the training compounds. So the
size of training set should be 44 × 43 = 1892 and the size of testing set should be
44 × 11 × 2 + 11 × 10 = 1078. Due to the deletion of some pairs with the same activities,
the data sets are actually a little bit smaller than those above.
The structure of triazines is described in Figure 5(b). In triazines dataset, each
compound has six possible substitution positions: the positions of R3 and R4; if the
substituent at R3 contains a ring itself, then R3 and R4 of this third ring; similarly if the
substituent at R4 contains a ring itself, then R3 and R4 of this third ring. Ten features are
used to characterise each position: the structure branching feature and other nine features
which are the same as those used for each substituent of pyrimidines. If no substituent
locates in a possible position, the features are indicated by ten –1s. So each vector has
120 features. We randomly select 60 drugs from triazines dataset and then randomly
shuffle and split them into two parts in the proportion of 4 : 1 based on drugs of pairs.
(a) (b)
280 B. Jin, Y-Q. Zhang and B. Wang
We design two kinds of GKTs for each dataset which are shown in Figure 7. GKTs-1 and
GKTs-2 are used for pyrimidines. GKTs-3 and GKTs-4 are used for triazines. GKTs-1
and GKTs-3 are a kind of two layer kernel trees and within which each granular kernel’s
importance is controlled by the outgoing connection weight. GKTs-2 and GKTs-4 are
three layer kernel trees and within which each drug pair is represented by a two layer
subtree. Two subtrees are combined together by a product operation at the top of tree.
Figure 7 Granular Kernel Trees: (a)–(b) GKTs for pyrimidines and (c)–(d) GKTs for triazines
(c) GKTs-3
Granular Kernel Trees with parallel Genetic Algorithms 281
Figure 7 Granular Kernel Trees: (a)–(b) GKTs for pyrimidines and (c)–(d) GKTs for triazines
(continued)
(d) GKTs-4
The performances of three GAs based kernels on triazines dataset are shown in Table 2.
On testing accuracy, SVMs with EGKTs-3 (evolutionary GKTs-3) and EGKTs-4
(evolutionary GKTs-4) are better than SVMs with E-RBF by 3.7% and 4.9%
respectively. We find that the training accuracies are much higher than both testing
accuracies and fitness values for all three kernels on both datasets, especially on triazines
dataset. The reason could be due to the fact that data are complicated and SVMs easily
overfit on the training dataset.
282 B. Jin, Y-Q. Zhang and B. Wang
The comparisons between RBF kernels and GKTs are made by using a large number of
kernel parameter samples. We randomly generate 2000 C values from [1, 256] for SVMs
and 2000 groups of kernel parameters for each kernel. SVMs are trained and tested with
these random parameters. For each dataset, the prediction accuracy curves of three
kernels are drawn in one picture (Figures 8 and 9) and each of them is ordered with C
values. From Figures 8 and 9, it is easy to see that the performances of GKTs are better
than those of RBF kernels. Quartiles and mean are also used to summarise each kernel
performance in terms of testing accuracy. The results are listed in Tables 3 and 4. Based
on the differences of Q1 (25th percentile), Q2 (median), Q3 (75th percentile) and Mean
values, we can conclude the performances of two GKTs are better than those of RBF
kernels by about 2.3~3.4% on pyrimidines and 3.6~4.5% on triazines.
Table 3 Testing accuracies on pyrimidines dataset with 2000 groups of random parameters
Table 4 Testing accuracies on triazines dataset with 2000 groups of random parameters
We can see that almost all testing accuracies of EGKTs in Tables 1 and 2 are better than
the maximum testing accuracies of RBF kernels in Tables 3 and 4. We can also find that
the testing accuracies of GAs based kernel methods can be stabilised at Q3.
Granular Kernel Trees with parallel Genetic Algorithms 283
This paper has proposed an approach to construct GKTs according to the granular kernel
concept and properties. The experimental results have shown that GKTs and EGKTs
have better performances than traditional RBF kernels in drug activity comparisons.
It’s promising to construct more powerful and suitable kernels by using such kind of
evolutionary hierarchical kernel design. In the future, we will continue our research on
the evolutionary granular kernel tree design for other problems. How to generate feature
granules could be one issue in the case that the relationships among features are complex.
284 B. Jin, Y-Q. Zhang and B. Wang
Acknowledgements
This work is supported in part by NIH under P20 GM065762. Bo Jin is supported by
Molecular Basis for Disease (MBD) Doctoral Fellowship Program.
References
Adamidis, P. (1994) ‘Review of parallel genetic algorithms bibliography’, Internal T.R., Aristotle
University of Thessaloniki, Greece.
Berg, C., Christensen, J.P.R. and Ressel, P. (1984) Harmonic Analysis on Semigroups-Theory
of Positive Definite and Ralated Functions, Springer-Verlag, New York, USA.
Boser, B., Guyon, I. and Vapnik, V.N. (1992) ‘A training algorithm for optimal margin classifiers’,
Proc. Fifth Annual Workshop on Computational Learning Theory, ACM Press, USA,
pp.144–152.
Burbidge, R., Trotter, M., Buxton, B. and Holden, S. (2001) ‘Drug design by machine learning:
support vector machines for pharmaceutical data analysis’, Computers and Chemistry,
Vol. 26, No. 1, pp.4–15.
Cantú-Paz, E. (1998) ‘A survey of parallel genetic algorithms’, Calculateurs Paralleles, Hermes,
Paris, Vol. 10, No. 2, pp.141–171.
Chang, C-C. and Lin, C-J. (2001) LIBSVM: A Library for Support Vector Machines, Software
available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Collins, M. and Duffy, N. (2002) ‘Convolution kernels for natural language’, in Dietterich, T.G.,
Becker, S. and Ghahramani, Z. (Eds.): Advances in Neural Information Processing Systems,
MIT Press, Cambridge, MA, Vol. 14, pp.625–632.
Cortes, C. and Vapnik, V.N. (1995) ‘Support-vector networks’, Machine Learning Vol. 20,
pp.273–297.
Cristianini, N. and Shawe-Taylor, J. (1999) An Introduction to Support Vector Machines:
And other Kernel-based Learning Methods, Cambridge University Press, NY.
Dong, J.X., Krzyzak, A. and Suen, C.Y. (2003) ‘A fast parallel optimization for training support
vector machine’, in Perner, P. and Rosenfeld, A. (Eds.): Proceedings of 3rd International
Conference on Machine Learning and Data Mining, Springer Lecture Notes in Artificial
Intelligence (LNAI 2734), Leipzig, Germany, pp.96–105.
Duan, K., Keerthi, S.S. and Poo, A.N. (2003) ‘Evaluation of simple performance measures for
tuning SVM hyperparameters’, Neurocomputing, Vol. 51, pp.41–59.
Gärtner, T. (2003) ‘A survey of Kernels for structured data’, ACM SIGKDD Explorations
Newsletter, Vol. 5, pp.49–58.
Gärtner, T. Flach, P.A. and Wrobel, S. (2003) ‘On graph kernels: hardness results and efficient
alternatives’, Proceedings of the 16th Annual Conference on Computational Learning Theory
and the 7th Kernel Workshop.
Graf, H-P., Cosatto, E., Bottou, L., Dourdanovic, I. and Vapnik, V.N. (2005) ‘Parallel support
vector machines: the cascade SVM’, in Saul, L., Weiss, Y. and Bottou, L. (Eds.): Advances in
Neural Information Processing Systems, MIT Press, MIT Press, Cambridge, MA, Vol. 17,
pp.513–520.
Haussler, D. (1999) ‘Convolution kernels on discrete structures’, Technical report
UCSC-CRL-99-10, Department of Computer Science, University of California at Santa Cruz.
Hirst, J.D., King, R.D. and Sternberg, M.J.E. (1994a) ‘Quantitative structure-activity relationships
by neural networks and inductive logic programming. I. The inhibition of dihydrofolate
reductase by pyrimidines’, Journal of Computer-Aided Molecular Design, Vol. 8, No. 4,
pp.405–420.
Granular Kernel Trees with parallel Genetic Algorithms 285
Hirst, J.D., King, R.D. and Sternberg, M.J.E. (1994b) ‘Quantitative structure-activity relationships
by neural networks and inductive logic programming. II. The inhibition of dihydrofolate
reductase by triazines’, Journal of Computer-Aided Molecular Design, Vol. 8, No. 4,
pp.421–432.
Joachims, T. (2000) ‘Estimating the generalization performance of a SVM efficiently’, Proceedings
of the International Conference on Machine Learning, Morgan Kaufman.
Kashima, H. and Inokuchi, A. (2002) ‘Kernels for graph classification’, Proc. 1st ICDM Workshop
on Active Mining (AM-2002), Maebashi, Japan.
Kashima, H. and Koyanagi, T. (2002) ‘Kernels for semi-structured data’, Proceedings of the
Nineteenth International Conference on Machine Learning, pp.291–298.
Lin, S-H., Goodman, E.D. and Punch III, W.F. (1997) ‘Investigating parallel genetic algorithms on
job shop scheduling problem’, Proceedings of the 6th International Conference on
Evolutionary Programming VI.
Lodhi, H., Shawe-Taylor, J., Christianini, N. and Watkins, C. (2001) ‘Text classification using
string kernels’, in Leen, T., Dietterich, T. and Tresp, V. (Eds.): Advances in Neural
Information Processing Systems, MIT Press, Cambridge, MA, Vol. 13, pp.563–569.
Michalewicz, Z. (1996) Genetic Algorithms + Data Structures = Evolution Programs,
Springer-Verlag, Berlin.
Newman, D.J., Hettich, S., Blake, C.L. and Merz, C.J. (1998) UCI Repository of Machine Learning
Databases, [http://www.ics.uci.edu/~mlearn/MLRepository.html], University of California,
Department of Information and Computer Science, Irvine, CA.
Runarsson, T.P. and Sigurdsson, S. (2004) ‘Asynchronous parallel evolutionary model selection for
support vector machines’, Neural Information Processing – Letters and Reviews, Vol. 3, No. 3
pp.59–67.
Schölkopf, B., Tsuda, K. and Vert, J-P. (2004) Kernel Methods in Computational Biology,
MIT Press, Cambridge, MA.
Serafini, T., Zanni, L. and Zanghirati, G. (2004) Parallel GPDT A Parallel Gradient Projection-
based Decomposition Technique for Support Vector Machines, http://www.dm.unife.it/gpdt.
Shawe-Taylor, J. and Cristianini, N. (2004) Kernel Methods for Pattern Analysis, Cambridge
University Press, Cambridge, MA.
Vapnik, V.N. (1998) Statistical Learning Theory, John Wiley and Sons, New York.
Vapnik, V.N. and Chapelle, O. (2000) ‘Bounds on error expectation for support vector machine’,
in Smola, A., Bartlett, P., Schölkopf, B. and Schuurmans, D. (Eds.): Advances in Large
Margin Classifiers, MIT Press, Cambridge, MA, pp.261–280.
Weston, J., Perez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A. and Schölkopf, B. (2003)
‘Feature selection and transduction for prediction of molecular bioactivity for drug design’,
Bioinformatics, Vol. 19, No. 6, pp.764–771.
Zanghirati, G. and Zanni, L. (2003) ‘Parallel solver for large quadratic programs in training support
vector machines’, Parallel Computing, Vol. 29, pp.535–551.