Sei sulla pagina 1di 16

270 Int. J. Data Mining and Bioinformatics, Vol. 1, No.

3, 2007

Granular Kernel Trees with parallel Genetic


Algorithms for drug activity comparisons

Bo Jin* and Yan-Qing Zhang


Department of Computer Science,
Georgia State University, Atlanta, GA 30302, USA
E-mail: cscbxjx@cs.gsu.edu E-mail: yzhang@cs.gsu.edu
*Corresponding author

Binghe Wang
Department of Chemistry and
Center for Biotechnology and Drug Design,
Georgia State University, Atlanta, GA 30302-4098, USA
E-mail: wang@gsu.edu

Abstract: With the growing interests of biological data prediction and


chemical data prediction, more powerful and flexible kernels need to be
designed so that the prior knowledge and relationships within data can be
expressed effectively in kernel functions. In this paper, Granular Kernel Trees
(GKTs) are proposed and parallel Genetic Algorithms (GAs) are used to
optimise the parameters of GKTs. In applications, SVMs with new kernel trees
are employed for drug activity comparisons. The experimental results show that
GKTs and evolutionary GKTs can achieve better performances than traditional
RBF kernels in terms of prediction accuracy.

Keywords: kernel design; Support Vector Machines; SVMs; Granular Kernel


Trees; GKTs; Genetic Algorithms; GAs; drug activity comparisons; data
mining; bioinformatics.

Reference to this paper should be made as follows: Jin, B., Zhang, Y-Q. and
Wang, B. (2007) ‘Granular Kernel Trees with parallel Genetic Algorithms for
drug activity comparisons’, Int. J. Data Mining and Bioinformatics, Vol. 1,
No. 3, pp.270–285.

Biographical notes: Bo Jin is a PhD student in the Computer Science


Department at Georgia State University. He received his BE Degree from the
University of Electronic Science and Technology of China. His research
interests are in the areas of machine learning, data mining, chemical
informatics and biomedical informatics.
Yan-Qing Zhang is currently an Associated Professor of the Computer Science
Department at Georgia State University. He received a PhD Degree in
Computer Science and Engineering at the University of South Florida in 1997.
His research interests include hybrid intelligent systems, computational
intelligence, granular computing, kernel machines, bioinformatics, data mining
and computational web intelligence. He has published 3 books, 12 book
chapters, 49 journal papers and over 100 conference papers. He has served as a
reviewer for 37 international journals, and a committee member in over
70 international conferences. He is a program co-chair of IEEE-GrC2006.

Copyright © 2007 Inderscience Enterprises Ltd.


Granular Kernel Trees with parallel Genetic Algorithms 271

Binghe Wang is Professor of Chemistry at Georgia State University,


Georgia Research Alliance Eminent Scholar in Drug Discovery, and Georgia
Cancer Coalition Distinguished Cancer Scientist. He obtained his BS Degree
from Beijing Medical College in 1982 and his PhD Degree in Medicinal
Chemistry from the University of Kansas, School of Pharmacy in 1991. He is
Editor-in-Chief of Medicinal Research Reviews published by John Wiley and
Sons and the Series Editor of ‘A Wiley Series in Drug Discovery and
Development’. His research expertise includes drug delivery, drug design
and synthesis, bioorganic chemistry, fluorescent sensors, and combinatorial
chemistry.

1 Introduction

Kernel methods, specifically Support Vector Machines (SVMs) (Boser et al., 1992;
Cortes and Vapnik, 1995; Shawe-Taylor and Cristianini, 2004; Vapnik, 1998) have been
widely used in many fields such as bioinformatics (Schölkopf et al., 2004) and chemical
informatics (Burbidge et al., 2001; Weston et al., 2003) for data classification and pattern
recognition. With the help of kernels nonlinear mapping, input data are transformed into
a high dimensional feature space where it is ‘easy’ for SVMs to find a hyperplane to
separate data. SVMs performance is mainly affected by kernel functions. While
traditional kernels, such as RBF kernels and polynomial kernels do not take into
considerations the relationships and structure within each data item but simply treat each
data vector as one unit in operations. With the growing interests of biological data
prediction and chemical data prediction such as structure-property based molecule
comparison, protein structure prediction and long DNA sequence comparison, more
complicated kernels are designed to integrate data structures, such as string kernels
(Cristianini and Shawe-Taylor, 1999; Lodhi et al., 2001), tree kernels (Collins and Duffy,
2002; Kashima and Koyanagi, 2002) and graph kernels (Gärtner et al., 2003; Kashima
and Inokuchi, 2002) based on kernel decomposition concept. For detailed review please
see Gärtner (2003). One common character of these kernels is that feature
transformations are implemented according to objects structures without steps of input
feature generation. Many of them directly implement inner product operations with some
kinds of iterative calculations. These transformations are very efficient in the case that
objects include large structured information. While for many challenging problems,
objects are not structured or some relationships within objects are not easy to be
described directly. Furthermore, essential optimisations are needed once kernel functions
are defined. It should be mentioned that Haussler (1999) first detailedly introduced the
decomposition based kernel design and proposed convolution kernels.
In this paper, we use granular computing concepts to redescribe the decomposition
based kernel design and propose an evolutionary hierarchical approach to integrate the
prior knowledge such as data structures, feature relationships into the kernel design.
Features within an input vector are grouped into feature granules according to the
composition and structure of each data item. Each feature granule captures a particular
aspect of data items. For two input vectors, the similarity between a pair of feature
granules is measured by using a kernel function called granular kernel. Granular kernels
for different kinds of feature granules are fused together by hierarchical trees, called
GKTs. Parallel GAs are used to optimise GKTs and select an effective SVMs model.
272 B. Jin, Y-Q. Zhang and B. Wang

In applications, SVMs with new kernels trees are employed for the comparisons of
drug activities, which is a problem in Quantitative Structure Activity Relationships
(QSAR) analysis. QSAR is an important technique used in drug design, which describes
the relationships between compound structures and their activities. In QSAR analysis,
compounds with different activities are discriminated, and then predictive rules are
constructed. In this study, inhibitors of E. Coli dihydrofolate reductase (DHFR) are
analysed. These inhibitors are potential therapeutic agents for the treatment of malaria,
bacterial infection, toxoplasma, and cancer. Experimental results show that SVMs with
both GKTs and EGKTs can achieve much better performance than SVMs with the
traditional RBF kernels in terms of prediction accuracy.
The rest of the paper is organised as follows. Granular kernel, kernel tree design and
evolutionary optimisation are proposed in Section 2. Section 3 describes the experiments
of drug activity comparisons. Finally, Section 4 gives conclusions and directs the future
work.

2 Granular Kernel and Kernel tree design

2.1 Definitions

Definition 1 (Cristianini and Shawe-Taylor, 1999): A kernel is a function K that for all
G G
x , z ∈ X satisfies
G G G G
K ( x , z ) = φ ( x ), φ ( z ) (1)

where φ is a mapping from input space X = Rn to an inner product feature space F = RN


G G
φ : x 6 φ (x) ∈ F. (2)

Definition 2: A feature granule space G of input space X = Rn is a sub space of X, where


G = Rm and 1 ≤ m ≤ n.
From input space we may generate many feature granule spaces and some of them may
overlap on some feature dimensions.
G
Definition 3: A feature granule g ∈ G is a vector which is defined in the feature granule
space G.
G G
Definition 4: A granular kernel gK is a kernel that for all g , g ' ∈ G satisfies
G G G G
gK ( g , g ') = ϕ ( g ), ϕ ( g ') (3)

where ϕ is a mapping from feature granule space G = Rm to an inner product feature


space RE.
G G
ϕ : g 6 ϕ (g) ∈ RE . (4)
Granular Kernel Trees with parallel Genetic Algorithms 273

2.2 Granular Kernel properties

Property 1: Granular kernels inherit the properties of traditional kernels such as the
closure under sum, product, and multiplication with a positive constant over the granular
feature spaces.
G G
Let G be a feature granule space and g , g ' ∈ G. Let gK1 and gK2 be two granule kernels
G G
operating over the same space G × G. The following gK ( g , g ') are also granular kernels.
G G G G
gK ( g , g ') = c × gK1 ( g , g '), c ∈ R + (5)
G G G G
gK ( g , g ') = gK1 ( g , g ') + c, c ∈ R + (6)
G G G G G G
gK ( g , g ') = gK1 ( g , g ') + gK 2 ( g , g ') (7)
G G G G G G
gK ( g , g ') = gK1 ( g , g ') × gK 2 ( g , g ') (8)
G G G G
gK ( g , g ') = f ( g ) f ( g '), f : X → R (9)
G G
G G gK1 ( g , g ')
gK ( g , g ') = G G G G . (10)
gK1 ( g , g ) × gK1 ( g ', g ')

These properties can be elicited from the traditional kernel properties directly.

Property 2 (Berg et al., 1984; Haussler, 1999): A kernel can be constructed with two
granular kernels defined over different granular feature spaces under sum operation.
G G G G
To prove it, let gK1 ( g1 , g1 ') and gK 2 ( g 2 , g '2 ) be two granular kernels, where
G G G G
g1 , g '1 ∈ G1 , g 2 , g '2 ∈ G2 and G1 ≠ G2. We may define new kernels like this,
G G G G G G
gK (( g1 , g 2 ), ( g1 ', g '2 )) = gK1 ( g1 , g '1 )
G G G G G G
gK '(( g1 , g 2 ), ( g '1 , g '2 )) = gK 2 ( g 2 , g '2 )

gK and gK′ can operate over the same feature space (G1 × G2) × (G1 × G2). We get
G G G G G G G G G G G G
gK1 ( g1 , g '1 ) + gK 2 ( g 2 , g '2 ) = gK (( g1 , g 2 ), ( g '1 , g '2 )) + gK '(( g1 , g 2 ), ( g '1 , g '2 )) .

According to the sum closure property of kernels (Cristianini and Shawe-Taylor, 1999),
G G G G
gK1 ( g1 , g '1 ) + gK 2 ( g 2 , g '2 ) is a kernel over (G1 × G2) × (G1 × G2).

Property 3 (Berg et al., 1984; Haussler, 1999): A kernel can be constructed with two
granular kernels defined over different granular feature spaces under product operation.
G G G G
To prove it, let gK1 ( g1 , g '1 ) and gK 2 ( g 2 , g '2 ) be two granular kernels, where
G G G G
g1 , g '1 ∈ G1 , g 2 , g '2 ∈ G2 and G1 ≠ G2. We may define new kernels like this,
G G G G G G
gK (( g1 , g 2 ), ( g1 ', g '2 )) = gK1 ( g1 , g '1 )
G G G G G G
gK '(( g1 , g 2 ), ( g '1 , g '2 )) = gK 2 ( g 2 , g '2 ).
274 B. Jin, Y-Q. Zhang and B. Wang

So gK and gK′ can operate over the same feature space (G1 × G2) × (G1 × G2). We get
G G G G G G G G G G G G
gK1 ( g1 , g '1 ) gK 2 ( g 2 , g '2 ) = gK (( g1 , g 2 ), ( g '1 , g '2 )) gK '(( g1 , g 2 ), ( g '1 , g '2 )).

According to the product closure property of kernels (Cristianini and Shawe-Taylor,


G G G G
1999), gK1 ( g1 , g '1 ) gK 2 ( g 2 , g '2 ) is a kernel over (G1 × G2) × (G1 × G2).

2.3 GKTs and EGKTs


An easy and effective way to construct new kernel functions is combining a group of
granular kernels via some simple operations such as sum and product. The new kernel
functions can be naturally expressed as tree structures. The following are main steps in
GKTs design.
Step 1: Features are bundled into feature granules according to some prior knowledge
such as object structures and feature relationships or with an automatic learning
algorithm.
Step 2: A tree structure is constructed with suitable number of layers, nodes and
connections. Like the first step, we can construct trees according to some prior
knowledge or with an automatic learning algorithm. Figure 1 shows a kind of GKTs with
G G
m basic granular kernels gKt and m pairs of feature granules gt and gt ' , where
1 ≤ t ≤ m.
Step 3: Granular kernels are selected from the candidate kernel set. Some popular
traditional kernels such as RBF kernels and polynomial kernels can be chosen as
granular kernels, since these kernels have proved successful in many real problems.
Some special kernels designed for some particular problems could also be selected as
granular kernels if they are good at measuring the similarities of corresponding feature
granules.
Step 4: Parameters of granular kernels and operations of connection nodes are selected.
Each connection operation in GKTs can be a sum or product. A positive connection
weight may associate to each edge in the tree and a granular kernel may belong to one
more subtrees.
In this paper, GAs are used to find the optimum parameter settings of GKTs. We use
EGKTs to represent such kind of evolutionary GKTs. The following are basic definitions
and operations used in optimising EGKTs.

Chromosome: Let Pi denote the population in generation Gi, where i = 1, …, m and m is


the total number of generations. Each population Pi has p chromosomes cij, j = 1, …, p.
Each chromosome cij has q genes gt(cij), where t = 1, …, q. Here each gene is a parameter
of GKTs and we use GKTs(cij) to represent GKTs configured with genes gt(cij),
t = 1, …, q.
Granular Kernel Trees with parallel Genetic Algorithms 275

Fitness: There are several methods to evaluate SVMs performance. One is using k-fold
cross-validation, which is a popular technique for performance evaluation. Others are
some theoretical bounds evaluation on the generalisation errors, such as Xi-Alpha bound
(Joachims, 2000), VC bound (Vapnik, 1998), Radius margin bound and VCs span bound
(Vapnik and Chapelle, 2000). Detail review can be found in Duan et al. (2003). In this
paper we use k-fold cross-validation to evaluate SVMs performance in training phase.

Figure 1 An example of GKTs

In k-fold cross-validation, the training data set S is separated into k mutually exclusive
subsets Sv . For v = 1, …, k, data set Λv is used to train SVMs with GKTs(cij) and Sv is
used to evaluate SVMs model.

Λ v = S − Sv , v = 1," , k . (11)

After k times of training-testing on all different subsets, we get k prediction accuracies.


The fitness fij of chromosome cij is calculated by
1 k
f ij = ∑ Accv
k v =1
(12)

where Accv is the prediction accuracy of GKTs(cij) on Sv .


276 B. Jin, Y-Q. Zhang and B. Wang

Selection: In the algorithm, the roulette wheel method described in Michalewicz (1996)
is used to select individuals for the new population.

Crossover: Two chromosomes are first selected randomly from current generation as
parents and then the crossover point is randomly selected to separate the chromosomes.
Parts of chromosomes are exchanged between parents to generate two children.

Mutation: Some chromosomes are randomly selected and some genes are randomly
chosen from each selected chromosome for mutation. The values of mutated genes are
replaced by random values.

2.4 Parallel GAs


We use parallel GAs to speed up SVMs model selection and parameter optimisation.
In the literature, some parallel algorithms are designed for SVMs. In Dong et al. (2003), a
parallelisation approach is proposed where SVMs’ kernel matrix is approximated by
block diagonal matrices so an original optimisation problem can be rewritten into
hundreds of sub-problems. In Zanghirati et al. (2003) and Serafini et al. (2004), a
Gradient Projection Method (GPM) is presented and implemented for parallel
computation in SVMs. The decomposition technique is used to split the SVM Quadratic
Programming (QP) problem into smaller QP sub-problems (each sub-problem is solved
by GPM). The related SVMs software can be used in both scalar and distributed memory
parallel environments. Graf et al. (2005) develop a kind of parallel SVMs called Cascade
of SVMs on a distributed environment, where smaller optimisations are solved
independently. The partial results are combined and filtered again in a Cascade of SVMs,
until the global optimum is reached. Convergence to the global optimum is guaranteed
with multiple passes through the Cascade.
Besides the works mentioned above, Runarsson and Sigurdsson (2004) use the
parallel method to speedup the evolutionary model selection for SVMs. The algorithm is
implemented on a multi-processor computer in C++ using standard Posix threads.
In GKTs optimisation, all parameters and operations to be optimised are independent
in each generation, so it’s well suitable to design a parallel GAs based system to speed up
GKTs optimisation. Parallel GAs (Cantú-Paz, 1998; Adamidis, 1994; Lin et al., 1997)
have been well studied in recent several years. There are three common types of parallel
GAs models:
• single population master-slave models
• single population fine-grained models
• multiple population coarse-grained models.
In this paper, the parallel GAs system is designed based on the first type of models.
In the system, one processor is chosen as the master, who stores the population, does
selection, crossover and mutation, and then distributes individuals to slave processors on
the cluster. Each single SVMs model is trained and evaluated on one of slave processors
with the received individual (parameters). After fitness evaluation, each slave will
send back the fitness value to the master. The architecture of parallel GAs is shown in
Figure 2. The parallel GAs-SVMs system has some characteristics. First, this is a global
GAs-SVMs system, since all evaluations and operations are performed on the entire
Granular Kernel Trees with parallel Genetic Algorithms 277

population. Second, the implementation is easy, clear, practical, and especially suitable
for SVMs model selection. Third, the system can be easily moved to the large distributed
computing environment, such as the grid-computing system.

Figure 2 Parallel GAs model

QP decomposition based parallel computing can also speed up SVMs model selection in
a distributed system, while if the training data set is large, the communication costs for
transferring sub-QP meta results will be very high. On the other hand, in SVMs model
selection, each SVMs model spends much more time for QP calculation, which generally
has higher magnitude of running time than those of operations in GAs. In the
master-slave based parallel GAs-SVMs system, only parameters and fitness values need
to be transferred between the master and the slaves. So the communication costs are low.
Figure 3 shows an example of running time and speedup with parallel GAs on a cluster
system. The cluster is a shared-disk and distributed-memory platform. In the example, the
size of dataset is 314, RBF is chosen as the kernel function, the size of population is set to
300 and the number of generations is set to 50. For this example, we can see the speedup
can reach 10 with 14 nodes. Here each node is a processor. The system architecture of
SVMs with EGKTs is shown in Figure 4. In practice, the regularisation parameter C of
SVMs is also optimised by parallel GAs.

Figure 3 An example of running time and speedup with parallel GAs: (a) running time
and (b) speedup

(a) (b)
278 B. Jin, Y-Q. Zhang and B. Wang

Figure 4 System Architecture of SVMs with EGKTs

3 Experiments

Since RBF kernels (equation (13)) usually have better performances among traditional
kernels, we compare GKTs and EGKTs with RBF kernels. To make a fair comparison
with EGKTs, traditional RBF kernels are also optimised by using GAs. Here we use
E-RBF to represent GAs based RBF kernels.
G G
exp(−γ || x − z ||2 ). (13)

3.1 Drug sets


The drug datasets used in the experiments are pyrimidines and triazines, which are
described in Hirst (1994a; 1994b) and available at UCI Repository of machine learning
databases (Newman, 1998). Pyrimidines dataset contains 55 drugs, and each drug has
three possible substitution positions (R3, R4 and R5, see Figure 5(a)). Each substituent is
characterised by nine chemical properties features: polarity, size, flexibility,
hydrogen-bond donor, hydrogen-bond acceptor, π donor, π acceptor, polarisability, and σ
effect. Drug activities are identified by the substituents. If no substituent locates in a
possible position, the features are indicated by nine –1s. Each input vector includes two
drug features with the fixed feature order. In one vector, if the activity of the first drug is
higher than that of the second one, the vector is labelled positive, otherwise it is labelled
negative. So the feature number of one vector is 54.

Figure 5 Drug structures: (a) pyrimidines and (b) triazines

(a) (b)
Granular Kernel Trees with parallel Genetic Algorithms 279

The pyrimidines dataset is randomly shuffled and split into two parts in the proportion of
4 : 1. One part is used as the training set, which contains pairs of 44 compounds.
The other part is chosen as the unseen testing set, which contains pairs of the left
compounds and those between the left compounds and the training compounds. So the
size of training set should be 44 × 43 = 1892 and the size of testing set should be
44 × 11 × 2 + 11 × 10 = 1078. Due to the deletion of some pairs with the same activities,
the data sets are actually a little bit smaller than those above.
The structure of triazines is described in Figure 5(b). In triazines dataset, each
compound has six possible substitution positions: the positions of R3 and R4; if the
substituent at R3 contains a ring itself, then R3 and R4 of this third ring; similarly if the
substituent at R4 contains a ring itself, then R3 and R4 of this third ring. Ten features are
used to characterise each position: the structure branching feature and other nine features
which are the same as those used for each substituent of pyrimidines. If no substituent
locates in a possible position, the features are indicated by ten –1s. So each vector has
120 features. We randomly select 60 drugs from triazines dataset and then randomly
shuffle and split them into two parts in the proportion of 4 : 1 based on drugs of pairs.

3.2 Feature granules and GKTs design


In the experiments, the input vectors are decomposed according to the possible
substituent locations. Each feature granule includes all features of one substituent
(see Figure 6). For pyrimidines, each drug pair has six feature granules and each feature
granule has nine features. For triazines, each drug pair has twelve feature granules with
the size of 10.

Figure 6 Feature granules: (a) pyrimidines and (b) triazines

(a) (b)
280 B. Jin, Y-Q. Zhang and B. Wang

We design two kinds of GKTs for each dataset which are shown in Figure 7. GKTs-1 and
GKTs-2 are used for pyrimidines. GKTs-3 and GKTs-4 are used for triazines. GKTs-1
and GKTs-3 are a kind of two layer kernel trees and within which each granular kernel’s
importance is controlled by the outgoing connection weight. GKTs-2 and GKTs-4 are
three layer kernel trees and within which each drug pair is represented by a two layer
subtree. Two subtrees are combined together by a product operation at the top of tree.

3.3 Experimental setup


RBF kernel functions are also chosen as granular kernels’ functions in each GKTs, and
therefore each granular kernel gKi has a RBF parameter γi.
The initial ranges of all RBFs’ γ and γi are set to [0.0001, 1]. The initial range of
regularisation parameter C is [1, 256]. The probability of crossover is 0.7 and the
mutation ratio is 0.5. The range of connection weights is [0.001, 1]. 5-fold
cross-validation is used on pyrimidines training dataset and 8-fold cross-validation is
used on triazines training dataset. In cross-validation, the training data are also split in the
same way as described in the subsection 3.1. The population size is set to 500 and the
number of generations is set to 30 for both datasets. The software package of SVMs used
in the experiments is LibSVM (Chang and Lin, 2001).

Figure 7 Granular Kernel Trees: (a)–(b) GKTs for pyrimidines and (c)–(d) GKTs for triazines

(a) GKTs-1 (b) GKTs-2

(c) GKTs-3
Granular Kernel Trees with parallel Genetic Algorithms 281

Figure 7 Granular Kernel Trees: (a)–(b) GKTs for pyrimidines and (c)–(d) GKTs for triazines
(continued)

(d) GKTs-4

3.4 Experimental results and comparisons


Table 1 shows performances of three GAs based kernels on pyrimidines dataset.
EGKTs-1 is evolutionary GKTs-1 and EGKTs-2 is evolutionary GKTs-2. From Table 1,
we can see that SVMs with both kinds of EGKTs can outperform SVMs with E-RBF by
3.0% and 3.3% respectively in terms of prediction accuracy on unseen testing dataset.
The fitness values and training accuracies of SVMs with EGKTs are also higher than
those of SVMs with E-RBF kernels. It’s also shown that the testing accuracy of SVMs
with EGKTs-1 is a little bit higher than that of SVMs with EGKTs-2 on pyrimidines.

Table 1 Prediction accuracies on pyrimidines dataset

E-RBF (%) EGKTs-1 (%) EGKTs-2 (%)


Fitness 84.5 86.6 88.5
Training accuracy 96.8 96.8 98.8
Testing accuracy 88.4 91.7 91.4

The performances of three GAs based kernels on triazines dataset are shown in Table 2.
On testing accuracy, SVMs with EGKTs-3 (evolutionary GKTs-3) and EGKTs-4
(evolutionary GKTs-4) are better than SVMs with E-RBF by 3.7% and 4.9%
respectively. We find that the training accuracies are much higher than both testing
accuracies and fitness values for all three kernels on both datasets, especially on triazines
dataset. The reason could be due to the fact that data are complicated and SVMs easily
overfit on the training dataset.
282 B. Jin, Y-Q. Zhang and B. Wang

Table 2 Prediction accuracies on triazines dataset

E-RBF (%) EGKTs-3 (%) EGKTs-4 (%)


Fitness 73.8 74.6 75.8
Training accuracy 93.4 97.2 98.7
Testing accuracy 79.6 83.3 84.5

The comparisons between RBF kernels and GKTs are made by using a large number of
kernel parameter samples. We randomly generate 2000 C values from [1, 256] for SVMs
and 2000 groups of kernel parameters for each kernel. SVMs are trained and tested with
these random parameters. For each dataset, the prediction accuracy curves of three
kernels are drawn in one picture (Figures 8 and 9) and each of them is ordered with C
values. From Figures 8 and 9, it is easy to see that the performances of GKTs are better
than those of RBF kernels. Quartiles and mean are also used to summarise each kernel
performance in terms of testing accuracy. The results are listed in Tables 3 and 4. Based
on the differences of Q1 (25th percentile), Q2 (median), Q3 (75th percentile) and Mean
values, we can conclude the performances of two GKTs are better than those of RBF
kernels by about 2.3~3.4% on pyrimidines and 3.6~4.5% on triazines.

Table 3 Testing accuracies on pyrimidines dataset with 2000 groups of random parameters

RBF (%) GKTs-1 (%) GKTs-2 (%)


Maximum 91.0 93.2 93.0
75th percentile 88.4 91.7 91.0
Median 88.0 91.3 90.6
25th percentile 87.5 90.9 90.1
Minimum 83.5 87.0 87.2
Mean 88.2 91.2 90.5

Table 4 Testing accuracies on triazines dataset with 2000 groups of random parameters

RBF (%) GKTs-3 (%) GKTs-4 (%)


Maximum 83.9 88.2 88.2
75th percentile 79.9 83.7 84.1
Median 78.5 82.6 83
25th percentile 77.9 81.5 82
Minimum 72.2 77.8 76.2
Mean 78.9 82.6 83

We can see that almost all testing accuracies of EGKTs in Tables 1 and 2 are better than
the maximum testing accuracies of RBF kernels in Tables 3 and 4. We can also find that
the testing accuracies of GAs based kernel methods can be stabilised at Q3.
Granular Kernel Trees with parallel Genetic Algorithms 283

Figure 8 Testing accuracy comparisons on pyrimidines

Figure 9 Testing accuracy comparisons on triazines

4 Conclusions and future work

This paper has proposed an approach to construct GKTs according to the granular kernel
concept and properties. The experimental results have shown that GKTs and EGKTs
have better performances than traditional RBF kernels in drug activity comparisons.
It’s promising to construct more powerful and suitable kernels by using such kind of
evolutionary hierarchical kernel design. In the future, we will continue our research on
the evolutionary granular kernel tree design for other problems. How to generate feature
granules could be one issue in the case that the relationships among features are complex.
284 B. Jin, Y-Q. Zhang and B. Wang

Acknowledgements

This work is supported in part by NIH under P20 GM065762. Bo Jin is supported by
Molecular Basis for Disease (MBD) Doctoral Fellowship Program.

References
Adamidis, P. (1994) ‘Review of parallel genetic algorithms bibliography’, Internal T.R., Aristotle
University of Thessaloniki, Greece.
Berg, C., Christensen, J.P.R. and Ressel, P. (1984) Harmonic Analysis on Semigroups-Theory
of Positive Definite and Ralated Functions, Springer-Verlag, New York, USA.
Boser, B., Guyon, I. and Vapnik, V.N. (1992) ‘A training algorithm for optimal margin classifiers’,
Proc. Fifth Annual Workshop on Computational Learning Theory, ACM Press, USA,
pp.144–152.
Burbidge, R., Trotter, M., Buxton, B. and Holden, S. (2001) ‘Drug design by machine learning:
support vector machines for pharmaceutical data analysis’, Computers and Chemistry,
Vol. 26, No. 1, pp.4–15.
Cantú-Paz, E. (1998) ‘A survey of parallel genetic algorithms’, Calculateurs Paralleles, Hermes,
Paris, Vol. 10, No. 2, pp.141–171.
Chang, C-C. and Lin, C-J. (2001) LIBSVM: A Library for Support Vector Machines, Software
available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Collins, M. and Duffy, N. (2002) ‘Convolution kernels for natural language’, in Dietterich, T.G.,
Becker, S. and Ghahramani, Z. (Eds.): Advances in Neural Information Processing Systems,
MIT Press, Cambridge, MA, Vol. 14, pp.625–632.
Cortes, C. and Vapnik, V.N. (1995) ‘Support-vector networks’, Machine Learning Vol. 20,
pp.273–297.
Cristianini, N. and Shawe-Taylor, J. (1999) An Introduction to Support Vector Machines:
And other Kernel-based Learning Methods, Cambridge University Press, NY.
Dong, J.X., Krzyzak, A. and Suen, C.Y. (2003) ‘A fast parallel optimization for training support
vector machine’, in Perner, P. and Rosenfeld, A. (Eds.): Proceedings of 3rd International
Conference on Machine Learning and Data Mining, Springer Lecture Notes in Artificial
Intelligence (LNAI 2734), Leipzig, Germany, pp.96–105.
Duan, K., Keerthi, S.S. and Poo, A.N. (2003) ‘Evaluation of simple performance measures for
tuning SVM hyperparameters’, Neurocomputing, Vol. 51, pp.41–59.
Gärtner, T. (2003) ‘A survey of Kernels for structured data’, ACM SIGKDD Explorations
Newsletter, Vol. 5, pp.49–58.
Gärtner, T. Flach, P.A. and Wrobel, S. (2003) ‘On graph kernels: hardness results and efficient
alternatives’, Proceedings of the 16th Annual Conference on Computational Learning Theory
and the 7th Kernel Workshop.
Graf, H-P., Cosatto, E., Bottou, L., Dourdanovic, I. and Vapnik, V.N. (2005) ‘Parallel support
vector machines: the cascade SVM’, in Saul, L., Weiss, Y. and Bottou, L. (Eds.): Advances in
Neural Information Processing Systems, MIT Press, MIT Press, Cambridge, MA, Vol. 17,
pp.513–520.
Haussler, D. (1999) ‘Convolution kernels on discrete structures’, Technical report
UCSC-CRL-99-10, Department of Computer Science, University of California at Santa Cruz.
Hirst, J.D., King, R.D. and Sternberg, M.J.E. (1994a) ‘Quantitative structure-activity relationships
by neural networks and inductive logic programming. I. The inhibition of dihydrofolate
reductase by pyrimidines’, Journal of Computer-Aided Molecular Design, Vol. 8, No. 4,
pp.405–420.
Granular Kernel Trees with parallel Genetic Algorithms 285

Hirst, J.D., King, R.D. and Sternberg, M.J.E. (1994b) ‘Quantitative structure-activity relationships
by neural networks and inductive logic programming. II. The inhibition of dihydrofolate
reductase by triazines’, Journal of Computer-Aided Molecular Design, Vol. 8, No. 4,
pp.421–432.
Joachims, T. (2000) ‘Estimating the generalization performance of a SVM efficiently’, Proceedings
of the International Conference on Machine Learning, Morgan Kaufman.
Kashima, H. and Inokuchi, A. (2002) ‘Kernels for graph classification’, Proc. 1st ICDM Workshop
on Active Mining (AM-2002), Maebashi, Japan.
Kashima, H. and Koyanagi, T. (2002) ‘Kernels for semi-structured data’, Proceedings of the
Nineteenth International Conference on Machine Learning, pp.291–298.
Lin, S-H., Goodman, E.D. and Punch III, W.F. (1997) ‘Investigating parallel genetic algorithms on
job shop scheduling problem’, Proceedings of the 6th International Conference on
Evolutionary Programming VI.
Lodhi, H., Shawe-Taylor, J., Christianini, N. and Watkins, C. (2001) ‘Text classification using
string kernels’, in Leen, T., Dietterich, T. and Tresp, V. (Eds.): Advances in Neural
Information Processing Systems, MIT Press, Cambridge, MA, Vol. 13, pp.563–569.
Michalewicz, Z. (1996) Genetic Algorithms + Data Structures = Evolution Programs,
Springer-Verlag, Berlin.
Newman, D.J., Hettich, S., Blake, C.L. and Merz, C.J. (1998) UCI Repository of Machine Learning
Databases, [http://www.ics.uci.edu/~mlearn/MLRepository.html], University of California,
Department of Information and Computer Science, Irvine, CA.
Runarsson, T.P. and Sigurdsson, S. (2004) ‘Asynchronous parallel evolutionary model selection for
support vector machines’, Neural Information Processing – Letters and Reviews, Vol. 3, No. 3
pp.59–67.
Schölkopf, B., Tsuda, K. and Vert, J-P. (2004) Kernel Methods in Computational Biology,
MIT Press, Cambridge, MA.
Serafini, T., Zanni, L. and Zanghirati, G. (2004) Parallel GPDT A Parallel Gradient Projection-
based Decomposition Technique for Support Vector Machines, http://www.dm.unife.it/gpdt.
Shawe-Taylor, J. and Cristianini, N. (2004) Kernel Methods for Pattern Analysis, Cambridge
University Press, Cambridge, MA.
Vapnik, V.N. (1998) Statistical Learning Theory, John Wiley and Sons, New York.
Vapnik, V.N. and Chapelle, O. (2000) ‘Bounds on error expectation for support vector machine’,
in Smola, A., Bartlett, P., Schölkopf, B. and Schuurmans, D. (Eds.): Advances in Large
Margin Classifiers, MIT Press, Cambridge, MA, pp.261–280.
Weston, J., Perez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A. and Schölkopf, B. (2003)
‘Feature selection and transduction for prediction of molecular bioactivity for drug design’,
Bioinformatics, Vol. 19, No. 6, pp.764–771.
Zanghirati, G. and Zanni, L. (2003) ‘Parallel solver for large quadratic programs in training support
vector machines’, Parallel Computing, Vol. 29, pp.535–551.

Potrebbero piacerti anche