Sei sulla pagina 1di 10

Manuscript by Nai Ding

Application of chaotic neural network mimicking olfactory system


and SVM in classifying reconstituted milk and fresh milk

Abstract--This paper presents a new approach for pattern recognition in machine


olfaction by combining a novel chaotic neural network -- KIII model and support
vector machine (SVM). In this approach, feature vectors are firstly processed by
KIII model stimulating information processing process in olfactory bulb and then
are classified by SVM. This novel approach is applied to classify reconstituted
milk from fresh milk and gain high accuracy for complex data and is more robust
when unexpected noise is added comparing with several traditional approaches.

1 Introduction
Many pattern recognition approaches have been applied to electronic nose [1] [2].
However, until now, robustness and accuracy is still the main weakness for pattern
analysis on electronic nose. Problems such as drift compensation, mixture
separation and identification against complex odor backgrounds are still challenges.
In contrast with artificial electronic nose’s limitation, the mammalian olfactory
system can detect and interpret the information from volatile molecules in the
environment with a high degree of sensitivity, selectivity and stability. Therefore,
many researchers begin to pay more attention to biologically-inspired odor
processing models [3], to conquer difficulties in machine olfactory.

Milk classification using electronic nose is an especially difficult task due to the
heterogeneous nature of dairy product [4]. Difference in heat treatment, protein and
fat concentration can all affect aroma of milk [5]. Thereby, classification of milk
always deals with noisy data with complicated structure. This paper investigates
the performance of a novel classification approach by cascading bionic KIII model
and SVM in classifying reconstituted milk and fresh milk.

2 Description of KIII Model and SVM


2.1 KIII Model
The KIII network describes the olfactory neural system, including populations of
neurons, local synaptic connection, and long forward and distributed time-delayed
feedback loops. With parameters optimized and additive noise introduced, KIII
model can simulate the output EEG waveform observed in electrophysiological
experiments. In the topology of the KIII network (fig.1), R represents the olfactory
receptor, which is sensitive to the odor molecule. M represents mitral cell, the
response of which is used as the activity measure. A full description of KIII can be
found in [6].

In olfactory system, each odorant activates a subset of the receptor cells and
initiates a spatial pattern of action potentials. This pattern initiates another spatial
pattern of activity in the outer layer of the olfactory bulb. These spatial patterns
have no specific topographic relation to the stimulus input pattern [7].
Discrimination among odors is a problem of spatial pattern recognition.
Manuscript by Nai Ding

Mimicking olfactory system, KIII model receives stimulus at receptor level and
transforms this stimulus pattern to amplitude modulation at olfactory bulb level. If
the standard deviations of responses of mitral cells are viewed as the output, KIII
model with N channels can be regarded as a nonlinear multiple-input
multiple-output system mapping input vector in its N-dimensional input space onto
its N-dimensional output space. KIII model aims at stimulating the information
processing phase in olfactory system but it does not model the decision making
function realized by higher level neural systems.

Learning Rule
Associative learning in olfactory bulb is by enhancement of mutually excitatory
synapses among the mitral level, creating a landscape of chaotic attractors. Each
attractor, formed in a self-organized way, represents a class.
Habituation is also an essential part of discrimination of sensory stimuli. It takes
place at the synapses of the excitatory neurons onto both inhibitory and other
excitatory neurons. A modified Hebbian learning rule with habituation is employed
to train KIII model [8].

2.2 SVM
Basically the SVM [9] [10] is a linear machine that applies a kernel method to map
the data into a higher dimensional space where a hyperplane can be used to do the
separation. It hinges on two mathematical operations [11]. One is to map the input
vector into a higher dimensional feature space in which non-linear separable
patterns become highly probable to be linearly separable. The other is to construct
an optimal hyperplane in the new feature space. The nonlinear map in operation
one is achieved by a kernel method. Each kernel function, satisfying Mercer’s
Theorem, corresponds to a space where the function is defined as an inner product.
In the new space a hyperplane is constructed in such a way that the margin of
separation between different classes is maximized. Different from
back-propagation algorithm devised to train a multilayer perceptron with
artificially designed structure, SVM automatically determines the required number
of hidden units (the number of support vectors (SV)). The decision function of
SVM can be expressed by equation (1)[12],
⎛ ⎞
I ( x ) = sign ⎜ ∑ α i K ( xi ⋅ x ) + b0 ⎟ (1)
⎝ SV in training set ⎠
K ( x . x ) represents the kernel function.
i

2.3 Combination of KIII and SVM


As mentioned in Section 2.1, KIII model can process and learn data but it does not
output a class label. Usually, a traditional classifier is needed to be cascaded to
accomplish pattern recognition task. Previous works [13]14] have shown that
minimum Euclidean distance classifier can usually yield a satisfying result for data
preprocessed by KIII. However, minimum Euclidean distance classifier is only
optimum for normally distributed classes sharing the same diagonal covariance
matrices. When samples do not conform to this assumption, minimum Euclidean
distance classifier is not optimum and some more powerful classifier may have
better performance. In this paper, SVM is adopted to classify the feature vector
Manuscript by Nai Ding

preprocessed by KIII.

KIII model, trained unsupervisedly, stimulates the nonlinear map in olfactory


system. SVM also involves mapping data into a space with higher dimension using
a kernel method. KIII’s self-organized map may strengthen SVM’s ability to
transform data and make it easier to find the intrinsic variances of different classes
in SVM’s supervised learning phase.

If the mapping relation of KIII model is noted as y = ϕ ( x , S ) , where x denoting


the stimuli at R node, y denoting the standard deviation at M node and S denoting
the training set for KIII network. Then, this cascaded classifier, noted as KIII-SVM,
can be mathematically expressed as equation (2)
⎛ ⎞
⎜ ⎛ ⎞ ⎟
I ( x) = sign ⎜ ϕ ϕ
∑ αi K ⎜ ( xi , ST ) ⋅ ( x, ST ) ⎟ + b0 ⎟ (2)
⎜ SV in ϕ ( ST ,ST ) ⎝ ⎠ ⎟
⎝ ⎠
Some former experiments [15] have shown that KIII network performs well when
the training set is small, which inspired us to use training sets of different size for
the cascaded KIII and SVM. The simplest method to accomplish this idea is to
partition the training set ST into two non-overlapping subsets, one being S K the

other being ST − S K . S K is used as the training set for KIII and obtain the trained
network ϕ ( i, S ) , the whole training set S T is used as training set for SVM after
K

being processed by KIII. In other word, ϕ ( S , S ) rather than ϕ ( S , S ) is used


T K T T

to train SVM in this model. This model, noted as KIII-SVM-modified, can be


expressed as equation (3)
⎛ ⎞
⎜ ⎛ ⎞ ⎟
I j ( x ) = sign ⎜ ∑ α i K ⎜ϕ ( xi , ST ) ⋅ϕ ( x, ST ) ⎟ + b0 ⎟ (3)
⎜ SV in ϕ ( ST , S K ) ⎝ ⎠ ⎟
⎝ ⎠
A further extension of this model is inspired by the fact that one set can be
partitioned into two subsets by many different methods. When using different
subset pairs, we can train different KIII networks and therefore gain different KIII
preprocessed data. Consequently, the structure of cascaded SVM will differ too.
This indicates that we can construct different classifiers using one training set by
manipulating the selection of training set like the bagging algorithm [0]. When a
series of classifiers gotten, we can use a majority voting method [0] to combine
them together. As mentioned in [0], ensemble method can partly overcome the
statistical, computational and representation problem a single classifier may suffer
from, . This bagging like method, noted as KIII-SVM-ensemble, can be expressed
as equation (4)
⎧ ⎛ ⎛ ⎞ ⎞
⎪ I j ( x) = sign ⎜⎜ ∑ α i K ⎜ϕ ( xi , S k ) ⋅ϕ ( x, Sk ) ⎟ + b0 ⎟⎟
⎪ ⎝ SV in ST ⎝ ⎠ ⎠ (4)

⎪ ⎛ p

⎪ I ( x ) = sign ⎜ ∑ I j ( x ) ⎟
⎩ ⎝ j =0 ⎠
Manuscript by Nai Ding

3 Experimental Results
3.1 Data Acquisition
Our experiments employ static head-space analysis. A tin oxide gas sensor array
with 8 sensors (TGS880 (2×), TGS813 (2×), TGS822 (2×), TGS800, TGS823)
from Figaro Engineering Inc. is mounted into a chamber. After equilibrated air
above a milk sample is injected into the chamber, sensors begin to be heated. The
step response of the gas sensor array is recorded. Diverse methods to extract
features from dynamic response of gas sensor array exist [16], but in our
experiment maximum response is adopted as the single feature for each sensor. As
mentioned in [17], when the ratio of samples to variables is low many erroneous
classifications may be made. Thereby, if too many features are extracted, a large
amount of measurements have to be conducted to make classification result
convincing, which brings difficulty to experiment. Classification rate when
maximum response is adopted as the single parameter is higher than those when
rise time, maximum slopes and stable response are adopted in our pretest.

Since sensor array’s response to milk is not strong, the impact of humidity and
temperature is not neglectable in our experiment. Equation (5) is used for the
purpose of drift compensation, with sensor response in fresh air as the baseline. To
compensate for differences in concentration between samples, each feature vector
is normalized using equation (6).
Robserved − Rbaseline R
R= (5) R= (6)
Rbaseline
∑ ( R (i ))
2

3.2 Sample description


Six brands of commercial UHT fresh milk and six brands of commercial whole
milk powder were collected from the market in Hangzhou, China. Reconstituted
milk samples were prepared with whole milk powder and water. The amount of
water added to milk powder is selected to be the one making the feature vectors of
fresh milk and reconstituted milk has the minimum Euclidean distance. 8
consecutive measurements are conducted for each brand of dairy. Therefore,
samples of one brand of milk can be supposed to be coherent. However, because
the heterogeneous nature of milk samples of different brands of fresh or
reconstituted milk may have observable difference [18]. Therefore, both fresh and
reconstituted milk can be deemed as containing six subclasses.

3.3 Experiments
Four experiments are conducted with different purposes. Experiment I and
Experiment II measure classifiers’ ability to classify fresh and reconstituted milk
from different perspective. Experiment III is designed to measure classifier’s
ability to detect reconstituted milk adulterated into fresh milk. Experiment IV is to
evaluate the impact of fresh milk’s concentration on adulteration detection.

For the first two experiments, all twelve brands of dairy are considered. The six
brands of fresh milk are regarded as one class and the six brands of reconstituted
milk are regarded as another. Difference between the two experiments lies in the
methods to select training set for classifiers. Experiment I uses ‘non-blind’
Manuscript by Nai Ding

selection and Experiment II uses ‘blind’ selection. ‘Non-blind’ selection performs


the selection considering the prior information of subclass. This method selects the
same number of samples from each subclass to compose the training set. In
contrast, the ‘blind’ selection dismisses the subclass information and randomly
selects training samples from fresh milk set and reconstituted milk set. In this
method, some subclass may contribute more samples to the training set while
others may contribute less or none.

In these two experiments, four classifying approaches are employed, namely


minimum Euclidean distance classifier, KIII-minimum Euclidean distance
classifier, SVM, KIII-SVM. SVM adopts radial-basis function kernel with its
parameter optimized. KIII-SVM-modified and KIII-SVM-ensemble are also
applied in Experiment II. 5 basis classifiers are combined for KIII-SVM-ensemble
approach in our experiments. For the minimum Euclidean distance classifier
employed in Experiment I, we calculate a cluster center for each subclass. If the
cluster center nearest to an unknown sample is of a brand of fresh milk then the
sample is categorized as fresh milk. It will be labeled as reconstituted milk
otherwise. This method can be viewed as a multi-class method because it calculates
12 cluster centers. However it dose not output information about which brand the
sample belongs to, which means misclassification between different brands within
fresh milk or different brands of reconstituted milk is tolerated. In Experiment II,
with subclass information being ignored, only two clusters are calculated when
applying minimum distance classifier. One is for all brands of fresh milk; the other
is for all brands of reconstituted milk. In this method, the classification problem
becomes a pure two class problem and samples in each class may have complicated
distribution.

In Experiment III, one brand of fresh milk and one brand of reconstituted milk
randomly selected are mixed together with different ratio. Samples with
reconstituted milk’s ratio under one value are deemed as one class and other
samples are deemed as another class. In Experiment IV, the two brands of dairy
used in Experiment III are used again. With reconstituted milk’s concentration
fixed, different amount of water is added into fresh milk. Classifiers trained in
Experiment III are employed to classify the diluted fresh milk and reconstituted
milk.

In Experiment I and Experiment II, 8 sensor’s response amplitudes construct a


8-dimentional feature vector. Nevertheless, in Experiment III and Experiment IV
only the first three components of the feature vector are used for classification.
This is to keep the ratio of data points to variables bigger than six, as suggested in
[19].

3.4 Results
Four classifiers’ performances in Experiment I with different training set size are
shown in fig.2. Result of Experiment II is shown in Fig.3 and Fig.4. Noting that
KIII-SVM outperforms SVM with 30-50 training samples, the size of KIII’s
training set for KIII-SVM-modified and KIII-SVM-ensemble are chosen to be 48
or 36. In Fig.4 approaches with KIII trained by 36 samples are noted as
KIII-SVM-modified(1) and KIII-SVM-ensemble(1) while approaches with KIII
trained by 48 samples are noted as KIII-SVM-modified(2) and
Manuscript by Nai Ding

KIII-SVM-ensemble(2). From the results, approaches adopting SVM outperform


those adopting minimum Euclidean distance classifier significantly and
KIII-SVM-ensemble achieves the highest accuracy.

Experiment III’s results are in Table 1. When 80% or more reconstituted milk is
adulterated into fresh milk, classifiers can achieve fairly satisfying classification
rate. Classifiers trained by all samples considered in this condition are tested in
Experiment IV. Different approaches’ performance with different levels of water
added into fresh milk is shown in Fig.5. The normalized Euclidean distances
between cluster center of reconstituted milk and cluster centers of milk water
mixture with 100%, 90%, 80%, 70%, 60% milk are 0.09 0.18 0.44 1.00
respectively. The Euclidean distance grows longer as the proportion of water grows
higher, but SVM’s classification rate dose not fall gradually.

3.5 Accuracy Analysis


In Experiment I, with structure of data is clear and each subclass can be
approximately assumed to normally distributed, minimum Euclidean distance
classifier achieves similar accuracy with SVM. But in experiment II, when data of
each pattern contains a more unknown complex structure and is more different
from normally distributed sets, minimum Euclidean distance classifiers perform
poorly. This result concurs with the discussion in Section 2.3. Minimum Euclidean
distance classifier is designed with a Gaussian priori and its performance can not
be guaranteed when data is complex. KIII’s pre-procession can only improve
accuracy when data is complex.

SVM performs well for both data with simple or complex structure. KIII can not
enhance SVM’s accuracy directly. However, the involvement of KIII makes it easy
to construct ensemble classifiers. The size of training set of KIII does not affect
classification result significantly and the differences in training set for KIII result
in differently transformed data. Maintaining the size of training set for SVM large
enough enables SVM to make as accurate prediction as possible for data
transformed in different ways. The KIII-SVM-ensemble is shown to be the most
accurate approach for complex data in Experiment II.

3.6 Robustness Analysis


Robustness is of great importance for pattern recognition. For reconstituted milk
detection, the concentration of milk surely cannot be deemed as a constant.
However, in Experiment IV, when concentration’s impact considered, SVM’s
performance becomes extremely unstable. This experiment also shows that KIII’s
preprocessing make SVM generate more robust and meaningful classification
result. This malfunction of SVM does not conflict with the generalizing ability it
show in most cases, since generalizing ability is usually measured when the
training set can comprehensively represent the statistical attribute of the testing set.
However, in experiment IV, the training set only contains information about fresh
and reconstituted milk at a specific concentration. Thereby the hyperplane learned
by SVM is only optimum to classify samples of fresh and reconstituted milk at that
concentration. As the training set does not predict the change in data when
unexpected noise added (In this experiment, the noise is the impact of additional
water), the hyperplane is not necessarily still the optimum one for classifying noisy
data. That is to say finding the optimum hyperplane to separate classes in training
Manuscript by Nai Ding

set is one problem, but finding the optimum hyperplane reflecting the intrinsic
differences of different classes is another, when the training set does not carry
comprehensive information.

4 Discussion
Pattern recognition involves a process to transform a realistic scene to a feature
vector that is easy to deal with. It is a process to reduce dimensionality, to
strengthen differences between classes, and to make patterns more separable. This
process may be accomplished explicitly in signal processing phase or implicitly by
the classifier with different orientations. For example, discriminant function
analysis (DFA) and the kernel method used in SVM are oriented to map the data
into a space where the patterns become more separable. Isometric feature mapping
(ISOMAP), locally linear embedding (LLE) and other manifold learning method
aim at reducing dimension by learning the underlying global geometry of a data set.
Principle component analysis (PCA) and independent component analysis (ICA)
are designed to exploit the statistical structure of data. However, information
processing in biology system is highly complicated whose mechanism is still
unknown but is definitely not optimized according to one specific mathematical
object. As mentioned in [20], we need to characterize how representations are
transformed such that they are useful. KIII network can stimulate EEG waveform
in electrophysiological experiments and perhaps can partly stimulate the signal
processing mechanism in mammalian olfactory system. It may transform data to
strengthen the useful information in them rather than simply make them more
separable. The information processing mechanism of KIII is still unknown too
and is worth studying.

5. Conclution
This article demonstrates that KIII-SVM-ensemble is the best approach to classify
reconstituted milk from fresh milk considering its robustness and accuracy shown
in Experiment II and IV. Preprocessing feature vectors with KIII model is a choice
when data is complex and may suffer from unexpected noise.

References
(To be added)
Manuscript by Nai Ding

Approaches SVM KIII-SVM- KIII-SVM-

fresh: ensemble

reconstituted

1:0 91% 86% 81%

0.8:0.2 87% 86% 78%

0.6:0.4 71% 67% 65%

0.2:0.8 64% 55% 61%

Table 1. Correction rates of different classifiers

when classifying fresh milk and fresh milk

Fig.1. Topological diagram for KIII network

randomly chosen training set


equal training samples from each brand of dairy 87
86

86

84
85

84
82
correction rate %
correction rate %

83

80
82

81 KIII-SVM(ensemble)1
78
KIII-SVM(ensemble)2
KIII-SVM 80 SVM
SVM KIII-SVM(modified)1
76
minimum distance 79 KIII-SVM(modified)2
KIII-minimum distance KIII-SVM
78
74 60 65 70 75 80 85
10 20 30 40 50 60 70 80 90
size of training set
size of training set

Fig.2. Performance measures for different classifying Fig.4. Performance measures for different approaches

approaches with ‘non-blind’ training set selection cascading KIII and SVM with ‘non-blind’ training set

mix water and milk together


randomly chosen training set 100
84
KIII-SVM
90
82 SVM
minimum distance
80
80 KIII-minimum distance

70
78
correction rate %
correction rate %

60
76

50
74

40
72

30
70
SVM
20
68 KIII-SVM(ensemble)
KIII-SVM
10
66 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
10 20 30 40 50 60 70 80 90
milk’s proportion in the milk-water mixture
size of training set

Fig.3. Performance measures for different Fig.5. Performance measures for different classifying

classifying approaches with ‘blind’ training set approaches with different proportion of water added
The training phase of KIII-SVM-ensemble is illustrated in fig.1.
The testing phase of KIII-SVM-ensemble is illustrated in fig.2.
The testing and training phases of KIII-SVM are illustrated in fig.3.

Fig.1. the training phase for KIII-SVM-ensemble

Fig.2. the testing phase for KIII-SVM-ensemble


Fig.2. the training and testing phases for KIII-SVM

Potrebbero piacerti anche