Sei sulla pagina 1di 24

Measurement Science and Technology

ACCEPTED MANUSCRIPT

Intelligent fault diagnosis of rolling bearing using improved deep


recurrent neural network
To cite this article before publication: Hongkai Jiang et al 2018 Meas. Sci. Technol. in press https://doi.org/10.1088/1361-6501/aab945

Manuscript version: Accepted Manuscript


Accepted Manuscript is “the version of the article accepted for publication including all changes made as a result of the peer review process,
and which may also include the addition to the article by IOP Publishing of a header, an article ID, a cover sheet and/or an ‘Accepted
Manuscript’ watermark, but excluding any other editing, typesetting or other changes made by IOP Publishing and/or its licensors”

This Accepted Manuscript is © 2018 IOP Publishing Ltd.

During the embargo period (the 12 month period from the publication of the Version of Record of this article), the Accepted Manuscript is fully
protected by copyright and cannot be reused or reposted elsewhere.
As the Version of Record of this article is going to be / has been published on a subscription basis, this Accepted Manuscript is available for reuse
under a CC BY-NC-ND 3.0 licence after the 12 month embargo period.

After the embargo period, everyone is permitted to use copy and redistribute this article for non-commercial purposes only, provided that they
adhere to all the terms of the licence https://creativecommons.org/licences/by-nc-nd/3.0

Although reasonable endeavours have been taken to obtain all necessary permissions from third parties to include their copyrighted content
within this article, their full citation and copyright line may not be present in this Accepted Manuscript version. Before using any content from this
article, please refer to the Version of Record on IOPscience once published for full citation and copyright details, as permissions will likely be
required. All third party content is fully copyright protected, unless specifically stated otherwise in the figure caption in the Version of Record.

View the article online for updates and enhancements.

This content was downloaded from IP address 128.111.121.42 on 29/03/2018 at 06:29


Page 1 of 23 AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1

1
2
3
4 Intelligent fault diagnosis of rolling bearing using improved
5
6
7 deep recurrent neural network
8

pt
9
10 Hongkai Jiang, Xingqiu Li, Haidong Shao, Ke Zhao
11 School of Aeronautics, Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China
12
13
Abstract: Traditional intelligent fault diagnosis methods for rolling bearing heavily depend on

cri
14
15 manual feature extraction and feature selection. For this purpose, an intelligent deep learning method
16
named improved deep recurrent neural network (DRNN) is proposed in this paper. Firstly, frequency
17
18 spectrum sequences are used as inputs to reduce the input size and ensure good robustness. Secondly,
19
DRNN is constructed by the stacks of the recurrent hidden layer to automatically extract the features
20

us
21 from the input spectrum sequences. Thirdly, adaptive learning rate is adopted to improve the training
22 performance of the constructed DRNN. The proposed method is verified with the experimental rolling
23
24 bearing data, and the results confirm that the proposed method is more effective than traditional
25 intelligent fault diagnosis methods.
26
27
28
29
an
Keywords: Intelligent fault diagnosis; Rolling bearing; Deep learning; Deep recurrent neural
network; Adaptive learning rate
30
31
32
dM
1 Introduction
33
34
35 Rolling bearing is an important mechanical component, especially in rotating machinery. Due to its
36
severe working environment, rolling bearing is prone to failure, which will result in economic losses
37
38 and even security accidents [1]. Therefore, rolling bearing fault diagnosis has attracted much attention
39 in recent decades [2-4].
40
41 With the rapid development of computer science, artificial intelligence (AI) has sprung up and been
42 widely applied in all walks of life, in which machine learning is one of the most popular research areas
43
pte

44 [5]. Traditional intelligent fault diagnosis methods based on machine learning have been widely
45 researched and applied for fault diagnosis of rotating machinery [6],especially the artificial neural
46
47 network (ANN) and support vector machine (SVM). Traditional intelligent methods for fault diagnosis
48 using ANN and SVM generally contain three steps i.e. manual feature extraction based on signal
49
processing, manual feature selection and fault classification [7]. Muruganatham et al. designed the
50
51 feature sets for fault detection by extracting the singular values and the energy of principal component,
ce

52 and then ANN was used as the fault classifier [8]. The vibration signals were processed with ensemble
53
54 empirical mode decomposition (EEMD), and then energy features were extracted and fed into ANN for
55 fault diagnosis by Yang et al. [9]. Lei et al. calculated varieties of time-domain and frequency-domain
56
features for monitoring locomotive rolling bearing conditions, and then applied ANN for fault
Ac

57
58 classification [10]. Bin et al. proposed energy momentum as features, and employed back propagation
59
60 neural network (BPNN) for the early fault diagnosis of rotating machinery [11]. Pacheco et al. studied
AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1 Page 2 of 23

1
2
3 the performance of several typical ANNs for gearbox fault diagnosis with different feature extraction
4
5 methods [12]. Li et al. applied modified multi-scale symbolic dynamic entropy for feature extraction ,
6 and selected the most important features by minimum redundancy maximum relevance, then used the
7
least square support vector machine (LSSVM) for fault classification [13]. Lu et al. used EEMD to
8

pt
9 decompose the vibration signals, and then features in time domain, frequency domain, and
10 time-frequency domain are extracted and inputted into SVM for fault diagnosis [14]. Zhang et al.
11
12 preprocessed the raw vibration signals using EEMD, then permutation entropy are calculated and fed
13 into SVM for bearing fault detection [15]. Keskes et al. used stationary wavelet packet transform for

cri
14
15 feature extraction, and applied SVM for rotor fault diagnosis [16]. According to the above reviews,
16 ANNs or SVM based intelligent fault diagnosis method still has the following shortcomings: (1) In the
17
18 engineering practice, the fault characteristics are often weak, diversity and complicated due to that the
19 measured signals of rolling bearing are usually non-stationary and mixed with noise [17, 18]. Thus,
20
advanced signal processing techniques are required to extract valuable features. (2) On the one hand,

us
21
22 manually extracting discriminative and sensitive features for a specific fault diagnosis problem has a
23 great dependence on the researchers’ prior knowledge and costs lots of time. On the other hand, the
24
25 extracted sensitive features usually show poor generalization ability, that is, the selected features
26 performs well only in the certain case [19]. (3) ANNs and SVM belong to the shallow machine learning
27
28
29
an
model. However, shallow architectures have a common problem .i.e. its nonlinear approximation
ability is limited, which leads to its poor performance when dealing with complicated classification
30
problems [6]. Thus, it is of urgent need to study a new method to get rid of the dependence on manual
31
32 feature extraction and feature selection.
dM
33 Deep learning was first proposed by Hinton in 2006 [20]. Unlike ANNs and SVM, deep learning
34
35 models can automatically learn the valuable features from the input data due to the deep architecture
36 [21]. At present, there mainly exist four kind of deep learning models i.e. deep belief network (DBN),
37
38 stacked auto-encoder (SAE), convolutional neural network (CNN) and recurrent neural network (RNN),
39 which have been applied successfully for various tasks [22]. Nevertheless, only about three years ago,
40
41 deep learning models were introduced to machinery fault diagnosis. Since then, our group has been
42 dedicated to the intelligent fault diagnosis based on deep learning methods. Shao et al. proposed
43
pte

various novel deep learning methods for rotating machinery fault diagnosis in recent years [23-28],
44
45 including the optimization DBN in 2015 [23], the deep auto-encoder constructed with denoising
46 auto-encoder and contractive auto-encoder in 2016 [24], the deep wavelet auto-encoder [27] and the
47
48 convolutional DBN in 2017 [28]; Wang et al. developed an adaptive CNN for rolling bearing fault
49 diagnosis in 2017 [29]. Except our group, Feng et al. developed a normalized sparse auto encoder for
50
51 mechanical fault diagnosis [30]. Yin et al. designed a DBN to monitoring the high speed trains
ce

52 conditions [31]. Tamilselvan et al. employed DBN for fault diagnosis of aircraft engine [32]. To our
53
best known, deep recurrent neural network hasn’t been employed for rolling bearing fault diagnosis.
54
55 Largely different from DBN, SAE and CNN, recurrent neural network (RNN) is a novel deep
56 learning model among whose hidden units there exist recurrent connections [33]. RNN is designed to
Ac

57
58 model sequential data and able to capture long-term dependencies of different time-step sequential
59 signals [34]. Considering that the measured vibration signals of rolling bearing are in nature time
60
Page 3 of 23 AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1

1
2
3 sequences, it is more suitable to apply RNN for analyzing the measured signals. However, there are
4
5 two main challenges when directly applying the conventional RNN for rolling bearing fault diagnosis.
6 (1) The raw vibration signals usually contain heavy background noise, so the RNN model built directly
7
on top of the measured signals might not robust [35]. Besides, plenty of data points of each vibration
8

pt
9 signal are required in order to ensure high diagnostic accuracy, which is likely to result in much more
10 computing time. (2)As is known, the learning rate is usually fixed in the training process of
11
12 conventional RNN [36, 37]. A learning rate set unsuitably may lead to error oscillation and slow
13 convergence while tuning the learning rate is a difficult and time-consuming task. Thus, it is necessary

cri
14
15 to make the learning rate adapt to the training process.
16 In this paper, an improved deep recurrent neural network is proposed for fault diagnosis of rolling
17
18 bearing. The frequency-domain sequential signals are inputted into the proposed method, and the
19 results show that the proposed method has higher diagnosis accuracy and better robustness compared
20
with traditional intelligent fault diagnosis methods. The contributions of the proposed method can be

us
21
22 summarized as follows:
23
24 ➢ In order to simplify the architecture of the network and ensure good robustness, the frequency
25
spectrum sequences referring to the vibration signals are used for designing the input data.
26
27
28
29
an
➢ In order to get rid of the dependence on the manual feature extraction and feature selection, deep
recurrent neural network is constructed to automatically and effectively learn the useful features
30 from the input data.
31
32
dM
➢ In order to improve the training process of the constructed deep recurrent neural network and
33
34 reduce the time of tuning the parameters, adaptive learning rate is adopted.
35
36 The rest of this paper is organized as follows. In Section 2, the basic theory of RNN is introduced
37
briefly. Section 3 is dedicated to a detailed description of the proposed method. In Section 4, four
38
39 experiments are carried out to verify the effectiveness of the proposed method. In Section 5, the
40 summary of this paper is drawn.
41
42
43 2 Basic theory of recurrent neural network
pte

44
45 Shown as Fig. 1, RNN build cycle connections among its hidden units, which is different from the
46
47 feed forward neural networks (FNNs), such as ANN, DBN and CNN [37]. That is the output of hidden
48 layer of RNN can be directly inputted into itself at the next moment. Therefore, it is believed that the
49
50 output of hidden layer of RNN at time t is determined not only by the input of the current time but also
51
ce

by its own state at time t-1. The above process can be expressed by the mathematical Eq. (1) and Eq.
52
(2):
53
54 ht  f h( Wih xt  Whh ht 1  bh ) (1)
55
56 y t  fo ( Who ht  bo ) (2)
Ac

57
where f h and f o are the activation functions of the hidden layer and the output layer respectively, Wih
58
59 are the weight matrix connecting input layer with hidden layer, Whh is the weight matrix of hidden
60
AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1 Page 4 of 23

1
2
3 layer to its own loop connection, Who is connection weight matrix between hidden layer and output
4
5 layer, bh and bo are bias vectors of hidden layer and output layer respectively.
6
7
8 Output Layer

pt
9
10
11
12
13

cri
14 Hidden Layer
15
16
17 Input Layer
18 t-1 t
19 (a) (b)
20 Fig.1. (a) The structure of RNN, (b) the structure of RNN across a time step

us
21
22 However, conventional RNN shown in Fig. 1 has its inherent flaws. According to Fig. 1 (b), we can
23
see that the architecture of RNN across time steps is equivalent to a feed forward neural network with
24
25 multiple hidden layers, and the number of time steps can be regarded as its total number of layers.
26
27
28
29
an
When RNN is trained using back propagation through time (BPTT), the error back propagates not only
from output layer to hidden layer but also through time t to time 1 simultaneously [38]. However, it can
be seen that if t is too large, the learning process will be especially challenging due to the gradient
30
31 vanishing or exploding problem [36].
32 Therefore, an improved RNN model named long-short term memory recurrent neural network
dM
33
34 (LSTMRNN) was proposed to overcome the flaws of conventional RNN [39]. From Fig. 2, we find
35 that LSTMRNN can be acquired by replacing the hidden neurons of conventional RNN with LSTM
36
units. The most obvious characteristic of LSTM unit is that it mainly consists of a memory cell and
37
38 three layer gates i.e. input gate, forget gate and output gate. In addition, the dashed lines connecting the
39 memory cell with three layer gates are called peephole connections [40]. Such architecture of LSTM
40
41 cell greatly relaxes the problem of gradient vanishing or exploding. Therefore, this paper adopts the
42 LSTMRNN model to get satisfactory results. The mathematical calculating procedure of LSTM unit
43
pte

44 can be described as
45 gt  g( Wih  xt  Whh  ht 1  bh ) (3)
46
47 it  σ( Wiig  xt  Whig  ht 1  pig ⊙ct 1  big ) (4)
48
49 ft  σ( Wif g  xt  Whf g  ht 1  p f g ⊙ct 1  b f g ) (5)
50
ot  σ( Wiog  xt  Whog  ht 1  pog ⊙ct  bog ) (6)
51
ce

52 ct  it ⊙gt  ft ⊙ct 1 (7)


53
54 ht  ot ⊙h( ct ) (8)
55
Where σ , g , h are the gate activation function, the input and the output activation functions of
56
Ac

57 LSTM units, respectively; Wih , Wiig , Wif g , Wiog are the weight matrices between the input layer
58
59 and the LSTM layer at time t; Whh , Whig , Whf g , Whog are the self-connection weight matrices of
60
Page 5 of 23 AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1

1
2
3 LSTM units between time t and t-1; b h , big , b f g and bog are bias vectors of the input nodes, the
4
5 input gate, the forget gate and the output gate, respectively; pig , p f g , pog are the weight matrices
6
7 between the peephole connections and the three gate units, respectively.
8

pt
9 3. The proposed method
10
11 This paper proposes a novel intelligent method based on improved deep recurrent neural network for
12 fault diagnosis of rolling bearing. The proposed method mainly consists of four parts: firstly, the
13
bearing data set design based on frequency spectrum sequences; secondly, deep recurrent neural

cri
14
15 network construction; thirdly, improved deep recurrent neural network with adaptive learning rate
16
strategy; fourthly, the main diagnosis process of the proposed method.
17
18
19
σ
20 Gate activation function Weighted connection

us
21 g Input activation function Recurrent connection
22 h Output activation function Peephole connection
23 X Multiplication cell Memory cell
24
25
26 ht
27
28
29 ht
an X
σ
h
30
31 ht-1
32
dM
33 σ ft X cell
34 xt X σ
35
36 g
37
38
39
40 Fig.2. The structure of LSTM
41
3.1 The bearing data set design
42
43
pte

Rolling bearings usually work in poor environment; it is inevitable that the raw sensor data is mixed
44
45 with noise. For this reason, the RNN model may be not robust if the raw vibration signals are directly
46 inputted. In addition, it’s believed that the length of each signal inputted into the classifier can affect
47
the accuracy when the raw vibration signals are directly applied for machinery fault diagnosis. If the
48
49 length of the signals is small, which may means the signals contain little valuable information, the fault
50 diagnosis accuracy may be poor. Thus, in order to contain enough information and get good results, a
51
ce

52 large length of the signals is required. However, it is strike that too large size of the signals will lead to
53 complex architecture of the network and plenty of computing time. Consider the factors mentioned
54
55 above, the frequency domain signals are adopted to design the input data set for rolling bearing fault
56 diagnosis in this paper. The reasons for using the frequency domain signals are summarized as follows:
Ac

57
58 Firstly, frequency domain signals can be obtained easily from the measured signals using FFT without
59 manual feature extraction and feature selection. Secondly, the frequency spectrum can provide more
60
AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1 Page 6 of 23

1
2
3 abundant information about the health conditions of rolling bearings [6]. Thirdly, it is enough to select
4
5 the first half for designing the input data set because the frequency spectrum is symmetrical. That is the
6 length of each vibration signal selected to acquire the frequency spectrum sequence can be larger,
7
which can guarantee that the selected signals contain enough valuable information and that the input
8

pt
9 size is not too large simultaneously.
10
11
12 3.2 Construction of Deep Recurrent Neural Network
13

cri
14 It is the deep architecture that makes the deep learning based methods has the ability to
15 automatically learn valuable features from the input data. In this paper, we constructed a deep recurrent
16
17 neural network by stacks of the recurrent hidden layers to get rid of the dependence on manual feature
18 extraction and feature selection. At the same time, the LSTM units are employed to replace all the
19
20 hidden units to solve the problem of gradient vanishing or exploding.

us
21 Besides, Mean square error (MSE) function is adopted to evaluate the performance of the network,
22
23 which is calculated as Eq. (9).
24 1 S
25 ( ,X ,R)
E (y i - yˆ i)
2
(9)
2S i =1
26
27
28
29
Where S is the total number of training samples,
an
training sample set, and x i is the ith training sample, R
is the parameter set, X

{yˆ i | i 1, 2,
{xi | i 1, 2, , S} is

, S} is the corresponding

30 label set, yˆ i is the label of the ith training sample, y i is the actual output of the network for the ith
31
32 training sample. BPTT is used to train the DRNN, and stochastic gradient descent (SGD) is applied for
dM
33 updating the weight matrices and bias vectors. The update procedure can be described as Eq. (10) and
34
35 Eq. (11).
36 E ( ,x,yˆ )
vm vm (1 ) (10)
37 1 m

38 m m 1 vm (11)
39
40
Where m is the learning rate, and is the momentum, which are introduced to ensure a
41
convergence during updating the parameters, E ( ,x,yˆ ) is the calculated gradient, v is the
42 m

43
pte

update values.
44
45 3.3 Improved deep recurrent neural network with adaptive learning rate
46
47 This paper constructs deep recurrent neural network for rolling bearing fault diagnosis. As an
48
important parameter in the training process, learning rate can have great influences on the performance
49
50 of DRNN. The learning process will be extremely slow if the learning rate is too small, while the
51
ce

learning process may not convergence and even fail if the learning rate is too large. In addition, once
52
53 the learning rate is fixed, it won’t be changed any more in the whole training process. In this case, there
54 are two main shortcomings: On the one hand, plenty of trials have to be carried out constantly to find a
55
56 better value of the learning rate, which is considered to be a challenging and time-consuming task. On
Ac

57 the other hand, the learning rate set for one case may not fit for another i.e. once the input data or the
58
structure of the network is changed, the learning rate may have to be reset through lots of trials. Thus it
59
60 is necessary to make the learning rate adaptive to improve the performance of DRNN.
Page 7 of 23 AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1

1
2
3 Therefore, this paper employs the adaptive learning rate for training the constructed DRNN to
4
5 accelerate the learning process and reduce the time of tuning the parameters, and we call the DRNN
6 with adaptive learning rate improved deep recurrent neural network. The adaptive strategy of the
7
learning rate mainly contains three cases as follows: Firstly, if the error decreases, then we will increase
8

pt
9 the learning rate cautiously, but it should be emphasized that the more the error decreases the less the
10 learning rate increases and the less the error decreases the more the learning rate increases. Secondly, if
11
12 the error increases, then we will reduce the learning rate rapidly, and the decrement of the learning rate
13 is in proportion to the error change rate. Thirdly, whether the error increases or decreases, we only

cri
14
15 decrease the learning rate when the error is smaller than the threshold. The above process can be
16 described as Eq. (12), Eq. (13) and Eq. (14).
17
18  Error (m)
α  , if Error (m)  Error ( m -1), α  (0,1)
19  Error (m) - Error (m -1)
20  Error (m) - Error (m -1)

us
21 Δηm   β  , if Error (m)  Error (m -1), β  (1,100) (12)
22  Error (m  1)
23  Error (m) - Error (m -1)
 , if Error (m)  ε, ε  (0, 0.4)
24  Error (m  1)
25
26 if Δηm  1 , then Δηm  ω, ω  (0.9,1) (13)
27
28
29
an
ηm  1  Δηm   ηm1 (14)

30 Where Error(m) is the training error at the mth epoch, α and β represent the increase factor and
31 the decrease factor, respectively, Δηm is the change rate of the learning rate, ε is the threshold of the
32
dM
33 error, ω is the limitation of the Δηm to ensure the learning rate always above 0.
34
35
36 3.4 The general fault diagnosis procedure of the proposed method
37
The input data designed with the frequency spectrum can reduce the dimension of input samples.
38
39 The DRNN can automatically learn valuable features from input signals. The adaptive learning rate can
40 improve the training performance of DRNN. Thus, this paper proposes an improved DRNN for rolling
41
42 bearing fault diagnosis. Fig. 3 exhibits the flowchart of the proposed method, and the general diagnosis
43
pte

procedure is summarized as follows:


44
45  Step 1: Collect the raw vibration signals of rolling bearings by the accelerometer sensor.
46  Step 2: Obtain the frequency spectrum sequences under each health conditions.
47
 Step 3: Organize these spectra into training samples and testing samples without manual feature
48
49 extraction and feature selection.
50  Step 4: Construct a deep recurrent neural network by the stacks of recurrent hidden layers
51
ce

52 to automatically capture the features directly from the input data.


53  Step 5: Apply the adaptive learning rate for training the constructed DRNN.
54
55  Step 6: Employ the improved DRNN for rolling bearing fault diagnosis with testing samples.
56
Ac

57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1 Page 8 of 23

1
2
3
Data acquisition and samples design
4
5
6 Rolling bearing Vibration data acquisition Samples design using spectum sequences

7
8

pt
9
10 Vibration
11 signals
12
13

cri
14
Frequency
15 spectra
16 Sample 1 Sample i Sample N
17
18
19
20

us
21 Improved deep recurrent neural network construction
22
23 Deep architecture construction Adaptive learning rate
24
25
26
27
28
29
an
30
31
32
dM
33
34
35
36
37
38 Diagnosis results
39
40
41
42
43
pte

44
45
46
47
48
49
Fig.3. Flowchart of the proposed method
50
51
ce

4. Rolling bearing fault diagnosis based on the proposed method


52
53 4.1 Description of experimental bearing data
54
55 In this paper, experimental bearing data from Case Western Reserve University (CWRU) was used to
56
evaluate the capability of the proposed method [41]. The test rig of rolling bearing are exhibited in Fig.
Ac

57
58 4, which mainly consists of a 2 hp motor (left), a torque transducer/encoder (center) and a
59
dynamometer (right). The test drive end bearing supports the motor shaft. The raw vibration signals of
60
Page 9 of 23 AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1

1
2
3 different health conditions were measured by accelerometers under 1750 rpm and 1797 rpm,
4
5 respectively, and the sampling rate is 12 kHz.
6
7
8

pt
9
10
11
12
13

cri
14
15 Fig.4. The test rig of rolling bearing
16
17
Table 1
18 Description of bearing operation conditions under 1750 rpm or 1797 rpm
19
20 Outer race fault Number of training / testing

us
21 Health condition Fault diameter (in.) Label
orientation samples
22
23 Normal 0 --- 120 / 80 1
24 Ball faults 0.007 --- 120 / 80 2
25
26 0.014 --- 120 / 80 3
27
28
29
0.021
0.028
an
---
---
120 / 80
120 / 80
4
5

30 Inner race faults 0.007 --- 120 / 80 6


31 0.014 --- 120 / 80 7
32 0.021 120 / 80 8
dM
---
33
34 0.028 --- 120 / 80 9
35 Outer race faults 0.007 Center@6:00 120 / 80 10
36 0.014 Center@6:00 120 / 80 11
37
38 0.021 Center@6:00 120 / 80 12
39
40 Table 2
41 Description of the bearing data sets
42
43 Motor speed
pte

Data set Sample description Dimension


44 (rpm)
45
The frequency spectrum sequences(without
46 C1 / C4 300
47 manual feature extraction and feature selection)
48 14 features extracted from eight frequency-band
49 C2 / C5 1750 /1797 112 (14 * 8)
signals
50
51 5 most sensitive features selected from eight
ce

C3 / C6 40 (5 * 8)
52 frequency-band signals
53
54
55 In this case study, three type of single point faults i.e. the ball (B) faults, the inner race (IR) faults
56 and the outer race (OR) faults are created with different fault diameters described as Table 1, in which
Ac

57
58 the outer raceway faults are located at 6 o’clock (orthogonal to the load zone) of the test bearing. The
59 fault diameters of the ball and the inner race are 0.007 in., 0.014 in., 0.021 in., and 0.028 in. (1 in. =
60
AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1 Page 10 of 23

1
2
3 25.4 mm), respectively. The fault diameters of the outer race are 0.007 in., 0.014 in., 0.021 in.,
4
5 respectively. Therefore, totally 12 operation conditions are studied, which comprise 1 normal condition
6 and 11 fault conditions. Each condition contains 200 raw samples, and each sample contains 600
7
sampling data points of the collected vibration signal. The first 120 raw samples are used for training
8

pt
9 and the rest are used for testing.
10 As is shown in Table 2, six data sets are designed for experiments, in which data sets C1, C2, C3 are
11
12 obtained under 1750 rpm and data sets C4, C5, C6 are obtained under 1797 rpm. Specifically, C1 and
13 C4 are designed for the proposed method without manual feature extraction and feature selection. The

cri
14
15 construction procedure of C1 and C4 is same and can be described as follows: Firstly, the raw samples
16 of 12 conditions are used to acquire the corresponding frequency spectrum sequences by FFT. Then,
17
18 the first half of each spectrum sequence is selected to construct a new sample of C1 or C4. Since each
19 raw sample comprises 600 data points, each sample of C1 or C4 comprises 300 data points, which will
20
greatly reduce the dimension of the samples and the time of computing.

us
21
22 Data sets C2 and C5 are constructed by manual feature extraction, and the construction procedure of
23 C2 and C5 is also same. Firstly, the raw vibration signals of each condition are processed with
24
25 three-layer wavelet packet transform (db5). Then 14 feature parameters are extracted manually from 8
26 decomposed frequency-band signals to construct the original feature set. These feature parameters are
27
28
29
an
variance, standard deviation, root mean square, mean, maximum, peak-to-peak value, median, mean
absolute value, square root amplitude value, the wavelet packet energy, kurtosis, skewness, shape factor,
30
crest factor, respectively. Therefore, each sample of C2 or C5 contains 112 (8*14) parameters. Data sets
31
32 C3 and C6 are acquired by manually selecting five most sensitive features (standard deviation, root
dM
33 mean square, mean absolute value, square root amplitude value and the wavelet packet energy) from
34
35 the original feature set C2 and C5, respectively, so each sample of C3 or C6 contains 40 (8*5)
36 parameters.
37
38 Based on the sequential spectral data set C1 without manual feature extraction and feature selection,
39 the original feature set C2 with manual extraction and the feature set C3 with further manual selection,
40
41 this paper designs the following two experiments.
42
43
pte

44  Experiment 1: The sequential spectral set C1 is fed into the proposed method for the fault
45 diagnosis of rolling bearings. For comparison, the data set C1, the original feature set C2 and
46 feature set C3 acquired by further feature selection are inputted into BPNN, SVM, LSSVM.
47
48
49  Experiment 2: Same as experiment 1, C1 is still inputted into the proposed method. For
50 comparison, C1 is also fed into the standard DRNN, which is trained with a fixed learning
51
ce

rate, as well as the standard DBN, CNN, and SAE. ( Considering that the input of CNN is
52
53 2-D, the input size of CNN is selected as 400 (20*20) i.e. each input sample contains 400
54 data points, and the number of training samples and testing samples of each condition are 90
55 and 60, respectively. )
56
Ac

57
58 It is necessary to point out that both Experiment 1 and Experiment 2 have their own focuses.
59 Experiment 1 is mainly designed to verify the effectiveness of the proposed method and its superiority
60
Page 11 of 23 AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1

1
2
3 comparing with BPNN, SVM, and LSSVM, when applied for rolling bearing fault diagnosis.
4
5 Experiment 2 is mainly designed to verify the adopted adaptive learning rate strategy and analyze the
6 performance of different deep learning methods.
7
In order to further verify the capacity of the proposed method, two comparative experiments are
8

pt
9 carried out using the sequential spectral data set C4 without manual feature extraction and feature
10 selection, the original feature set C5 with manual extraction and the feature set C6 with further manual
11
12 selection, which can be described as follows:
13
 Experiment 3: The data set C4 is fed into the proposed method. The data set C4, the original

cri
14
15 feature set C5 and feature set C6 acquired by further feature selection are inputted into BPNN,
16 SVM, LSSVM for comparison.
17
18
19  Experiment 4: Same as Experiment 3, C4 is also fed into the proposed method for the fault
20 diagnosis of rolling bearing. As a comparison, C4 is also input into standard DRNN, which is

us
21 trained with a fixed learning rate, as well as the standard DBN, CNN, and SAE. (The input size of
22
CNN and the number of training samples and testing samples of each condition are same as
23
24 Experiment 2.)
25
26
27
28
29
4.2 Experimental results and analysis
an
In this section, 12 health conditions have to be classified, including not only fault categories but also
30 fault severities. Here Experiment 1 and Experiment 2 are repeated 20 times, respectively. The main
31 parameters of the proposed method are discussed and given as follows. The number of input units is
32
dM
33 corresponding to the number of each sample, and the number of output units is equal to the number of
34 the health conditions. However, there is no mature theoretical support for selecting the number of
35
36 hidden layers and the number of hidden units in the deep learning models [24]. Thus, these parameters
37 in this paper are given by experiments. After a number of trials, the proposed method is constructed
38
39 with a structure as 300-40-40-12. Besides, the other parameters i.e. the initial learning rate, α , β , ε ,
40 ω , momentum and iteration number are 0.9, 0.00001, 10, 0.125, 0.95, 0.9 and 1100, respectively.
41
42 The related parameters of the other methods in Experiment 1 and Experiment 2 are given as
43
pte

follows. Experiment 1: (1) The main parameters of the proposed method are given above. (2) BPNN
44
45 + C1: The structure is selected as 300-450-12, and the corresponding learning rate, momentum and
46 iteration number are 0.05, 0.5 and 1100, respectively.(3) BPNN + C2: The structure is selected as
47
48 112-240-12, and the learning rate, momentum and iteration number are 0.9, 0.9 and 1100, respectively.
49 (4) BPNN + C3: The structure is selected as 40-240-12, and the learning rate, momentum and iteration
50
51 number are 0.9, 0.9 and 1100, respectively. (5) SVM + C1:The RBF kernel is applied. Penalty factor is
ce

52 32, and kernel radius is 5.7. (6) SVM + C2:The RBF kernel is used. Penalty factor is 42, and kernel
53
54 radius is 0.0039. (7) SVM + C3:The RBF kernel is employed. Penalty factor is 32, and kernel radius is
55
0.0078. (8) LSSVM + C1:The RBF kernel is applied. Penalty factor is 26.55, and kernel radius is 0.35.
56
Ac

57 (9) LSSVM + C2:The RBF kernel is used. Penalty factor and kernel radius are 48.42 and124.24,
58
59 respectively. (10) LSSVM + C3:The RBF kernel is applied. Penalty factor and kernel radius are 38.55
60 and 78.24, respectively. The two main parameters of SVM and LSSVM are acquired by 10-fold cross
AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1 Page 12 of 23

1
2
3 validation.
4
5 100
6
7
Accuracy (%)
8 The proposed method
Classification Accuracy ( % )

pt
9 90 BP + C1
10 BP + C2
11 BP + C3
12 SVM + C1
Classification

13 80
SVM + C2

cri
14 SVM + C3
15 LSSVM + C1
16 LSSVM + C2
70
17 LSSVM + C3
18
19
20 100

us
60
21 1
Training 2
Testing
22
23 Fig.5. General classification accuracy comparison of different methods in Experiments 1
24 90
25 Table 3
26 The diagnosis results comparison of different methods in Experiments 1
27
28
29
Methods Average training 80
accuracy
an Average testing
accuracy
Standard deviation of
testing accuracy
30 The proposed method 97.32 % (1401 / 1440) 94.75 % (910 / 960) 1.58 %
31 BPNN + C1 72.14 % (1039 / 1440) 71.13 % (682 / 960) 5.62 %
32 70
BPNN + C2 95.25 % (1372 / 1440) 90.63 % (870 / 960) 3.06 %
dM
33 BPNN + C3 94.13 % (1355 / 1440) 95.00 % (912 / 960) 2.64 %
34 SVM + C1 100 % (1440 / 1440) 83.75 % (804 / 960) 2.16 %
35 SVM + C2 99.44 % (1432 / 1440) 87.50 % (840 / 960) 2.39 %
36 60
SVM + C3 99.24 % (1429 / 1440) 94.17 % (904
1 / 960) 1.76
2 %
37 LSSVM + C1 100 % (1440 / 1440) 84.27 % (809 / 960) 2.31 %
38 LSSVM + C2 99.17 % (1428 / 1440) 88.33 % (848 / 960) 2.51 %
39 LSSVM + C3 94.13 % (1421 / 1440) 93.85 % (901 / 960) 1.87 %
40
41
Experiment 2: (1) The main parameters of the proposed method are same as Experiment 1. (2)
42
43 Standard DRNN: The structure is selected as 300-40-40-12, the learning rate, momentum and iteration
pte

44 number are 0.4, 0.5 and 1100, respectively. The parameters are selected by experience and experiments
45
46 similar to reference [24]. (3) Standard DBN: The structure is selected as 300-150-100-12; the learning
47 rate, momentum, and iteration number of the each restricted Boltzmann machine are 0.15, 0.75, and
48
49 300, respectively. (4) Standard CNN: The size of the input feature map is 400 (20*20), the first
50 convolutional layer contains 6 kernels, the second convolutional layer contains 12 kernels, and the
51
ce

52 scales of the first and second pooling layers are both set to 2. The learning rate is 0.2 and iteration
53 number is 600. (5) Standard SAE: The architecture is 300-150-75-12, the learning rate, momentum, and
54
iteration number of each auto encoder are 0.45, 0.9, 80, respectively.
55
56 In Experiment 1, the proposed method is applied for classifying 12 health conditions under the
Ac

57 sequential data set C1 (without manual feature extraction and feature selection). Fig. 5 and Table 3
58
59 show that the average training accuracy and testing accuracy of the proposed method are 97.32% and
60 94.69% respectively, while the average training accuracy and testing accuracy of BPNN under C1 are
Page 13 of 23 AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1

1
2
3 only 72.14% and 71.19%, respectively. Under data set C2 (with manual feature extraction), the average
4
5 training accuracy and testing accuracy of BPNN are 95.27% and 90.62%, respectively. The average
6 training and testing accuracy of BPNN under C3 (with manual feature selection) are 94.15% and
7
95.01% respectively. Although the training accuracies of SVM and LSSVM are very high, the testing
8

pt
9 accuracies, which are more significant, are lower than the proposed method. These results suggest that
10 the proposed method have very high diagnosis accuracy for rolling bearing fault diagnosis, and the
11
12 performance of those traditional intelligent methods are dependent on manual feature extraction and
13 feature selection to a large extent.

cri
14
15
The proposed method BP + C1 BP + C2 BP + C3
16 100 100
The proposed method BP + C1 BP + C2 BP + C3
17 100 100
18 95
Accuracy (%)

95
95 95
19
20 90 90 90 90

us
21 85 85
85 85
22 80 80

23 80 75 80 75
Classification

24 70 70
25 75 65
75 65
26
27
28
29
70

65
60

55

50
70

65
an 60

55

50
2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20

30 60 (b)
60 (a)
31
32 55 SVM + C1 55
SVM + C2 SVM + C3 LSSVM + C1 LSSVM + C2 LSSVM + C3
dM
33 100 100
34 50 100
50
2 4 6
100
8 10 12 14 16 18 20
95 2 4 6 8 10 12 14 16 18 95 20
35
Accuracy (%)

95 95

36 90 90 90
90
37 85 85
38 85 85
80 80
39
40 80 75
80 75
Classification

41 70 70

42 75 65 75 65

43
pte

60 60
70 70
44 55 55
45 65 65 50
50
46 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
(c) (d)
47 60 60
48 Trial number
49 55
Fig.6. Testing accuracies of 20 trials in Experiment 551: (a) the proposed method, (b) BPNN, (c) SVM,
50 (d) LSSVM
50
51 50
ce

2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
52 Fig. 6 shows the average testing accuracies of different methods in 20 trials, and the corresponding
53
standard deviations of diagnosis results are calculated as Table 3. The standard deviation of the
54
55 proposed method is 1.58%, which is the smallest among all the methods.
56
Ac

57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1 Page 14 of 23

1
2
3
4 The proposedThe proposed method
method BP + C1 BP + C1 BP + C2
BP + C2 BP +BP
C3+ C3 SVM+ +C1
SVM C1 SVM
SVM+ +C2C2 SVMSVM
+ C3 + C3 LSSVMLSSVM
+ C1 + C1LSSVM + C2
LSSVM +LSSVM
C2 + C3 LSSVM + C3

5 100 100
6
7 90
8 90

pt
Accuracy ( % ) (%)

9
10 80
ClassificationAccuracy

80
11
Classification Accuracy ( % )

12 70

13 70

cri
14
Classification

60
15 60
16
50
17
18 50

19 40

20

us
40
21 30
1 2 3 4 5 6 7 8 9 10 11 12
22
23 30
1 2 3 4
Label
5
of condition
6 7 8 9 10 11 12
24 Fig.7. Training accuracy comparison of each condition in Experiments 1
25
26
27
28
29
100
The proposed method
The proposed method

100
BP + C1
BP + C1
BP + C2
BP + C2
BP + C3
BP + C3
an
SVM + C1
SVM + C1
SVM + C2
SVM + C2
SVM + C3
SVM + C3
LSSVM + C1 LSSVM + C2
LSSVM + C1
LSSVM + C3
LSSVM + C2 LSSVM + C3

30 90 90
31
32
Accuracy ( % ) (%)

dM
33 80 80
Classification Accuracy ( % )
ClassificationAccuracy

34
35 70 70
36
37
Classification

60
38 60

39
50
40 50
41
42 40

43 40
pte

44 30
1 2 3 4 5 6 7 8 9 10 11 12
45 30
1 2 3 4 5Label 6of condition
7 8 9 10 11 12
46
47 Fig.8. Testing accuracy of each condition in Experiments 1
48
49
Fig. 7 and Fig. 8 give more detailed information of Experiment 1. From Fig. 7 we can find that the
50
51 training accuracies of most conditions based on the proposed method are more than 95% except that
ce

52
the training accuracy of the third-class condition is 90.88%. Exhibited as Fig. 8 the testing accuracies
53
54 of each condition based on the proposed method are most about 94%, except that the accuracy of the
55 third-class condition is 82.44%. However, the applied traditional methods perform not very well under
56
some conditions. For instance, although the average training and testing accuracy of BPNN under the
Ac

57
58 data set C3 (with manual feature selection) is close to the proposed method, BPNN only classifies
59
60 accurately 35% training samples of the third-class condition and less than 75% testing samples of the
Page 15 of 23 AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1

1
2
3 third-class and fourth-class conditions. Under the data set C3, SVM and LSSVM only recognize less
4
5 than 70% testing samples of the ninth-class condition. These results suggest that the proposed method
6 can more effectively and more stably recognize not only the fault categories but also fault severities.
7
8

pt
9 Table 4
10 The diagnosis results comparison of the five deep learning methods in Experiments 2
11
Methods Average training Average testing Standard deviation of
12
accuracy accuracy testing accuracy
13
The proposed method 97.32 % (1401 / 1440) 94.75 % (910 / 960) 1.58 %

cri
14
Standard DRNN + C1 91.64 % (1320 / 1440) 89.50 % (859 / 960) 2.74 %
15
Standard DBN + C1 88.36 % (1272 / 1440) 88.02 % (845 / 960) 3.39 %
16
Standard CNN + C1 91.85 % (992 / 1080) 90.69 % (653 / 720) 2.15 %
17
Standard SAE + C1 90.07 % proposed
The (1297 / 1440)
method90.00 % (864 / 960)
Standard DRNN 2.73 %
18
19
20 The proposed method Standard DRNN

us
21 0.9 0.9
22
0.8
23
0.8
24 0.7
25
26 0.6
Training error

0.7
27
28
29
0.6
0.5

0.4
an
30
0.3
31 0.5
32 Early stopping
0.2
dM
33
0.4 0.1
34
35 0
36 0.3 0 200 400 600 800 1000 1200
37 Epoch
38 0.2 Fig.9. The training error (MSE) curves of two types of DRNNs learning algorithms for the 11th trial in
39 Experiment 2
40
41 0.1 1
42
43
pte

0
44 0 100 200 0.8
300 400 500 600 700 800 900 1000 1100
45
Learning rate

46
0.6
47
48
49 0.4
50
51
ce

52 0.2
53
54
55 0
0 200 400 600 800 1000 1200
56 Epoch
Ac

57
Fig.10. The adaptive updating curve of learning rate of the proposed method in Experiment 2
58
59
60 In Experiment 2, data set C1 is fed into the proposed method and typical deep learning methods
AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1 Page 16 of 23

1
2
3 including the standard DRNN, DBN, CNN and SAE, respectively. From Table 4, we can find that the
4
5 average training accuracy and testing accuracy of the proposed method are 97.32% and 94.69%, and
6 these are higher than the other deep learning methods, which are 91.64% and 89.53% for standard
7
8 DRNN , 88.36 % and 88.02 % for standard DBN,91.85 % and 90.69 % for standard CNN,90.07 %

pt
9 and 90.00 % for standard SAE,respectively. Table 4 also shows that the standard deviation of testing
10
11 accuracy of the proposed method is 1.58%, which is the smallest among these deep learning methods.
12 These suggest that applying DRNN for rolling bearing fault diagnosis is effective. It is interesting and
13
meaningful to study various new deep learning methods and try to utilize them for fault diagnosis.

cri
14
15
16
Table 5
17
Average classification accuracy comparison of the five methods in Experiments 3
18
19 Methods Average training Average testing Standard deviation of
20 accuracy accuracy testing accuracy

us
21 The proposed method 98.67 % (1421 / 1440) 96.53 % (772 / 960) 1.71 %
22 BPNN + C4 74.62 % (1075 / 1440) 73.12 % (702 / 960) 4.88 %
23 BPNN + C5 88.69 % (1277 / 1440) 86.12 % (827 / 960) 3.67 %
24 BPNN + C6 90.25 % (1300 / 1440) 90.38 % (868 / 960) 4.09 %
25 SVM + C4 100 % (1440 / 1440) 78.65 % (755 / 960) 2.13 %
26 SVM + C5 99.51 % (1433 / 1440) 86.24 % (828 / 960) 2.39 %
27
28
29
SVM + C6
LSSVM + C4
LSSVM + C5
98.05 % (1412 / 1440)
100 % (1440 / 1440)
98.47 % (1418 / 1440)
an 91.87 % (882 / 960)
78.33 % (752 / 960)
87.50 % (840 / 960)
1.86 %
2.22 %
2.22 %
30 LSSVM + C6 98.47 % (1418 / 1440) 92.50 % (888 / 960) 1.99 %
31
32
dM
33 100
34
35
36 The proposed method
Accuracy( (%)
%)

90
37 BP + C4
38
Classification Accuracy

BP + C5
39 BP + C6
40 SVM + C4
80
41
Classification

SVM + C5
42 SVM + C6
43
pte

LSSVM + C4
44 70 LSSVM + C5
45
LSSVM + C6
46
47
48 60
1 2
49 Training Testing
50 100
51 Fig.11. The diagnosis results comparison of different methods in Experiments 3
ce

52 90
53 Fig. 9 gives the training error curves of the proposed method and the standard DRNN for the 11th
54
55 trial. Fig.10. shows the adaptive updating curve80
of learning rate of the proposed method . Obviously,
56 the training error (MSE) curve of the proposed method converges much faster with less oscillation than
Ac

57 70
58 the standard DRNN, and the final error is much smaller as well. After a lot of trials, we also find that
59 when the training error of the standard DRNN reduces
60 to about 0.12, it jumps to high value shown as
60
50

40

30
Page 17 of 23 AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1

1
2
3 Fig. 9, which will result in the failure of training. For this purpose, we have to early stop the training
4
5 process when the error reduces to 0.12 before the 1100th iteration. The results firmly confirm that the
6 adopted adaptive learning rate strategy is effective to improve both the learning process and the
7
diagnosis results of DRNN.
8

pt
9
10
11 The proposed method BP + C4 BP + C5 BP + C6
100 100
12 100 100

13 95
Classification Accuracy (%)

95 95

cri
95
14
15 90
90 90

16 85
90
85
17 85
18 80
85 80
80
19 75 75
20 80

us
75
21
70 70

22 70 65 75 65
23
24
60 60
65 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
70
25 (a) (b)
60
26
27
28
29
55
100 100

95
SVM + C4 SVM + C5 65SVM + C6

60
an 100
100

95
LSSVM + C4 LSSVM + C5 LSSVM + C6
Classification Accuracy (%)

50 95 2 95
4 6 8 10 12 14 16 18 20
30 2
90
4 6 8 10 12 14 16 18
90
20
31 90
9085
32 85
dM
33 80 80
85
34 85
35 75 75

80
36 70 8070
37
38 75 65 65
75
39 60 60
40 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
70
(c) (d)
41 70
Trial number
42 65
43 Fig.12. Testing accuracies of 20 trials in Experiments 3: (a) The proposed method, (b) BPNN, (c)
pte

65
44 SVM, (d) LSSVM
60
45 2 4 6 8 10 12 14 16 18 20
60
46 Experiment 3 & 4 are conducted to further verify the superiority
2 4 6 of the
8 proposed
10 12 method,
14 16 which
18 are
20
47
repeated 20 times respectively. It is noted that all the parameters of the applied methods in Experiment
48
49 3 & 4 are absolutely same as that in the Experiment 1 & 2. All the results are analyzed as follows.
50 In Experiment 3, the proposed method is used for recognizing 12 health conditions under the
51
ce

52 sequential data set C4 (without manual feature extraction and feature selection). Fig. 11 and Table 5
53 show that the average training accuracy of the proposed method is 98.67%, and it is higher than the
54
55 BPNN based methods, which are 90.14%, 74.65%, 88.70%, 90.25%, respectively. The average testing
56 accuracy of the proposed method is 96.53%, and it is the highest compared with all the traditional
Ac

57
58 intelligent methods, which are 73.12 % , 86.12 % , 90.38 %, 78.65 % , 86.24 % , 91.87 % , 78.33 % ,
59 87.50 %, 92.50 % , respectively. The standard deviation of the proposed method is 1.71%, which is
60
AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1 Page 18 of 23

1
2
3 smaller than the other methods in Experiment 3. Fig.12. gives the testing accuracies of different
4
5 methods in 20 trials.These results further confirm that the proposed method have higher diagnosis
6 accuracies for rolling bearing fault diagnosis, and the performance of traditional methods largely
7
depend on manual feature extraction and feature selection.
8

pt
9 The proposed method
The proposed method BP + C4
BP + C1 BP + C2
BP + C5 BP + C6
BP + C3
SVM + C4
SVM + C1
SVM + C5
SVM + C2
SVM + C6
SVM + C3
LSSVM + C4
LSSVM + C1
LSSVM + C5 LSSVM + C6
LSSVM + C2 LSSVM + C3
10 100
100
11
12 90
13 90

cri
14
Accuracy ( % ) (%)

15 80 80
Classification Accuracy ( % )
Classification Accuracy

16
17 70 70
18
19
Classification

20 60 60

us
21
22 50 50
23
24 40
25 40

26
27
28
29
30
30 1
1
2
2
3
3
4
4
5
5
an 6
6
Label of condition
7
7 8
8
9
9
10
10
11
11
12
12

30 Fig.13. Training accuracy comparison of each condition in Experiment 3


31
32
dM
The proposed method BP + C4 BP + C5 BP + C6 SVM + C4 SVM + C5 SVM + C6 LSSVM + C4 LSSVM + C5 LSSVM + C6
The proposed method BP + C4 BP + C5 BP + C6 SVM + C4 SVM + C5 SVM + C6 LSSVM + C4 LSSVM + C5 LSSVM + C6
33 100 100
34
35
90
36 90
(%)

37
Accuracy

38 80
80
(%)

39
Classification Accuracy ( % )
Classification Accuracy

40 70
41 70
Classification

42 60
43
pte

60
44
45 50

46 50

47 40

48 40
49 30
50 1 2 3 4 5 6 7 8 9 10 11 12

51
30 Label of condition
ce

1 2 3 4 5 6 7 8 9 10 11 12

52 Fig.14. Testing accuracy comparison of each condition in Experiments 3


53
54 From Fig. 13 and Fig. 14, we can find that the training samples and testing samples of each
55
condition are all classified accurately over 96% and over 91% by the proposed method, respectively.
56
Ac

57 However, only 61.69% testing samples of the third-class condition and 55.94% testing samples of the
58
fourth-class condition under data set C5 (with manual feature extraction) are classified accurately by
59
60 BPNN, and only 53.69% testing samples of the third-class condition under data set C6 (with manual
Page 19 of 23 AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1

1
2
3 feature selection) are classified accurately by BPNN. Only about 70% testing samples of the
4
5 twelfth-class condition under data set C6 (with manual feature selection) are classified accurately by
6 SVM and LSSVM. These results further confirm that the proposed method can more effectively and
7
more stably recognize not only the fault categories but also fault severities.
8

pt
9
10 Table 6
11 The diagnosis results comparison of the five deep learning methods in Experiments 4
12
Methods Average training Average testing Standard deviation of
13
accuracy accuracy testing accuracy

cri
14
The proposed method 98.67 % (1421 / 1440) 96.53 % (772 / 960) 1.71 %
15
Standard DRNN + C4 90.13 % (1298 / 1440) 87.88 % (842 / 960) 3.43 %
16
Standard DBN + C4 87.79 % (1264 / 1440) 86.57 % (831 / 960) 3.39 %
17
Standard CNN + C4 90.28 % (975 / 1080) 89.58 % (645 / 720) 2.15 %
18 Standard SAE + C4 89.93 % (1295 / 1440) 89.36 % (858 / 960) 2.73 %
19
The proposed method
20

us
Standard DRNN
21 The proposed method Standard DRNN
22
1 1
23
24 0.9
0.9
25 0.8
26
27 0.8 0.7
an
Training error

28 0.6
29 0.7 0.5
30
0.4
31 0.6
32 0.3
dM
33 0.5 0.2 Early stop
34
0.1
35
0.4
36 0
0 200 400 600 800 1000 1200
37
Epoch
38 0.3
39 Fig.15. The training error curves of two types of DRNNs learning algorithms for the 13th trial in
40 0.2 Experiment 4
41
42 0.1 1
43
pte

44 0
45 0 200 0.8 400 600 800 1000 1200
46
47
Learning rate

48 0.6
49
50
51
ce

0.4
52
53
54 0.2
55
56
Ac

57 0
0 200 400 600 800 1000 1200
58 Epoch
59
Fig.16. The adaptive updating curve of learning rate of the proposed method in Experiment 4
60
AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1 Page 20 of 23

1
2
3 Described as Table 6, the average training and testing accuracy of the proposed method are 98.67%
4
5 and 96.53%, which are both highest compared with other deep learning methods. Meanwhile, standard
6 deviation of the proposed method is smallest among these deep learning methods. This further suggests
7
that the proposed method is a novel and effective deep learning based method for fault diagnosis.
8

pt
9 Fig. 15. shows the training error curves of the proposed method and the standard DRNN for the 13th
10 trial in Experiment 4. Fig.16. gives the adaptive updating curve of learning rate. Obviously, the
11
12 training error curve of the proposed method converges much faster with less oscillation than the
13 standard DRNN, and the final error is much smaller as well. The results further confirm that the

cri
14
15 adopted adaptive learning rate strategy is effective to improve both the learning process and the
16 diagnosis results of DRNN.
17
18 5. Conclusion
19
20 This paper proposes an improved deep recurrent neural network for rolling bearing fault diagnosis,

us
21
22 in which the frequency signals are directly used for designing the input data without manual feature
23 extraction and feature selection, and the adopted adaptive learning rate strategy is employed to improve
24
the training process. Four experiments are carried out to verify the effectiveness of the proposed
25
26 method using the experimental bearing data from Case Western Reserve University. The experimental
27
28
29
an
results confirm that the adopted adaptive learning rate is helpful to improve the learning process and
diagnosis accuracy, and the proposed method is more effective and robust to recognize both the fault
30 categories and fault severities of rolling bearing compared with traditional intelligent methods.
31
32 It must be pointed out that to some extend the proposed method still depends on the signal
dM
33 processing technology to get high and robust results despite without manual feature extraction and
34
feature selection. Besides, this paper doesn’t study how the architecture of the proposed method will
35
36 influence its performance. Two points mentioned above are what we will focus on in the future.
37
38 6. Acknowledgements
39
40 This research is supported by the National Natural Science Foundation of China (No. 51475368).
41
42
43
pte

44
45
46
47 Reference
48
49 [1] H.K. Jiang , C.L. Li, H.X. Li, An improved EEMD with multiwavelet packet for rotating
50
51 machinery multi-fault diagnosis[J]. Mechanical Systems and Signal Processing 36 (2013)
ce

52 225-239.
53 [2] Y.Y. He, J. Huang, B. Zhang. Approximate entropy as a nonlinear feature parameter for fault
54
diagnosis in rotating machinery[J]. Measurement Science andTechnology, 2012,
55
56 23(4):45603-45616(14).
Ac

57 [3] F. Jiang, Z.C. Zhu, W. Li, G.A. Chen,G.B. Zhou. Robust condition monitoring and fault
58
diagnosis of rolling element bearings using improved EEMD and statistical features[J].
59
60 Measurement Science andTechnology, 2014, 25(2):1-14.
Page 21 of 23 AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1

1
2
3 [4] M.H. Zhao, B.P. Tang, Q. Tan. Fault diagnosis of rolling element bearing based on S
4
5 transform and gray level co-occurrence matrix[J]. Measurement Science andTechnology,
6 2015, 26(8).
7 [5] W. Zine, Z. Makni, E. Monmasson, L. Idkhajine1, and B. Condamin . Interests and limits of
8
machine learning-based neural networks for rotor position estimation in EV traction drives[J].

pt
9
10 IEEE Transactions on Industrial Informatics, 2017, PP(99):1-1.
11 [6] F. Jia, Y.G. Lei, J. Lin, X. Zhou, N. Lu. Deep neural networks: A promising tool for fault
12
characteristic mining and intelligent diagnosis of rotating machinery with massive data[J].
13

cri
14 Mechanical Systems and Signal Processing, 2016, 72-73:303-315.
15 [7] Z.Y. Wang, C. Lu, B. Zhou. Fault diagnosis for rotary machinery with selective ensemble
16 neural networks[J]. Mechanical Systems and Signal Processing, 2017.
17
18 [8] B. Muruganatham, M.A. Sanjith, B. Krishnakumar, S.A.V. Satya Murty. Roller element
19 bearing fault diagnosis using singular spectrum analysis[J]. Mechanical Systems and Signal
20 Processing, 2013, 35(1–2):150-166.

us
21
22 [9] Y. Yang, D.J. Yu, J.S. Cheng. A roller bearing fault diagnosis method based on EMD energy
23 entropy and ANN[J]. Journal of Sound and Vibration, 2006, 294(1–2):269-277.
24 [10] Y.G. Lei, Z.Z. He, Y.Y.Zi. EEMD method and WNN for fault diagnosis of locomotive roller
25
bearings[J]. Expert Systems with Applications, 2011, 38(6):7334-7341.
26
27
28
29
an
[11] G.F. Bin, J.J. Gao, X.J. Li, B.S. Dhillon. Early fault diagnosis of rotating machinery based on
wavelet packets—Empirical mode decomposition feature extraction and neural network[J].
Mechanical Systems and Signal Processing, 2012, 27(1):696-711.
30
31 [12] Pacheco F, Cerrada M, Cabrera D, et al. A statistical comparison of neuroclassifiers and
32 feature selection methods for gearbox fault diagnosis under realistic conditions[J].
dM
33 Neurocomputing, 2016, 194(C):192-206.
34
35 [13] F. Jiang, Z.C. Zhu, W. Li, G.A. Chen. Robust condition monitoring and fault diagnosis of
36 rolling element bearings using improved EEMD and statistical features[J]. Measurement
37 Science andTechnology, 2014, 25(2):1-14.
38
[14] Y.B. Li, Y.T. Yang, G.Y. Li, et al. A fault diagnosis scheme for planetary gearboxes using
39
40 modified multi-scale symbolic dynamic entropy and mRMR feature selection[J]. Mechanical
41 Systems & Signal Processing, 2017, 91:295-312.
42 [15] C.Q. Lu, S.P. Wang, V. Makis. Fault severity recognition of aviation piston pump based on
43
pte

44 feature extraction of EEMD paving and optimized support vector regression model[J].
45 Aerospace Science & Technology, 2017, 67:105-117.
46 [16] X.Y. Zhang, Y.T. Liang, J.Z. Zhou, et al. A novel bearing fault diagnosis model integrated
47
48 permutation entropy, ensemble empirical mode decomposition and optimized SVM[J].
49 Measurement, 2015, 69:164-179.
50 [17] H. Keskes, A. Braham, Z. Lachiri. Broken rotor bar diagnosis in induction machines through
51
ce

stationary wavelet packet transform and multiclass wavelet SVM[J]. Electric Power Systems
52
53 Research, 2013, 97(97):151-157.
54 [18] J.L. Chen, Y.Y. Zi, Z.J. He, J. Yuan. Improved spectral kurtosis with adaptive redundant
55 multiwavelet packet and its applications for rotating machinery fault detection[J].
56
Measurement Science and Technology, 2012, 23(4):45608-45622(15).
Ac

57
58 [19] Y.G. Lei, F. Jia, J. Lin, S.B. Xing, S.X. Ding . An Intelligent Fault Diagnosis Method Using
59 Unsupervised Feature Learning Towards Mechanical Big Data[J]. IEEE Transactions on
60
AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1 Page 22 of 23

1
2
3 Industrial Electronics, 2016, 63(5):3137-3147.
4
5 [20] G.E. Hinton, S. Osindero, Y.W. Teh. A Fast Learning Algorithm for Deep Belief Nets[J].
6 Neural Computation, 2006, 18(7):1527.
7 [21] H.Z. Chen, J.X. Wang, B.P. Tang, K. Xiao, J.Y. Li. An integrated approach to planetary
8
gearbox fault diagnosis using deep belief networks[J]. Measurement Science and Technology,

pt
9
10 2017, 28(2):025010.
11 [22] Chandra B, Sharma R K. Fast Learning in Deep Neural Networks[J]. Neurocomputing, 2015,
12
171(C):1205-1215.
13

cri
14 [23] H.D. Shao, H.K. Jiang, X. Zhang, M.G. Niu. Rolling bearing fault diagnosis using an
15 optimization deep belief network[J]. Measurement Science and Technology, 2015, 26(11).
16 [24] H.D. Shao, H.K. Jiang, F.A. Wang, H.W. Zhao. An enhancement deep feature fusion method
17
18 for rotating machinery fault diagnosis [J]. Knowledge-Based Systems, 2016.
19 [25] H.D. Shao, H.K. Jiang, H.W. Zhao, F.A. Wang. A novel deep auto encoder feature learning
20 method for rotating machinery fault diagnosis [J]. Mechanical Systems and Signal Processing,

us
21
22 2017, 95:187-204.
23 [26] H.D. Shao, H.K. Jiang, F.A. Wang, Y.N. Wang. Rolling bearing fault diagnosis using adaptive
24 deep belief network with dual-tree complex wavelet packet.[J]. Isa Transactions,
25
2017:187-201.
26
27
28
29
[27]
an
H.D. Shao, H.K. Jiang, X.Q. Li, S.P. Wu. Intelligent fault diagnosis of rolling bearing using
deep wavelet auto-encoder with extreme learning machine[J]. Knowledge-Based Systems,
2017:1-14.
30
31 [28] H.D. Shao, H.K. Jiang, H.Z. Zhang, T.C. Liang. Electric locomotive bearing fault diagnosis
32 using novel convolutional deep belief network[J]. IEEE Transactions on Industrial
dM
33 Electronics, 2017, PP(99):1-1.
34
35 [29] F.A. Wang, H.K. Jiang, H.D. Shao, W.J. Duan. An adaptive deep convolutional neural network
36 for rolling bearing fault diagnosis[J]. Measurement Science & Technology, 2017, 28(9).
37 [30] F. Jia, Y.G. Lei, L. Guo, J. Lin, S.B. Xing. A neural network constructed by deep learning
38
technique and its application to intelligent fault diagnosis of machines[J]. Neurocomputing,
39
40 2017: 1–10
41 [31] J.T. Yin, W.T. Zhao. Fault diagnosis network design for vehicle on-board equipments of
42 high-speed railway: A deep learning approach[J]. Engineering Applications of Artificial
43
pte

44 Intelligence, 2016, 56:250-259.


45 [32] P. Tamilselvan, P.F. Wang. Failure diagnosis using deep belief learning based health state
46 classification[J]. Reliability Engineering and System Safety, 2013, 115(7):124-135.
47
48 [33] W.D. Mulder, S. Bethard, M.F. Moens. A survey on the application of recurrent neural
49 networks to statistical language modeling[M]. Academic Press Ltd. 2015.
50 [34] Z.C. Lipton, J. Berkowitz, C. Elkan. A Critical Review of Recurrent Neural Networks for
51
ce

Sequence Learning[J]. Computer Science, 2015.


52
53 [35] R. Zhao, R.Q. Yan, J.J Wang, K.Z. Mao. Learning to Monitor Machine Health with
54 Convolutional Bi-Directional LSTM Networks[J]. Sensors, 2017, 17(2):273.
55 [36] I. Sutskever. Training Recurrent Neural Networks[J]. Doctoral, 2013.
56
A. Graves. Supervised Sequence Labelling with Recurrent Neural Networks[M]. Springer
Ac

57 [37]
58 Berlin Heidelberg, 2012.
59 [38] P.J. Werbos. Backpropagation through time: what it does and how to do it[J]. Proceedings of
60
Page 23 of 23 AUTHOR SUBMITTED MANUSCRIPT - MST-106583.R1

1
2
3 the IEEE, 1990, 78(10):1550-1560.
4
5 [39] K. Greff, R.K. Srivastava, J. Koutník, B.R. Steunebrink, J. Schmidhuber. LSTM: A Search
6 Space Odyssey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017,
7 PP(99):1-11.
8
[40] D. Monner, J.A. Reggia. A generalized LSTM-like training algorithm for second-order

pt
9
10 recurrent neural networks[J]. Neural Networks, 2012, 25: 70–83.
11 [41] X.L. Zhang, B.J. Wang, X.F. Chen. Intelligent fault diagnosis of roller bearings with
12
multivariable ensemble-based incremental support vector machine[J]. Knowledge-Based
13

cri
14 Systems, 2015, 89:56-85.
15
16
17
18
19
20

us
21
22
23
24
25
26
27
28
29
an
30
31
32
dM
33
34
35
36
37
38
39
40
41
42
43
pte

44
45
46
47
48
49
50
51
ce

52
53
54
55
56
Ac

57
58
59
60

Potrebbero piacerti anche