Energies 10 01913 v2

energies
Article
Failure Prognosis of High Voltage Circuit Breakers
with Temporal Latent Dirichlet Allocation †
Gaoyang Li ID
, Xiaohua Wang *, Aijun Yang, Mingzhe Rong * and Kang Yang
State Key Laboratory of Electrical Insulation and Power Equipment, School of Electrical Engineering,
Xi’an Jiaotong University, Xi’an 710049, China; ligaoyang@stu.xjtu.edu.cn (G.L.);
yangaijun@mail.xjtu.edu.cn (A.Y.); yk3115160030@stu.xjtu.edu.cn (K.Y.)
* Correspondence: xhw@mail.xjtu.edu.cn (X.W.); mzrong@mail.xjtu.edu.cn (M.R.)
† This paper is an extended version of our paper published in Guo, C., Li, G., Zhang, H., Ju, X., Zhang, Y.,
Wang, X. Defect Distribution Prognosis of High Voltage Circuit Breakers with Enhanced Latent Dirichlet
Allocation. In Proceedings of the International Conference on Prognostics and Health Management
(PHM-Harbin 2017), Harbin, China, 9–12 July 2017.
Received: 15 October 2017; Accepted: 15 November 2017; Published: 20 November 2017
Abstract: The continual accumulation of power grid failure logs provides a valuable but rarely used
source for data mining. Sequential analysis, aiming at exploiting the temporal evolution and exploring
the future trend in power grid failures, is an increasingly promising alternative for predictive scheduling
and decision-making. In this paper, a temporal Latent Dirichlet Allocation (TLDA) framework is
proposed to proactively reduce the cardinality of the event categories and estimate the future failure
distributions by automatically uncovering the hidden patterns. The aim was to model the failure
sequence as a mixture of several failure patterns, each of which was characterized by an infinite mixture
of failures with certain probabilities. This state space dependency was captured by a hierarchical
Bayesian framework. The model was temporally extended by establishing the long-term dependency
with new co-occurrence patterns. Evaluation of the high voltage circuit breakers (HVCBs) demonstrated
that the TLDA model had higher fidelities of 51.13%, 73.86%, and 92.93% in the Top-1, Top-5, and Top-10
failure prediction tasks over the baselines, respectively. In addition to the quantitative results, we showed
that the TLDA can be successfully used for extracting the time-varying failure patterns and capture the
failure association with a cluster coalition method.
Keywords: failure prognosis; Latent Dirichlet Allocation; high voltage circuit breakers
1. Introduction
With the increasing and unprecedented scale and complexity of power grids, component failures
are becoming the norm instead of exceptions [1–3]. High voltage circuit breakers (HVCBs) are directly
linked to the reliability of the electricity supply, and a failure or a small problem with them may lead to
the collapse of a power network through chain reactions. Previous studies have shown that traditional
breakdown maintenance and periodic checks are not effective for handling emergency situations [4].
Therefore, condition-based maintenance (CBM) is proposed as a more efficient maintenance approach
for scheduling action and allocating resources [5–7].
CBM attempts to limit consequences by performing maintenance actions only when evidence is
present of abnormal behaviors of a physical asset. Selection of the monitoring parameters is critical to
its success. Degradation of the HVCB is caused by several types of stress and aging, such as mechanical
maladjustment, switching arcs erosion, and insulation level decline. The existing literature covers a
wide range of specific countermeasures, including mechanism dynamic features [8–10], dynamic contact
resistance [11], partial discharge signal [12,13], decomposition gas [14], vibration [15], and spectroscopic
monitoring [16]. Furthermore, numerous studies applied neural networks [8], support vector machine
Energies 2017, 10, 1913; doi:10.3390/en10111913 www.mdpi.com/journal/energies

Energies 2017, 10, 1913 2 of 20
(SVM) [17], fuzzy logic [18], and other methods [19], to introduce more automation and intelligence into
the signal analysis. However, these efforts were often limited to one specific aspect in their diagnosis of the
failure conditions. In addition, the requirements for dedicated devices and expertise restrict their ability
to be implemented on a larger scale. Outside laboratory settings, field recordings, including execution
traces, failures, and warning messages, offer another easily accessible data source with broad failure
category coverage. The International Council on Large Electric Systems (CIGRE) recognizes the value of
event data and has conducted three world-wide surveys on the reliability data of circuit breakers since
the 1970s [20–22]. Survival analysis, aiming at reliability evaluation and end-of-life assessment, also relies
on the failure records [2,23].
Traditionally, the event log is not considered as an independent component in the CBM framework,
as the statistical methodologies were thought to be useful only for average behavior predictions or
comparative analysis. In contrast, Salfner [24] viewed failure tracking as being of equal importance to
symptom monitoring in online prediction. In other fields, such as transactional data [25], large distributed
computer systems [26], healthcare [27], and educational systems [28], the event that occurs first is identified
as an important predictor of the future dynamics of the system. The classic Apriori-based sequence mining
methods [29], and new developments in nonlinear machine learning [27,30] have had great success in
their respective fields. However, directly applying these predictive algorithms is not appropriate for
HVCB logs for three unique reasons: weak correlation, complexity, and sparsity.
(1) Weak correlation. The underlying hypothesis behind association-based sequence mining,
especially for the rule-based methods, is the strong correlation between events. In contrast,
the dependency of the failures on HVCBs is much weaker and probabilistic.
(2) Complexity. The primary objective of most existing applications is a binary decision: whether a failure
will happen or not. However, accurate life-cycle management requires information about which
failure might occur. The increasing complexity of encoding categories into sequential values can
impose serious challenges on the analysis method design, which is called the “curse of cardinality”.
(3) Sparsity. Despite the cardinality problem, the types of failure occurring on an individual basis is
relatively small. Some events in a single case may have never been observed before, which makes
establishing a statistical significance challenging. The inevitable truncation also aggravates the
sparsity problem to a higher degree.
The attempts to construct semantic features of events, by transforming categorical events into
numerical vectors, provide a fresh perspective for understanding event data [31,32]. Among the latent
space methods, the Latent Dirichlet Allocation (LDA) method [33], which represents each document
as mixtures of topics that ejects each word with certain probabilities, offers a scalable and effective
alternative to standard latent space methods. In our preliminary work, we introduced the LDA
into failure distribution prediction [34]. In this paper, we further extended the LDA model with a
temporal association by establishing new time-attenuation co-occurrence patterns, and developed a
temporal Latent Dirichlet Allocation (TLDA) framework. The techniques were validated against the
data collected in a large regional power grid with regular records over a period of 10 years. The Top-N
recalls and failure pattern visualization were used to assess the effectiveness. To the best of our
knowledge, we are the first to introduce the advanced sequential mining technique into the area of
HVCB log data analysis.
The rest of this paper is organized as follows. The necessary process to transfer raw text data into
chronological sequences is introduced in Section 2. Section 3 provides details of the proposed TLDA
model. The criteria presented in Section 4 are not only for performance evaluation but also show the
potential applications of the framework. Section 5 describes the experimental results in the real failure
histories of the HVCBs. Finally, Section 6 concludes the paper.
Energies 2017, 10, 1913 3 of 20
2. Data Preprocessing
2.1. Description of the HVCB Event Logs

To provide necessary context, the format of the collected data is described below. The HVCBs’ failure
dataset was derived from 149 different types of HVCBs from 219 transformer substations in a regional
power grid in South China. The voltage grades of the HVCBs were 110 kV, 220 kV, and 500 kV, and the
operation time ranged from 1 to 30 years. Most of the logs were retained for 10 years, aligned with the use
of the computer management system. Detailed attributes of each entry are listed in Table 1. In addition to
the device identity information, the failure description, failure reason, and processing measures fields
contain key information about the failure.
Table 1. Attributes of the failure logs.
Attribute Content
ID Numerical order of a failure entry
Voltage grade 110 kV, 220 kV, or 550 kV
Substation Location of the equipment failure, e.g., ShenZhen station
Product model Specified model number, e.g., JWG1-126
Equipment type A board taxonomy, e.g., high-voltage isolator, gas insulated switchgear (GIS)
Failure description Detailed description of the phenomena observed
Failure reason Cause of the failure
Failure time Time when a failure was recorded
Processing measures Performed operation to repair the high voltage circuit breaker (HVCB)
Processing result Performance report after repair
Repair time Time when a failure was removed
Installing time Time when a HVCB was first put into production
Others Including the person responsible, mechanism type, a rough classification, manufacturers, etc.
2.2. Failure Classification

One primary task of pre-processing is to reduce the unavoidable subjectivity and compress the
redundant information. Compared to the automatically generated logs, the failure descriptions of
HVCBs are created manually by administrators containing an enormous amount of text information.
Therefore, the skill of the administrators significantly influences the results. Underreporting or
misreporting the root cause may reduce the credibility of the logs. Only by consolidating multiple
information sources can a convincing failure classification be generated. An illustrating example is
presented in Table 2. The useful information is hidden in the last part and can be classified as an
electromotor failure. Completing this task manually is time-consuming and is highly prone to error.
Automatic text classification has traditionally been a challenging task. Straightforward procedures,
including keyword searches or regular expression, cannot meet the requirements.
Table 2. A typical manual log entry sample.
Failure Description Failure Reason Processing Measures

Circuit breakers connected with the high voltage side
of the main transformer cannot close or open. Bad manufacturing quality Replace the electromotor
The power supply runs faultlessly during inspection
Due to progress in deep learning technology, establishing an end-to-end text categorization

model has become easier. In this study, the Google seq2seq [35] neural network was adopted by
replacing the decoder part with a SVM. The text processing procedure was as follows: (1) An expert
administrator manually annotated a small set of the examples with concatenated texts, including the
failure description, failure reason, and processing measures; (2) After tokenization and conjunction
removal, the labeled texts were used to train a neural network; (3) Another small set of failure texts
were predicted by the neural network. Wrong labels were corrected and added to the training set;
Energies 2017, 10, 1913 4 of 20
(4) Steps (2) and (3) were repeated until the classification accuracy reached 90%; (5) The trained
networkEnergies
was 2017,
used10,to replace manual work. The preferential classification taxonomy was the
1913 accurate
4 of 20
component location that broke the operation. The failure phenomenon was recorded when no failure
component location that broke the operation. The failure phenomenon was recorded when no failure
locationEnergies 2017, 10, 1913 Finally, 36 kinds of failures were extracted from the man-machine4 interaction.
was available.
location was available. Finally, 36 kinds of failures were extracted from the man-machine interaction.
of 20
The numbers
The numbersof different failures werewereranked in descending
descending order and plotted in a log-log axis
component locationofthat
different
brokefailures
the operation. ranked in
The failure phenomenon order and
was plotted
recorded in a log-log
when axis
no failure
shown in Figure
shown
location 1. available.
inwas The1.failure
Figure numbers
TheFinally,
failure satisfy
numbers
36 kinds a long-tail
ofsatisfy distribution
a long-tail
failures distribution
were extracted [36],
[36],
from the making
making
man-machine itinteraction.
hard
it hard to recall the
to recall
failures the failures
withThea lowerwithoccurrence
numbers a of
lower occurrence
different frequency.
frequency.
failures were ranked in descending order and plotted in a log-log axis
shown in Figure 1. The failure numbers satisfy a long-tail distribution [36], making it hard to recall
the failures with a lower occurrence frequency.
Figure 1. Long
Figure tail
1. Long taildistribution ofthe
distribution of thefailure
failure numbers.
numbers.
2.3. Sequence Aligning and Spatial Compression

2.3. Sequence Aligning and Spatial Compression
Figure 1. Long tail distribution of the failure numbers.
The target outputs of the sequence pre-processing are event chains in chronological order. As
The2.3.target
Sequence
mentioned outputs
Aligning
earlier, theof
and the sequence
Spatial
accessibilityCompression pre-processing
to the failure are event
data was limited chains
to the last in chronological
10 years. Therefore, the order.
As mentioned
visible earlier,
sequences the accessibility
were bilaterallyto the failure
truncated, data
creating wasnew limited to the
difficulties
The target outputs of the sequence pre-processing are event chains in chronological order. forlast 10 years.
comparing Therefore,
different
As the
sequences.
visible sequences Instead
were of using
bilaterally the actual
truncated, failure times,
creating the
new times of origin
difficulties of
for the HVCBs
comparing
mentioned earlier, the accessibility to the failure data was limited to the last 10 years. Therefore, the were changed
different sequences.
to using
Instead of theirsequences
visible installation
the actualwere timebilaterally
failuretotimes,
align the
different
timessequences.
truncated, origin ofTo
ofcreating mitigate
the
new HVCBs the
were
difficulties sparsity
forchangedproblem,
comparing spatial
to their installation
different
compression was used by clustering failure events from the same substation of the same machine
time to sequences. Instead
align different of using theTo
sequences. actual failurethe
mitigate times, the times
sparsity of originspatial
problem, of the HVCBs were changed
compression was used by
type,
to theiras they often occur
installation time into bursts.
align Finally, ofsequences.
different the 43,738To rawmitigate
logs, 7637
the items were
sparsity HVCB-related.
problem, spatialoccur in
clustering failure
After sequence
events from
aligning
the same
andclustering
substation844
spatial compression,
of independent
the same machine type, as they often
compression was used by failure events from the samefailure sequences
substation of thewere
sameextracted,
machine
bursts. with
Finally,average
of the length
43,738ofraw logs, 7637 items were HVCB-related. After sequence aligning and
type, anas they often occur in nine. A sequence
bursts. Finally, example can be
of the 43,738 rawfound
logs,in7637
Figure 2. Different
items failures
were HVCB-related.that
spatial compression,
break the device844 independent
operation failure
continually sequences
occurred along were
the time extracted,
axis. with
After sequence aligning and spatial compression, 844 independent failure sequences were extracted, an average length of nine.
A sequence
withexample
an average can be found
length of nine.in A
Figure
sequence 2. Different
example canfailures that in
be found break the2.device
Figure operation
Different continually
failures that
occurredbreak
along thethe timeoperation
device axis. continually occurred along the time axis.
Figure 2. A graphical illustration of a failure sequence.
3. Proposed Method
The key idea behind all failure tracking predictions is to obtain the probability estimations using
3.
theProposed Method
occurrence of previous failures. The problem is unique because both the training sets and the test
3. Proposed Method
sets are categorical failurealldata. A detailed
The key idea behind failure tracking expression
predictionsof is the sequential
to obtain mining problem
the probability studied
estimations in
using
this paper can be summarized as follows: the HVCB failure prognosis problem is a topic of sequential
Thethekey
occurrence of previous
idea behind failures.
all failure The problem
tracking is uniqueisbecause
predictions boththe
to obtain theprobability
training sets estimations
and the test using
mining
sets are concerned
categorical with estimating
failure the future
data. A detailed failure distribution
expression of a HVCB,
of the sequential miningbased
problemon the failure
studied in
the occurrence of previous failures. The problem is unique because both the training sets and the test
sets aremining
categorical failure data. A detailed expression of the sequential mining problem studied in
concerned with estimating the future failure distribution of a HVCB, based on the failure
mining concerned with estimating the future failure distribution of a HVCB, based on the failure
Energies 2017, 10, 1913 5 of 20
history of itself,
Energies 2017, 10,and
1913 the failure sequences of all the other HVCBs, under the limitations of20short
5 of
sequences and multiple categories.
history
This of itself,
section willand the failure
present how thesequences of all thea other
TLDA provides HVCBs,
possible under
solution the limitations
to this problem byofembedding
short
sequences and multiple categories.
the temporal association into the LDA model.
This section will present how the TLDA provides a possible solution to this problem by
embedding the temporal
3.1. Latent Dirichlet Allocationassociation
Model into the LDA model.
LDA is a Dirichlet
3.1. Latent three-level hierarchical
Allocation Model Bayesian model originally used in natural language process.
It posits that each document is modeled as a mixture of several topics, and each topic is characterized
LDA is a three-level hierarchical Bayesian model originally used in natural language process. It
by anposits
infinite
that mixture of words
each document with certain
is modeled probabilities.
as a mixture A LDA
of several topics, example
and each topicis is
shown in Figure 3.
characterized
A document
by an infinite mixture of words with certain probabilities. A LDA example is shown in Figure the
consists not only of words but also the topics assigned to the words, and 3. A topic
distribution provides a sketch of the document subject. LDA introduces topics
document consists not only of words but also the topics assigned to the words, and the topic as a fuzzy skeleton
to combine the discrete
distribution provideswords
a sketchinto a document.
of the Meanwhile,
document subject. the shared
LDA introduces topics
topics as aprovide a convenient
fuzzy skeleton to
combine
indicator the discrete
to compare thewords into abetween
similarity document. Meanwhile,
different the shared
documents. topics
LDA hasprovide a convenient
had success in a variety
indicator
of areas to compare
by extending thetheconcepts
similarityofbetween
document,different documents.
topic, and word. LDAForhas had success
example, in a variety
a document can be
of areas by extending the concepts of document, topic, and word. For example,
a gene [37], an image [38], or a piece of code [39], with a word being a feature term, a patch, a document can be a or a
gene [37], an image [38], or a piece of code [39], with a word being a feature term, a patch, or a
programming word. Likewise, a failure sequence can be treated as a document, and a failure can be
programming word. Likewise, a failure sequence can be treated as a document, and a failure can be
recognized as a word. The topics in LDA can be analogous to failure patterns that represent the kinds
recognized as a word. The topics in LDA can be analogous to failure patterns that represent the kinds
of failures that cluster
of failures together
that cluster andand
together howhowthey develop
they with
develop equipment
with equipmentaging.
aging.Two
Twofoundations
foundations of of LDA
are the Dirichlet distribution and the idea of latent layer.
LDA are the Dirichlet distribution and the idea of latent layer.
Figure
Figure 3. An
3. An illustratingexample
illustrating example of
of Latent
LatentDirichlet
DirichletAllocation
Allocation(LDA).
(LDA).
3.1.1.3.1.1. Dirichlet
Dirichlet Distribution
Distribution
Among the distribution families, the multinomial distribution is the most intuitive for modeling
Among the distribution families, the multinomial distribution is the most intuitive for modeling a
a discrete probability estimation problem. The formulation of the multinomial distribution is
discrete probability
described as: estimation problem. The formulation of the multinomial distribution is described as:
(∑1i xi + 1) k xi
Γ ∑ Γ+
f ( x1, ,…. . . x; k ; n; ,p…1 , . . . ,=pk ) = ∏p
∏ Γ ∏+i Γ1( xi + 1) i=1 i
(1) (1)
whichwhich satisfies
satisfies ∑ xi =∑ n and
= ∑ and ∑ 1. Multinomial
pi = = 1. Multinomial distribution
distribution represents
represents the the probability
probability of kofdifferent
different events
i for experiments,
i with each category having a fixed probability happening
events for nΓexperiments,
times. is the gamma with eachFurthermore,
function. category having a fixed probability
the Maximum pi happening
Likelihood Estimation xi times.
(MLE) of Γ is the
is:
gamma function. Furthermore, the Maximum ̂=
Likelihood Estimation (MLE) of pi is:
∑ (2)
xi
p̂ =
which implies that the theoretical basis of the statistic (2)
∑i xi method is MLE estimation of a multinomial
distribution. Effective failure prognosis methods must balance the accuracy and details of the
which implies that the theoretical basis of the statistic method is MLE estimation of a multinomial
adequate grain information. However, we supposed that the dataset has M sequences and N kinds
distribution. Effective
of failures. Modeling failure prognosis
a multinomial methods must
distribution balance
for each HVCBthewillaccuracy
result in aand detailsmatrix
parameter of the with
adequate
grainthe
information. However, we supposed that the dataset has M sequences and N kinds
shape of M × N. These statistics for individuals will cause most elements to be zero. Taking the of failures.
Modeling a multinomial distribution for each HVCB will result in a parameter matrix with
failure sequence in Figure 1 as an example, among the 36 kinds of failure, only 7 have been seen, the shape of
M × making
N. These statisticsafor
providing individuals
reasonable will cause
probability most elements
estimation to be
for the other zero. Taking
failures the failure
impossible. sequence
This is why
much1of
in Figure asthe
anstatistical
example,analysis
amongrelies onkinds
the 36 a special classifying
of failure, onlystandard
7 have to reduce
been types
seen, of failure,
making or
providing a
reasonable probability estimation for the other failures impossible. This is why much of the statistical
analysis relies on a special classifying standard to reduce types of failure, or ignores the independence
Energies 2017, 10, 1913 6 of 20
of the HVCBs. Two solutions are feasible for alleviating the disparities: introduce a priori knowledge
or mine associations among different failures and different HVCBs.
One possible way to introduce a priori knowledge is based on Bayes’ theorem. Bayesian inference is
a widely used
Energiesmethod of statistical inference to estimate the probability of a hypothesis when insufficient
2017, 10, 1913 6 of 20
information is available. By introducing a prior probability on the parameters, Bayesian inference acts as
ignores the independence of the HVCBs. Two solutions are feasible for alleviating the disparities:
a smoothing filter. Conjugate prior is a special case where the prior and posterior distribution have the
introduce a priori knowledge or mine associations among different failures and different HVCBs.
same formulation. The conjugate
One possible way to prior distribution
introduce of multinomial
a priori knowledge distribution
is based on Bayes’istheorem.
Dirichlet distribution,
Bayesian
which is: inference is a widely used method of statistical inference to estimate the probability of a hypothesis
→→ 1 k α i −1
when insufficient Dir information
pα =
is available.
f ( p1 , . . . ,By
pk ;introducing a prior ∏
α1 , . . . α k ) = probability
p on the parameters, (3)
→ i
Bayesian inference acts as a smoothing filter. Conjugate prior is∆aαspecial i =1 case where the prior and
posterior distribution have the same formulation. The conjugate prior distribution of multinomial
with the normalization coefficient being:
distribution is Dirichlet distribution, which is:
→ ∏ik=1 Γ(αi )
| ∆
= α =, … , ;k , …
= 1 (3) (4)
Γ ∑ i =1 α i Δ
→
similar towith
the the normalization coefficient being:
multinomial distribution. Due to the Bayesian rule, the posterior distribution of p with
→
new observations x can be proven as: ∏ Γ
Δ = (4)
Γ ∑
→→ → →→ →
p p α ,Due
similar to the multinomial distribution. =the
x to Dir pα +
Bayesian x the posterior distribution of
rule, with (5)

new observations can be proven as:
with the mean being: | , = | + (5)
→ xi + αi
with the mean being: pi = (6)
∑i ( xi + αi )
+
From Equation (6), even the failures with no = observations are assigned to a prior probability
(6)
∑ +
associated with αi . The conjugate relation can be described as a generative process shown in Figure 4a:
From Equation (6), even the failures with no observations are assigned to a prior probability
→
associated with →. The
conjugate relation can be described as a generative process shown in Figure
(1) Choose
4a:
θ i ∼ Dir α , where i ∈ {1, 2, 3, , M };
→
(2) Choose
(1) a failure ~f ij ∼ Multinominal
Choose , where ∈ 1,2,3,
( θ i. ). ,, where
; j ∈ {1, 2, 3, , Ni }.
(2) Choose a failure ~ , where ∈ 1,2,3, . . , .
(a) (b)
Figure 4. Figure 4. Graphical

Graphical representations
representations comparison
comparison ofofthe
theDirichlet
Dirichlet distribution
distribution and LDA:
and (a) graphical
LDA: (a) graphical
representation
representation of the Dirichlet
of the Dirichlet distribution;
distribution; (b) (b) graphicalrepresentation
graphical representation ofofLDA.
LDA.
3.1.2. Latent Layer

3.1.2. Latent Layer
Matrix completion is another option for solving the sparsity problem that establishes global
Matrix completion
correlation amongis another
units. option
The basic task offor solving
matrix the sparsity
completion problem
is to fill the missing that
entriesestablishes
of a partiallyglobal
observed
correlation amongmatrix.
units. In
Thesequential prediction
basic task withcompletion
of matrix limited observations,
is to fill thepredicting
missing theentries
probabilities of
of a partially
failures that have never appeared is a problem Using the recommend system
observed matrix. In sequential prediction with limited observations, predicting the probabilities as an example, for a of
sparse user-item rating matrix with users and items, each user had only rated several items.
failures that have never appeared is a problem Using the recommend system as an example, × for a
To fill the unknown space, is first decomposed as two low dimensional matrices ∈ and
sparse user-item
∈ × rating matrix R with m users and f items, each user had only rated several items.
satisfying:
Energies 2017, 10, 1913 7 of 20
To fill the unknown space, R is first decomposed as two low dimensional matrices P ∈ Rm× f and
Q ∈ Rn× f satisfying:
R ≈ PQ T = R̂ (7)
Energies 2017, 10, 1913 7 of 20
with the aim of making R̂ as close to R as possible. Then, the rating of user u to item R̂(u, i ) = r̂ui ,
can be inferred as:
≈ = (7)
r̂ui = ∑ pu f qi f (8)
with the aim of making as close to as possible.
f Then, the rating of user to item , = ̂ ,
can be inferred as:
Many different realizations of Equation (7) can be created by adopting different criteria to
̂ = (8)
determine whether the given matrices are similar. The spectral norm or the Frobenius norm creates the
Many different realizations of Equation (7) can be created by adopting
classical singular value decomposition (SVD) [40], and the root-mean-square error (RMSE) creates the different criteria to
latent factor determine
model whether
(LFM)the given
[41] matrices
model. are similar.regularization
In addition, The spectral norm or the
terms Frobenius
are norm creates
useful options to increase
the classical singular value decomposition (SVD) [40], and the root-mean-square error (RMSE) creates
the generalization of the model.
the latent factor model (LFM) [41] model. In addition, regularization terms are useful options to
Analogously, a latent layerofwith
increase the generalization L elements can be introduced between the HVCB sequences
the model.
and the failures. Analogously, a latent layer with N kinds
For M sequences with ofcan
elements failures, of M N-parameter
insteadbetween
be introduced multinomial
the HVCB sequences
distributions and the described
failures. For sequences
above, M L-parameter
with multinomial
kinds models,
of failures, instead L N-parameter
of andN-parameter multinomial
multinomial
distributions described above, L-parameter multinomial models,
models are preferred, where L failure patterns are extracted. A schematic diagram of the comparison and N-parameter
is shownmultinomial
in Figure 5. models are preferred,
No direct where exist
observations failure patterns
to fill are extracted.
the gap between As1schematic diagram
and f 3 ; the of
connection of
the comparison is shown in Figure 5. No direct observations exist to fill the gap between s1 and f3; the
s1 -z1 -f 3 , s1 -z2 -f 3 , s1 -z3 -f 3 will provides a reasonable suggestion.
connection of s1-z1-f3, s1-z2-f3, s1-z3-f3 will provides a reasonable suggestion.
(a) (b)
Figure 5.Figure 5. Schematic

Schematic diagram
diagram of the
of the matrix
matrix completion: (a)
completion: (a)the
thegraphical representation
graphical of the of
representation failure
the failure
probability
probability estimation
estimation task.task.
TheThe solid
solid linesrepresent
lines represent the
the existing
existingobservations, and the
observations, anddotted line
the dotted line
represents the probability of the estimate. (b) The model makes an estimation by the solid lines after
represents the probability of the estimate. (b) The model makes an estimation by the solid lines after
matrix decomposition.
matrix decomposition.
3.1.3. Latent Dirichlet Allocation
3.1.3. LatentThe
Dirichlet Allocation
combination of Bayesian inference and matrix completion creates the LDA. Two Dirichlet
Thepriors are assigned
combination to the two-layer
of Bayesian multinomial
inference distributions.
and matrix completionA similar ideathe
creates is shared
LDA. byTwoLFM,
Dirichlet
where the regularization items can be theoretically deduced from the assumption of Gaussian priors.
priors are assigned to the two-layer multinomial distributions. A similar idea is shared by LFM,
The major difficulties in realizing LDA lie in the model inference. In LDA, it is assumed that the jth
where the regularization items can
failure in sequence
be theoretically deduced ,from
comes from a failure pattern making
the assumption of Gaussian priors.
satisfying a multinomial
The major difficulties in realizing
distribution parameterized with LDA lie in the model inference.
. In addition, the failure pattern In LDA, it is assumed
also originates from that
a the
jth failure in sequence
multinomial m f mj comes
distribution whosefrom a failure
parameters are pattern zmj , making
. Finally, from the fperspective
mj satisfying a multinomial
of Bayesian
→
distribution parameterized
statistics, both andwithare . In addition,
ϕ zsampled
mj
the failure
from two Dirichlet priorspattern zmj also originates
with parameters and . Thefrom a
→ to a three-layer sampling process as follows:
original Dirichlet-multinomial process can evolve
multinomial distribution whose parameters are θ m . Finally, from the perspective of Bayesian statistics,
→ (1) →
Choose ~ , where ∈ 1,2,3, . . , ; → →
both ϕ zmj and θ m are sampled from two Dirichlet priors with parameters α and β . The original
(2) Choose ~ , where ∈ 1,2,3, . . , ;
Dirichlet-multinomial process can evolve to a three-layer sampling process as follows:
For each failure ,
→ →
(1) Choose θ m ∼a Dir
(3) Choose latent ), where~m ∈ {1, 2, 3, , M};;
( αvalue
→ →
(2) (4) Choose
Choose ϕ k ∼aDir
failure ~
( β ), where k ∈ {1, 2, 3, , K }. ;
For each failure f mj ,
Energies 2017, 10, 1913 8 of 20
→
(3) Choose a latent value zmj ∼ Multinominal ( θ m );
→
(4) Choose a failure f mj ∼ Multinominal ( ϕ zmj ).
where m ∈ {1, 2, 3, , M }, and j ∈ {1, 2, 3, , Nm }. Nm is the failure number in sequence m, and M is

the total sequence number.
The probabilistic graphic of LDA is shown in Figure 4b, and the joint probability distribution of
all the failures under this model is given by:
→ → M Nm K
∏∏∑p

p( f , z ) = f mj zmj = k p zmj = k (9)
m =1 j =1 k =1
→ →
The learning targets of LDA include θ m and ϕ k . They can both be inferred from the topic
→ →
assigning z . The posterior distribution of z cannot be directly solved. Gibbs sampling is one possible
solution. First, the joint probability distribution can be reformulated as:
→ →

→
K → M ∆ nm + α
→ →→ → ∆( n k + β )

p( f , z α , β ) = ∏ → ∏ → (10)
k =1 ∆( β ) m =1 ∆ α
→ →
n o
where n k = nw nkm

k w=1:V , and n m = are the statistics of the failures count under topic k, and
k =1:K
topic count under failure sequence m. V is the number of failure types. The conditional distribution of
the Gibbs sampling can be obtained as:
f mj
→ → nk − 1 + β f mj nkm − 1 + αk

→
p( zmj = k z −mj , α , β ) = V · K
(11)
∑i=1 (nik,−mj − 1 + β i ) ∑i=1 (nim,−mj + αi )
where nik,−mj is the number of failures with the index i assigned to topic k, excluding the failure f mj ,
and nim,−mj is the number of failures in sequence m with topic i, excluding the failure f mj . After certain
→ →
iterations, the posterior estimation of θ m and ϕ k can be inferred with:
nkm + αk
θmk = (12)
∑iK=1 nim + αi

nw
k + βw
ϕkw = (13)
∑V i

i =1 n k + β i
Finally, the posterior failure distribution of the ith HVCB can be predicted with:
K K
→
→ →
p m = ∑ p(zk ) p w zk = ∑ θmk ϕ k (14)

k =1 k =1
3.2. Introducing the Temporal Association into LDA

Even with the promising advantage of finding patterns in the categorical data, directly borrowing
LDA to solve the sequence prediction problem has some difficulties. LDA assumes that data samples
are fully exchangeable. The failures are assumed to be independently drawn from a mixture of
multinomial distributions independently, which is not true. In the real world, failure data are naturally
collected in time order, and different failure patterns evolve. So, it is important to exploit the temporal
characteristics of the failure sequences.
To introduce the time attributes in LDA, we first assumed that the future failure was most related
to the other failures within a time slice. Instead of using the failure sequences of the full life circle,
the long sequences were divided into several sub-sequences by a sliding time window with width W.
The sub-sequences may overlap with each other. Under this assumption, a simple way to use LDA in a
→
time series is to directly exploit the pattern distributions θ m in different time-slices. However, this
Energies 2017, 10, 1913 9 of 20
approach does not consider the dependence among different slices. In the LDA model, the dependence
among different sub-sequences can be represented by the dependency among the pattern distributions.
A modified probabilistic graph is shown in Figure 6, where ums represents the topic distribution of a
→
specified 2017, 10, 1913 and w are the prior parameters, with the joint distribution being:9 of 20
sub-sequence,
Energies
→ →→
Z Z
→ →
Jm → → → → →
Nms → → →
p( f , z w ) = p u m0 w ∏ p u ms u m0 , u m1 , . . . , u m,s−1 , w ∏ p zmsj u ms d u ms d u m0 (15)

, | = | s =1 | , ,…, , , j=1|
(15)
where Jwhere
m is the isnumber
the numberof ofsub-sequences
sub-sequences ininsequence
sequence m, m, N the is
is ms the number
number of failuresofin failures
the sub- in the
→
sequence
sub-sequence , andu ms isisthe
s, and the topic
topic distribution
distributionof aofspecified sub-sequence.
a specified sub-sequence.
FigureFigure 6. Graphical
6. Graphical representationfor
representation for a
a general
generalsequential extension
sequential of LDA.
extension of LDA.
Due to the lack of conjugacy between Dirichlet distributions, the posterior inference of
Due to the(15)
Equation lack
canofbe conjugacy between Dirichlet
intractable. Simplifications, such asdistributions, the posterior
the Markov assumption inference of
and specified
Equation (15) can be
conditional intractable.
distributions, can Simplifications,
elucidate the posteriorsuch as the Markov
distribution assumption
out [42,43]. However, andthe specified
formulation
conditional does notcan
distributions, need to be Markovian,
elucidate and the
the posterior time dependency
distribution can stillHowever,
out [42,43]. be complicated. To
the formulation
does not overcome
need tothis beproblem,
Markovian,an alternative
and themethod of creating a new
time dependency canco-occurrence mode is proposed
still be complicated. to
To overcome
establish the long-term dependency among different sub-sequences. Specifically, form Equations (12)
this problem, an alternative method of creating a new co-occurrence mode is proposed to establish
and (13), the failures that occur together are likely to have the same failure pattern. In other words,
the long-term dependency
co-occurrence is still among different
the foundation forsub-sequences.
deeper pattern mining Specifically,
in LDA.form Equations
Therefore, (12)ofand (13),
instead
the failures that occur
specifying together are
the dependency likely
among thetotopic
have the same failure
distributions, as shown pattern.
by theIn otherline
dotted words, co-occurrence
in Figure 6, a
is still the
directfoundation for deeper
link was constructed betweenpattern mining
the current and in LDA.
earlier Therefore,
failures by addinginstead
the past of specifying
failures into the
current sub-sequence with certain probabilities. Additionally, the adding
dependency among the topic distributions, as shown by the dotted line in Figure 6, a direct link operation should embed
the temporal information by assigning a higher probability to the closer ones. Based on the
was constructed between the current and earlier failures by adding the past failures into current
requirements, a sampling rate comforting exponential decay is implemented as follows:
sub-sequence with certain probabilities. Additionally, the adding operation should embed the temporal
information by assigning a higher probability=to the − closer
∆
, 0 ones.
≤ < Based on the requirements, a sampling
(16)
rate comforting exponential decay is implemented0,as ℎfollows:
where the attenuation coefficient ∆ controls the decreasing speed of along the time interval .
(
exp − ∆x , 0 ≤ x < T

is the time at the left edge ofpthe =
( x ) current time window. Figure 7 shows the schematic diagram of (16)
0, otherwise
the process for constructing new co-occurrence patterns. To predict the future failure distribution,
the failures ahead of the current time window are also included. Each iteration generates new data
where the attenuation
combinations coefficient
to argument ∆ controls
the data. the of
An outline decreasing speed of
the Gibbs sampling p( x ) along
procedure with the interval x.
timedata
the new
T is thegeneration
time at the left edge
method of the
is shown current time
in Algorithm 1. window. Figure 7 shows the schematic diagram of
the process for constructing new co-occurrence patterns. To predict the future failure distribution,
the failures ahead of the current time window are also included. Each iteration generates new data
combinations to argument the data. An outline of the Gibbs sampling procedure with the new data
generation method is shown in Algorithm 1.
Energies 2017, 10, 1913 10 of 20
Energies 2017,
Energies 10,10,
2017, 1913
1913 1020
10 of of 20
Figure 7. The
Figure sampling
7. The samplingprobability withinand
probability within andprior
priortoto
thethe time
time window.
window.
Figure 7. The sampling probability within and prior to the time window.
Algorithm 1 Gibbs sampling with the new co-occurrence patterns
Algorithm
Algorithm1 Gibbs sampling
1 Gibbs with the
sampling new
with co-occurrence patterns patterns
Input: Sequences, MaxIteration, , the new co-occurrence
, ∆,
→ →
Input:
Input: Sequences,
Sequences,
Output: MaxIteration,
MaxIteration,
posterior , ∆, ,Wand
α , βof
inference , ∆,
→ →
1: posterior
Output:
Output: posterior inference
Initialization: randomly
inference of assign
of θ and ϕ andfailure patterns and make sub-sequences by ;
1: 1: 2: Initialization:
Initialization: randomly
randomly
Compute the statistics assign
assign failure
failure
, , patterns
patterns
,, and
,
and make
make sub-sequences
sub-sequences by W; by ;
in Equation (11) for each sub-sequence;
2: 2: 3: Compute
f
for iter in
Compute the statistics
1 to nkmj , nkmdo
MaxIteration
statistics , , nik,,−mj,, ni,k −, mj , in Equation
in Equation
(11) for(11) for
each each sub-sequence;
sub-sequence;
3: 3: 4: for
foriterForeach
iter in sequence in do
in 1 to MaxIteration
MaxIteration Sequences
do do
4: 4: 5: Foreach
Foreach Foreach sub-sequence
sequence
sequence ininSequences
Sequences in
dosequence
do do
5: 5: 6: Foreach Add new failures
sub-sequence in
Foreach sub-sequence in sequence do in the current
sequence do sub-sequence based on Equation (16);
6: 6: 7: Foreach
Add new failure
failures in the new
current sub-sequence
Add new failures in the current sub-sequence sub-sequence dobasedbased
on Equation (16); (16);
on Equation
7: 7: 8: Draw
Foreach new
failure in the from
new Equation
sub-sequence
Foreach failure in the new sub-sequence do (11);
do
8: 8: 9: Draw
Update newthezmj from Equation
statistics (11); (11);
Draw new fromin Equation
Equation (11);
9: 10: End Update
for the statistics in Equation (11);
9: Update the statistics in Equation (11);
10: 11: EndEndfor for
10: End for
11: 12: End End
forfor
11: End for
12: 13: End for
Compute the posterior mean of and based on Equations (12) and (13)
12: End for → →
13: 14: EndCompute
for the posterior mean of θ and ϕ based on Equations (12) and (13)
13: Compute the posterior mean of and based on Equations (12) and (13)
14: 15: End for
Compute the mean of and of last several iterations
14: End for → →
15: Compute the mean of θ and ϕ of last several iterations
15: Compute the mean of and of last framework
several iterations
Based on the above premise, the TLDA for extracting the semantic characteristics
and predicting
Based the failure
on the above distribution
premise, the TLDAis shown in Figure 8.
framework forAfter preprocessing
extracting and generating
the semantic the
characteristics
Based on theanabove
sub-sequences, premise,
alternating the TLDA
renewal framework
process for extracting
was implemented betweenthethe
semantic characteristics
new co-occurrence
and predicting the failure distribution is shown in Figure 8. After preprocessing and generating the
and predicting
pattern the failure
construction anddistribution is shownThe
the Gibbs sampling. in Figure 8. Afteroutput
final average preprocessing andtime
reflects the generating
decreasethe
sub-sequences,
sub-sequences,
ananalternating
alternating
renewal
renewal
process was implementedbetween
process was implemented
betweenthe thenew
newco-occurrence
co-occurrence
presented in Equation (16) due to the multi-sampling process. Finally, Equation (14) provides the
pattern construction
pattern
future distributionand
construction andthe
theGibbs
prognosis using sampling.
Gibbs sampling.
the The final
The final average
learned parameters average output
output
of the last reflectsof
reflects
sub-sequence the
the time
time
each decrease
decrease
HVCB.
presented
presentedin in
Equation
Equation(16)(16)due
duetotothethemulti-sampling
multi-sampling process. Finally,Equation
process. Finally, Equation(14) (14)provides
providesthethe
future distribution
future distribution prognosis
prognosisusing
usingthe thelearned
learnedparameters
parameters of of the
the last
last sub-sequence
sub-sequenceofofeach eachHVCB.
HVCB.
Figure 8. Log analysis framework by the temporal Latent Dirichlet Allocation (TLDA).
Figure 8. 8.Log
Figure Loganalysis
analysisframework
frameworkby
by the
the temporal LatentDirichlet
temporal Latent DirichletAllocation
Allocation(TLDA).
(TLDA).
Energies 2017, 10, 1913 11 of 20
4. Evaluation Criteria
The output of the proposed system is the personalized failure distribution for each HVCB.
However, directly verifying the prediction result is impossible due to the sparsity of the failure
sequences. Therefore, several indirectly quantitative and qualitative criteria are proposed as follows.
4.1. Quantitative Criteria

Instead of verifying the entire distribution, the prognosis ability of the model was testified by
predicting the upcoming one failure. Several evaluation criteria are developed as follows.
4.1.1. Top-N Recall

The Top-N prediction is originally used in the recommend system to check if the recommended N
items satisfy the customers. Precision and recall are the most popular metrics for evaluating the Top-N
performance [44]. With only one target behavior, the recall becomes proportional to the precision,
which can be simplified as:
∑ | R (h) ∩ T (h)|
Recall = h∈ H N (17)
|H|
where R N (h) is the failure set with the Top-N highest prediction probabilities for HVCB h, T (h) is the
failure that subsequently occurred, and | H | is the sum of HVCBs to be predicted. The recall indicates
whether the failure that subsequently occurred is included in the Top-N predictions. Considering the
diversity of the different failure categories, Top-1, Top-5 and Top-10 recalls were used.
4.1.2. Overlapping Probability

The overlapping probability Po is proposed as an aided index to the Top-1 recall, which is defined
as the probability the model assigns for T (h). For instance, assuming a model concludes that the next
failure probabilities for a, b, c are 50%, 40%, and 10%, respectively, after a while, failure b actually
occurs. Then, the overlapping probability is 40%. This index provides an outline of how much the
probability distribution overlaps with the real one-hot distribution, which can also be understood
as the confidence. With similar Top-1 recall, higher mean overlapping probability represents a more
reliable result.
These two kinds of quantitative criteria are suitable for different maintenance strategies,
considering the limitation of the maintainers’ rigor. The Top-N recall corresponds to the strategy of
focusing on the Top-N rank failure types, whereas the overlapping probability is another possible
strategy of monitoring on the probabilities of the failure types that exceed the threshold.
4.2. Qualitative Criteria

The TLDA can provide explicit semantic characteristics. The results of our algorithm offer a deep
new perspective for understanding the failure modes and their variation trends. For example, different
→
failure patterns can be extracted by examining the failures with high proportions. By considering θ as
a function of time, investigating rise and fall of different failures and how they interact is easy, either
from a global perspective or when focusing on one sample. In addition, by introducing the angle
cosine distance as a measurement, the similarity between failure p and failure q can be calculated as:
∑iK=1 ϕip ϕiq
I pq = q q (18)
∑iK=1 ϕ2ip ∑iK=1 ϕ2iq
Figure 9 depicts the cosine distance computing method. Only the angle between the two vectors
affects this indicator. A higher cosine distance often indicates more similar failure reasons.
Energies 2017, 10, 1913 12 of 20
Energies 2017, 10, 1913 12 of 20
Energies 2017, 10, 1913 12 of 20
Figure 9. Schematic diagram of the cosine distance with two dimensions.

Figure 9. Schematic diagram of the cosine distance with two dimensions.
5. Case Study
5. Case Study
The experimental dataset was based on the real-world failure records described in Section 2.
After Figure 9.the
data processing, Schematic diagram of theeach
cosineHVCB
distance withlisted
two dimensions.
The experimental dataset failure history
was based onofthe real-worldwas as a failure
failure records sequence
described in
in Section 2.
chronological
After data processing,order.
the A cross-validation
failure history oftest
eachwas used
HVCB to
wasassess the
listed asperformance
a failure with
sequence the following
in chronological
5. Case Study
process. Firstly, the last failure of each sequence was separated as the test set. Then, the remaining
order. A cross-validation test was used to assess the performance with the following process. Firstly, the
instances were used to dataset
The experimental train thewas
TLDA model
based based
on the on Algorithm
real-world failure1. records
For eachdescribed
validationinround, the2.
Section
last failure
After
of
tail part each
of
data
sequence
each was separated
failure sequences
processing,
as of
washistory
the failure randomlytheeach
test set.
abandoned Then,
HVCBtowas
the
obtain remaining
the new
listed as atest
instances wereinused to
sets. sequence
failure
train the TLDA model
chronological order.based on Algorithm
A cross-validation test1.was
Forused
eachtovalidation round, the with
assess the performance tail part of each failure
the following
5.1.
sequences Quantitative
was Analysis
randomly abandoned to obtain the new test sets.
process. Firstly, the last failure of each sequence was separated as the test set. Then, the remaining
instances were used to train the TLDA model based on Algorithm 1. For each validation round, the
5.1.1.
tail Parameter
5.1. Quantitative
part of each Analysis
Analysis
failure sequences was randomly abandoned to obtain the new test sets.
Hyper-parameters of the proposed method include the number of the failure patterns , the
5.1.1. 5.1.
Parameter
width of theAnalysis
Quantitative Analysis
time window , and the attenuation coefficient ∆. For all runs of the algorithm, the
Dirichlet parameters
Hyper-parameters and
of the proposed were assigned
method with symmetric
include priors of
the number of the
1/ failure
and 0.01, respectively,
patterns K, the width
5.1.1. Parameter Analysis
which were slightly different from the common setting [45]. Gibbs sampling of 300 iterations was
of the time window W, and the attenuation coefficient ∆. For all runs of the algorithm, the Dirichlet
sufficient
→ for the → algorithm
Hyper-parameters to converge.
of the proposed For each Gibbs
method includesampling
the number chain,ofthe
thefirst 200 iterations
failure patterns were
, the
parameters
discarded,
width of and
α the andtime were
βthe windowassigned
average resultswith
, and of thesymmetric
the priors were
last 100 iterations
attenuation of 1/K
coefficient Forand
∆.taken as0.01,
all respectively,
the final
runs of output.
the Thewhich
algorithm, the were
first
slightly
setdifferent
Dirichlet from the
of experiments
parameters wascommon
conducted
and setting [45].the
to assigned
were analyze Gibbs
model
with sampling
performance
symmetric of 300
priors withiterations
of 1/ respect towasrespectively,
and 0.01, sufficient
among {25, for the
30, 35,
algorithm
which 40,
towere 45,
converge. 50}. Figure
slightly For 10 shows
each
different from the
Gibbs results
sampling
the common ofchain,
Top-1,the
setting Top-5,
first
[45]. Top-10 recalls, of
200 sampling
Gibbs iterations and
were theiterations
300 overlapping
discarded, wasand the
probability
sufficient for under
the fixed
algorithm to and ∆ of
converge. six
For years
each and
Gibbs 10,000
samplingdays, respectively.
chain, the
average results of the last 100 iterations were taken as the final output. The first set of experiments first These
200 evaluation
iterations were
indexes doand
discarded, not the
appear to beresults
average much affected by100
theiterations
number ofwerefailure patterns. the The failure pattern of
was conducted to analyze the modelofperformance
the last with respecttaken to Kasamong final{25,
output. The40,
30, 35, first45, 50}.
40 of
set surpasses
experimentsthe others slightly for
was conducted the Top-N
to analyze the recalls. The overlapping
model performance with probability
respect to increased
among {25,to
Figurerelatively
10 showsstablethe results of Top-1, Top-5, Top-10 recalls, and the overlapping probability under fixed
30, 35, 40, 45, 50}.numerical
Figure 10 values afterresults
shows the 40. The overfitting
of Top-1, Top-5,phenomenon,
Top-10 recalls, which
andperplexes many
the overlapping
∆ of sixlearning
W andmachine years and 10,000was days, respectively. These evaluation indexes do not appear to be much
probability undermethods,
fixed andnot∆ serious with high
of six years andnumbers of failure
10,000 days, patterns.
respectively. These evaluation
affected by the
indexes do number of to
not appear failure patterns.
be much affectedThe failure
by the numberpattern of 40patterns.
of failure surpassesThe the others
failure slightly
pattern of for
the Top-N recalls. The overlapping probability increased to relatively stable numerical
40 surpasses the others slightly for the Top-N recalls. The overlapping probability increased to values after 40.
The overfitting phenomenon,
relatively stable numericalwhich
valuesperplexes many
after 40. The machinephenomenon,
overfitting learning methods, was not serious
which perplexes many with
machine learning
high numbers methods,
of failure was not serious with high numbers of failure patterns.
patterns.
Figure 10. Performance comparison versus the number of failure patterns: (a) the Top-1, Top-5, and
Top10 recalls with respect to the number of failure patterns. (b) the overlapping probability with
respect to the number of failure patterns.
FigureFigure 10. Performancecomparison

10. Performance comparison versus
versusthethe
number of failure
number patterns:
of failure (a) the Top-1,
patterns: Top-5,
(a) the and Top-5,
Top-1,
Top10 recalls with respect to the number of failure patterns. (b) the overlapping probability
and Top10 recalls with respect to the number of failure patterns. (b) the overlapping probability with with
Energies 2017, 10, 1913 13 of 20
Energies 2017, 10, 1913 13 of 20
In theInnext
the next experiment,
experiment, thethe qualitativecriteria
qualitative criteria were
were examined
examinedasas
a function of the
a function timetime
of the window
window
and the attenuation coefficient ∆, with the number of the failure patterns fixed at 40. The
W and the attenuation coefficient ∆, with the number of the failure patterns K fixed at 40. The results
results are shown in Figure 11. The peak values of different criteria were achieved with different
are shown in Figure 11. The peak values of different criteria were achieved with different parameters.
parameters. The optimal parameters with respect to the performance metrics are summarized in
The optimal
Table 3. parameters with respect to the performance metrics are summarized in Table 3.
Figure Performance
11. 11.
Figure Performancecomparison versustime
comparison versus timewindow
window length
length and and the attenuation
the attenuation coefficient:
coefficient: (a)
(a) thetheTop-1
Top-1recall
recallversus
versus the
themodel
modelparameters;
parameters;(b) (b)
the the
Top-5 recallrecall
Top-5 versus the model
versus parameters;
the model (c)
parameters;
(c) thetheTop-10
Top-10recall
recall versus
versusthethemodel
modelparameters; and (d)
parameters; andthe(d)
overlapping probability
the overlapping versus the versus
probability model the
modelparameters.
parameters.
Table 3. Optimal parameters for different prediction tasks.

Table 3. Optimal parameters for different prediction tasks.
Performance Criteria W (Years) ∆ (Days)
Performance Criteria
Top-1 W (Years)
7 ∆ (Days)
30,000
Top-1 Top-5 7 7 30,000
20,000
Top-5 Top-10 7 3 20,000
10,000
Top-10 3 10,000
Overlapping Probability 7 30,000
Overlapping Probability 7 30,000
From Table 3, the high Top-1 recall calls for a relatively large window size of seven years and a
From Table 3,
large decay the highof
parameter Top-1
30,000recall calls
days, for the
while a relatively large
best Top-10 window
recall size of seven
was obtained with years
smallerand a
large parameters of three
decay parameter ofyears
30,000and 10,000
days, days.
while theThe
bestTop-5 recall
Top-10 alsowas
recall requires a large
obtained of sevenparameters
with smaller years
butyears
of three and∆10,000
a smaller of 20,000 days
days. when
The compared
Top-5 recall to therequires
also Top-1 recall. TheW
a large overlapping probability
of seven years but aalso
smaller
∆ of 20,000 days when compared to the Top-1 recall. The overlapping probability also shares for
shares similar optical parameters with Top-10. The difference among the parameter selection similar
different evaluation parameters may be explained as follows. With wider and larger ∆, the sub-
optical parameters with Top-10. The difference among the parameter selection for different evaluation
sequence tends to include more failure data. A duality exists where more data may help the model
parameters may be explained as follows. With wider W and larger ∆, the sub-sequence tends to
discover the failure pattern more easily or limit its extension ability. With more data, the model tends
include more failure
to converge data. A
on several duality
certain exists
failure whereand
patterns more data may
provides morehelp the model
confidence discover
in the failures.the failure
This
pattern
explains why the Top-1 recall and the overlapping probabilities share the same optical parameters. on
more easily or limit its extension ability. With more data, the model tends to converge
several certain failure patterns and provides more confidence in the failures. This explains why the
Top-1 recall and the overlapping probabilities share the same optical parameters. However, this kind
of converge may neglect the other related failures. For the Top-10 recall, the most important criterion is
the fraction of coverage, rather than one accurate hit. Training and predicting with relatively less data
Energies 2017, 10, 1913 14 of 20
focuses more on the mutual associations, which provides more insight into the hidden risk. Generally,
the difference between the optical parameters of Top-1 and Top-10 recalls reflects a dilemma between
higher confidence and wider coverage in machine learning methods.
5.1.2. Comparison with Baselines

The best results were also compared with several baseline algorithms, including the statistical
approach, Bayesian method, and the newly developed Long Short-Term Memory (LSTM) neural
network. The statistical approach is the most common method for log analysis in power grids,
which accounts for a large proportion in the annual report of power enterprises. A global average
result that mainly focuses on the proportion of different failures is used to guide the production of the
next year. The Bayesian method is one of the main approaches for distribution estimation. A sequential
Dirichlet update initialized with the statistical average was conducted to provide a personalized
distribution estimates for each HVCB. In past years, deep learning has exceeded the traditional
methods in many areas. As one branch of deep learning for handling sequential data, LSTM has been
applied to HVCB log processing. The key parameters of the LSTM include the embedding dimension
of eight and the fully connected layer, having 100 units. Additionally, sequences shorter than 10 are
padded to ensure a constant input dimension.
Table 4 reports the experimental result, where the model with the best performance is marked
in bold font. The TLDA had the best performance for the Top-1, Top-5, and Top-10 tasks with
51.13%, 73.86%, and 92.93%, respectively, whereas the best overlapping probability was obtained by
the Bayesian method. Although the Bayesian method obtained good overlapping probability and
Top-1 recall, its Top-5 and Top-10 performances were the worst among the tested methods because the
Bayesian method places too much weight on individual information and ignores the global correlations.
On the contrary, the statistical approach obtained a slightly better result in Top-5 and Top-10 recall
owning to the long tail distribution. However, its Top-1 recall was the lowest. The unbalanced datasets
create a problem for the LSTM for obtaining high Top-1 recall. However, the LSTM still demonstrated
its learning ability as reflected in its Top-5 and Top-10 recalls.
Table 4. Performance comparison with different methods.
Method Top-1 (%) Top-5 (%) Top-10 (%) Po (%)

TLDA 51.13 73.86 92.93 31.50
Statistical Approach 19.79 58.33 78.12 5.87
Bayesian Sequential Method 41.67 55.21 65.63 33.96
Neural Network(LSTM) 32.29 67.70 81.25 15.52
5.2. Qualitative Analysis
5.2.1. Failure Patterns Extraction

As mentioned before, the LDA method treats each failure sequence as a mixture of several failure
patterns. Some interesting failure modes and failure associations can be mined by visualizing the
failures. For simplicity, failure patterns of 10 were adopted to train a new TLDA model. Table 5 lists the
failures that account for more than 1% in each failure pattern. All the failure patterns were extracted
automatically and the titles were summarized afterward. Notably, error records may exist, the most
common of which was the confusion between causes and phenomena. For example, various failure
categories can be mistaken for operating mechanism failure as that the mechanism is the last step of a
complete HVCB operation. A summary of the extracted failure patterns is as follows.
Energies 2017, 10, 1913 15 of 20
Table 5. Top failures in each failure pattern.
1. Operation Error by 2. Operation Error by 3. Operation Error by 4. Cubicles and Its

Machinery Parts Driving System Tripping Coils Auxiliary System
Mechanism cubicle
High temperature
Operating mechanism Electromotor stalling
Secondary cubicle
Assistive component Travel switch Tripping and closing coil
Insulator
damage Relay Secondary cubicle
Closing instructions
High voltage indicating Reason Unidentified Operating mechanism
Auxiliary switch
device Electromotor on file Humidity ovenproof
Incomplete installation
SF6 leakage Safe-blocked circuit Mechanism cubicle
High voltage indicating
Electromotor stalling Closing instruction Safe-blocked circuit
device
Travel switch Operating mechanism
Operating mechanism
Travel switch
6. Operation Error by 7. Pneumatic
5. SF6 Leakage 8. Measuring System
Secondary System Mechanism
Pneumatic mechanism Closing instructions
leakage High voltage indicating
Poor contact device
Remote control signal
Misjudgment Operation counter
Auxiliary switch
Air compressor stalling False wring
Rejecting action
SF6 leakage Air compressor leakage Transmission bar
Operating mechanism
Mechanism cubicle Operating mechanism
Tripping and closing coil
SF6 leakage SF6 leakage
Gas pressure meter
Remote control signal Gas pressure meter
Relay SF6 constituents
Electromotor stalling Tripping and closing coil
9. Secondary System 10. Hydraulic Mechanism
Contactor
Safe-blocked circuit
Air compressor stalling Hydraulic mechanism leakage
Main circuit Closing instructions
High voltage indicating device SF6 leakage
SF6 constituents Operating mechanism
False wring
Contactor
Failure pattern 1 mainly contains the operating mechanism’s own failures, while pattern 2 reveals
the co-occurrence of the operating mechanism within the driving system. Analogously, pattern 3 and
pattern 6 mainly focus on how the operation may be broken by the tripping coils and secondary parts
such as remote control signal. Pattern 7 and pattern 10 cluster the failures of pneumatic and hydraulic
mechanism together. The other patterns also show different features. Different failure patterns have
special emphasis and overlap. For example, though both contain secondary components, pattern 9
only considers their manufacturing quality, while pattern 6 emphasizes the interaction between the
secondary components and the final operation.
5.2.2. Temporal Features of the Failure Patterns

The average value of θ in different time slices can be calculated as a function of time to show the
average variation tendency of different failure patterns. As shown in Figure 12, the failure modes of
hydraulic mechanism, pneumatic mechanism and cubicles increase along with operation years, while
the percentage of the measuring system, tripping and closing coils decrease. The SF6 leakage and
machinery failures always share a large portion. The rise and fall of different failure patterns reflect
the dynamic change of the device state, which is useful for targeted action scheduling.
Energies 2017, 10, 1913 16 of 20
Energies 2017, 10, 1913 16 of 20
Figure 12. Average time-varying dynamics of the extracted 10 failure patterns.

Figure 12. Average time-varying dynamics of the extracted 10 failure patterns.
Additionally, the concentration can be placed on one sequence to determine how each event
changeAdditionally,
the mixture theofconcentration can be placed
the failure modes. Figure on
13 one
showssequence to determine
the failure how eachofevent
mode variation change
the sample.
the mixture
At first, theof theleakage
SF6 failure modes.
and theFigure
cubicle 13failures
shows the failure amode
allocates largevariation
portion of
to the
the sample. At first, modes.
corresponding the SF6
leakage and the cubicle failures allocates a large portion to the corresponding modes.
Then, the contactor failure improves the failure pattern of the secondary system. Afterward, the Then, the contactor
failure improves
operation the failure
mechanism pattern
creates of thein
a peak secondary system.
the pattern Afterward, the
of machinery operation
parts. However,mechanism
its sharescreates
are
aquickly
peak inreplaced
the pattern of machinery parts. However, its shares are quickly replaced by the
by the failure mode of the tripping coils. This can be considered as the model’s self- failure mode of
the tripping coils. This can be considered as the model’s self-correction to distinguish
correction to distinguish failures caused by the operating mechanism itself or its preorder system. At failures caused by
the
last,operating
the remotemechanism itself orcauses
control failure its preorder system.
a portion shiftAt last,the
from thefailure
remotemode
controloffailure causes a portion
the secondary system
shift from the failure mode of the secondary
to the operation error by secondary system. system to the operation error by secondary system.
Figure 13. Time-varying dynamics of the failure patterns for an individual HVCB.
Figure 13. Time-varying dynamics of the failure patterns for an individual HVCB.
5.2.3. Similarities between Failures

5.2.3. Similarities between Failures
The similarities between different failures based on Equation (18) are shown in Figure 14. A
The similarities between different failures based on Equation (18) are shown in Figure 14. A wealth of
wealth of associations can be extracted combined with the equipment structure knowledge. In
associations can be extracted combined with the equipment structure knowledge. In general, the failures
general, the failures with high similarities can be classified into four types.
with high similarities can be classified into four types.
Energies 2017, 10, 1913 17 of 20
Energies 2017, 10, 1913 17 of 20
Figure 14. Similarity map for all the failures in the real-word dataset.
Figure 14. Similarity map for all the failures in the real-word dataset.
The first type is causal relationship, where the occurrence of one failure is caused by another.
The first the
For example, type is causal
failure relationship,
of a rejecting action where
may be thecaused
occurrence
by theofremote
one failure
controlissignal,
causedsafe-blocked
by another.
For example, the failure of a rejecting action may be caused by the remote
circuit, auxiliary switch, SF6 constituents, and humidity ovenproof which may cause blocking control signal, safe-blocked
circuit, auxiliary
according, to theswitch, SF6 constituents,
similarity map. The second and humidity
type is ovenproof which Failures
wrong logging. may cause blocking
with wrong according,
logging
to the similarity map. The second type is wrong logging. Failures with wrong
relationships often occur in a functional chain, facilitating wrong error location. The similarity logging relationships often
occur in a functional chain, facilitating wrong error location. The similarity
between electromotor stalling and relay or travel switch failures, and the similarity between between electromotor stalling
and relay orcubicle
secondary travel and
switch failures,
tripping coiland
maythebelong
similarity between
to this secondary
type. The cubicle
third type and tripping
is common cause coil may
failures.
belong to this type.
The failures The third
are caused bytype is common
similar reasons,causesuchfailures.
as theThe failures areamong
similarities caused theby similar reasons,
measurement
such as the similarities among the measurement instruments, including the
instruments, including the closing instructions, the high voltage indicating device, the operation closing instructions, the high
voltage
counters, indicating
and the gas device, the operation
pressure meter. The counters, and the gasbetween
strong association pressure the meter. The strong
secondary association
cubicle and the
between
mechanism cubicle may be caused by the deficient sealing, and a bad choice of motors assignsand
the secondary cubicle and the mechanism cubicle may be caused by the deficient sealing, high a
bad choice of motors assigns high similarity between the electromotor
similarity between the electromotor and oil pump. The fourth type is relation transmission. and oil pump. The fourth type is
relation transmission.
Similarities are built on Similarities
indirect are built on indirect
association. association.
For example, For example,bar
the transmission the has
transmission bar has
a direct connect
atodirect connect to the operation counter, and the counter shares a similar
the operation counter, and the counter shares a similar aging reason with the other measurement aging reason with the other
measurement instrument, making the transmission bar similar in number
instrument, making the transmission bar similar in number to the high voltage indicating device andto the high voltage indicating
device
the gasand the gasmeter.
pressure pressure meter. Likewise,
Likewise, the safe-blocked
the safe-blocked circuit circuit
may act mayas act
theasmedium
the medium between
between the the
air
air compressor stalling and SF6
compressor stalling and SF6 constituents. constituents.
This similarity map
This similarity mapmay mayhelp helpestablish
establish a failure
a failure look-up
look-up table
table for for
fastfast failure
failure reason
reason analysis
analysis and
and location.
location.
6. Conclusions and Future Work
6. Conclusions and Future Work
In this paper, the event logs in a power grid were considered a promising data source for the goal
In this paper, the event logs in a power grid were considered a promising data source for the
of predicting future critical events and extracting the latent failure patterns. A TLDA framework is
goal of predicting future critical events and extracting the latent failure patterns. A TLDA framework
presented as an extension of the topic model, introducing a failure pattern layer as the medium between
is presented as an extension of the topic model, introducing a failure pattern layer as the medium
the failure sequences and the failures. The conjunction relation between the multinomial distribution and
between the failure sequences and the failures. The conjunction relation between the multinomial
the Dirichlet distribution is embedded into the framework for better generalizations. Using a mixture of
distribution and the Dirichlet distribution is embedded into the framework for better generalizations.
hidden variables for a failure representation not only enables pattern mining from the sparse data but also
Using a mixture of hidden variables for a failure representation not only enables pattern mining from
enables the establishment of quantitative relationships among failures. Furthermore, a simple but effective
the sparse data but also enables the establishment of quantitative relationships among failures.
Furthermore, a simple but effective temporal new co-occurrence pattern was established to introduce
strict chronological order of events into the originally exchangeable Bayesian framework. The
Energies 2017, 10, 1913 18 of 20
temporal new co-occurrence pattern was established to introduce strict chronological order of events
into the originally exchangeable Bayesian framework. The effectiveness of the proposed method was
verified by thousands of real-word failure records of the HVCBs from both quantitative and qualitative
perspectives. The Top-1, Top-5, and Top-10 results revealed that the proposed method outperformed
the existing methods in predicting potential failures before they occurred. The parameter analysis
showed a different parameter preference for higher confidence or a wider coverage. By visualizing the
temporal structures of the failure patterns, the TLDA showed its ability to extract meaningful semantic
characteristics, providing insight into the time variation and interaction of failures.
As future work, experiments can be conducted in other application areas. Furthermore, as a
branch of the state space model, the attempt to use the trained TLDA embedding in the Recurrent
Neural Network may provide better results.
Acknowledgments: This work was supported by the National Natural Science Foundation of China (No. 51521065).
Author Contributions: Gaoyang Li and Xiaohua Wang conceived and designed the experiments; Mingzhe Rong
provided theoretical guidance and supported the study; Kang Yang contributed analysis tools; Gaoyang Li wrote
the paper; Aijun Yang revised the contents and reviewed the manuscript.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Hale, P.S.; Arno, R.G. Survey of reliability and availability information for power distribution, power generation,
and HVAC components for commercial, industrial, and utility installations. IEEE Trans. Ind. Appl. 2001, 37,
191–196. [CrossRef]
2. Lindquist, T.M.; Bertling, L.; Eriksson, R. Circuit breaker failure data and reliability modelling. IET Gener.
Transm. Distrib. 2008, 2, 813–820. [CrossRef]
3. Janssen, A.; Makareinis, D.; Solver, C.E. International Surveys on Circuit-Breaker Reliability Data for
Substation and System Studies. IEEE Trans. Power Deliv. 2014, 29, 808–814. [CrossRef]
4. Pitz, V.; Weber, T. Forecasting of circuit-breaker behaviour in high-voltage electrical power systems: Necessity
for future maintenance management. J. Intell. Robot. Syst. 2001, 31, 223–228. [CrossRef]
5. Jardine, A.K.; Lin, D.; Banjevic, D. A review on machinery diagnostics and prognostics implementing
condition-based maintenance. Mech. Syst. Signal Process. 2006, 20, 1483–1510. [CrossRef]
6. Liu, H.; Wang, Y.; Yang, Y.; Liao, R.; Geng, Y.; Zhou, L. A Failure Probability Calculation Method for Power
Equipment Based on Multi-Characteristic Parameters. Energies 2017, 10, 704. [CrossRef]
7. Peng, Y.; Dong, M.; Zuo, M.J. Current status of machine prognostics in condition-based maintenance:
A review. Int. J. Adv. Manuf. Technol. 2010, 50, 297–313. [CrossRef]
8. Rong, M.; Wang, X.; Yang, W.; Jia, S. Mechanical condition recognition of medium-voltage vacuum circuit
breaker based on mechanism dynamic features simulation and ANN. IEEE Trans. Power Deliv. 2005, 20,
1904–1909. [CrossRef]
9. Rusek, B.; Balzer, G.; Holstein, M.; Claessens, M.S. Timings of high voltage circuit-breaker. Electr. Power
Syst. Res. 2008, 78, 2011–2016. [CrossRef]
10. Natti, S.; Kezunovic, M. Assessing circuit breaker performance using condition-based data and Bayesian
approach. Electr. Power Syst. Res. 2011, 81, 1796–1804. [CrossRef]
11. Cheng, T.; Gao, W.; Liu, W.; Li, R. Evaluation method of contact erosion for high voltage SF6 circuit breakers
using dynamic contact resistance measurement. Electr. Power Syst. Res. 2017. [CrossRef]
12. Tang, J.; Jin, M.; Zeng, F.; Zhou, S.; Zhang, X.; Yang, Y.; Ma, Y. Feature Selection for Partial Discharge Severity
Assessment in Gas-Insulated Switchgear Based on Minimum Redundancy and Maximum Relevance. Energies
2017, 10, 1516. [CrossRef]
13. Gao, W.; Zhao, D.; Ding, D.; Yao, S.; Zhao, Y.; Liu, W. Investigation of frequency characteristics of typical pd
and the propagation properties in gis. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 1654–1662. [CrossRef]
14. Yang, D.; Tang, J.; Yang, X.; Li, K.; Zeng, F.; Yao, Q.; Miao, Y.; Chen, L. Correlation Characteristics Comparison
of SF6 Decomposition versus Gas Pressure under Negative DC Partial Discharge Initiated by Two Typical
Defects. Energies 2017, 10, 1085. [CrossRef]
Energies 2017, 10, 1913 19 of 20
15. Huang, N.; Fang, L.; Cai, G.; Xu, D.; Chen, H.; Nie, Y. Mechanical Fault Diagnosis of High Voltage Circuit
Breakers with Unknown Fault Type Using Hybrid Classifier Based on LMD and Time Segmentation Energy
Entropy. Entropy 2016, 18, 322. [CrossRef]
16. Wang, Z.; Jones, G.R.; Spencer, J.W.; Wang, X.; Rong, M. Spectroscopic On-Line Monitoring of Cu/W Contacts
Erosion in HVCBs Using Optical-Fibre Based Sensor and Chromatic Methodology. Sensors 2017, 17, 519.
[CrossRef] [PubMed]
17. Tang, J.; Zhuo, R.; Wang, D.; Wu, J.; Zhang, X. Application of SA-SVM incremental algorithm in GIS PD
pattern recognition. J. Electr. Eng. Technol. 2016, 11, 192–199. [CrossRef]
18. Liao, R.; Zheng, H.; Grzybowski, S.; Yang, L.; Zhang, Y.; Liao, Y. An integrated decision-making model for
condition assessment of power transformers using fuzzy approach and evidential reasoning. IEEE Trans.
Power Deliv. 2011, 26, 1111–1118. [CrossRef]
19. Jiang, T.; Li, J.; Zheng, Y.; Sun, C. Improved bagging algorithm for pattern recognition in UHF signals of
partial discharges. Energies 2011, 4, 1087–1101. [CrossRef]
20. Mazza, G.; Michaca, R. The first international enquiry on circuit-breaker failures and defects in service.
Electra 1981, 79, 21–91.
21. International Conference on Large High Voltage Electric Systems; Study Committee 13 (Switching
Equipment); Working Group 06 (Reliability of HV circuit breakers). Final Report of the Second International
Enquiry on High Voltage Circuit-Breaker Failures and Defects in Service; CIGRE: Paris, France, 1994.
22. Ejnar, S.C.; Antonio, C.; Manuel, C.; Hiroshi, F.; Wolfgang, G.; Antoni, H.; Dagmar, K.; Johan, K.; Mathias, K.;
Dirk, M. Final Report of the 2004–2007 International Enquiry on Reliability of High Voltage Equipment;
Part 2—Reliability of High Voltage SFCircuit Breakers. Electra 2012, 16, 49–53.
23. Boudreau, J.F.; Poirier, S. End-of-life assessment of electric power equipment allowing for non-constant
hazard rate—Application to circuit breakers. Int. J. Electr. Power Energy Syst. 2014, 62, 556–561. [CrossRef]
24. Salfner, F.; Lenk, M.; Malek, M. A survey of online failure prediction methods. ACM Comput. Surv. 2010, 42.
[CrossRef]
25. Fu, X.; Ren, R.; Zhan, J.; Zhou, W.; Jia, Z.; Lu, G. LogMaster: Mining Event Correlations in Logs of Large-Scale
Cluster Systems. In Proceedings of the 2012 IEEE 31st Symposium on Reliable Distributed Systems (SRDS),
Irvine, CA, USA, 8–11 October 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 71–80.
26. Gainaru, A.; Cappello, F.; Fullop, J.; Trausan-Matu, S.; Kramer, W. Adaptive event prediction strategy with
dynamic time window for large-scale hpc systems. In Proceedings of the Managing Large-Scale Systems
via the Analysis of System Logs and the Application of Machine Learning Techniques, Cascais, Portugal,
23–26 October 2011; ACM: New York, NY, USA, 2011; p. 4.
27. Wang, F.; Lee, N.; Hu, J.; Sun, J.; Ebadollahi, S.; Laine, A.F. A framework for mining signatures from event
sequences and its applications in healthcare data. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 272–285.
[CrossRef] [PubMed]
28. Macfadyen, L.P.; Dawson, S. Mining LMS data to develop an “early warning system” for educators: A proof
of concept. Comput. Educ. 2010, 54, 588–599. [CrossRef]
29. Agrawal, R.; Imieliński, T.; Swami, A. Mining association rules between sets of items in large databases.
In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington,
DC, USA, 25–28 May 1993; ACM: New York, NY, USA, 1993; pp. 207–216.
30. Li, Z.; Zhou, S.; Choubey, S.; Sievenpiper, C. Failure event prediction using the Cox proportional hazard
model driven by frequent failure signatures. IIE Trans. 2007, 39, 303–315. [CrossRef]
31. Fronza, I.; Sillitti, A.; Succi, G.; Terho, M.; Vlasenko, J. Failure prediction based on log files using random
indexing and support vector machines. J. Syst. Softw. 2013, 86, 2–11. [CrossRef]
32. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space.
Comput.Sci. 2013.
33. Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022.
34. Guo, C.; Li, G.; Zhang, H.; Ju, X.; Zhang, Y.; Wang, X. Defect distribution prognosis of high voltage circuit
breakers with enhanced latent Dirichlet allocation. In Proceedings of the Prognostics and System Health
Management Conference (PHM-Harbin), Harbin, China, 9–12 July 2017; IEEE: Piscataway, NJ, USA, 2017;
pp. 1–7.
35. Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. Adv. Neural Inf.
Process. Syst. 2014, 4, 3104–3112.
Energies 2017, 10, 1913 20 of 20
36. Anderson, C. The Long Tail: Why the Future of Business Is Selling Less of More; Hachette Books: New York, NY,
USA, 2006.
37. Pinoli, P.; Chicco, D.; Masseroli, M. Latent Dirichlet allocation based on Gibbs sampling for gene function
prediction. In Proceedings of the 2014 IEEE Conference on Computational Intelligence in Bioinformatics and
Computational Biology, Honolulu, HI, USA, 21–24 May 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1–8.
38. Wang, X.; Grimson, E. Spatial Latent Dirichlet Allocation. In Proceedings of the Conference on Neural
Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; pp. 1577–1584.
39. Maskeri, G.; Sarkar, S.; Heafield, K. Mining business topics in source code using latent dirichlet allocation.
In Proceedings of the India Software Engineering Conference, Hyderabad, India, 9–12 February 2008;
pp. 113–120.
40. Golub, G.H.; Van Loan, C.F. Matrix Computations; Johns Hopkins University Press: Baltimore, MD, USA,
1983; pp. 392–396.
41. Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42,
30–37. [CrossRef]
42. Blei, D.M.; Lafferty, J.D. Dynamic topic models. In Proceedings of the 23rd International Conference on
Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; ACM: New York, NY, USA, 2006; pp. 113–120.
43. Du, L.; Buntine, W.; Jin, H.; Chen, C. Sequential latent Dirichlet allocation. Knowl. Inf. Syst. 2012, 31, 475–503.
[CrossRef]
44. Herlocker, J.L.; Konstan, J.A.; Terveen, L.G.; Riedl, J.T. Evaluating collaborative filtering recommender
systems. ACM Trans. Inf. Syst. 2004, 22, 5–53. [CrossRef]
45. Wei, X.; Croft, W.B. LDA-based document models for ad-hoc retrieval. In Proceedings of the 29th Annual
International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA,
USA, 6–10 August 2006; ACM: New York, NY, USA, 2006; pp. 178–185.
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Energies 10 01913 v2

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Energies 10 01913 v2

Caricato da

Copyright:

Formati disponibili

energies

Received: 15 October 2017; Accepted: 15 November 2017; Published: 20 November 2017

Energies 2017, 10, 1913; doi:10.3390/en10111913 www.mdpi.com/journal/energies

2.1. Description of the HVCB Event Logs

Table 1. Attributes of the failure logs.

2.2. Failure Classification

Table 2. A typical manual log entry sample.

Failure Description Failure Reason Processing Measures

Due to progress in deep learning technology, establishing an end-to-end text categorization

2.3. Sequence Aligning and Spatial Compression

Figure 2. A graphical illustration of a failure sequence.

Figure 4. Figure 4. Graphical

3.1.2. Latent Layer

Figure 5.Figure 5. Schematic

where m ∈ {1, 2, 3, , M }, and j ∈ {1, 2, 3, , Nm }. Nm is the failure number in sequence m, and M is

3.2. Introducing the Temporal Association into LDA

4.1. Quantitative Criteria

4.1.1. Top-N Recall

4.1.2. Overlapping Probability

4.2. Qualitative Criteria

Energies 2017, 10, 1913 12 of 20

Figure 9. Schematic diagram of the cosine distance with two dimensions.

FigureFigure 10. Performancecomparison

Table 3. Optimal parameters for different prediction tasks.

5.1.2. Comparison with Baselines

Table 4. Performance comparison with different methods.

Method Top-1 (%) Top-5 (%) Top-10 (%) Po (%)

5.2. Qualitative Analysis

5.2.1. Failure Patterns Extraction

Table 5. Top failures in each failure pattern.

1. Operation Error by 2. Operation Error by 3. Operation Error by 4. Cubicles and Its

5.2.2. Temporal Features of the Failure Patterns

Figure 12. Average time-varying dynamics of the extracted 10 failure patterns.

5.2.3. Similarities between Failures

Potrebbero piacerti anche