Sei sulla pagina 1di 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/259384074

Fault Diagnosis based on DPCA and CA (full paper)

Data · December 2013

CITATIONS READS

0 67

5 authors, including:

Ruben Morales-Menendez Ricardo A. Ramírez-Mendoza


Tecnológico de Monterrey Tecnológico de Monterrey
258 PUBLICATIONS   731 CITATIONS    310 PUBLICATIONS   645 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Robotics: Parallel robots and other electromechanical applications View project

Modelling and Control of Dynamical Systems in Synthetic Biology and Supercritical Fluid Extraction View project

All content following this page was uploaded by Ricardo A. Ramírez-Mendoza on 20 December 2013.

The user has requested enhancement of the downloaded file.


I.A. Karimi and Rajagopalan Srinivasan (Editors), Proceedings of the 11th International
Symposium on Process Systems Engineering, 15-19 July 2012, Singapore
⃝c 2012 Elsevier B.V. All rights reserved.

Fault Diagnosis based on DPCA and CA


Celina Rea, Ruben Morales-Menendez ∗, Juan C. Tudón Martínez,
Ricardo A. Ramírez Mendoza, Luis E. Garza Castañon
Tecnológico de Monterrey, Campus Monterrey, Av. Eugenio Garza Sada 2501, Col. Tec-
nológico, 64,849 Monterrey NL, México

Abstract
A comparison of two fault detection methods based in process history data is presented.
The selected methods are Dynamic Principal Component Analysis (DPCA) and Corre-
spondence Analysis (CA). The study is validated with experimental databases taken from
an industrial process. The performance of methods is compared using the Receiver Op-
erating Characteristics (ROC) graph with respect to several tuning parameters. The diag-
nosis step for both methods was implemented through Contribution Plots. The effects of
each parameter are discussed and some guidelines for using these methods are proposed.

1. Motivation
Industrial process have grown in integration and complexity. Monitoring only by humans
is risky and sometimes impossible. Faults are always present, early Fault Detection and
Isolation (FDI) systems can help operators to avoid abnormal event progression. DPCA
and CA are two techniques based on statistical models coming from experimental data that
can be used for fault diagnosis, Detroja et al. (2006b). These approaches are well known
in some domains; but, there are several questions in the fault diagnosis. A ROC graph is a
technique for visualizing, organizing and selecting classifiers based on their performance.
ROC graph has been extended for use in diagnostic systems. In the published research
works, the number of data for model’s learning the model, sampling rate, number of
principal components/axes, thresholds have not been studied under same experimental
databases.

2. Fundamentals
A brief comparative review of both DPCA and CA approaches is presented focus in mod-
elling, detection and diagnosis.
Modeling for both DPCA and CA. Both methods need a statistical model of the pro-
cess under normal operating conditions. The data set need be scaled to zero mean and
unit variance. CA requires nt observations for p variables having a form of X (t) =
[X1 (t) . . . X p (t) ](nt ×p) ; while DPCA additionally includes some past observations (i.e. w-
time delay) X (t) = [X1 (t) . . . X1 (t − w) . . . X p (t) . . . X p (t − w)](nt ×(p ·[w+1])) . Based on
X (t) matrix, two subspaces are built. The Principal Subspace captures major faults in
the process, and the Residual Subspace considers minor faults and correlation rupture.
A SCREE test, determines the number of principal and residual components (axes). By
plotting the eigenvalues (singular values) of X (t) for DPCA (CA), the principal compo-
nents (axes) are the first k components (axes) before the inflection point is located, while
∗ rmm@itesm.mx
2 Celina Rea et. al.

the remainders are the residuals components (axes). Even DPCA and CA models cannot
be compared. The representation given by CA would appear to be better able to capture
inter-relationships between variables and samples, Detroja et al. (2006a). Fig. 1 (left)
summarizes the model’s learning for both methods.

Process Process
...

...

...
...
Monitoring the process variables
under normal operating conditions
Monitoring the process Normalize the experimental data
variables under normal
operating conditions Compute T Hotelling statistic
Q and No
are off-line
computed Yes
Compute Q statistic

No

A fault is detected

Figure 1. Model’s learning algorithm (left) and online detection (right) for DPCA & CA.

Detection for DPCA. The variables must be normalized, XS . Based on the correlation
matrix R, the eigenvalues λi and eigenvectors Vi are obtained. The eigenvalues must be
organized in decreasing order. The eigenvectors form two matrices: V[1,k] which is the
principal component transformation and V[k+1,p·[w+1]] , known as residual transformation
matrix. The projections to the subspaces are based on a linear transformation with no
correlation between them. Mapping from multivariate approach to a scalar demands two
statistics: T 2 Hotelling and Q statistics. The T 2 measures the deviation of variables in a
data set from their mean values. Hotelling T 2 chart is based on the concept of statistical
distance and it monitors the change in the mean vector T 2 statistic is obtained by TX2 S ; a
i
normal operating point can be established as Tα2 = (nt − 1)Fα (k, nt − k)/(knt − k) Fα -
distribution with k and nt − k degrees of freedom, α = 0.95. If T 2 > Tα2 a fault is
detected. Q statistic will detect changes on the residual directions. The Q statistic is
obtained as QXS = XSi GXSTi . The threshold for normal operating conditions is given by
i
[ √ ]1
Qα = θ1 hoCα 2θ2 /θ1 + 1 + θ2 h0 (h0 − 1)θ12 h0 If QXS >Qα a fault is detected.
i
−1/2 −1/2
Detection for CA. CA is an optimization problem with P = Dr [(1/g)X −rcT ]Dc
and A Dµ B = SV D(P). For choosing the principal and residual axes, a SCREE plot
T

can be made with the singular values obtained in Dµ . The Greenacre’s criteria was used
for selection of k principal axis. The coordinates of the row and column profile points for
the new principal axis can be computed by projecting on A and B, with only the first k
−1/2 −1/2
columns retained, with F = Dr A Dµ and G = Dc BDµT . Matrix F gives the new
row coordinates for the row cloud. Using a new measurement vector X = [X1 . . . Xm ]T ,
the row sum is given by r = ∑i=1 p
Xi and the new row score is f = [r−1 xT G Dµ−1 ]T . The
T 2 statistic for CA is defined as Ti2 = f T Dµ−2 f , where Dµ contains the first k-largest
singular values. The threshold is computed through Tα2 . For residual axes, Q statistic
Fault Diagnosis based on DPCA and CA 3

allows to detect any significant deviation. Considering Qi = ResT Res, and the control
limit as Qα = µQ ±Cα ∗ σQ where Cα is a confidence limit of 95% according to its normal
deviation N (µQ , σQ ). Fig. 1 (right) shows the scheme for online detection.
Diagnosis for both DPCA and CA. The T 2 statistic gives a variable which is calculated
by using all variables; it shows the changes that occurred in all variables. The T 2 chart
does not give information on which variable or variables is faulty. Miller et al. (1998)
introduced a method for determining the contribution of each of the p variables in the T 2
computation. Scores of the PC are used for monitoring in T 2 chart; upon a faulty signal,
first the contribution of each variable in the normalized scores is computed. Then total
contributions of each variable are determined and plotted. The plot that shows the contri-
bution of each variable in T 2 chart at time k is called the variable contribution plot, Kose-
balaban and Cinar (2001). When T 2 value at time k is above the upper control limit, the
variable contribution are calculated and plotted to diagnose the variable(s) that caused the
fault alarm in multivariate T 2 chart. The variables with higher contribution, are isolated
as the most probable fault: Conti = Zi2 / ∑nj=1t
Z 2j where Z = RVr is a generated residual
for DPCA. CA method cannot diagnose the fault neither; for this purpose, a contribution
plot base on the residuals of the Q statistic are used, Z = Res. The contribution plots can
be plotted over time when a faulty signal alarms, the change in contribution of process
variables gives more information about the root cause of the faulty condition at that time
period.

3. Experimental Setup
An industrial Heat Exchanger (HE) was the test-bed for this research. The HE uses steam
vapor for heating water. The operating point is 70% of steam vapor flow (FT2 ), 38%
of water flow (FT1 ), and the input water temperature (T T1 ) was at 23o C, which give an
outlet water temperature (T T2 ) of 34o C. More than 20 experimental tests were done. For
each sensor an abrupt fault as a bias of { ±5, ±6, ±8 } σ of the signal were implemented
Prakash et al. (2002). Also, the number of data were 200, 500,..., 5000, and the sampling
rate: 1,2,...,10.

4. Analysis of the Tuning Parameters.


Number of Principal Components (PC)/axes. Data compression is an important aspect
of multivariate statistical tools. SCREE plot indicates the percentage of accumulated in-
formation variation versus the components/axes. The components/axes, when the slope
of the plot does not change, defines the number of PC/axes. Figure 2 (A) shows the per-
centage accumulated variation information versus the number of components for DPCA,
and versus number of axes for CA is shown in Fig. 2 (B). Each plot includes databases
with 100, 200, 500 and 1,000 data vectors. For DPCA, 5 PC describe 92-94 % of the
information; 6 or more components do not contribute significantly. The number of data
vector of process variables do not impact in the curve behavior. For CA the number of
data affects the curve: 100-200 data have lower percentage accumulated variation than
500-1000 data vectors for 1 or 2 axes. After 3 axes there is not difference. According to
Greenacre criteria, Greenacre and Blasius (1995), three axes are recommended.
Thresholds. A False Negative (FN) is when there is a fault, but it is not detected; while
a False Positive (FP) is when there is no fault, but one is detected. True Positive (TP) is
when there is a fault, and it is detected; while, True Negative (TN) is when there is no
4 Celina Rea et. al.

2
T Hotelling statistic
110 2
A B C
100 A DPCA 1000

Accumulated variation (%)


100 CA 1000
1
Accumulated Variation (%)

Detection probability (%)


DPCA 500
CA 500
80 90 0 −30%
0 0.05 0.1 −20%
80 CA −10%
200 0%
60 Q statistic
1.1 10%
70
DPCA 20%
200
30%
40 CA
60 100 Number 1
DPCA Number of of axes
100
Components 50
20 0 1 2 3 4 5 0.9
0 2 4 5 6 8 0 0.05 0.1

2 False alarm probability (%) False Alarm Probability (%) F


T Hotelling statistic D E
2
5
Q 2
DPCA T CA
Detection Probability (%)

1
4 4
−30% T2
0 DPCA
0 0.05 0.1 −20% 3

Frequency
Frequency

−10% 3
0%
2
Q statistic 10%
20% 2
1 30% 1
Q
0.5 1 2
out of T DPCA
0 DPCA
control T2
0 CA
0
0 0.2 0.4 0.6 500 1000 1500 2000
1 2 5 8 10
Sampling rate (=) secs Amount of data for training

Figure 2. Average variation accumulated versus components for DPCA (A), and versus
number of axes for CA (B). ROC curves for outlet water temperature (T T2 ) transmitter
using DPCA (C), and for outlet water temperature (T T2 ) transmitter CA (D). Frequency
of errors for different sampling time (E) and for different number of training data (F).

fault and no fault is detected. The sum of TP and FN are the total faulty cases (TFC)
and the sum of TN and FP are total healthy cases (THC). The probability of detection
Pd = TTFCP
while the probability of false alarms Pf a = TFP
HC . For every condition, when
[(Pd > 0.9) and (Pf a < 0.1)], an optimum is achieved by having a good compromise be-
tween right detection and minimum false alarms. ROC curves show a relation between
opportune detection probability Pd versus false alarms probability Pf a . Table 1 summa-
rizes the performance of [(Pd > 0.9) and (Pf a < 0.1)] criteria for faults sensors. DPCA
shows successful results; while, CA exhibits some troubles mainly with Q statistics. Fig.
2 (C and D) shows ROC curves, where the thresholds were modified in different percent-
age. Fig. 2 (C) shows that DPCA is successful for both statistics. However, Fig. 2 (D)
shows that for CA only T 2 statistics works well. Q statistics for CA, is only based on the
residuals axes, which does not capture the variation of the residual per axes, Table 1.
Sampling rate. The process variables were sampled at 1, 2, 5, 8 or 10 s. Fig. 2 (E) shows
the number of times that the [(Pd > 0.9) and (Pf a < 0.1)] criteria was violated for each
statistics for different sampling rate. There is a good performance for both methods when
sampling rate is 1 or 2 s; however, the probabilities for missing detections or increasing
false alarms grows up for sampling times greater than 5 s. The Q statistic for CA was
avoided because its low performance.
Number of Training Data. Fig. 2 (F) shows the number of times that the [(Pd > 0.9)
and (Pf a < 0.1)] criteria was violated for each statistics for the number of training data.
The number of data is a key issue for learning a statistical model. It can be seen that after
1,500 training data (25 min, 1 s sampling rate) both methods improve their performance.
Fault Diagnosis based on DPCA and CA 5

Table 1. Average performance for different sensors based on ROC curves.

Sensor Transmitter Number Tests DPCA-T 2 DPCA-Q CA-T 2 CA-Q


T T2 11 100 100 100 9.09
FT1 15 100 86.67 40 6.67
FT2 12 91.67 100 91.67 91.67
T T1 11 81.82 90.91 81.82 0

Diagnosis. There is a good performance with DPCA for the 4 faulty sensors, while CA
shows 90.9% performance when a fault occurs in (T T2 ) and 46.6% if the fault is in (FT1 ).
Contributions of 4 variables were compared to the same statistic and choose the variables
corresponding to the relatively large contributions as the possible causes for faults. Instead
of comparing the absolute contribution and the corresponding control limit, the use of the
relative contribution is more convenient way to identify faulty variables Choi and Lee
(2005).

5. Conclusions
Given experimental data a multivariate statistical model can be learned. SCREE plot and
the Greenacre criteria guide the complexity of the model based on the minimum number
of components/axes. The statistical model for CA is more sensible to the number of data
than the model for DPCA; but, defined the minimum number of components/axes, the
number of data does not have influence. The ROC graphs are a useful tool for visualizing
and evaluating fault detection algorithms. A detection probability (Pd > 0.9) and false
alarm probability (Pf a < 0.1.) is a good criteria for choosing the sampling rate, number
of data and thresholds. Sampling rate was the most important parameter. Based on ROC
graphs, DPCA outperforms CA in fault detection, because the Q statistics does not work
well. Diagnosis could be implemented in both methods through contribution plots.

References
Choi, S., Lee, I., 2005. Multiblock PLS-based Localized Process Diagnosis . J of Process Control
15 (3), 295–306.
Detroja, K., Gudi, R., Patwardhan, S., 2006a. Fault Diagnosis using Correspondence Analysis:
Implementation Issues and Analysis. In: IEEE Int Conf on Ind Tech. pp. 1374 – 1379.
Detroja, K., Gudi, R., Patwardhan, S., Roy, K., 2006b. Fault Detection and Isolation Using
Correspondence Analysis . Ind Eng Chem Res 45, 223 – 235.
Greenacre, M., Blasius, J., 1995. Correspondence Analysis in the Social Sciences. A Press.
Kosebalaban, F., Cinar, A., 2001. Integration of Multivariate SPM and FDD by Parity Space
Technique for a Food Pasteurization Process. Computers and Chemical Eng 25, 473–491.
Miller, P., Swanson, R., Heckler, C., 1998. Contribution Plots: A Missing Link in Multivariate
Quality Control . Appl. Math. and Comp. 8, 775–792.
Prakash, J., Patwardhan, S., Narasimhan, S., 2002. A Supervisory Approach to Fault-Tolerant
Control of Linear Multivariable Systems. Ind. Eng. Chem. Res. 41, 2270–2281.

View publication stats

Potrebbero piacerti anche