Sei sulla pagina 1di 4

2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics

Multiple Cycles of Time Series Anomaly Detection Algorithm Based on Wavelet


Analysis

Danbo Chen Xiaofeng Zhou


College of Computer and Information College of Computer and Information
Hohai University Hohai University
Nanjing, china Nanjing, china
e-mail: via_chan@126.com e-mail: zhouxf@hhu.edu.cn

Abstract—In view of the hydrological time series data with The main article does an anomaly analysis of the
both trends, jumping, and the cycle characteristics of the hydrological time series data based on wavelet, and it can
certainty together with randomness of the unique features, this better reveal the characteristics of the changeable period
paper comes up with wavelet analysis to analyze the main cycle hidden in time series data. According to the reasons above,
and hidden cycle, then through the sliding window method to firstly, we do the cycle detection with wavelet analysis;
predict data based on each period for further testing. And secondly, we use the sliding window method based on the
verify this method with instance data. The experimental results prediction for further detection in multi-cycle. To take
show that multiple cycles of time series anomaly detection advantage of the multi-cycle characteristics of hydrological
algorithm based on wavelet analysis can effectively complete
data, this paper does the anomaly detection targeted to
the anomaly detection of hydrological time series data.
(Abstract)
hydrological time series data.
II. ANOMALY DETECTION BASED ON WAVELET ANALYSIS
Keywords-hydrologic time series; Period; sliding window
IN MULTI PERIODIC TIME SERIESE
method; wavelet analysis; anomaly detection (key words)
A. The theory of wavelet analysis [9-10]
I. RESEARCH STATUS:
1)Wavelet Function: The basic theory of wavelet analysis
Hydrological phenomena are time-varying phenomena, is a cluster of wavelets to represent or approximate a signal
this process is called hydrological process. The hydrological or a function. Therefore, the wavelet function is the key of
process is affected by the certainty factors and many wavelet analysis. It is a class of shock functions,which can
stochastic factors [1]. The hydrological time series data quickly decay to zero, that is the wavelet function
mainly includes determine and random compositions. The
purpose of anomaly detection of hydrological time series
ψ (t) ∈ L2 (R) meeting the:
+∞
data is to further improve the quality of hydrological data
acquisition. ³−∞
ψ (t)dt = 0 (1)

Abnormal data refer to data deviated from most of the In the formula, ψ (t) is the wavelet basic function, it can
data, which makes people suspect that rather than random be functions through the scale expansion and the time axis
deviation these data are generated from different shifting:
mechanisms. According to the forms and the research aim of −1/ 2 t−b
abnormal, outlier detection in time series can be divided into ψ a,b (t) = a ψ ( ) Among them a,b ∈ R,a ≠ 0 (2)
point anomaly detection, sub-sequence anomaly detection, a
anomaly intrusion detection and sequence abnormalities. In the formula, ψ a,b (t) is the basic wavelet; a is the
At present, there are mainly algorithms about the time scale factor, reflecting the cycle length of wavelet; b is the
series anomaly detection as follows: window based method translation factor, reaction time on translation.
[2], clustering based method [3], distance based method [4], 2)Wavelet transform: If ψ a,b (t) is the basic wavelet given
density based method [5], method based on support vector
machine [6] and the method based on wavelet [7]. The above by (2), for a given finite energy signal f (t) ∈ L2 (R) , the
methods applied to anomaly detection in time series data are continuous wavelet transform (Continue Wavelet Transform,
largely for single cycle detection. For hydrological time abbreviated as CWT)is:
series data, due to the earth revolving around the sun, the t−b
Wf (a, b) = a ³ f(t)ψ (
-1/ 2
rotation of the earth, together with the influence of geology, )dt (3)
geography and human activities, have unique characteristics
R a
and cycles [8]. Therefore, single cycle anomaly detection of In the formula, Wf (a, b) is the wavelet coefficients; f
hydrological data barely utilize the characteristics of (t) is a signal or a square integrable function; a is the
multi-period, and these methods may ignore the abnormal x−b
data due to multi-cycle characteristics. telescopic scale; b is the translation parameters; ψ ( )
a

978-1-4799-8646-0/15 $31.00 © 2015 IEEE 421


424
DOI 10.1109/IHMSC.2015.172
x−b pad = 1; dt=1;
is the ψ ( ) duplicate conjugate function. Most of the mother = ‘Morlet’;
a
time series data observed in hydrology are discrete, set // Calculate the wavelet transform
[wave, period, scale] = wavelet(data, dt, pad, mother);
function f (kΔt) , (k=1, 2,..., N; Δt for the sampling interval), // Calculate the significant level of different scales
then the discrete wavelet transform of type (3) is: power = (abs(wave)).^2 ; lag = 0.72;
N
kΔt-b [signif] = wave_signif(1.0,dt,scale,0,lag,-1,-1,mother);
Δt ¦ f(kΔt)ψ (
-1/ 2
Wf (a, b) = a ) (4) sigx = (signif)’*(ones(1,len));
k =1 a
// The full spectrum of wavelet and significant test
We can understand the basic theory of wavelet analysis sigx = power ./ sigx;
by the formula (3) or (4). That is increasing or decreasing the global_ws = var*(sum(power')/len);
scale of a to get the signal of low frequency and high dif= len - scale;
frequency information, and then to analyze the signal profile global_signif=wave_signif(var,dt,scale,1,lag,1,dif,mother);
or details, which can realize the analysis of the signal in
different time scales and spatial local features. C. The sliding window method based on prediction[11]
In practical research, the most important thing is to get After wavelet analysis, we can draw the main period and
the wavelet coefficients by wavelet transform equation, and some hidden period of hydrological time series data.
then analyze the time-frequency of time series through the Compared with single cycle anomaly detection, the detection
coefficient of variation characteristics. results are more comprehensive and accurate for the addition
3) Wavelet variance: The square value of the wavelet of multi-period considerations.
coefficients integral in the b domain, we can get the wavelet 1)The principle of sliding window method based on
variance, i.e. Prediction: Define the neighbor window ηi( k ) nearest to the
∞ 2
Var(a) = ³ Wf (a,b) db (5) point di ; establish the one-step-ahead model , and make the
−∞
The changing process of wavelet variance with the scale observation data as observation set of ηi( k ) to be the input
of a is called wavelet variance diagram. As formula (5) parameters of di observations vi' ; calculate the confidence
shows, it can reflect the fluctuation of the signal energy interval of di corresponding predicted vi' ,i.e. vi' ± τ
distribution with scale a. Therefore, the wavelet variance
;thereinto, threshold τ can be calculated with the window
graph can be used to determine the relative intensity and the
width k and predict the confidence level of p; When
main signal in different time scales, i.e. the main period is.
accessing to ν i which is the actual measurement data of di ,
B. Wavelet cycle analysis of hydrological time series data compare ν i with the predictive value vi' ; If the ν i is in the vi'
wavelet cycle
± τ interval, judging di anomaly, otherwise normal. Let the
For the specific analysis of hydrological time series data,
wavelet analysis can accurately solve practical problems. sliding window back one step, make di be instead of di − 2 k at
The results of wavelet analysis also provide accurate data the nodes and update the ηi( k ) , to determine the next node,
basis for anomaly detection in multiple cycle. The method until all the nodes completed.
can detect the abnormal missing data due to the neglect of 2)Pediction model and abnormal points judgment of
the abnormal cycle characteristics of hydrological time series sliding window method based on Prediction: Define the
data on the basis of single cycle. input values for sliding window method based on the
1)The choice of wavelet function: In the use of wavelet prediction as ηi( k ) = { d1 , d 2 ,,..., di }, then:
analysis theory to solve the detection of hydrological time
series period, we usually choose continuous complex Morlet di +1 = M (ηi( k ) ) (6)
wavelet transform to do the analysis. The M () is the prediction model, postulating that the
2) The map drawing of wavelet coefficients, wavelet observed value is the precursor of a linear combination of the
variance diagram and main cycle change trend: After adjacent window at time t:
2k 2k
selecting the appropriate basic wavelet function, the next
step is to obtain the wavelet coefficients with the wavelet vi = (¦ ( wt − i vt − i )) / (¦ wt − i ) (7)
i =1 i =1
transform. And then draw the wavelet coefficients, wavelet
variance diagram and main cycle changes with matlab. Among them, wt − 2 k , wt − 2 k +1 ,…, wt −1 is adjacent
3)The pseudo code of wavelet analysis: windows node weight vector, the closer the distance between
Input: Formatted time series data nodes, the greater the weight. To simplify the calculation,
Output: Time series period generally assigned weight vector < 1, 2,..., 2k >.
// Data preprocessing Set adjacent window for the test point as input
var=std(data)^2 parameters, according to the prediction model, we can
data=(data - mean(data))/sqrt(var) ; calculate the test prediction point and the confidence
len=length(data); intervals based on predicted values. Forecasting confidence
// Set parameter of wavelet function interval values give observations to the possible values to be

425
422
detected. Confidence coefficient p=100(1-¢) indicates the Firstly, inspect the period of the data; secondly,
expected frequency of actual measuredd values in the according to the periodicity which h has been detected, we
confidence interval values. If the assumedd model residuals detected anomaly outlier data in eaach periodic time series;
with zero mean Gaussian distribution, then the p % finally, neaten all the abnormal points. Before the anomaly
confidence threshold of ´ and confidencee interval can be detection, we need to analyze the period based on wavelet
calculated as follows: analysis. According to the method above,
a this paper uses the
τ = ta / 2,2 k −1 * s 1 + 1 / (2k ) (8) method of wavelet variance to deco ompose the experimental
data by using the Morlet wavelet fu unction as basic wavelet.
PCI = vi +1 ± τ (9) The experiment mainly draws the wavelet
w coefficients of the
Among them, v i +1 is the forecast values calculated by the real contour map, wavelet power spectrum and the power
test point which the sliding window adjacennt to, ta / 2,2 k −1 is the spectrum of the significant test.
percentile which is subjected to 2k-1 degreees of freedom of Due to the actual hydrologicaal data for collection and
the student's t- distribution, s is model rresidual standard other force majeure reasons, theree must be some missing
deviation, k is the size of the sliding window
w. If the observed data. For not affecting the experiimental verification, this
value is within the confidence interval oof the predicted paper uses Lagrange interpolation method
m to interpolate the
values, it will be the normal data, otherwisse abnormal data. missing data to ensure data inteegrity. Finally, we can
Sliding window updates one point every tim me. minimize the impact by the defect data of the experiment.
3)Anomaly detection algorithm based on multi-cycle According to the results of wavvelet analysis in Figure 1,
prediction based on sliding window: This paper presents a on the left side is the wavelet coeefficients of real contour
wavelet cycle analysis method, deecomposes the map. The abscissa is time / hour, thee ordinate is the cycle, by
hydrological time series data, obtains a nuumber of hidden cycle function extraction, and the fuunction has multiple time
multiple cycles of hydrological time seeries, then in a scale characteristics. In simple termms, there are 9128 hours
plurality of cycles we use the sliding windoow method based (about 380 days) and 4120 hours (about( 171 days) period,
on prediction to detect outliers in time seriess data. and the two cycles are flat on the tim
meline. The right graph is
Each cycle is divided into two parts aft fter wavelet cycle the wavelet power spectrum and power spectrum
analysis: one is using the cycle length of waavelet analysis to significance test. The dotted line iss a significant test of the
split the time series; the time series is divided into power spectrum, from the position of o the dotted line and the
sub-sequence which have equal length; the other, in the first
power spectrum curve relationship, when the wavelet power
cycle, we use sliding window method basedd on prediction to
spectrum curve is higher than the siggnificance test, indicating
detect the outlier data in every cycle accordiing to the change
rule. When abnormal factor is greater than the threshold that the wavelet power spectrum cu urve of the corresponding
value, it is determined to be abnormal, becauuse the first cycle cycle characteristics achieved the significance test standard.
should modify the abnormal points based on the predicted From Figure 1, we can clearly seee that the wavelet power
value of the prediction of sliding windoow method, the spectrums of the two extreme vallue points are all in the
adjusted cycle can be referenced later. In addition to other significance test standard. As it show
ws in Figure 1 accessibly,
cycle rather than the first cycle, the same m
method can get the the cycle of 380 days is the primary y period, and the cycle of
same forecast predicted values for differeent points of the 171 days is the hidden period. Th he cycle of 380 days is
cycle. Finally, compared with the observattion value of the roughly equivalent to a natural year,, the hidden period can be
first cycle, we can clearly detect the abnormaal object. thought as unique hydrologic period d due to various reasons,
In the field of hydrology, compared w with the classical which is easily ignored by other methods. So the cycle
anomaly detection algorithm, according to the unique cycle detection can discover the cycle facttors in anomaly detection
characteristics of the large amount of data and many other which is not easily discovered by classical methods. Thus,
factors, multiple cycle prediction based onn sliding window multiple cycles of time series anom maly detection algorithm
can be more comprehensive and accurate. based on wavelet analysis can ensurre the comprehensiveness
III. EXPERIMENTAL ANALY
YSIS
and accuracy of anomaly detection.

A. Experiment 1: cycle analysis of hydrologgical time series


data:
Select daily water level from a hydrologgical station to do
the analysis. Because the water level data have the typical
characteristics of hydrological time series ddata. The data is
affected by earth revolution, tides, geeographical and
geological factors; it has a unique cyclle characteristics
besides the date. It can detect cycle by ussing this method, Figure 1. Wavelet periodic analysis diagram
and apply it to the specific abnormal detectioon to compensate
for the neglect of anomaly detection bby single cycle
detection.

426
423
B. Experiment 2: Anomaly detection basedd on wavelet
analysis in multi-periodic time series:
According to the steps of the method abbove, we need to
detect abnormal respectively in the two cycles. Firstly,
sliding window is used to separate data in a cycle, and we
can give the confidence interval through thhe method based
on prediction and threshold. We define tthe value of the
Figure 3. Hidden cycle of anomaly
a detection
window size as 12 under the main cycle off 380 days. Then
each of the 380 days’ cycle data can get m m-a+1=380-12+1=
369 window sequences; the sequences of w window S = {s1, ACKNOWLEDGM
MENT
s2,..., s369}. Define the threshold value off abnormal factor
λ as 0.8, the result is shown in Figure 2. U Under the period This research is supported by thee following program:
of 171 days, define the window size as 12, each cycle data • The National Key Tech hnology Research and
can get 171-12+1=160 window sequences, namely window Development Program of the Ministry of Science
sequences S={s1,s2,...,s160}, the thresshold value of and Technology of Ch hina under Grant No.
abnormal factor λ as 0.144, the result is shown in Figure 2013BAB05B01.
3.Figure in the red is where the exception obbject located at. • The National Key Tech hnology Research and
Development Program of the Ministry of Science
IV. SUMMARY AND ANALY
YSIS and Technology of Ch hina under Grant No.
According to the Figure 2, 3, the two cyycles of anomaly 2013BAB06B04.
detection data are different, the anomaly daata from cycle of • The Technology Program of o China Huaneng Group
171 days are more apparent. Because, this paper selects the Headquarters under Grant No.
N HNKJ13-H17.
data only during 7 years, there are more cycles of 171 days,
REFERENCE
ES
so its characteristics show more obvious. It is more efficient
to do the anomaly detection aimed at this peeriod. The period
of 380 days is not so obvious in anomaly dettection due to the [1] Yanfang Sang, Zhonggeng Wang, and d Changming Liu, “Progress in
amount of cycles being not large enough. H However, we can the analysis of hydrological time seriess,” Progress In Geography, vol.
32(1), pp.20-30, January 2013. (In Chin nese)
see that the abnormal point data of the two ccycles have some
[2] Chandola V, Banerjee A, and Kum mar V, “Anomaly detection: a
overlap, but large parts of anomaly data cann be detected only survey,” ACM Computing Surveys (CS SUR), vol. 41(3):, pp. 15, 2009.
in a specific cycle. This fully embodies the advantage of the (references)
method this paper presents compared witth classical time [3] Budalakoti S , Srivastava A and Ak kellar, “Anomaly detection in
series data anomaly detection methods. The method can fully large sets of high-dimensional symbol sequences,” NASA
consider that the hydrological data are inflluenced by many TM-2006-214553 [R].Moffett Field d: NASA Ames Research
factors, and there are multiple cycle characcteristics with the Center,2006㧚
data which other methods easily ignored. M Moreover, taking [4] Xiaoxu He, “Study on some key probleems of time series data mining.
”University of Science & Technology China,
C 2014. (In Chinese)
the advantage of hiding cycle charaacteristics from
hydrological data, we can do the anomaaly detection of [5] Anrong Xue, Shiguang Wang, and Weihua He, “Study on local
outlier mining algorithm,” Chinese Jou urnal of Computers, vol. 30(8),
hydrological time series data more compprehensively and pp. 1455 㧙1463, 2007. (In Chinese)
more accurately. [6] Zhao Zhang, Runlian Zhang, Xiaoge Jiang,and
J Bing Zeng, “Method
According to the hydrological time seriees data, this paper of anomaly detection and feature seleection based on support vector
explores the characteristics which the comm mon single cycle machine,” Computer Engineering, vol. v 09, pp. 3046-3049+3162,
anomaly detection method easily ignoored. The most 2013. (In Chinese)
important advantage of multiple cycles of time series [7] Weimin Tong, Yijun Li, andYongzh heng Shan, “Time series data
anomaly detection algorithm based on waveelet analysis is to mining based on wavelet analysis,” Co omputer Engineering & Design,
vol. 01, pp. 26-29, 2008. (In Chinese)
make up the negligence and omissions w which are easily
[8] Lihong Zhao, “Study on analysis metthod of period of hydrological
ignored by the anomaly detection algorithm for single cycle. time series”Hohai University, 2007. (In
(I Chinese)
[9] Yanfang Sang, Zhonggeng Wang, and d Changming Liu, “Analysis of
current situation and Prospect of app plication of wavelet method in
Hydrology Research,” Progress In n Geography, vol. 09, pp.
1413-1422, 2013. (In Chinese)
[10] Xuepeng Zhuang, “Outlier detectio on in time series based on
Wavelet”1DQMLQJ University, 20. (In ( Chinese)
[11] <Xfeng Yu, Yuelong Zhu, Dingshen ng Wan and Xinzhong Guan,
“Anomaly detection of hydrological time series prediction based on
sliding window.” Progress In Geography, vol. 08, pp.
Figure 2. The main cycle anomaly detection aanalysis diagram 2217-2220+2226, 2014. (In Chinese)

427
424

Potrebbero piacerti anche