Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract—In view of the hydrological time series data with The main article does an anomaly analysis of the
both trends, jumping, and the cycle characteristics of the hydrological time series data based on wavelet, and it can
certainty together with randomness of the unique features, this better reveal the characteristics of the changeable period
paper comes up with wavelet analysis to analyze the main cycle hidden in time series data. According to the reasons above,
and hidden cycle, then through the sliding window method to firstly, we do the cycle detection with wavelet analysis;
predict data based on each period for further testing. And secondly, we use the sliding window method based on the
verify this method with instance data. The experimental results prediction for further detection in multi-cycle. To take
show that multiple cycles of time series anomaly detection advantage of the multi-cycle characteristics of hydrological
algorithm based on wavelet analysis can effectively complete
data, this paper does the anomaly detection targeted to
the anomaly detection of hydrological time series data.
(Abstract)
hydrological time series data.
II. ANOMALY DETECTION BASED ON WAVELET ANALYSIS
Keywords-hydrologic time series; Period; sliding window
IN MULTI PERIODIC TIME SERIESE
method; wavelet analysis; anomaly detection (key words)
A. The theory of wavelet analysis [9-10]
I. RESEARCH STATUS:
1)Wavelet Function: The basic theory of wavelet analysis
Hydrological phenomena are time-varying phenomena, is a cluster of wavelets to represent or approximate a signal
this process is called hydrological process. The hydrological or a function. Therefore, the wavelet function is the key of
process is affected by the certainty factors and many wavelet analysis. It is a class of shock functions,which can
stochastic factors [1]. The hydrological time series data quickly decay to zero, that is the wavelet function
mainly includes determine and random compositions. The
purpose of anomaly detection of hydrological time series
ψ (t) ∈ L2 (R) meeting the:
+∞
data is to further improve the quality of hydrological data
acquisition. ³−∞
ψ (t)dt = 0 (1)
Abnormal data refer to data deviated from most of the In the formula, ψ (t) is the wavelet basic function, it can
data, which makes people suspect that rather than random be functions through the scale expansion and the time axis
deviation these data are generated from different shifting:
mechanisms. According to the forms and the research aim of −1/ 2 t−b
abnormal, outlier detection in time series can be divided into ψ a,b (t) = a ψ ( ) Among them a,b ∈ R,a ≠ 0 (2)
point anomaly detection, sub-sequence anomaly detection, a
anomaly intrusion detection and sequence abnormalities. In the formula, ψ a,b (t) is the basic wavelet; a is the
At present, there are mainly algorithms about the time scale factor, reflecting the cycle length of wavelet; b is the
series anomaly detection as follows: window based method translation factor, reaction time on translation.
[2], clustering based method [3], distance based method [4], 2)Wavelet transform: If ψ a,b (t) is the basic wavelet given
density based method [5], method based on support vector
machine [6] and the method based on wavelet [7]. The above by (2), for a given finite energy signal f (t) ∈ L2 (R) , the
methods applied to anomaly detection in time series data are continuous wavelet transform (Continue Wavelet Transform,
largely for single cycle detection. For hydrological time abbreviated as CWT)is:
series data, due to the earth revolving around the sun, the t−b
Wf (a, b) = a ³ f(t)ψ (
-1/ 2
rotation of the earth, together with the influence of geology, )dt (3)
geography and human activities, have unique characteristics
R a
and cycles [8]. Therefore, single cycle anomaly detection of In the formula, Wf (a, b) is the wavelet coefficients; f
hydrological data barely utilize the characteristics of (t) is a signal or a square integrable function; a is the
multi-period, and these methods may ignore the abnormal x−b
data due to multi-cycle characteristics. telescopic scale; b is the translation parameters; ψ ( )
a
425
422
detected. Confidence coefficient p=100(1-¢) indicates the Firstly, inspect the period of the data; secondly,
expected frequency of actual measuredd values in the according to the periodicity which h has been detected, we
confidence interval values. If the assumedd model residuals detected anomaly outlier data in eaach periodic time series;
with zero mean Gaussian distribution, then the p % finally, neaten all the abnormal points. Before the anomaly
confidence threshold of ´ and confidencee interval can be detection, we need to analyze the period based on wavelet
calculated as follows: analysis. According to the method above,
a this paper uses the
τ = ta / 2,2 k −1 * s 1 + 1 / (2k ) (8) method of wavelet variance to deco ompose the experimental
data by using the Morlet wavelet fu unction as basic wavelet.
PCI = vi +1 ± τ (9) The experiment mainly draws the wavelet
w coefficients of the
Among them, v i +1 is the forecast values calculated by the real contour map, wavelet power spectrum and the power
test point which the sliding window adjacennt to, ta / 2,2 k −1 is the spectrum of the significant test.
percentile which is subjected to 2k-1 degreees of freedom of Due to the actual hydrologicaal data for collection and
the student's t- distribution, s is model rresidual standard other force majeure reasons, theree must be some missing
deviation, k is the size of the sliding window
w. If the observed data. For not affecting the experiimental verification, this
value is within the confidence interval oof the predicted paper uses Lagrange interpolation method
m to interpolate the
values, it will be the normal data, otherwisse abnormal data. missing data to ensure data inteegrity. Finally, we can
Sliding window updates one point every tim me. minimize the impact by the defect data of the experiment.
3)Anomaly detection algorithm based on multi-cycle According to the results of wavvelet analysis in Figure 1,
prediction based on sliding window: This paper presents a on the left side is the wavelet coeefficients of real contour
wavelet cycle analysis method, deecomposes the map. The abscissa is time / hour, thee ordinate is the cycle, by
hydrological time series data, obtains a nuumber of hidden cycle function extraction, and the fuunction has multiple time
multiple cycles of hydrological time seeries, then in a scale characteristics. In simple termms, there are 9128 hours
plurality of cycles we use the sliding windoow method based (about 380 days) and 4120 hours (about( 171 days) period,
on prediction to detect outliers in time seriess data. and the two cycles are flat on the tim
meline. The right graph is
Each cycle is divided into two parts aft fter wavelet cycle the wavelet power spectrum and power spectrum
analysis: one is using the cycle length of waavelet analysis to significance test. The dotted line iss a significant test of the
split the time series; the time series is divided into power spectrum, from the position of o the dotted line and the
sub-sequence which have equal length; the other, in the first
power spectrum curve relationship, when the wavelet power
cycle, we use sliding window method basedd on prediction to
spectrum curve is higher than the siggnificance test, indicating
detect the outlier data in every cycle accordiing to the change
rule. When abnormal factor is greater than the threshold that the wavelet power spectrum cu urve of the corresponding
value, it is determined to be abnormal, becauuse the first cycle cycle characteristics achieved the significance test standard.
should modify the abnormal points based on the predicted From Figure 1, we can clearly seee that the wavelet power
value of the prediction of sliding windoow method, the spectrums of the two extreme vallue points are all in the
adjusted cycle can be referenced later. In addition to other significance test standard. As it show
ws in Figure 1 accessibly,
cycle rather than the first cycle, the same m
method can get the the cycle of 380 days is the primary y period, and the cycle of
same forecast predicted values for differeent points of the 171 days is the hidden period. Th he cycle of 380 days is
cycle. Finally, compared with the observattion value of the roughly equivalent to a natural year,, the hidden period can be
first cycle, we can clearly detect the abnormaal object. thought as unique hydrologic period d due to various reasons,
In the field of hydrology, compared w with the classical which is easily ignored by other methods. So the cycle
anomaly detection algorithm, according to the unique cycle detection can discover the cycle facttors in anomaly detection
characteristics of the large amount of data and many other which is not easily discovered by classical methods. Thus,
factors, multiple cycle prediction based onn sliding window multiple cycles of time series anom maly detection algorithm
can be more comprehensive and accurate. based on wavelet analysis can ensurre the comprehensiveness
III. EXPERIMENTAL ANALY
YSIS
and accuracy of anomaly detection.
426
423
B. Experiment 2: Anomaly detection basedd on wavelet
analysis in multi-periodic time series:
According to the steps of the method abbove, we need to
detect abnormal respectively in the two cycles. Firstly,
sliding window is used to separate data in a cycle, and we
can give the confidence interval through thhe method based
on prediction and threshold. We define tthe value of the
Figure 3. Hidden cycle of anomaly
a detection
window size as 12 under the main cycle off 380 days. Then
each of the 380 days’ cycle data can get m m-a+1=380-12+1=
369 window sequences; the sequences of w window S = {s1, ACKNOWLEDGM
MENT
s2,..., s369}. Define the threshold value off abnormal factor
λ as 0.8, the result is shown in Figure 2. U Under the period This research is supported by thee following program:
of 171 days, define the window size as 12, each cycle data • The National Key Tech hnology Research and
can get 171-12+1=160 window sequences, namely window Development Program of the Ministry of Science
sequences S={s1,s2,...,s160}, the thresshold value of and Technology of Ch hina under Grant No.
abnormal factor λ as 0.144, the result is shown in Figure 2013BAB05B01.
3.Figure in the red is where the exception obbject located at. • The National Key Tech hnology Research and
Development Program of the Ministry of Science
IV. SUMMARY AND ANALY
YSIS and Technology of Ch hina under Grant No.
According to the Figure 2, 3, the two cyycles of anomaly 2013BAB06B04.
detection data are different, the anomaly daata from cycle of • The Technology Program of o China Huaneng Group
171 days are more apparent. Because, this paper selects the Headquarters under Grant No.
N HNKJ13-H17.
data only during 7 years, there are more cycles of 171 days,
REFERENCE
ES
so its characteristics show more obvious. It is more efficient
to do the anomaly detection aimed at this peeriod. The period
of 380 days is not so obvious in anomaly dettection due to the [1] Yanfang Sang, Zhonggeng Wang, and d Changming Liu, “Progress in
amount of cycles being not large enough. H However, we can the analysis of hydrological time seriess,” Progress In Geography, vol.
32(1), pp.20-30, January 2013. (In Chin nese)
see that the abnormal point data of the two ccycles have some
[2] Chandola V, Banerjee A, and Kum mar V, “Anomaly detection: a
overlap, but large parts of anomaly data cann be detected only survey,” ACM Computing Surveys (CS SUR), vol. 41(3):, pp. 15, 2009.
in a specific cycle. This fully embodies the advantage of the (references)
method this paper presents compared witth classical time [3] Budalakoti S , Srivastava A and Ak kellar, “Anomaly detection in
series data anomaly detection methods. The method can fully large sets of high-dimensional symbol sequences,” NASA
consider that the hydrological data are inflluenced by many TM-2006-214553 [R].Moffett Field d: NASA Ames Research
factors, and there are multiple cycle characcteristics with the Center,2006㧚
data which other methods easily ignored. M Moreover, taking [4] Xiaoxu He, “Study on some key probleems of time series data mining.
”University of Science & Technology China,
C 2014. (In Chinese)
the advantage of hiding cycle charaacteristics from
hydrological data, we can do the anomaaly detection of [5] Anrong Xue, Shiguang Wang, and Weihua He, “Study on local
outlier mining algorithm,” Chinese Jou urnal of Computers, vol. 30(8),
hydrological time series data more compprehensively and pp. 1455 㧙1463, 2007. (In Chinese)
more accurately. [6] Zhao Zhang, Runlian Zhang, Xiaoge Jiang,and
J Bing Zeng, “Method
According to the hydrological time seriees data, this paper of anomaly detection and feature seleection based on support vector
explores the characteristics which the comm mon single cycle machine,” Computer Engineering, vol. v 09, pp. 3046-3049+3162,
anomaly detection method easily ignoored. The most 2013. (In Chinese)
important advantage of multiple cycles of time series [7] Weimin Tong, Yijun Li, andYongzh heng Shan, “Time series data
anomaly detection algorithm based on waveelet analysis is to mining based on wavelet analysis,” Co omputer Engineering & Design,
vol. 01, pp. 26-29, 2008. (In Chinese)
make up the negligence and omissions w which are easily
[8] Lihong Zhao, “Study on analysis metthod of period of hydrological
ignored by the anomaly detection algorithm for single cycle. time series”Hohai University, 2007. (In
(I Chinese)
[9] Yanfang Sang, Zhonggeng Wang, and d Changming Liu, “Analysis of
current situation and Prospect of app plication of wavelet method in
Hydrology Research,” Progress In n Geography, vol. 09, pp.
1413-1422, 2013. (In Chinese)
[10] Xuepeng Zhuang, “Outlier detectio on in time series based on
Wavelet”1DQMLQJ University, 20. (In ( Chinese)
[11] <Xfeng Yu, Yuelong Zhu, Dingshen ng Wan and Xinzhong Guan,
“Anomaly detection of hydrological time series prediction based on
sliding window.” Progress In Geography, vol. 08, pp.
Figure 2. The main cycle anomaly detection aanalysis diagram 2217-2220+2226, 2014. (In Chinese)
427
424