Sei sulla pagina 1di 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224219397

Anomaly Detection in Network Traffic Based on Statistical Inference and


alpha-Stable Modeling

Article  in  IEEE Transactions on Dependable and Secure Computing · July 2011


DOI: 10.1109/TDSC.2011.14 · Source: IEEE Xplore

CITATIONS READS
58 1,365

6 authors, including:

Federico Simmross-Wattenberg Juan Ignacio Asensio-Pérez


Universidad de Valladolid Universidad de Valladolid
29 PUBLICATIONS   136 CITATIONS    169 PUBLICATIONS   2,160 CITATIONS   

SEE PROFILE SEE PROFILE

Pablo Casaseca-de-la-Higuera Marcos Martin-Fernandez


Universidad de Valladolid Universidad de Valladolid
68 PUBLICATIONS   396 CITATIONS    171 PUBLICATIONS   1,215 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

SLEEP ANALYSIS WITH ACTIGRAPHY AND NON-LINEAR PARAMETERS View project

GESEMÁN View project

All content following this page was uploaded by Marcos Martin-Fernandez on 21 May 2014.

The user has requested enhancement of the downloaded file.


494 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 4, JULY/AUGUST 2011

Anomaly Detection in Network Traffic Based


on Statistical Inference and -Stable Modeling
Federico Simmross-Wattenberg, Juan Ignacio Asensio-Pérez, Pablo Casaseca-de-la-Higuera,
Marcos Martı́n-Fernández, Ioannis A. Dimitriadis, Senior Member, IEEE, and Carlos Alberola-López

Abstract—This paper proposes a novel method to detect anomalies in network traffic, based on a nonrestricted -stable first-order
model and statistical hypothesis testing. To this end, we give statistical evidence that the marginal distribution of real traffic is
adequately modeled with -stable functions and classify traffic patterns by means of a Generalized Likelihood Ratio Test (GLRT). The
method automatically chooses traffic windows used as a reference, which the traffic window under test is compared with, with no
expert intervention needed to that end. We focus on detecting two anomaly types, namely floods and flash-crowds, which have been
frequently studied in the literature. Performance of our detection method has been measured through Receiver Operating
Characteristic (ROC) curves and results indicate that our method outperforms the closely-related state-of-the-art contribution
described in [1]. All experiments use traffic data collected from two routers at our university—a 25,000 students institution—which
provide two different levels of traffic aggregation for our tests (traffic at a particular school and the whole university). In addition, the
traffic model is tested with publicly available traffic traces. Due to the complexity of -stable distributions, care has been taken in
designing appropriate numerical algorithms to deal with the model.

Index Terms—Traffic analysis, anomaly detection, -stable distributions, statistical models, hypothesis testing, ROC curves.

1 INTRODUCTION

A NOMALY detection aims at finding the presence of


anomalous patterns in network traffic. Automatic
detection of such patterns can provide network adminis-
Following the aforementioned four-stage approach, we
can mention that data acquisition is typically carried out by
polling one or more routers periodically, so that traffic data
trators with an additional source of information to diagnose are collected and stored for posterior analysis in the second
network behavior or finding the root cause of network faults. stage. Some authors sample data at the packet level,
However, as of today, a commonly accepted procedure to gathering information from headers, latencies, etc., while
decide whether a given traffic trace includes anomalous others prefer to use aggregated traffic as the source of
patterns is not available. Indeed, several approaches to this information, often through the use of the Simple Network
problem have been reported in the literature (see [1], [2], [3], Management Protocol (SNMP). Sampling data at the packet
[4], [5], [6], [7], [8], [9], [10], [11], [12], described in Section 2). level provides more information, but at the cost of a higher
Research proposals in anomaly detection typically follow computational load and dedicated hardware must be
a four-stage approach, in which the first three stages define employed. Aggregated traffic, on the other hand, gives less
the detection method, while the last stage is dedicated to information from which to decide for the presence or
validate the approach. So, in the first stage, traffic data are absence of anomalies, but is a simpler approach and does
collected from the network (data acquisition). Second, data not need any special hardware. Apart from this dichotomy,
are analyzed to extract its most relevant features (data however, there seems to be a consensus on how to proceed
analysis). Third, traffic is classified as normal1 or abnormal in this stage.
(inference); and fourth, the whole approach is validated In the data analysis phase, several techniques can be
with various types of traffic anomalies (validation). In this applied to extract interesting features from current traffic.
regard, literature shows (see Section 2), flood and flash- Some of them include information theory [4], [9], wavelets
[6], statistics-based measurements [3], and statistical models
crowd anomalies are of interest to several anomaly
[1]. Of these techniques, the use of statistical models as a
detection contributors.
means to extract significant features for data analysis has
1. In this paper, the word “normal” will be used in the sense of “natural been found to be very promising, since they allow for a
status” and not as a synonym of “Gaussian.” robust analysis even with small sample sizes (provided that
the model is adequate for real data). Moreover, with a traffic
model, its set of parameters can be used as extracted traffic
. The authors are with the Universidad de Valladolid, ETSI Telecomunica-
cion, Paseo de Belen, 15, 47011 Valladolid, Spain. features, since any traffic sample is determined by the
E-mail: {fedsim, juaase, jcasasec, marcma, yannis, caralb}@tel.uva.es. model parameters.
Manuscript received 11 June 2010; accepted 14 Jan. 2011; published online Existing traffic models range from the classical Poisson
9 Feb. 2011. model, first introduced for packet networks by Kleinrock
Recommended for acceptance by R. Sandhu. [13], to most recent models, which state the importance of
For information on obtaining reprints of this article, please send e-mail to:
tdsc@computer.org, and reference IEEECS Log Number TDSC-2010-06-0096. high variability and long-range dependence [14] in model-
Digital Object Identifier no. 10.1109/TDSC.2011.14. ing network traffic. Nevertheless, anomaly detection is still
1545-5971/11/$26.00 ß 2011 IEEE Published by the IEEE Computer Society
SIMMROSS-WATTENBERG ET AL.: ANOMALY DETECTION IN NETWORK TRAFFIC BASED ON STATISTICAL INFERENCE AND -STABLE... 495

often based (at least partially) on classical models, such as which is appropriate for a given hour and weekday will
Gamma distributions [1]. The fact that these models do not probably not fit any other circumstances or any other
account for high variability may have a negative impact on network for this matter.
capturing traffic properties and, as a consequence, on In the validation stage, researchers give quality measures
detecting anomalies. High variability manifests itself in about the detection capability of their method according to a
the marginal (first-order) traffic distribution and states that chosen criterion, which is typically the detection rate in terms
traffic is inherently bursty. This results in traffic distribu- of false positives and false negatives (i.e., the fraction of
tions exhibiting heavy tails which cannot be properly normal traffic patterns incorrectly classified as anomalous
modeled with, e.g., Gaussian functions. Long-range depen- and the fraction of anomalous traffic patterns incorrectly
dence, on the other hand, states that traffic is highly classified as normal, respectively), although some research-
dependent over a wide range of time scales, i.e., its ers prefer other quality measures (see Section 2). Where
autocorrelation function exhibits a slowly decaying tail. possible, authors often compare the performance of their
Several statistical distributions are capable of modeling methods to other previously reported proposals as well.
the high variability property. One of such distributions is In this paper, we propose an anomaly detection method
the -stable family [15], which has been previously used to based on -stable distributions which does not need network
model network traffic (although with restrictions, as shown administrators choose reference traffic windows and it is able
in Section 4), and that we briefly explored in [16] (where the to detect flood and flash-crowd anomalies regardless of the
presence or absence of abrupt changes in network traffic.
detection problem is not addressed). To the best of our
With this method, we expect to provide a data analysis stage
knowledge, these distributions have never been applied to
giving more informative traffic features, as well as to address
anomaly detection. Moreover, in addition to properly the problem of selecting appropriate reference windows. To
modeling highly variable data, -stable distributions are this end, we provide statistical evidence about the suitability
the limiting distribution of the generalized central limit of -stable distributions to be used as a first-order model for
theorem [17], a fact that sets them as good candidates for network traffic and we build a classifier with a generalized
aggregated network traffic. Regarding the time evolution likelihood ratio test, the performance of which is measured
model and long-range dependence, we show in Section 6 by means of Receiver Operating Characteristic (ROC) curves.
that the first-order -stable model is appropriate to detect We compare our classification results to those reported in [1],
flood and flash-crowd anomalies, so we do not use a time a closely related state-of-the-art contribution which, addi-
evolution model in this paper. tionally, describes the experiments and results with sufficient
Several approaches have been used in the inference stage detail, so as to let other researchers build fair comparisons
as well. Classification methods based on neural networks with their methods. Due to the restrictions described in
[10], [11], [18], statistical tests [2], information theory [4], Section 6, traffic data used in testing the proposed method
and simple thresholding [19], to cite a few, can be found in comes from two routers in our university, which should be
anomaly detection literature. There seems to be a common representative of lightly and heavily loaded networks (see
point in all of them, though. The inference stage bases its Section 3). The traffic model is tested with these data, as well
as with public traffic traces. We choose aggregated traffic
decisions on the existence of a reference traffic window,
data over packet-level sampling for added simplicity so that
which allows the classification method to assess whether
no special hardware is needed for implementing our method.
the current traffic window is normal (i.e., it is sufficiently The paper is organized according to the aforementioned
similar to the reference window) or abnormal (i.e., sig- four-stage approach in order to enhance readability. There-
nificantly different from the reference window). How the fore, after describing the recent contributions in the field of
reference window is chosen not only has an impact on the anomaly detection in Section 2, we dedicate Section 3 to
final normal versus abnormal classification rate, but it also describe the framework used in our experiments, including
determines the exact definition of a traffic anomaly. Some data sampling and router specifications (stage one). Then,
approaches [2], [12], [20] assume that an anomaly is an Section 4 justifies the -stable first-order model (stage two).
abrupt change in some of the features extracted from traffic, Section 5 deals with the inference stage and the methods we
so the reference window is simply the previous-to-current use to classify network traffic (stage three). Section 6 is
traffic window. Other papers [1], [4] assume that the divided in two sections. The first one shows statistical
reference window has been previously chosen and ap- evidence that the -stable marginal model is valid for our
proved by an expert, so they do not need to define data under proper circumstances as well that as it behaves
anomalies as abrupt changes, but simply as traffic windows better than other models even when those circumstances
sufficiently different from the reference. Both of these are not met. The second one analyzes the detection
performance of our method for two common types of
approaches have disadvantages. The former can only detect
anomalies (flood and flash-crowd), and compares it to the
traffic anomalies which include abrupt changes from one
results reported in [1]. This section completes the fourth
traffic window to the next, disregarding, for instance, slow stage. The main conclusions and the foreseen steps of
trends in traffic data. The latter approach does allow for further research are exposed in Section 7. Finally, Appen-
abrupt changes or slow trends detection, but needs the dices A and B, which can be found on the Computer Society
intervention of an expert. In addition, having just one Digital Library at http://doi.ieeecomputersociety.org/
reference window can be problematic due to the nonsta- 10.1109/TDSC.2011.14 provide supplementary information
tionary nature of network traffic. It is widely accepted [21] on the mathematical methods designed and used in the
that network traffic exhibits a cycle-stationary behavior in paper to deal with the calculations in which the -stable
periods of days and weeks, so a reference traffic window model is involved.
496 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 4, JULY/AUGUST 2011

TABLE 1
Anomaly Detection Methods

2 BACKGROUND i.e., where the data come from, their type, as well as
whether they are publicly available for the community. The
In the last decade, several research teams have contributed
following two columns describe the employed anomaly
to anomaly detection in network traffic. However, solutions
given to each stage by different authors vary substantially detection algorithm, i.e., refer to the data analysis and
across different papers. Table 1 provides a systematic inference stages. Also, one can see the types of anomalies
comparative description of the main papers found in the treated in each paper, together with the figures of merit
literature. Thus, for each paper, the first three columns employed for the assessment of each method, as well as the
provide information related to the data acquisition stage, papers against whom they test the performance.
SIMMROSS-WATTENBERG ET AL.: ANOMALY DETECTION IN NETWORK TRAFFIC BASED ON STATISTICAL INFERENCE AND -STABLE... 497

From what can be observed in Table 1, most authors


clearly prefer to use real data in their experiments, rather
than simulated traffic. Also, note that authors tend to collect
data from their own networks, rather than using publicly
available traces. Not using public data in experiments may
be objectionable but is advantageous in that detection
methods are not tied to the information already available.
Thus, authors are free to inject any kind of anomaly in the
network and test it with the proposed method. As for
anomalous patterns, proposals often inject anomalies in the
network on purpose or have some traffic traces prelabeled
as anomalous, although some other approaches detect Fig. 1. A snapshot of instantaneous traffic passing through: (a) router 1
and (b) router 2 (10,000 samples each, taken in June 2007 and
unusual patterns without prior knowledge about anomaly February 2007, respectively). Average traffic is 30.42 Mbps in (a) and
types (these patterns are marked as “U” in Table 1). 366.87 Kbps in (b).
Regarding the types of (real) sampled traffic, some authors
prefer to use aggregated counters, while others do anomaly within cited papers that use common anomalies to test
detection at the packet level. As previously stated, this detection performance, [1] is the natural candidate for
paper focuses on anomaly detection in aggregated traffic. comparison due to several reasons: first, it is fairly recent so,
In the second and third stages, Table 1 shows that a wide
to the best of our knowledge, it is a state-of-the-art paper.
range of algorithms have been used to detect anomalies.
Second, the method is closely related to ours, so compar-
However, there are no proposals using high variability first-
isons can draw great insight on both the modeling and the
order models to analyze data; consequently, this paper aims
inference problem. And, finally, the authors include exact
at improving classification rates by making use of -stable
figures for validating their method so results are directly
properties. As for the inference stage, most reported methods
(and fairly) comparable.
make use of simple thresholding, statistical tests, neural
networks, or distance measurements. Among these techni-
ques, approaches based on neural networks or statistical 3 DATA ACQUISITION
inference should yield better results than arbitrary thresholds As mentioned in Section 1, data used in this work to test the
or predefined distances since they employ prior knowledge proposed detection method were collected from two routers
about the data. In addition, parametric approaches, such as at the University of Valladolid. Our university comprises
the GLRT, are able to take advantage of the robustness of the four campuses, each one in a different city, for a total of
traffic model (provided that it is an adequate model) while 25,000 students and 2,500 faculty members. Router 1 is the
nonparametric methods like neural networks cannot. Thus, core router for the whole University and router 2 is the
inference in this paper is based on hypothesis testing by main router from the School of Telecommunications.
means of a generalized likelihood ratio test. Router 2 is directly connected to one of the ports in router
It is also interesting to look at the anomaly types used in 1. Both of them are able to operate at 1,000 Mbps. Data
validating detection methods. While there is no consensus collection is done by querying the routers via SNMP
on what an anomaly is, or what kind of anomalies should be periodically for accumulated byte counters at each physical
detected, a trend toward three possible approaches can be port. Data have been continuously sampled from June 2007
observed, from most general to most specific anomalies: to July 2008 for router 1, and from February 2007 to October
first, several papers detect general divergences on measured 2008 for router 2 (with some brief interruptions due to
traffic features; some other authors detect a few anomaly unpredictable contingencies).
types commonly found in computer networks, and the Router 1 is a Cisco Catalyst 6509; it usually deals with
remaining papers test their methods with very specific average traffic amounts of several Megabits per second (40-
attack procedures, such as known viruses or malware. Our 70 Mbps typically). As mentioned, it is responsible for all
approach is validated with common (flood and flash-crowd) network traffic coming from every campus in the Uni-
anomalies. As such, it falls on the second approach, thus versity and comprises thousands of hosts directly or
keeping a compromise between general deviations and indirectly. Router 2, a Cisco Catalyst 3550, usually has a
specific attacks. Table 1 also shows a clear preference to much lower workload, its average traffic ranging typically
evaluate detection methods via their false positive/negative between practically 0 and 10 Megabits per second, depend-
rates, as we do in this paper (via ROC curves). ing on the chosen port. Router 2 alone manages traffic
Another interesting conclusion drawn from the papers in coming from hundreds of computers, which are in turn a
Table 1 regards reference traffic windows. Although it is not fraction of those connected to router 1. These routers should
directly seen in the table, authors tend to assume immediate be representative of heavily and lightly loaded networks.
past traffic is normal, or to leave the choice of appropriate Fig. 1 depicts typical traffic traces from both routers.
reference windows up to the network manager. In this paper, In the data acquisition stage, traffic samples are taken at
we address the problem of setting reference windows so as intervals of t seconds, so that data windows of W seconds
not to make any assumption about immediate past traffic and are continuously filled and passed to the second stage. W
not to depend on an expert’s skill. should be large enough to have a minimum amount of data
Some papers, as Table 1 shows, compare their results to when trying to fit a statistical model to them, and short
other similar contributions, when at all possible. In our case, enough to have (at least) local stationarity. This is necessary
498 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 4, JULY/AUGUST 2011

since we extract a single set of parameters from each time


window, which we assume to be constant for W seconds.
Traffic stationarity has been previously studied in [21],
where the authors find one-hour periods to be reasonably
stationary, so, we make use of this assumption in this paper
as well. However, in order to ensure the model adequately
fits the data (see Section 6.1), we chose a time window
length W ¼ 30 minutes.
t, on the other hand, should be short enough to, again,
provide as many traffic samples as possible to the second
stage, but we must also keep in mind that the shorter the t, Fig. 2. A typical histogram of traffic passing through: (a) router 1 and
the more loaded a router will be. Network managers often (b) router 2 (10,000 samples each, taken in June 2007 and February
find unacceptable for a router to spend any significant 2007, respectively) along with Poisson (dotted), Gaussian (dashed),
amount of time in monitoring tasks, so we chose t with this Gamma (dash-dot), and -stable (solid) curves fitted to the data.
restriction in mind. In our experiments, we found that for
t ¼ 5 seconds the monitoring overhead in both mentioned traffic, but not instantaneous one), but the authors impose a
routers ranged between 1 and 3 percent, which was deemed strict condition: analyzed traffic must be very aggregated2
acceptable by respective network administrators. for the model to work, that is, the FBM model is only valid,
authors say, when many traffic traces are aggregated, in
such a way that the number of aggregated traces is much
4 DATA ANALYSIS
larger than the length of a single trace (measured in number
As previously stated in Section 1, the use of statistical of traffic samples). Let us consider why this restriction is
models in the data analysis stage can be advantageous since necessary. First of all, we used our collected data to try and
an adequate model allows for a robust analysis even with see if this constraint was needed in our particular network,
small sample sizes. With traffic windows of W =t ¼ 360 and saw that it was indeed the case. A graph showing some
samples each, our sample size is rather small, so the use of a of our data can be seen in Fig. 1. Note that there are some
model is desirable. This approach has been previously used traffic peaks, or “bursts,” scattered among the data, which
in works such as [1]; however, the model used there does otherwise tend to vary in a slower fashion. Recalling that
not account for important traffic properties, such as high
instantaneous contributions to FBM are Gaussian random
variability. Section 4.1 identifies these properties and
variables, we can calculate a histogram of traffic data like
discusses why classical models are not adequate for
those in Fig. 2, which show typical cases of instantaneous
network traffic. To deal with the problem, Section 4.2
traffic distribution in routers 1 and 2, along with Poisson,
proposes the use of -stable distributions, unrestricted in
Gaussian, Gamma, and -stable curves fitted to real data.
their parameter space, as a model for traffic marginals.
Poisson, Gaussian, and Gamma curves were fitted using a
4.1 Existing Network Traffic Models Maximum Likelihood (ML) algorithm, while the -stable
Traditionally, network traffic has been modeled as a Poisson curve was fitted with the algorithm described in the
process for historical reasons. Indeed, the Poisson model has Appendix, which can be found on the Computer Society
been successfully used in telephone networks for many Digital Library at http://doi.ieeecomputersociety.org/
years, and so it was inherited when telecommunication 10.1109/TDSC.2011.14. Clearly, one can see the marginal
networks became digital and started to send information as distribution of sampled data differs considerably from
data packets. Also, this model has a simple mathematical Poisson, Gaussian, and Gamma probability density func-
expression [32], and has only one parameter, , which is in tions (PDFs), especially in the case of Fig. 2b. This happens
turn very intuitive (the mean traffic in packets per time unit). due to the extreme values present in the data, which alter
In the last decade, however, several authors have studied mean and variance estimates considerably.
network traffic behavior and proposed other models that All of this means that a single traffic trace cannot be
overcome the limitations which are inherent to Poisson modeled as an FBM because traffic marginals are not
processes, the most notable ones probably being that the Gaussian. However, once many traffic traces are aggre-
Poisson model has a fixed relationship between mean and gated, the resulting data do follow a Gaussian distribution,
variance values (both are equal to ), and that it does not and so, the FBM model is valid. This happens as a
consequence of the Central Limit Theorem [32] which
account for high variability or long-range dependence.
More recently proposed models are usually based on the loosely states that the sum of many identically distributed
assumption that network traffic is self-similar in nature, as random variables converges to a Gaussian distribution.
originally stated in [33]. Intuitively, network traffic can be Note, however, that FBM can model the self-similarity
thought of as a self-similar process because it is usually properties of traffic, i.e., it includes a time evolution model
“bursty” in nature and this burstiness tends to appear which accounts for the long-range dependence that data
independently of the time scale. Thus, in [33], Fractional usually exhibit.
Brownian Motion (FBM) [34] is shown to properly fit
2. Here, aggregated means exactly “averaged.” In other words, many
accumulated network traffic data (note that FBM is an traffic traces must be summed up, and then divided by the number of
autoregressive process and so it can model accumulated summed traces.
SIMMROSS-WATTENBERG ET AL.: ANOMALY DETECTION IN NETWORK TRAFFIC BASED ON STATISTICAL INFERENCE AND -STABLE... 499

At this point, it should be clear that any model for variability property seen in network traffic (the Noah
instantaneous traffic marginals must be flexible enough to effect). Moreover, -stable distributions have an asymmetry
adapt to some properties observed in sampled traffic, parameter which allows their PDF to vary from totally left-
namely: asymmetric to totally right-asymmetric (this is almost the
case in Fig. 2b), while genuine Gaussian distributions are
1. Let CðtÞ be the amount of traffic accumulated at always symmetric. This parameter makes -stable distribu-
time t. Then, CðtÞ  Cðt þ 1Þ and Cðt þ 1Þ  tions naturally adaptable to the first traffic property
CðtÞ  M, where M is the network maximum (compact support) even when average traffic is virtually 0
transmission rate. or very near the maximum theoretical network throughput
2. The fact that at time t there is a certain amount of
(see Fig. 2 again).
traffic CðtÞ does not imply in any way that at
In addition, -stable distributions give an explanation to
time t þ 1 the amount of traffic lies anywhere near
the restriction imposed in [33] about the need to aggregate
CðtÞ, due to the inherent burstiness of network
many traffic traces for them to converge to a Gaussian
traffic. This is equivalent to say network traffic
distribution. According to the Generalized Central Limit
exhibits the high variability property.
Theorem [17], which includes the infinite variance case, the
The latter property is also known as the “Noah effect” or sum of n -stable distributions is another -stable distribu-
the infinite variance syndrome [14], and is easily observed on tion, although not necessarily Gaussian. Since traffic data
a histogram like those in Fig. 2 as a heavy tail, usually on its exhibit the Noah effect, we can assume infinite variance.
right side. This data tail is not negligible as, for example, the Then, under the hypothesis that marginals are -stable, the
tails of Poisson, Gaussian, or Gamma distributions. On this sum of a few traces will be -stable but not necessarily
aspect, note that the histogram in Fig. 2b shows only data Gaussian. In [33], however, after summing sufficiently many
under percentile 98 because the tail is so long. One effect traces, the final histogram converges to a Gaussian curve.
heavy tails have when modeling traffic data is that they This occurs because any real measurement cannot be infinite,
distort mean and variance estimates notably, which makes even if an infinite variance model proves to reflect reality best.
it difficult to fit Gamma, Gaussian, and Poisson curves, as Section 6.1 is dedicated to validating this hypothesis, but
seen in Fig. 2. before, although describing -stable distributions in detail is
On the other hand, the first aforementioned property beyond the scope of this paper, as there are several good
states the obvious fact that network traffic has compact
references in this field ([15], [34], [35] for example), we briefly
support between 0 and the M. Compact support makes
mention some of their properties for readability purposes.
symmetric distributions (Gaussian distributions are sym- -stable distributions are characterized by four para-
metric) inappropriate, because if the traffic histogram
meters. The first two of them,  and , provide the
concentrates on very low transmission rates, the model
aforementioned properties of heavy tails () and asymme-
would allow negative traffic increments to occur with a
try (), while the remaining two,  and , have analogous
non-negligible probability and this can never be the case.
meanings to their counterparts in Gaussian functions
Accordingly, if traffic data concentrate near the maximum
(standard deviation and mean, respectively). Note that,
transmission rate, a symmetric model would allow traffic
while they have analogous senses (scatter and center), they
increments to be larger than physically possible, again,
are not equivalent because -stable distributions do not
with a non-negligible probability. This also affects the
have, in general, a finite mean or variance, i.e., EfXg 6¼ 
Gamma distribution, since its tail always lies on its right
and STDfXg 6¼ . The allowed values for  lie in the
side. As an illustrative example, if we extrapolated the
interval ð0; 2, being  ¼ 2 the Gaussian case, while  must
Gaussian (dashed) curve in Fig. 2b toward the left, we
lie inside ½1; 1 (1 means totally left-asymmetric and 1
would see that the probability of getting a negative Mbps
totally right-asymmetric). The scatter parameter () must be
rate is not negligible. Regarding the Poisson distribution,
a nonzero positive number and  can have any real value. If
recall that it converges to Gaussian when  is sufficiently
 ¼ 2, the distribution does not have heavy tails and  loses
large.3 With our data ranging typically within tens of
its meaning since Gaussian distributions are always sym-
packets per second, it makes sense to assume that Gaussian
metric. Conversely, the tails of the PDF become heavier as 
convergence holds, so the previous discussion applies.
tends to zero.
Neither of these problems occur with the -stable (solid)
Despite their potential advantages, however, we also state
curve in the case of Fig. 2, a fact we briefly explored in our
some reasons why -stable distributions are difficult to use.
previous work [16], so we now focus our attention on these
First, the absence of mean and variance in the general case
distributions in order to justify their use as a model for
makes impossible the use of many traditional statistical tools.
anomaly detection.
Moreover, as mentioned before, these distributions do not
4.2 -Stable Models have (to the best of our knowledge) a known closed analytical
-stable distributions can be thought of as a superset of form to express their PDF nor their CDF, so powerful
Gaussian functions and originate as the solution to the numerical methods are needed for tasks which are almost
Central Limit Theorem when second order moments do not trivial with (for example) the Gaussian distribution, such as
exist [17], that is, when data can suddenly change by huge estimating their parameters for a given data set or even
amounts as time passes by. This fits nicely to the high drawing a PDF. Also, the fact that they have four parameters,
instead of just two, introduces two new dimensions to the
3.  ¼ 10 is often considered enough for this purpose. problem, which can make processing times grow faster than
500 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 4, JULY/AUGUST 2011

in the Gaussian approach. In our experiments, however, this which are induced by means of various Distributed Denial-
is not an issue with recent hardware. The Appendix, which of-Service (DDoS) attacks. Flash-crowd anomalies are also
can be found on the Computer Society Digital Library at considered and found to be a priori distinguishable from
http://doi.ieeecomputersociety.org/10.1109/TDSC.2011.14 normal traffic by a multiresolution study of the correlation
describes the mathematical methods we used in dealing with evolution via the ARFIMA model, although a quantitative
-stable distributions. analysis is only carried out for flood anomalies.
In spite of the inadequacy of FBM to model network The goal of this paper is to detect anomalies in network
traffic (because of its marginals), it is nowadays commonly traffic (particularly, flood and flash-crowd types), and not
accepted in the literature that a proper traffic model should to establish a full, novel network traffic model. To this end,
describe 1) the marginal distribution of empirical data and we show in Section 6 that the analysis of marginals suffices
2) how a given traffic sample depends on past ones (i.e., to detect both kinds of anomalies, and that -stable
correlations between them). In accordance to this, other distributions, when used to their full parameter range,
authors have proposed traffic models based on the findings outperform other previously used models both as a model
in [33], which we review briefly here. for marginals and as a means to detect anomalies.
In [36], traffic is modeled as a combination of Linear
Fractional Stable Noise (LFSN) and Log-Fractional Stable 5 INFERENCE
Noise (Log-FSN) [34]. These are stochastic processes whose
marginals are -stable, but the authors must use them with In order to detect anomalies, we first need to define what an
some restrictions for the model to fit real data. For example, anomaly is or, in other words, what our method tries to
the center parameter  must be zero for an -stable process detect. A common approach is to define anomalies as a
to be considered as either LFSN or Log-FSN. With this sudden change in any quantity measured in the previous
stage [2], [3], [12], or a significant divergence between the
constraint, the first-mentioned property seen in traffic data
current traffic window and a previously chosen reference [1],
cannot hold true, so the model is altered to consider the
[4]. Note the difference between both strategies: the former
absolute value of the LFSN or Log-FSN process instead of
compares current traffic to recent past traffic, while the latter
the original one. This should not pose any limitation per se
does not assume recent traffic is necessarily normal. We feel
but, for similar reasons, they must restrict to -stable
this latter approach is superior since some types of anomalies
distributions having  ¼ 0 and  > 1 (i.e., symmetric PDFs
are detectable this way but would not be otherwise (for
whose tails cannot be very heavy), which does limit the
instance, slow trends). However, how superior this latter
-stable parameter space substantially.
approach is depends directly on the ability of the reference
In a similar way, Karasaridis and Hatzinakos [37]
window to represent all kinds of normal traffic at any given
propose a model based on totally right-skewed LFSN
circumstance.
( ¼ 1). This kind of process, again, imposes some restric-
It is widely known that network traffic exhibits a cycle-
tions on the -stable parameters. In addition to the fixed stationary behavior with periods of days and weeks (see [21],
value of ,  must be greater than 1 for the process to have for example) and, generally speaking, traffic patterns that are
long range dependency (and thus, self-similarity). These clearly anomalous in some network, at a given time, can be
restrictions, however, allow the authors to estimate the perfectly normal in some other network or time instant. Thus,
-stable distribution using its inherent properties for the the reference window should vary from a router port to
case that  > 1 and  ¼ 1. another, and from any hour-weekday combination to any
More related work on this subject can be found in [38], other for the anomaly detection system to succeed in real
where the authors try to answer, from a mathematical point world. Possibly, holidays and other workday interruption
of view, the question of whether traffic data are better periods should also be taken into account.
modeled with Stable Lévy Motion [34] (SLM) or FBM Still, having exactly one reference window for all
(among other differences, SLM contributions are -stable possible combinations of port, hour, and weekday needs
while FBM ones are Gaussian). To this end, they use the intervention of an expert who can tell normal traffic
connection rates as an input parameter to some commonly apart from anomalous. Since one of our goals is to provide
used packet-source models, such as the ON/OFF and the an automatic anomaly detection method, we propose, as a
infinite source Poisson models. Note that both SLM and hypothesis, that pðnormal trafficÞ  pðanomalous trafficÞ in
FBM are cumulative processes, so they do not model any correctly behaved network at any circumstance (it
instantaneous traffic but accumulated one. Their conclusion seems obvious that normal traffic should happen most of the
is that for high connection rates FBM can be used, but for time for it to be considered as normal). In other words,
low connection rates SLM is more appropriate. This seems normal traffic should be that which has gone through the
to be in concordance with our results because data from router most of the time in the past. Our data collection
router 1, which deals with higher connection rates than includes traffic samples from routers 1 and 2 for a period of
router 2, tends to be better modeled with Gaussian at least one year, so we roughly have 2  24  365 ¼ 17;520
distributions than data from router 2 (see Section 6.1). 30-minute windows for each port in the routers. That gives
Finally, in [1], the proposed model for the marginals is the us 17;520=24=7 (100) traffic windows for each port-hour-
Gamma distribution, which is combined with an ARFIMA weekday combination such that, by hypothesis, most of
process for correlations. Using the proposed model, the them are representative of normal traffic. For all these
authors find that the marginals alone can be used to windows, we estimate the parameters of an -stable PDF
distinguish between normal traffic and flood anomalies, which fits the data and store those parameters in catalogs,
SIMMROSS-WATTENBERG ET AL.: ANOMALY DETECTION IN NETWORK TRAFFIC BASED ON STATISTICAL INFERENCE AND -STABLE... 501

Y
n
one catalog for each hour-weekday combination, where we x;  ij ; Hi Þ ¼
pðx fðxk ; ij ; Hi Þ; ð2Þ
assume typical stationarity cycles of days and weeks, and k¼1
local stationarity within an hour. Coherently with this
where fðxk ; ij ; Hi Þ is the PDF of the -stable distribution
assumption, test windows are compared with stored
whose likelihood is being calculated, evaluated at the
training windows within the corresponding catalog.
point xk , and n is the number of samples in each traffic
That leaves us with the problem of deciding when a
window (3060=5 ¼ 360 in our scenario). However, -stable
particular traffic window is far enough from our set of PDFs can yield very high values, especially when  ! 0, to
(assumed to be) normal windows. Again, a common the point that a finite precision machine may incorrectly
approach is to fix an arbitrary threshold which marks the evaluate the product p as an infinite value. To overcome this
boundary between normal and anomalous traffic, and trigger situation, we alter the GLRT to use log-likelihoods. By using
an alarm when the threshold is exceeded. Unfortunately, this fðxj ; ij ; Hi Þ > 0 and the fact that log is a strictly increasing
approach is error-prone since a human network adminis- function, then logðmaxðzÞÞ ¼ maxðlogðzÞÞ. Thus, the test will
trator will probably feel it is very sensitive or, the opposite decide H1 if
way, it does not detect interesting anomalous behaviors. Also,   
the simple combination of normal windows and a threshold logðLG ðxxÞÞ ¼ max log pðx x; 1j ; H1 Þ 
 1j
assumes that anything not previously seen is necessarily    ð3Þ
 max log pðx x; 0j ; H0 Þ > 1 ;
anomalous. To overcome this situation, we propose the use of  0j
sets of synthetic anomalies (one set per anomaly type),
where 1 ¼ logðÞ, and
analogous to the normal traffic windows but known to be
anomalous, and setting the threshold so that a given traffic   Xn  
window is classified as normal or anomalous based on its x;  ij ; HiÞ ¼
log pðx log pðxk ;  ij ; HiÞ : ð4Þ
k¼1
similarity to the normal and anomalous sets. Moreover, we
propose to inform network administrators via an “abnorm- This solves the problem of finite machine precision, but
ality index” instead of using binary yes/no alarms (see 1 must still be chosen appropriately to fit the network
below). This way, they have a source of information which manager’s needs. Instead of choosing a fixed 1 , however,
should be more informative and thus less error-prone. the raw value of logðLG ðx
xÞÞ may be scaled and shifted, so it
As for the classification algorithm itself, we choose a represents a conveniently bounded abnormality index, say
Generalized Likelihood Ratio Test (GLRT) [39] since, being between 0 and 100. Then, the administrator can judge for
themselves whether the extracted log-likelihood exceeds
a parametric test, it can take advantage of the -stable
their particular vision of an alarm threshold in each case.
model described in Section 4.2. Although there is no Transforming an unbounded value like logðLG ðx xÞÞ into a
optimality associated with the GLRT, asymptotically it can bounded index in the interval I 2 ð0; 100Þ is easily done (for
be shown to be uniformly most powerful (UMP) among all instance) as
tests that are invariant [39]. We intend to determine
whether it is more likely that the current traffic window 100  1 
I¼ tan flogðLG ðx
xÞÞg þ : ð5Þ
comes from the normal set or one of the anomalous sets, so 2
we test H0 : current traffic is normal versus H1 : current This index may be monitored by the network manager,
traffic is not normal. The GLRT will decide H1 if and its evolution followed over time, in the same fashion as
other common network status indicators (such as average
x; 1j ; H1 Þ
max 1j pðx
LG ðx
xÞ ¼ > ; ð1Þ traffic levels, connection rates, etc.). This way, more
x; 0j ; H0 Þ
max 0j pðx information is provided (in comparison to a simple yes/
where x is the vector of traffic samples in the current window, no alarm) in the case that the network manager has to make
any decision. Nevertheless, binary alarms may be triggered
 is the chosen threshold, and 0j , 1j are, respectively, the
as well by setting an appropriate 1 in (3) if desired. To do
normal and anomalous sets of -stable parameter vectors for
so, users should pick an anomaly intensity and set a desired
the current port-hour-weekday combination. The test is
false-alarm rate. Once this is done, 1 is found by using ROC
repeated for each kind of anomaly to be detected.
curves (see Section 6.2).
This translates into evaluating all the previously stored
In this paper, we focus on detecting flood and flash-crowd
and catalogued -stable PDFs for the current stationarity
anomalies in order to present the results in Section 6.2. Flood
period, in the points specified by the current window anomalies include attacks, or any other circumstances,
samples. Thus, we obtain as many values of likelihood as which result in a net growth of instantaneous traffic. One
stored PDFs we have for the stationarity period being can think of flood anomalies as having one or more relatively
considered (i.e., for each catalog), from which we take the constant traffic sources added to otherwise normal traffic.
maximum. Of course, an appropriate threshold should be DDoS attacks typically give rise to anomalies of this kind.
chosen, depending on the network administrator’s criterion. Flash-crowd anomalies encompass traffic patterns which are
One approach is to choose a desired false positive rate, and caused by a net growth of (usually human) users trying to
set  accordingly, but see below. access a network resource. Typical flash-crowd anomalies
Our final implementation of the GLRT differs slightly are related to overwhelming web server usage patterns.
from expression (1). Using this expression, -stable like- As stated before, we generate synthetic patterns for both
lihoods are evaluated as kinds of anomalies. To this end, we assume that traffic
502 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 4, JULY/AUGUST 2011

Fig. 4. Distribution of injected versus synthetic anomalous patterns over


Fig. 3. Anomalous patterns for flood and flash-crowd anomalies.
the -stable parameter space (   plane). Note that the distribution of
injected abnormal traffic does not seem to differ from that of synthetic
resulting from aggregating two traffic sources is the sum of patterns.
these particular traces. This implicitly assumes that the
summed traces do not exceed the network capacity, so care injecting anomalous traffic in a real network. In this regard,
should be taken in choosing the intensities of generated Fig. 4 shows the distribution of some flood and flash-crowd
anomalous patterns. patterns injected into a real network, along with the
Synthetic anomalies are generated as follows: We start by distribution of synthetically obtained abnormal patterns. It
using public domain programs to generate flood and flash- can be seen that -stable parameters are distributed
crowd anomalies in a virtually empty network (traffic peaks similarly for both types of patterns.
less than 3 Kbps). We used Iperf [40] for flood anomalies,
and JMeter [41] for flash-crowd. Iperf is a command-line
tool, very simple to use, that allows a constant traffic 6 RESULTS
amount to be injected into a network. JMeter, on the other This section shows the results for the Data Analysis and
hand, is slightly more complex. It is designed to test the Inference stages. For data analysis, we show statistical
behavior of the Apache web server under configurable load evidence that -stable distributions are adequate as a model
conditions and allows a user-defined set of HTTP queries to of network traffic marginals for the purpose of detecting
be sent to a web server at random intervals. To ensure the anomalies and compare our goodness-of-fit figures to those
simulation mimics human usage patterns, we implemented of other marginal models which have been used elsewhere.
a lognormal timer with parameters  ¼ 3 seconds and Regarding the Inference stage, we make extensive use of
 ¼ 1:1 for interclick periods, as described in [42]. Fig. 3 ROC curves to show our classification rates and, specifi-
shows some flood and flash-crowd anomalous patterns. cally, we compare them with the results reported in [1].
In the flood case, we generated 104 30-minute anomalies The -stable traffic model is tested with real data
ranging from 16 Kbps to 128 Mbps; in the flash-crowd case, collected from routers 1 and 2, as described in Section 3,
we generated 100 anomalies ranging from 10 to 1,000 threads as well as with well-known public traffic traces available at
each. Once we have these two pools of “pure anomalies,” we [22], [23] (respectively, ITA and WITS data sets). As
randomly choose as many of them as existing normal previously indicated, however, the proposed detection
windows in each port-hour-weekday combination and add method needs a set of traffic data which is sufficiently
them together. This results in three sets of traffic windows dense and that provides enough windows in each of the
for each port, hour, and weekday: a normal set, a flood- catalogues, so that they are representative of their respec-
anomalous set, and a flash-crowd-anomalous set. As stated tive stationarity periods. Unfortunately, we could not find
above, normal windows consist of strictly real traffic, while any public traces that satisfy both criteria at the same time,
anomalous ones are synthetic (although they have also been so ROC curves presented below use only data from our
built from real traffic). The same way we did with normal University routers.
traffic windows, we fit an -stable PDF to the data in each
anomalous window, and store the estimated parameters for 6.1 Goodness-of-Fit of the -Stable Model
their use in the GLRT classifier. A very common way of testing goodness of fit is the use of
Before presenting performance results in the next nonparametric tests, such as the Kolmogorov-Smirnov (KS)
section, it should be noted that synthetic anomalies are test [43]. Unfortunately, this and other similar tests assume
not real abnormal patterns, since the latter tend to alter that samples are independent and identically distributed
network behavior in various ways not directly observed in (iid). Since we are assuming local stationarity within an hour
aggregated traces (network entities tend to drop packets, from the beginning, all samples in one traffic window may be
trigger retries and congestion control mechanisms, alter considered as being identically distributed, but not necessa-
backoff timers, etc.) especially when network use is near its rily independent. Actually, several studies ([14], [33], for
maximum capacity. Injecting anomalies on a real network example) have detected a strong presence of positive
to generate enough training patterns, however, is not an dependence in sampled traffic, which in turn results in
option since doing so would interfere with normal network long-range dependency and the need to use sophisticated
use. Nevertheless, in our tests, -stable parameters obtained models for traffic correlations (such as ARFIMA processes).
from estimation of synthetic anomalous patterns do not However, the effect of positive dependence in stationary
seem to differ significantly from those obtained by directly stochastic processes has been studied in various works.
SIMMROSS-WATTENBERG ET AL.: ANOMALY DETECTION IN NETWORK TRAFFIC BASED ON STATISTICAL INFERENCE AND -STABLE... 503

Fig. 5. KS tests for goodness-of-fit of various distributions to traffic marginals, shown as ðH0 Þ acceptance rates (in percent). Window lengths range
from 8.3 minutes (100 samples) to 1 hour, 23 minutes (1,000 samples). The inference stage uses 360-sample windows (30 minutes).

Weiss [44], for example, proposes a modification of the tests with output traffic from the upstream port of routers 1
KS test to second-order Autoregressive Moving Average and 2 (all ports cannot be shown here for space reasons).
(ARMA) processes. There are also tests which are appro- Taking SNMP byte counters as inputs, data windows of 100
priate for -stable distributed strongly correlated processes to 1,000 consecutive samples are randomly chosen for each
[45]. However, we decided against using these because of the ports. Then, for each window length, we make 1,000
1) recent literature shows that network traffic is usually too experiments in which:
correlated to be closely represented by ARMA processes, and 1. The four parameters of an -stable distribution are
2) a test which is specific to -stable distributions prevents us estimated from the data using the algorithm
from comparing them with any other distribution. Instead, described in Appendix A, which can be found on
we make use of the results reported in [46], which state that the Computer Society Digital Library at http://
under the presence of positive dependence, the KS test tends doi.ieeecomputersociety.org/10.1109/TDSC.2011.14.
to reject the null hypothesis ðH0 Þ. This way, if the test for 2. A KS goodness-of-fit test is made with the null
iid variables accepts the hypothesis that traffic follows a hypothesis ðH0 Þ being “data follow the estimated
specific distribution then we can be sure that this distribution -stable distribution.”
is adequate even for very positively correlated data. Of This process is repeated for Gamma, Gaussian, and
course, the drawback of this approach is that, when H0 is Poisson distributions, using their corresponding ML
rejected, there can still be doubt that the affected distribution estimators.
could have been an adequate model. Nevertheless, it is not Since we want ðH0 Þ to be accepted, the test should yield
our intention in this paper to measure exact values for the p-values greater than the significance level so that ðH0 Þ is
adequacy of a particular model, but just to validate -stable not rejected. Fig. 5 shows the results of test sets for routers 1
distributions and corroborate that they can fit traffic margin- and 2, as well as for traces LBL-CONN-7 from ITA [22], and
als better than other previously used models. Thus, the AUCKLAND-4 from WITS [23]. For each experiment set,
distribution that yields the largest H0 acceptance rates should ðH0 Þ acceptance rates are shown, with a significance level of
be the best as a model for traffic marginals.  ¼ 0:05 for all tests. For clarity, results for the Poisson
The KS test, however, has another drawback: it can only distribution are not shown, because due to the fact that the
be applied when the theoretical distribution is completely Poisson distribution has only one free parameter, and the
specified, i.e., its parameters cannot be estimated from data. aforementioned Gaussian convergence, Poisson fits always
Since the real distribution of traffic data is naturally lie below Gaussian ones.
About these results, it should be noted that ðH0 Þ
unknown, the use of the KS test may be, again, objection-
acceptance rates tend to be smaller as the number of
able. Nevertheless, the KS statistic can be corrected by
samples grows. This may be a consequence of strong
simulation to avoid this problem. Appendix B, which can be positive correlation in sampled data, loss of local stationar-
found on the Computer Society Digital Library at http:// ity (since samples apart more than an hour are forced in this
doi.ieeecomputersociety.org/10.1109/TDSC.2011.14 is de- experiment to be identically distributed) or the simple fact
voted to find proper correction coefficients for the KS that the KS test expects more convergence as the number
statistic, which are used here to present goodness-of-fit of samples grows. Nevertheless, figures show that for
results. 30-minute, five seconds/sample windows (i.e., 360 sam-
We have already referred to Fig. 2 as a pictorial ples), -stable distributions fit traffic marginals best, and
indication that typical traffic histograms can be closely their ðH0 Þ acceptance rates fall well above other previously
approximated using -stable distributions. To give statis- used distributions, so they should constitute an adequate
tical evidence that this is indeed the case, we made several model for traffic marginals.
504 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 4, JULY/AUGUST 2011

6.2 Classifier Performance There are three differences between the described
Once the model for traffic marginals has been validated, we algorithm and the original found in [1]: First, the original
show performance results for our classification method. To algorithm uses 10 logarithmically spaced time resolutions
this end, it is common practice to use graphs that show the from 1 to 1,024 ms. Our data, however, are sampled at
detection probability (PD ) as a function of the false-alarm 5-second intervals, so we have no other option than making
probability (PF A ). These graphs are called ROC curves, they the multiresolution calculations at a larger scale. Second, the
are strictly increasing, and always lie within range from (0,0) authors do not elaborate on how to choose an appropriate
to (1,1). The larger the area under these curves (AUC) the reference window, but just indicate that the window
preceding the injection of network attacks was used;
better performance the considered classifier has. The ideal
therefore, we choose the window preceding normal and
ROC curve would be that which included the point (0,1),
anomalous ones as reference traffic. Third, [1] does not
indicating perfect detection capabilities and zero false-alarm
consider different combinations of ports, hour, and week-
probability, whereas the worst case is a straight line from (0,0)
days, but we do so here so as to fairly compare classification
to (1,1), indicating no gain compared to a purely random performance between both approaches. Step 5 deserves
classifier. ROC curves for our method are obtained by varying further explanation as well. In the original paper, the
1 from (3) in a logarithmically spaced set of thresholds from authors state that Mean Quadratic Distances (MQDs) are to
1 to þ1. A good reference on ROC curves is [39]. be independently calculated for both parameters of the
In order to avoid training and test data sets to overlap, we Gamma distribution. However, they then observe that the
use a procedure based on the leave-one-out strategy [47], i.e., scale parameter does not alter in the presence of flood
we cycle through all available patterns in such a way that, in anomalies and discard it, so we do not calculate it while
each round, all patterns except one are used as the training reproducing its results. For the case of flash-crowd
data set and the remaining pattern is used as the test set. In anomalies, the authors do not see any particular change in
our method, however, anomalous patterns are generated traffic marginals and prefer to use the correlation model to
from normal ones, so some correlation exists between these detect them.
two. To prevent this correlation from affecting the tests, we For space reasons, we cannot present all generated ROC
remove the test pattern an its corresponding pair (instead of curves. Instead, we present some of them in Figs. 6 and 7.
just the test pattern) from the training data set, in what may Each plot shows three ROC curves, one for our method (A),
be called a “leave-two-out” strategy. the second for the method in [1] (B), and the third (Ref)
In [1], ROC curves are used to measure the performance obtained by applying a logistic regression classifier [48] to
of the anomaly detection method there presented, although the -stable parameter space (implementation from MA-
only flood anomalies are tested. Since anomalous data used TLAB [49] statistics toolbox that will be used as a reference).
in that paper are not publicly available, we implemented These reference AUCs give information for the assessment
that method—with some modifications to adapt it to our of the GLRT classification capabilities when compared to a
simple classification method. In this regard, note that in
scenario—and compared its results to ours. In order to carry
some cases the ROC for the logistic regression classifier
out a fair comparison, we briefly describe here our exact
cannot be correctly estimated from the sample since its
implementation of [1]:
results are close to random (see below for an explanation).
1. For all collected traffic windows belonging to a In these cases, no reference curve is shown.
particular combination of port, hour, and weekday, ROC curves for each method are shown as 95 percent
repeat steps 2 to 8. confidence intervals (estimated from the sample, since there
2. Prepare three consecutive traffic windows. The first is no evidence that obtained ROC points follow any known
one is the reference window; second and third ones distribution). We also summarize the median AUC results
act as normal and anomalous traffic windows. for both methods in Table 2. To assess whether there is any
3. Inject a synthetic flood anomaly of a given intensity significant performance difference between both methods,
into the third window. we generate anomalies at 10, 25, 50, and 100 percent
4. For time resolutions 5, 10, 20, 40, and 80 seconds per intensity, relative to the mean amount of traffic for each
sample, repeat step 5. window, so that classification performance can be mea-
5. Estimate the shape parameter of three Gamma sured for increasingly easy-to-detect anomalies (note that
distributions, one fitted to each traffic window, and anomalous traffic is generated at the training stage by
compute the following quadratic distances: a) re- adding pure anomalies of random intensities, instead of
ference to normal windows and b) reference to fixed). Then, we pick pairs of AUC—one for our method,
anomalous windows. other for [1]—for every weekday, at 00:00, 06:00, 12:00, and
6. Calculate mean quadratic distances over all time 18:00, for a total of 74 ¼ 28 AUC pairs per anomaly type
resolutions. and intensity. Then, we use the Mann-Whitney U test [50] to
7. For a sufficiently dense, logarithmically spaced set of search for statistical significance at a level  ¼ 0:05. Note
thresholds from 0 to þ1, repeat step 8. that 28 samples are very few to assume Gaussian conver-
8. Accumulate number of false positives/negatives for gence, and also that AUCs are defined in the interval ½0:5; 1,
each threshold. so a Student’s t test should be discarded in favor of a
9. Calculate false positive/negative ratios. nonparametric test [50]. Also recall that, contrary to the
10. Plot ROC curve and calculate its AUC. previous section, we are now interested in rejecting the null
SIMMROSS-WATTENBERG ET AL.: ANOMALY DETECTION IN NETWORK TRAFFIC BASED ON STATISTICAL INFERENCE AND -STABLE... 505

Fig. 6. ROC curves for flood anomalies, of our method (solid lines, marked as “A”) and the method described in [1] (dashed lines, marked as “B”).
Dotted lines (marked as “Ref”) show results from a simple logistic regression classifier applied to the -stable parameter space (only shown when
results differ from random classification). Curves are shown as 95 percent confidence intervals (estimated from the sample). Larger areas under the
curves give best classification performance. For space reasons, only inputs from router 1 and 2 at Monday 12:00 are shown, with anomaly intensities
of 10, 25, 50, and 100 percent.

hypothesis, so we look for p-values below the significance As Table 2 shows, our method presents a net gain over
level of the test. [1] in all but three cases, in which both methods yield
It should be noted again that the authors of [1] rule out statistically indistinguishable figures. These cases giving
the use of their method to detect flash-crowd anomalies fairly poor classification rates correspond to relatively low-
based solely on traffic marginals (they investigate the intensity flood anomalies in port 1 of router 1. Since this
viability to detect them by studying correlations, although
port is the one connecting the whole university to the
no quantitative analysis is performed). Nevertheless, it is
external world, it is by far the most loaded traffic port in our
interesting to compare performances of both methods also
for this case. Thus, we include it here for the sake of data sets. -stable marginal fits for this port show central
completeness and as an orientation of a minimum accep- values around 40-70 Mbps and a fairly large sample
table classification rate for our method. Note also that, deviation around it, which makes it hard for any classifier
despite the authors findings, there are some figures for their to detect subtle anomalous patterns, such as 10-25 percent
method that are clearly better than simple blind-classifying. flood anomalies.

Fig. 7. ROC curves for flash-crowd anomalies, of our method (solid lines, marked as “A”) and the method described in [1] (dashed lines,
marked as “B”). Dotted lines (marked as “Ref”) show results from a simple logistic regression classifier applied to the -stable parameter
space (only shown when results differ from random classification). Curves are shown as 95 percent confidence intervals (estimated from the
sample). Larger areas under the curves give best classification performance. For space reasons, only inputs from router 1 and 2 at Monday
12:00 are shown, with anomaly intensities of 10, 25, 50, and 100 percent.
506 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 4, JULY/AUGUST 2011

TABLE 2
Median Areas under ROC Curves for (A) Our Method, (B) that Described in [1], and
(Ref) a Simple Logistic Regression Classifier Applied to the -Stable Parameters

For methods A and B: underlined values are significantly larger with the corresponding p-value, as yielded by the Mann-Whitney U test. Both values
are underlined if no significant difference is found. For methods A and Reference: ? means no significant difference between GLRT and logistic
regression classifiers; y means the reference AUC is significantly larger than the corresponding GLRT AUC; “—” means results are indistinguishable
from random classifying. Unmarked reference AUC values are significantly smaller than the corresponding GLRT AUC. Each test is based on
28 AUCs corresponding to every weekday at 00:00, 06:00, 12:00, and 18:00. R = Router; P = Port; D = Direction.

A deeper inspection of Table 2 allows us to draw some while the other two parameters (shape and skewness) remain
conclusions, for which we give an explanation in the mostly unaltered. But flash-crowd patterns, being much
following paragraphs: more noisy and skewed (see Fig. 3b), do alter the shape and
skewness of the marginals. Therefore, -stable distributions
1.Figures for both methods tend to be better for make use of their full potential when classifying flash-crowd
router 2 than for router 1. anomalies, which is not the case for the flood type.
2. Our method seems to yield better classification Regarding the third and fourth conclusions, related to
results for flash-crowd anomalies than for flood type. the behavior of p-values, both are explained by looking at
3. Some p-values indicate a clear difference between box-and-whiskers plots for AUCs, an example of which can
both methods even though median AUCs are very be seen in Fig. 8. Generally speaking, AUCs for our method
similar. tend to concentrate around their median faster than for the
4. p-values tend to decrease as anomaly intensities method in [1], as anomaly intensities increase. This can
increase. result in AUCs that look similar in median, but differ
5. Reference AUCs are highly variable, sometimes significantly when considering their complete distribution.
giving classification results comparable to (or even About the fifth conclusion, the logistic regression classifier
slightly better than) the GLRT classifier, and in some works well when decision regions are disjoint in the -stable
other cases giving virtually random results. parameter space, i.e., when there is a clear boundary between
normal and anomalous patterns, such as in the case depicted
The first conclusion is explained the same way as the
in Fig. 9a. This happens especially for flood anomalies which,
previously mentioned cases with nonsignificant p-values.
as stated before, essentially tend to alter the centrality
Router 2 carries around one order of magnitude less traffic
parameter. In other cases, such as in Fig. 9b, decision
than router 1. This affects centrality and scatter parameters boundaries are not easily found, causing classification results
(both for Gamma and -stable distributions) in such a way to be indistinguishable from random.
that differences to any reference window tend to be more As a final conclusion, recall that our results are based
easily detected.
exclusively on traffic marginals, and no other measure-
The explanation for the second conclusion can be found in
ments or heuristics were needed to achieve them. This
the -stable parameter space: flood anomalies are detected
essentially as an abnormal centrality value and a slight
variation in the scatter parameter (as Fig. 3a may anticipate),

Fig. 9. Examples of decision boundaries obtained with a logistic


Fig. 8. An example of box and whiskers plots of AUC distribution for regression classifier (projection over the    plane of the 4D -stable
(A) our method and (B) that reported in [1]. On the left, a 10 percent parameter space is shown). On the left, a clear boundary may be drawn
flood anomaly at router 2. On the right, a 100 percent flood anomaly at between normal and flood traffic. On the right, the decision boundary
router 1. Both plots are based on 28 AUCs corresponding to every causes almost random classification outcome between normal and
weekday at 00:00, 06:00, 12:00, and 18:00. flash-crowd traffic.
SIMMROSS-WATTENBERG ET AL.: ANOMALY DETECTION IN NETWORK TRAFFIC BASED ON STATISTICAL INFERENCE AND -STABLE... 507

results in added simplicity to the goal of detecting performance of our method, we show classification rates, in
anomalies, both from a theoretical perspective and from a the form of areas under ROC curves, for two anomaly types
network administrator’s point of view. commonly found in anomaly detection literature, namely
flood and flash-crowds. Then, we compare our figures to
those obtained with the state-of-the-art method reported in
7 CONCLUSIONS AND FURTHER WORK
[1] (with some modifications to adapt it to our scenario).
In this paper, an anomaly detection method based on Table 2 shows a net gain for our method in all but a few
statistical inference and an -stable first-order model has cases, in which results for both approaches are statistically
been studied. We follow a four-stage approach to describe indistinguishable. Traffic data used in our tests come from
each of the pieces from which to build a functional detection the core router of our university and from one of its schools,
system (data acquisition, data analysis, inference, and which should be representative of heavily and lightly
validation), yielding the final classification results shown loaded networks.
in Section 6.2. Our approach uses aggregated traffic as Even though validation data for the inference stage
opposed to packet-level sampling so dedicated hardware is comes from networks in our University, we do not make
not needed. any assumption on the nature of the data, apart from the
In the data analysis stage, we propose to use -stable fact that it is locally stationary in periods of 30 minutes. This
distributions as a model for traffic marginals. Since these way, our results should be extrapolable to other networks
distributions are able to adapt to highly variable data, and as long as the local stationarity restriction holds. Otherwise,
due to the fact that they are the limiting distribution of network managers may have to adjust the traffic window
the generalized central limit theorem, they are a natural length (W ) and the sampling period (t) to fit their particular
candidate to use in modeling aggregated network traffic.
needs. W and t may need further adjustment so that
On this regard, we give statistical evidence that unrestricted
sampled routers are not overloaded and that sufficiently
-stable distributions pose an adequate marginal model for
large traffic windows are fed to the analysis stage. Also, our
real traffic in our scenario—as long as a local-stationarity
approach has been validated with flood and flash-crowd
hypothesis holds—and compare our -stable fits to other
anomalies, but other anomalies should be detectable with
marginal models which have been used elsewhere.
our method as well, provided that they influence the first-
In the inference stage, we propose a novel strategy for
order distribution of aggregated traffic.
choosing reference traffic windows, so that an expert
Despite the mentioned contributions, our approach still
intervention is not needed. We do not make any assumption
has some drawbacks. -stable distributions outperform
that immediate past traffic is necessarily normal, either. To
other statistical distributions used elsewhere in anomaly
this end, we make use of a whole set of reference windows
detection, but at the cost of a higher computational cost.
for all combinations of port, hour, and weekday to better
Although this extra calculation time should not be an issue
reflect reality, and use them to distinguish normal from
anomalous traffic. Synthetic anomalies are used to generate in current hardware, classical models will always be easier
anomalous traffic windows. As for the classification itself, and faster to use. On the other hand, a set of reference traffic
instead of applying a “sufficiently far from normal” thresh- windows combined with synthetic anomalies reduces the
old to raise an alarm, we propose the use of a GLRT to need for a network manager’s intervention, but a suffi-
assess the similarity of a particular traffic trace to reference ciently large traffic data collection, able to represent normal
normal and anomalous traffic windows. Then, a conveni- traffic at any moment, must be available prior to deploying
ently bounded abnormality index is calculated, in order to our detection method. For all experiments in Section 6, a
let network administrators decide for the importance of a recent, average PC machine was used as the detection
particular anomaly, should they wish to, instead of raising a machine, and a two-year-old, 16-way server was used to
binary yes/no alarm. calculate -stable parameters of all collected traffic win-
A topic not covered in this paper relates to self- dows in the training stage. The runtime to get both flood
adaptation of reference traffic windows (normal and and flash-crowd abnormality scores for the current traffic
abnormal) to newly seen real traffic. Since network traffic window in the detection machine was roughly five seconds.
tends to change over time, it may be desirable that training The offline training stage runtime to preprocess all traffic
traffic is periodically updated to fit new circumstances as windows needed to obtain the results shown in Table 2
time passes by. Apart from the extra computation time, (about one year of 30-minute windows, two anomaly types,
nothing prevents the proposed method from being fed with two routers, and two directions for an approximate total of
new traffic windows and periodically recalculate reference 150,000 windows) was roughly 12 hours in the server. All
windows. This issue has been addressed in other works, our algorithms have been implemented in MATLAB [49], so
such as [51], [52]. However, allowing the detection system optimized implementations in a compiled language may
to adapt itself to new traffic has other implications, such as dramatically lower this figure.
rendering it vulnerable to low-rate attacks [53], in which Further work in this subject falls in three main areas. First,
attackers inject abnormal traffic into the network slowly and the proposed model for network traffic is not complete, since
increasingly, a fact that would eventually lead the system to we model traffic marginals only. As recent literature shows,
incorrectly recognize abnormal traffic as normal. traffic correlations should be taken into account if new
Statistical tests of our model show that -stable distribu- insights on traffic nature are to be found, so clearly this is
tions outperform other first-order models used in anomaly open ground for a deeper study in the Data analysis stage.
detection regardless of the window length. In testing the Although the inclusion of a time evolution model adds
508 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 4, JULY/AUGUST 2011

complexity and computational load to traffic analysis, the [9] A. Wagner and B. Plattner, “Entropy Based Worm and Anomaly
Detection in Fast IP Networks,” Proc. 14th IEEE Int’l Workshops
long-range dependence property may provide additional Enabling Technologies: Infrastructures for Collaborative Enterprises,
information which could prove useful for the inference pp. 172-177, June 2005.
stage. Also, the GLRT has been chosen in the inference stage [10] M. Ramadas, S. Ostermann, and B. Tjaden, “Detecting
since, being a parametric classifier, it is able to take Anomalous Network Traffic with Self-Organizing Maps,”
Proc. Sixth Int’l Symp. Recent Advances in Intrusion Detection,
advantage of the traffic model robustness, as well as for its pp. 36-54, 2003.
asymptotical UMP characteristics. Nevertheless, other clas- [11] S.T. Sarasamma, Q.A. Zhu, and J. Huff, “Hierarchical Kohonen
sifiers may prove able to yield better results or reduced Net for Anomaly Detection in Network Security,” IEEE Trans.
Systems, Man and Cybernetics, Part B: Cybernetics, vol. 35, no. 2,
calculation times. Second, our method and that reported in pp. 302-312, Apr. 2005.
[1] differ in quite a few areas, so, there is the open question [12] V. Alarcon-Aquino and J.A. Barria, “Anomaly Detection in
of exactly how much every difference contributes to the final Communication Networks Using Wavelets,” IEE Proc.—Comm.,
results. Some of these differences are difficult to measure vol. 148, no. 6, pp. 355-362, Dec. 2001.
[13] L. Kleinrock, Queueing Systems, Volume 2: Computer Applications.
(e.g., the robustness of a set of normal and anomalous John Wiley and Sons, 1976.
reference windows for all port, hour, and weekday [14] W. Willinger, M.S. Taqqu, R. Sherman, and D.V. Wilson, “Self-
combinations versus the use of a single, normal traffic Similarity through High-Variability: Statistical Analysis of Ether-
reference window) but, still, studying the contribution of net LAN Traffic at the Source Level,” IEEE/ACM Trans. Network-
ing, vol. 5, no. 1, pp. 71-86, Feb. 1997.
every single variable seems necessary to further enhance [15] G. Samorodnitsky and M.S. Taqqu, Stable Non-Gaussian Random
performance figures. And third, the methods proposed in Processes: Stochastic Models with Infinite Variance. Chapman & Hall,
this paper have been tested in laboratory conditions, so more 1994.
testing in production environments shall be carried out, so [16] F. Simmross-Wattenberg, A. Tristán-Vega, P. Casaseca-de-la
Higuera, J.I. Asensio-Pérez, M. Martı́n-Fernández, Y.A. Dimi-
as to further understand network managers’ needs. triadis, and C. Alberola-López, “Modelling Network Traffic as
-Stable Stochastic Processes: An Approach Towards Anomaly
Detection,” Proc. VII Jornadas de Ingenierı´a Telemática (JITEL),
ACKNOWLEDGMENTS pp. 25-32, Sept. 2008.
[17] G.R. Arce, Nonlinear Signal Processing: A Statistical Approach.
The authors want to thank José Andrés González-Fermoselle John Wiley and Sons, 2005.
and Carlos Alonso-Gómez, network managers of the [18] J. Jiang and S. Papavassiliou, “Detecting Network Attacks in the
University of Valladolid campus and the School of Tele- Internet via Statistical Network Traffic Normality Prediction,”
communications Engineering, respectively, for their pa- J. Network and Systems Management, vol. 12, no. 1, pp. 51-72, Mar.
2004.
tience and invaluable support at accessing and sampling [19] W. Yan, E. Hou, and N. Ansari, “Anomaly Detection and Traffic
traffic data at routers 1 and 2, respectively. The authors Shaping under Self-Similar Aggregated Traffic in Optical
would also like to thank Dr. Antonio Tristán-Vega for Switched Networks,” Proc. Int’l Conf. Comm. Technology (ICCT ’03),
vol. 1, pp. 378-381, Apr. 2003.
helpful discussions about -stable distributions. This work
[20] J. Brutlag, “Aberrant Behavior Detection in Time Series for
has been partially funded by the Spanish Ministry of Science Network Monitoring,” Proc. USENIX 14th System Administration
and Innovation (TIN200803023), the Spanish Ministry of Conf. (LISA), pp. 139-146, Dec. 2000.
Education and Culture—European Regional Development [21] V. Paxson and S. Floyd, “Wide Area Traffic: The Failure of Poisson
Modelling,” IEEE/ACM Trans. Networking, vol. 3, no. 3, pp. 226-
Fund (TEC2007-67073/TCM), the Autonomous Government 244, June 1995.
of Castilla y León, Spain (VA106A08, SAN126/VA33/09 and [22] Internet Traffic Archive, http://ita.ee.lbl.gov/, 2011.
VA0339A10-2), and the Regional Health Ministry of Castilla [23] Waikato Internet Traffic Storage, http://wand.cs.waikato.ac.nz/
y León, Spain (GRS 292/A/08 and GRS 388/A/09). wits/, 2011.
[24] Cooperative Assoc. for Internet Data Analysis, http://www.
caida.org/, 2011.
REFERENCES [25] DiRT Group’s Home Page, Univ. of North Carolina, http://www-
dirt.cs.unc.edu/ts/, 2010.
[1] A. Scherrer, N. Larrieu, P. Owezarski, P. Borgnat, and P. Abry, [26] “Metrology for Security and Quality of Service,” http://
“Non-Gaussian and Long Memory Statistical Characterizations www.laas.fr/METROSEC/, 2011.
for Internet Traffic with Anomalies,” IEEE Trans. Dependable and [27] B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen, “Sketch-Based
Secure Computing, vol. 4, no. 1, pp. 56-70, Jan. 2007. Change Detection: Methods, Evaluation, and Applications,” Proc.
[2] M. Thottan and C. Ji, “Anomaly Detection in IP Networks,” IEEE Internet Measurement Conf. (IMC), pp. 234-247, Oct. 2003.
Trans. Signal Processing, vol. 51, no. 8, pp. 2191-2204, Aug. 2003.
[28] DDoSVax, http://www.tik.ee.ethz.ch/ddosvax/, 2010.
[3] C. Manikopoulos and S. Papavassiliou, “Network Intrusion and
Fault Detection: A Statistical Anomaly Approach,” IEEE Comm. [29] S. Stolfo et al., “The Third International Knowledge Discovery and
Data Mining Tools Competition,” http://kdd.ics.uci.edu/
Magazine, vol. 40, no. 10, pp. 76-82, Oct. 2002.
databases/kddcup99/kddcup99.html, 2011.
[4] Y. Gu, A. McCallum, and D. Towsley, “Detecting Anomalies in
Network Traffic Using Maximum Entropy Estimation,” Proc. [30] G. Cormode and S. Muthukrishnan, “What’s New: Finding
Internet Measurement Conf., Oct. 2005. Significant Differences in Network Data Streams,” IEEE/ACM
[5] A. Lakhina, M. Crovella, and C. Diot, “Diagnosing Network-Wide Trans. Networking, vol. 13, no. 6, pp. 1219-1232, Dec. 2005.
Traffic Anomalies,” Proc. ACM SIGCOMM ’04, pp. 219-230, Aug. [31] Cisco Systems, “Cisco IOS NetFlow,” http://www.cisco.com/
2005. web/go/netflow, 2011.
[6] P. Barford, J. Kline, D. Plonka, and A. Ron, “A Signal Analysis of [32] A. Papoulis, Probability, Random Variables, and Stochastic Processes,
Network Traffic Anomalies,” Proc. Second ACM SIGCOMM third ed., McGraw-Hill, 1991.
Workshop Internet Measurement, pp. 71-82, Nov. 2002. [33] W. Leland, M. Taqqu, W. Willinger, and D. Wilson, “On the Self-
[7] A. Ray, “Symbolic Dynamic Analysis of Complex Systems for Similar Nature of Ethernet Traffic (Extended Version),” IEEE/
Anomaly Detection,” Signal Processing, vol. 84, no. 7, pp. 1115- ACM Trans. Networking, vol. 2, no. 1, pp. 1-15, Feb. 1994.
1130, 2004. [34] P. Embrechts and M. Maejima, Selfsimilar Processes. Princeton
[8] S.C. Chin, A. Ray, and V. Rajagopalan, “Symbolic Time Series Univ. Press, 2002.
Analysis for Anomaly Detection: A Comparative Evaluation,” [35] Lévy Processes: Theory and Applications, O.E. Barndorff-Nielsen,
Signal Processing, vol. 85, no. 9, pp. 1859-1868, 2005. T. Mikosch, and S.I. Resnick, eds., Birkhäuser, 2001.
SIMMROSS-WATTENBERG ET AL.: ANOMALY DETECTION IN NETWORK TRAFFIC BASED ON STATISTICAL INFERENCE AND -STABLE... 509

[36] J.R. Gallardo, D. Makrakis, and L. Orozco-Barbosa, “Use of Juan Ignacio Asensio-Pérez received the MSc
-Stable Self-Similar Stochastic Processes for Modelling Traffic in and PhD degrees in telecommunications engi-
Broadband Networks,” Performance Evaluation, vol. 40, pp. 71-98, neering from the University of Valladolid, Spain,
2000. in 1995 and 2000, respectively. He is currently
[37] A. Karasaridis and D. Hatzinakos, “Network Heavy Traffic an associate professor in the Department of
Modeling Using -Stable Self- Similar Processes,” IEEE Trans. Signal Theory, Communications and Telematics
Comm., vol. 49, no. 7, pp. 1203-1214, July 2001. Engineering, University of Valladolid. His re-
[38] T. Mikosch, S. Resnick, H. Rootzén, and A. Stegeman, “Is Network search interests include teletraffic engineering
Traffic Approximated by Stable Lévy Motion or Fractional and technology-enhanced learning.
Brownian Motion?” The Annals of Applied Probability, vol. 12,
no. 1, pp. 23-68, 2002.
[39] S.M. Kay, Fundamentals of Statistical Signal Processing, Volume 2: Pablo Casaseca-de-la-Higuera received the
Detection Theory. Prentice Hall, 1998. Ingeniero de Telecomunicación and PhD de-
[40] Iperf, http://iperf.sourceforge.net/, 2011. grees from the University of Valladolid, Spain, in
[41] “Apache JMeter,” The Apache Jakarta Project, Apache Software 2000 and 2008, respectively. He is currently an
Foundation, http://jakarta.apache.org/jmeter/, 2011. assistant professor at the ETSI Telecomunica-
[42] Z. Liu, N. Niclausse, and C. Jalpa-Villanueva, “Traffic Model and ción of the University of Valladolid, where he
Performance Evaluation of Web Servers,” Performance Evaluation, performs his research within the Laboratory of
vol. 46, nos. 2-3, pp. 77-100, 2001. Image Processing (LPI). From December 2000
[43] M.A. Stephens, “EDF Statistics for Goodness of Fit and Some to November 2003, he worked as a design
Comparisons,” J. Am. Statistical Assoc., vol. 69, no. 347, pp. 730-737, engineer for Alcatel Espacio S.A. where he
1974. contributed to several space programs including the European satellite
[44] M.S. Weiss, “Modification of the Kolmogorov-Smirnov Statistic navigation project, Galileo. His activities there were all related to digital
for Use with Correlated Data,” J. Am. Statistical Assoc., vol. 73, signal processing and Radio Frequency (RF) design for the Telemetry,
no. 364, pp. 872-875, 1978. Tracking and Command (TTC) subsystem. After this period, he joined
[45] R.S. Deo, “On Estimation and Testing Goodness of Fit for the LPI with a research fellowship which finished in October 2005 when
m-Dependent Stable Sequences,” J. Econometrics, vol. 99, pp. 349- his academic activities began. His research interests are statistical
372, 2000. modeling and nonlinear methods for biomedical signal and image
[46] L.J. Glesser and D.S. Moore, “The Effect of Dependence on Chi- processing, and network traffic analysis.
Squared and Empiric Distribution Tests of Fit,” The Annals of
Statistics, vol. 11, no. 4, pp. 1100-1108, 1983.
Marcos Martı́n-Fernández received the Inge-
[47] A.K. Jain, R.P.W. Duin, and J. Mao, “Statistical Pattern Recogni-
niero de Telecomunicacion and PhD degrees
tion: A Review,” IEEE Trans. Pattern Analysis and Machine
from the University of Valladolid, Spain, in 1995
Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
and 2002 respectively. He is an associate
[48] S.J. Press and S. Wilson, “Choosing between Logistic Regression
professor at the ETSI Telecomunicacion, Uni-
and Discriminant Analysis,” J. Am. Statistical Assoc., vol. 73,
versity of Valladolid. From March 2004 to March
no. 364, pp. 699-705, 1978.
2005, he was a visiting assistant professor of
[49] “MATLAB—The Language of Technical Computing,” Math-
Radiology at the Laboratory of Mathematics in
works, Inc, http://www.mathworks.com/products/matlab/,
Imaging (Surgical Planning Laboratory, Harvard
2011.
Medical School, Boston, Massachusetts). His
[50] B. Rosner, Fundamentals of Biostatistics. Duxbury Thomson Learn-
research interests are statistical and mathematical methods for image
ing, 2000.
and signal processing. He is with the Laboratory of Image Processing
[51] A. Stavrou, G.F. Cretu-Ciocarlie, M.E. Locasto, and S.J. Stolfo,
(LPI) at the University of Valladolid where he is currently performing his
“Keep Your Friends Close: The Necessity for Updating an
research. He was granted with a Fullbright fellowship during his visit at
Anomaly Sensor with Legitimate Environment Changes,” Proc.
Harvard. He is reviewer of several international journals and member of
ACM/CSS Workshop Security and Artificial Intelligence (AISec), 2009.
several international scientific committees. He has contributed with more
[52] G.F. Cretu-Ciocarlie, A. Stavrou, M.E. Locasto, and S.J. Stolfo,
than 100 scientific publications.
“Adaptive Anomaly Detection via Self-Calibration and Dynamic
Updating,” Proc. 12th Int’l Symp. Recent Advances in Intrusion
Detection (RAID), Sept. 2009. Ioannis A. Dimitriadis received the BS degree
[53] G. Maciá-Fernández, J. Dı́az-Verdejo, and P. Garcı́a-Teodoro, in telecommunications engineering from the
“Evaluation of a Low-Rate DoS Attack against Application National Technical University of Athens, Greece,
Servers,” Computers and Security, vol. 27, pp. 335-354, 2008. in 1981, the MS degree in telecommunications
engineering from the University of Virginia,
Charlottesville, in 1983, and the PhD degree in
Federico Simmross-Wattenberg received the
telecommunications engineering from the Uni-
BS and MS degrees in computer science and
versity of Valladolid, Spain, in 1992. He is
the PhD degree from the University of Valladolid
currently a full professor of telematics engineer-
in 1999, 2001, and 2009, respectively. He is
ing at the University of Valladolid. His research
currently a lecturer of Telematics Engineering at
interests include technological support to learning and work processes,
the University of Valladolid. In 2001, he joined
computer networks, as well as machine learning. He is a senior member
the Laboratory of Image Processing (LPI) as a
of the IEEE, a member of the IEEE Computer Society and the
researcher, and later the Intelligent & Coopera-
Association for Computing Machinery.
tive Systems Research Group (GSIC), where he
has since contributed to various research
projects. He has also worked for several years as a network Carlos Alberola-López received Ingeniero de
administrator at the University of Valladolid. His current research Telecomunicación and PhD degrees from Poli-
interests include network traffic analysis, anomaly detection, and technical University of Madrid (Spain), in 1992
statistical models for signal processing. and 1996, respectively. He is a professor at the
ETSI Telecomunicación of the University of
Valladolid, Spain. In 1997, he was a visiting
scientist at Thayer School of Engineering, Dart-
mouth College, New Hampshire. His research
interests are statistical and fuzzy methods for
signal and image processing applications. He is
head of the Laboratory of Image Processing (LPI) at the University of
Valladolid. He is reviewer of several scientific journals and he is
consultant of the Spanish Government for the evaluation of research
proposals.

View publication stats

Potrebbero piacerti anche