Sei sulla pagina 1di 6

Detecting DGA Malware Traffic Through

Behavioral Models
Marı́a José Erquiaga∗1 , Carlos Catania∗2 and Sebastián Garcı́a†3
† ITIC,
FCEN, UNCuyo
Mendoza, Argentina
1
merquiaga@uncu.edu.ar
3
ccatania@itu.uncu.edu.ar
∗ CTU University, ATG Group
Prague, Czech Republic
2
sebastian.garcia@agents.fel.cvut.cz

Abstract—Some botnets use special algorithms to generate detecciones en la red y esperamos impulsar el desarrollo de
the domain names they need to connect to their command estos métodos para la detección de tráfico malicioso.
and control servers. They are refereed as Domain Generation Index Terms—DGA, Malware Detection, Botnet Detection,
Algorithms. Domain Generation Algorithms generate domain Machine Learning, Network Traffic, Network Flows.
names and tries to resolve their IP addresses. If the domain
has an IP address, it is used to connect to that command I. I NTRODUCTION
and control server. Otherwise, the DGA generates a new
domain and keeps trying to connect. In both cases it is Malware is known to use DGA (Domain Generation
possible to capture and analyze the special behavior shown Algorithms) to maintain its communication channels [1]. A
by those DNS packets in the network. The behavior of
Domain Generation Algorithms is difficult to automatically
DGA algorithm is a resilient method to generate domains
detect because each domain is usually randomly generated and that allows the attacker to maintain control of the malware
therefore unpredictable. Hence, it is challenging to separate by forbidding the defenders to know which domain will
the DNS traffic generated by malware from the DNS traffic be used next. Detecting DGA traffic is very important
generated by normal computers. In this work we analyze to identify the current infected computers in the network.
the use of behavioral detection approaches based on Markov
Models to differentiate Domain Generation Algorithms traffic
Detecting the domain names generated by a DGA is difficult
from normal DNS traffic. The evaluation methodology of our because they are usually randomly generated from letters or
detection models has focused on a real-time approach based common dictionary words. However we hypothesize that the
on the use of time windows for reporting the alerts. All the behavior in the network generated by DGA is quite different
detection models have shown a clear differentiation between from normal DNS traffic. This work proposes to analyze
normal and malicious DNS traffic and most have also shown a
good detection rate. We believe this work is a further step
the flows in the network with machine learning methods
in using behavioral models for network detection and we to differentiate DGA traffic from normal DNS traffic. To
hope to facilitate the development of more general and better accomplish this goal we use the network-based behavioral
behavioral detection methods of malware traffic. models of the Stratosphere Project to model the DNS traffic
Abstract—Algunos botnets utilizan algoritmos que generan and to do our experiments [2].
nombres de dominios que son utilizados para conectarse a There has been a large amount of research trying to detect
sus servidores command y control (CC). Éstos se denominan
Algoritmos de Generación de Dominios (DGA). Los DGA DGA [3] [4] [5] [6]. The most visible consequence of DGA
generan nombres de dominio e intentan resolver sus direc- traffic in the network is that most of the domains resolved do
ciones IP. Si el dominio tiene dirección IP, es utilizado para not exist and therefore, generate a NXDomain record. This
conectarse a ese servidor CC. Si no tiene éxito, el DGA fact has been used by the Pleaides system [3] to detect the
genera un nuevo dominio e intenta conectarse. En ambos casos, NXDomain responses from the DNS servers. The system
es posible capturar y analizar el comportamiento particular
encontrado en los paquetes DNS en la red. El comportamiento has two modules: DGA Discovery and DGA Classification
de DGA es difı́cil de detectar automáticamente porque cada - C&C detection. The former uses the X-Mean algorithm to
dominio es generado, usualmente, de forma aleatoria y por cluster traffic according to the NXDomain answers, while
lo tanto, impredecible. Por esta razón, es un desafı́o separar the later uses supervised learning to build models of the
el tráfico DNS generado por el malware del tráfico generado new DGA.
por el uso normal. En este trabajo analizamos el uso de un
método de detección por comportamiento basado en cadenas de Apart from the NXDomain responses, some research had
Markov para diferenciar el tráfico DGA del DNS normal. La used the timing differences in DGA traffic for detection [4].
metodologı́a de evaluación para nuestros modelos de detección, This work, called Exposure uses a decision tree algorithm
está enfocada en un método basado en el uso de ventanas to separate DGA and normal traffic using the following fea-
de tiempo para reportar las alertas. Todos los modelos de tures: Time-based features, DNS answer based features, TTL
detección mostraron una clara diferenciación entre el tráfico
DNS normal y malicioso, y la mayorı́a ha demostrado un buen Value-Based Features and Domain name-Based Features.
ı́ndice de detección. Consideramos que este trabajo es un nuevo Although they used timing properties they do not used the
avance en la utilización de modelos de comportamiento para inter-timing between DNS queries.
Other successful approach has been to analyze the be- 8.8.8.8-53-udp was assigned the behavioral model
havior of the DNS traffic in order to find the relationships 24.R*R.R.R*a*b*a*a*b*b*a*R.R*R.R*a*a*b*a*a*a*a*.
between the domains resolved. This tactic was used with
good results before [5] in a deep analysis of a large capture A. Markov Chains Detection Algorithm
of DNS traffic. They assume that if there is a connection We define a Markov Chain as a stochastic process that
to a malicious domain, it is probable that it will connect undergoes transitions from one state to another in a state
to other malicious domains. The authors used XMeans space, and where that process has the Markov Property. This
clustering with Bayesian Information Criteria. More recently property, usually referred as memorylessness, means that the
there has been an advancement on detecting DGA traffic probability distribution for going from one state to the next
using flows [6]. This approach uses machine learning to depends only on the current state [9].
analyze the relationship between the number of domains The use of Markov Chains to model the behaviors of the
resolved and the number of connections made to different connections in the network can be explained as follows. The
IP addresses, obtaining good results. Finally, there are some usage of computer systems by humans and programs can be
very innovative detection approaches that can lead to future seen as a sequence of actions when they are related to only
developments in agent technology by applying Artificial one specific service. For example, humans and programs
Immune Systems theories for attacks detections [7]. access web pages in a sequence of links (for example
This paper is organized as follows: Section II discuss the Google’s Page Rank algorithm uses Markov Chains [10]),
detection method used by Stratosphere project. Section III and humans also type in the keyboard in a sequence of
explains the training of the detection models. Section IV keystrokes one after the other. This suggests that the packets
describes the dataset. Section V describes the experiments. generated in the network as a result of those actions may
The results are shown in section VI. Section VII discusses also be seen as a sequence. Although the packets inside a
the experiments and results ara analyzed in Section VII. flow are related by the network protocol specification, the
Finally, Section VIII presents the conclusions. relationship between the flows that those packets generate
is only given by the user doing the actions. Is means that
II. D ETECTION M ETHOD the relationship between all the flows in the connection to
the same service, is highly related to the behavior of the
The detection method used in this work is based on
user or program in that service. As explained previously,
the technique used by the Stratosphere Project [8]. The
Stratosphere groups those flows together, and we assume
Stratosphere Project is a large effort to which we are related,
that each flow only depends on the previous one. This is
that involves a collaboration between the universities CTU
a fairly approximate assumption based on the analysis of
(Czech Technical Univeristy) and UNCuyo (Universidad
connections in the network. For example, an SSH (Secure
Nacional de Cuyo) and several NGOs organizations that
Sockets Layer) connection generates a new packet each
prefer to remain anonymous. Stratosphere goal is to create
time a key is pressed on the keyboard, making a direct
behavioral models of malicious connections in the network
relationship with written text and showing how each flow
by means of studying the long term characteristics of the
may depend on the previous one. The HTTP protocol was
traffic. In this work we apply this idea of behavioral models
also shown to follow this dependencies. Hence, we can study
to the specific problem of DNS DGA traffic detection. This
each flow as being generated as the result of a probability
is the first time that the Stratospheres models are used for
distribution for a transition from the previous flow. It is
the DGA problem.
very important for this assumption that all the flows for
The first step in the Stratosphere project is to model
the same connection are grouped together. Finally, as it was
the behavior of each connection by aggregating the flows
previously described, three features are computed for each
according to a 4-tuple composed of: the source IP address,
flow to obtain a letter. This letter is used as the state of
the destination IP address, the destination port and the
each flow. Based on the idea that flows are generated as
protocol. All the flows that match a tuple key are aggregated
a stochastic process and that the probability of observing a
together in a connection. The behavioral model of each
flow depends only on the previous flow, we can analyze this
connection is then created applying the following steps to
string of letters together with their probability distributions
each of its flows:
as a Markov Chain [11].
1) Extract three features of each flow: size, duration and Using the definition of Markov Chains and the behavioral
periodicity. string of letters previously created, the Stratosphere Project
2) Assign to each flow a state letter according to the use both to make detections [12]. Detection is done in three
features extracted and the assignment strategy shown phases. First, a Markov Chain model is created (trained) for
in Table I. each known connection. Creating each Markov Chain model
3) All the states (letters) of the connection are repre- results in a transition matrix and initialization vector per
sented as a string and stored as part of the behavioral connection, referenced here as the detection model of the
model. connection. Second, the new unknown incoming traffic that
In the second step a letter is assigned to each is going to be evaluated is separated into connections and
flow using the strategy shown in Table I. After for each connection we generate the corresponding string of
the assignment, each connection has its own string letters. Finally, the method evaluates which is the probability
of letters that represents its behavior in the net- that each new string of letters had been generated by each
work. For example, the connection named 10.0.2.103- one of the trained detection models. If the probability that
TABLE I
A SSIGNMENT MODEL OF BEHAVIORAL LETTERS OF THE To reach the representativeness goal, the training pro-
S TRATOSPHERE P ROJECT. cess follows a methodology supported by the well-known
stratified cross validation approach. In general, given a set
Size small Size medium Size Large
Dur. Dur. Dur Dur. Dur. Dur Dur. Dur. Dur of models (denoted as T CV ), the threshold t for some
Short Med. Long Short Med Long Short Med Long specific model m ∈ T CV is generated using two randomly
Strong periodicity a b c d e f g h i
Weak periodicity A B C D E F G H I generated sets T CVt ⊂ T CV and T CVv ⊂ T CV (see line
Weak Non-periodicity r s t u v w x y z 3). The first one is used for calculating (training) t whereas
Strong Non-periodicity R S T U V W X Y Z
No Data 1 2 3 4 5 6 7 8 9 the second is used for validating the performance of m on
Symbols for time difference some unseen models .
Between 0 and 5 seconds: .
Between 0 and 5 seconds: , Both sets T CVt and T CVv are complementary (i.e.
Between 0 and 5 seconds: + T CVt +T CVv = T CV −{m}) and, as mentioned in section
Between 0 and 5 seconds: *
Between 0 and 5 seconds: 0 IV, they are built ensuring the representativeness for all the
classes observed in T CV . The same process is repeated k
times using different subsets on each k. At the end, in line
the new behavioral model was generated by the trained 6, a final threshold tf is calculated considering the average
detection model is more than a certain decision threshold, of all the previous values of t.
then an alert is generated. Otherwise, no alert is generated. In line 9, the algorithm considers those situations where,
This threshold is important to control the generalization for a given model, it won’t be possible to find a match
capabilities of the detection method. If the threshold is too against the any of the compared model present in all the
small, the detection model will only detect a very similar possible instances of T CVt . The default policy, for those
behavior. If the threshold is too large, a high number of special cases, is to assign threshold of two when the model
different models will be matched. describes normal behavior and 1.1 when the model describes
DGA (see lines 10 and 13 respectively).
III. T RAINING THE D ECISION T HRESHOLD
As it was mentioned in the previous Section, before a IV. DATASETS D ESCRIPTION
model could be used to detect malicious DNS traffic it
is necessary to find the proper threshold for the decision To provide more data representativeness, our detection
threshold. Such process is denoted as the training the method is evaluated against several datasets coming from
threshold and it is based on considering the distances from different sources. Such sources include traffic from indi-
a given model to other models classes. vidual computers, small size networks and large university
The threshold selection and the consequent performance campus networks. All datasets are publicly available as part
of a model will depend on the quality and quantity of the of the Malware Capture Facility Project (MCFP) [2]. A
models used for training it. Therefore, it is important to careful analysis of the dataset revealed five different groups
guarantee a good representativeness of all the classes during of DNS behaviors that should be differentiated: Group A,
the training process. The complete training methodology for that consists of normal traffic. Group B, that consists of
setting the threshold is described in Algorithm 1. DNS traffic generated by a Botnet but that is not DGA.
This are typically domain resolutions that the botnet uses
Algorithm 1 Pseudo code of the algorithm for training the for trying to send SPAM. Group C, that consists of a first
decision threshold type of DGA. Group D, that consists of a second group of
1: for all m in T CV do DGA traffic. And finally Group E, that consists of the DNS
2: for k=1 to k=10 do traffic generated as part of a Fast-Flux botnet. It is very
3: (T CVt ,T CVv )= randomStratif iedsplit(T CV − important to note that the DNS traffic is not only separated
{m}) in Malicious and Normal, but that the different types of
4: t= T rain(m, T CVt ) malicious traffic are considered given their differences.
5: V alidate(m, t, T CVv ) Table II provides information about each of these five
6: tavg = tavg + t groups. The first two columns show the Label used for
7: end for referencing the group and a brief description of the behavior
8: tavg =tavg /10 included in such group. Then, in column three and four, the
9: if avg < 1 then table shows the number of pcap datasets included in the
10: if m is normal then group as well as the number of DNS connections in all the
11: tf inal =2.0 pcaps.
12: end if For the purpose of evaluation, these groups were also
13: if m is DGA then separated in a dataset for training, one for cross-validation
14: tf inal =1.1 and one for testing. Since the datasets have to be repre-
15: end if sentative of all the behaviors, each of the three datasets
16: else include random DNS connections from all the five groups.
17: tf inal =tagv The testing dataset is composed of one DNS connection
18: end if from each group, which means that is composed of five
19: end for
different pcap datasets.
TABLE II TABLE III
G ENERAL INFORMATION ABOUT THE DATASETS TP, FP, TN AND TP ASSIGNMENT

Group Desc. Pcap Datasets Connections Training label Model Response Testing Label
A Normal Traffic 4 10 Normal Botnet
B DNS Botnet 3 10 Botnet Match FP TP
C DGA1 1 9 Normal Match TN FP
D DGA2 5 9 Botnet Doesn’t match TN FN
E Fast Flux 1 4 Normal Doesn’t match TN FN

V. E XPERIMENTS S ETUP that the Normal group did not miss-detected any botnet DNS
traffic, which means that the models of Normal traffic are
The performance evaluation of our method was done by robust and good. Results shows that when the Normal group
training the models on the training dataset, verifying them tried to detect the botnet DNS groups, the results were all
on the cross-validation dataset and validating them on the FNR. It implies the model couldn’t detect them, which is a
testing dataset. Remember that each dataset is composed good result.
of several connections from all the five groups of DNS If we consider the table III, the expected value returned
behavior. by the system when evaluated against Normal traffic should
The goal of the testing experiments was to analyze the be TN. Also, if the training label Normal, doesn’t match
detection potential of the trained models on each of the five the testing label Botnet, a FN value is returned. Therefore,
group of DNS behaviors. Our testing methodology uses time given the values of FNR and TNR, it is possible to affirm
windows for the computation of errors. The idea is that alerts that models in this group are capable of detecting them-
should be reported at most after a fixed amount of time and selves, without incorrectly generalizing to other malicious
that the analyst can not wait until the capture is finished. behaviors.
Therefore, we separate the testing datasets on time windows
TABLE IV
of five minutes and on each time window we apply our P ERFORMANCE OF G ROUP A (N ORMAL ) D ETECTION M ODELS ON ALL
detection method to obtain the amount of errors: TN, TP, THE TESTING DATASETS
FP, FN. It is worth noticing that since the models can match
or not-match on each of the five behavioral groups we need
Normal TPR: - TNR: 1
a very clear specification of when we consider each error. FPR: 0 FNR: -
Table III shows how we interpret the errors FP, TP, FN and DNS Bot. TPR: 0 TNR: -
TN. FPR: - FNR: 1
DGA 1 TPR: 0 TNR: -
There was one testing experiment per trained model. Each FPR: - FNR: 1
experiment consisted in running the detection algorithm with DGA 2 TPR: 0 TNR: -
that trained model on all the five testing datasets. When the FPR: - FNR: 1
FastFlux TPR: 0 TNR: -
method was run on a unique testing dataset, we computed FPR: - FNR: 1
the errors on each time window, and then we sum all the
errors for that testing dataset. Then, for each testing dataset Table V show the results from the trained detection
we computed the performance metrics ratios TNR, TPR, models of Group B, DNS Botnet. In this case, it is possible
FPR and FNR. Therefore, each training model has five group to observe a FNR value of 1 (100%), which means that
of results, one per testing dataset, that is, one for each of this trained model have been incapable of detecting itself.
the five groups of DNS behaviors. We did this separation of Unexpectedly, this group has been able to detect the behavior
datasets and experiments because we need to know how each of the groups D (DGA 2) and E (Fast Flux).
trained behavioral group (represented by each trained model)
could detect each testing behavioral group (represented by TABLE V
P ERFORMANCE OF G ROUP B (DNS B OTNET ) D ETECTION M ODELS ON
each testing model and dataset). ALL THE TESTING DATASETS

VI. E XPERIMENTS R ESULTS


Normal TPR: - TNR: 0.996
After executing the experiments using each training model FPR: 0,003 FNR: -
against all the five datasets (behavioral groups) we can DNS Botnet TPR: 0 TNR: -
FPR: - FNR: 1
compute and show the results. Consider that there were DGA 1 TPR: 0 TNR: -
five testing experiments, each experiment represented how FPR: - FNR: 1
each DNS behavioral group (trained) could detect each DNS DGA 2 TPR: 0,0002 TNR: -
FPR: - FNR: 0,9
behavioral group (testing). On each experiment the perfor- FastFlux TPR: 0,0015 TNR: -
mance metrics True Positive Rate (TPR), True Negative FPR: - FNR: 0.9
Rate (TNR), False Positive Rate (FPR) and False Negative
Rate (FNR) are shown for each testing DNS behavioral Table VI shows the results for the experiment using the
group. The experiments allowed us to identify which DNS detection models from the training dataset C (DGA1). This
behavioral models were more likely to detect their own group is able to detect itself with a TPR of 1 (100%). Also,
groups or generalize to other DNS behaviors. it has shown some level of generalization (with a TPR value
Table IV shows the results of using the Group A (Normal) of 0.041) for detecting the DNS Botnet behavior of Group
as training and testing on all five groups. This table shows B.
TABLE VI
P ERFORMANCE OF G ROUP C (DGA1) D ETECTION M ODELS ON ALL
THE TESTING DATASETS

Normal TPR: - TNR: 0.94


FPR: 0,058 FNR: -
DNS Botnet TPR: 0,041 TNR: -
FPR: - FNR: 0,95
DGA 1 TPR: 1 TNR: -
FPR: - FNR: 0
DGA 2 TPR: 0 TNR: -
FPR: - FNR: 1
FastFlux TPR: 0 TNR: -
FPR: - FNR: 1

TABLE VII
P ERFORMANCE OF G ROUP D (DGA2) D ETECTION M ODELS ON ALL
THE TESTING DATASETS Fig. 1. Detections between groups

Normal TPR: - TNR: 0.998


FPR: 0,0012 FNR: -
DNS Botnet TPR: 0 TNR: - generalize to group B (DNS Botnet) behavior. Lastly, the
FPR: - FNR: 1 Group A (Normal) can detects itself, and do not mis-detect
DGA 1 TPR: 0 TNR: -
FPR: - FNR: 1 other behaviors.
DGA 2 TPR: 0,05 TNR: - Regarding the discrimination between Normal and Ma-
FPR: - FNR: 0,94 licious DNS traffic, we can observe two facts. First, when
FastFlux TPR: 0,003 TNR: -
FPR: - FNR: 0,99 normal detection models have been evaluated against the
testing groups, they have shown a FNR value of 1 for all the
malicious traffic testing models (DNS Botnet, DGA 1, DGA
Table VII shows the results of the detection model from 2 and Fast Flux). According to Table III, such results imply
the training dataset D (DGA2). In this experiment, this group that the training Normal models have not matched any botnet
have been able to partially detect itself (with a TPR value label. The second fact is that all the malicious detection
0.05) and also detect the FastFlux behavior of group E (with models tested against the normal testing dataset have shown
a TPR value of 0.003). high rates of TN values. If we consider the Table III, it
Table VIII shows the results for the trained detection corresponds to the case were the training label Botnet have
model from group E (Fast Flux). The Table shows that not matched the Testing label Normal. Although, there are
this group can detect itself and also can have some level values of FPR, these are very low. Therefore, it is possible
of generalization detecting group D (DGA2). to assume that there is a considerable segmentation between
normal and malware DNS traffic behavior. We believe that
TABLE VIII the latter is a significant finding, because it means that
P ERFORMANCE OF G ROUP E (FAST-F LUX ) D ETECTION M ODELS ON
ALL THE TESTING DATASETS normal traffic behavior is different from malicious traffic
and the system can effectively differentiate them.
Regarding the generalization capabilities of the detection
Normal TPR: - TNR: 0.93
FPR: 0,07 FNR: - models, the fact is that with the exception of the group B
DNS Botnet TPR: 0 TNR: - (DNS Botnet), all the detection models from the remaining
FPR: - FNR: 1 groups were able to detect themselves and showed some
DGA 1 TPR: 0 TNR: -
FPR: - FNR: 1 generalization to other groups. Even though such TPR
DGA 2 TPR: 0,0003 TNR: - rate values could seem low at first, the fact is these TP
FPR: - FNR: 0,99 translate into alerts triggered by a fully operational detection
FastFlux TPR: 0,046 TNR: -
FPR: - FNR: 0,956 system. Consequently, under a real-life situation a security
analyst will be forced to inspect malicious traffic and could
eventually find the computers infected with malware.
VII. E XPERIMENTS D ISCUSSION
To summarize, we compared the trained detection models VIII. C ONCLUSION
from the five behavioral groups, against each one of the Based on a modeling network behavior approach, our
models from the testing datasets, that also represent the same work focuses on the viability of DGA traffic detection using
five DNS behaviors. With this information it is possible to the detection method proposed in the Stratosphere project.
find out how each DNS behavioral group detect the others. Five groups of DNS traffic (Normal, DNS Botnet, DGA1,
Fig. 1 summarizes this by showing which groups detect DGA2 and Fast Flux) were grouped and tested to verify
themselves and which shown some generalization to detect their detection rate considering a time window approach.
others. It seems that Groups E (Fast Flux) and C (DGA2) Certainly, the evaluation of the detection models on
can detect themselves and generalize to more than one different time windows may turn more difficult the correct
group. Group B (DNS Botnet), generalizes to two groups of detection of DGA behaviors. However, the main goal of
behaviors, but it can not detect itself. The detection model of the Stratosphere Project is the development of a real time
Group B (DGA1) was capable of detecting itself and also application to detect Botnet behavior. Consequently, the
methodology proposed in this work to evaluate detection network. https://mcfp.felk.cvut.cz/publicDatasets/CTU-
models aim at being as close as possible to a real imple- Malware-Capture-Botnet-91/.
mentation. • CTU-Malware-Capture-Botnet-69. DGA and fastflux
Our detection method, based on the use of Markov malware. http://mcfp.felk.cvut.cz/publicDatasets/CTU-
Models, has shown that the detection models from the DGA Malware-Capture-Botnet-69/
groups were able to generalize others groups of traffic be- • CTU-Malware-Capture-Botnet-25-1. DGA Malware.
havior. In particular, the detection models from group DGA http://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-
1 were able to detect themselves and generalize to DNS Capture-Botnet-25-1/
Botnet. In the case of detection models from group DGA • CTU-Malware-Capture-Botnet-25-2. DGA Malware.
2, they were able to detect themselves and generalize to the http://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-
FastFlux behavior. In addition, non-DGA detection models Capture-Botnet-25-2/
(DNS Botnet and FastFlux) have recognized behaviors from • CTU-Malware-Capture-Botnet-25-3. DGA Malware.
the DGA 2 group. http://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-
Despite this results, the most important finding observed Capture-Botnet-25-3/
during our experiments was the notable difference between • CTU-Malware-Capture-Botnet-25-4. DGA Malware.
the behavior of normal and botnet DNS models. During the http://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-
testing phase, all the detection models from botnet traffic Capture-Botnet-25-4/
have shown a high TNR against the Normal testing model. • CTU-Malware-Capture-Botnet-25-5. DGA Malware.
Also, the detection models from the Normal group have http://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-
returned a FNR value of 100% against the malicious traffic. Capture-Botnet-25-5/
Such results are an important confirmation of the viability • CTU-Malware-Capture-Botnet-25-6. DGA Malware.
of the use of behavioral models in the detection of DGA http://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-
traffic. Capture-Botnet-25-6/
Finally, one of the limitations we have identified in our
R EFERENCES
work is that, at the moment, we don’t have datasets with both
normal and botnet traffic for our testing. With such datasets [1] J. Kwon, J. Lee, H. Lee, and A. Perrig, “PsyBoG: A scalable
botnet detection method for large-scale DNS traffic,” Computer
our experiments would be more realistic. Further research is Networks, vol. 97, pp. 48–73, 2016. [Online]. Available: http:
planned in this regard to better test the behavioral detection //linkinghub.elsevier.com/retrieve/pii/S1389128615004843
of DGA traffic. [2] S. Garcia, “Stratoshpere Project,” 2015. [Online]. Available:
https://stratosphereips.org
[3] M. Antonakakis and R. Perdisci, “From throw-away traffic to
IX. ACKNOWLEDGMENTS bots: detecting the rise of DGA-based malware,” Proceedings
of the 21st USENIX Security Symposium, p. 16, 2012.
The authors would like to thank the financial support [Online]. Available: https://www.usenix.org/system/files/conference/
received by CVUT and UNCuyo during this work. The usenixsecurity12/sec12-final127.pdf
founding provided by the Faculty Mobility program of [4] L. Bilge, E. Kirda, C. Kruegel, M. Balduzzi,
and S. Antipolis, “EXPOSURE : Finding Malicious
the International Relations Department from UNCuyo. The Domains Using Passive DNS Analysis,” Ndss, pp. 1–
financial support from SeCTyP-UNCuyo through project No. 17, 2011. [Online]. Available: http://scholar.google.com/
M004 and the financial support of NlNet Foundation. scholar?hl=en{\&}btnG=Search{\&}q=intitle:EXPOSURE+:
+Finding+Malicious+Domains+Using+Passive+DNS+Analysis{\#}0
[5] H. Gao, V. Yegneswaran, Y. Chen, P. Porras, S. Ghosh,
A PPENDIX A J. Jiang, and H. Duan, “An empirical reexamination of
DATASET OF M ALICIOUS AND N ORMAL DNS T RAFFIC global DNS behavior,” SIGCOMM 2013 - Proceedings of
the ACM SIGCOMM 2013 Conference on Applications,
The datasets used in this work are very large, therefore Technologies, Architectures, and Protocols for Computer
its description and labels were published online. Communication, pp. 267–278, 2013. [Online]. Available: http:
//dl.acm.org/citation.cfm?doid=2486001.2486018$\backslash$nhttp:
The Normal datasets used were: //www.cs.northwestern.edu/{∼}ychen/Papers/sigcomm13.pdf
• CTU-Normal-4. Normal with two DNS connections. [6] M. Grill, I. Nikolaev, V. Valeros, and M. Rehak, “Detecting DGA
malware using NetFlow,” 2015.
https://mcfp.felk.cvut.cz/publicDatasets/CTU-Normal- [7] C. Wallenta, J. Kim, P. J. Bentley, and S. Hailes, “Detecting inter-
4-only-DNS est cache poisoning in sensor networks using an artificial immune
• CTU-Normal-5. Normal with five DNS connections. algorithm,” Applied Intelligence, vol. 32, no. 1, pp. 1–26, 2010.
[8] S. Garcia, “Modelling the Network Behaviour of Malware To
https://mcfp.felk.cvut.cz/publicDatasets/CTU-Normal- Block Malicious Patterns . The Stratosphere Project : a Behavioural
5/ Ips,” in Virus Bulletin, no. September, 2015, pp. 1–8. [Online].
• CTU-Normal-6. Normal with two DNS connections. Available: https://www.virusbtn.com/pdf/conference{\ }slides/2015/
Garcia-VB2015.pdf
https://mcfp.felk.cvut.cz/publicDatasets/CTU-Normal- [9] D. L. Isaacson and R. W. Madsen, Markov chains, theory and
6-filtered/ applications. Wiley New York, 1976, vol. 4.
[10] L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank
The Malware datasets used were: citation ranking: bringing order to the web.” 1999.
• CTU-Malware-Capture-Botnet- [11] J. G. Kemeny and J. L. Snell, Finite markov chains. van Nostrand
Princeton, NJ, 1960.
31-1. Two DNS connections. [12] S. Garcia, “Identifying, Modeling and Detecting Botnet Behaviors in
https://mcfp.felk.cvut.cz/publicDatasets/CTU- the Network,” Ph.D. dissertation, UNICEN University, 2014.
Malware-Capture-Botnet-31-1/
• CTU-Malware-Capture-Botnet-91. This dataset is com-
posed of 11 infected hosts and 11 normal hosts in a

Potrebbero piacerti anche