Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
{marco,chris,vigna}@cs.ucsb.edu
ABSTRACT
users browser or in the browsers plugins. If successful, the exploit downloads malware on the victim machine, which, as a consequence, often becomes a member of a botnet.
Several factors have contributed to making drive-by-download
attacks very effective. First, vulnerabilities in web clients are widespread (in 2008, such vulnerabilities constituted almost 15% of the
reports in the CVE repository [18]), and vulnerable web clients
are commonly used (about 45% of Internet users use an outdated
browser [8]). Second, attack techniques to reliably exploit web
client vulnerabilities are well-documented [4, 3335]. Third, sophisticated tools for automating the process of fingerprinting the
users browser, obfuscating the exploit code, and delivering it to the
victim, are easily obtainable (e.g., NeoSploit, and LuckySploit [15]).
The mix of widespread, vulnerable targets and effective attack
mechanisms has made drive-by downloads the technique of choice
to compromise large numbers of end-user machines. In 2007, Provos et al. [28] found more than three million URLs that launched
drive-by-download attacks. Even more troubling, malicious URLs
are found both on rogue web sites, that are set up explicitly for
the purpose of attacking unsuspecting users, and on legitimate web
sites, that have been compromised or modified to serve the malicious content (high-profile examples include the Department of
Homeland Security and the BusinessWeek news outlet [10, 11]).
A number of approaches have been proposed to detect malicious web pages. Traditional anti-virus tools use static signatures
to match patterns that are commonly found in malicious scripts [2].
Unfortunately, the effectiveness of syntactic signatures is thwarted
by the use of sophisticated obfuscation techniques that often hide
the exploit code contained in malicious pages. Another approach
is based on low-interaction honeyclients, which simulate a regular browser and rely on specifications to match the behavior, rather
than the syntactic features, of malicious scripts (for example, invoking a method of an ActiveX control vulnerable to buffer overflows
with a parameter longer than a certain length) [14, 23]. A problem with low-interaction honeyclients is that they are limited by the
coverage of their specification database; that is, attacks for which a
specification is not available cannot be detected. Finally, the stateof-the-art in malicious JavaScript detection is represented by highinteraction honeyclients. These tools consist of full-featured web
browsers typically running in a virtual machine. They work by
monitoring all modifications to the system environment, such as
files created or deleted, and processes launched [21, 28, 37, 39]. If
any unexpected modification occurs, this is considered as the manifestation of an attack, and the corresponding page is flagged as
malicious. Unfortunately, also high-interaction honeyclients have
limitations. In particular, an attack can be detected only if the vulnerable component (e.g., an ActiveX control or a browser plugin)
targeted by the exploit is installed and correctly activated on the de-
General Terms
Security
Keywords
Drive-by-download attacks, web client exploits, anomaly detection
1.
INTRODUCTION
281
2.
BACKGROUND
282
3.
DETECTION APPROACH
We have seen that sophisticated JavaScript-based malware is difficult to detect and analyze using existing approaches. Thus, there
is the need for a novel approach that overcomes current challenges.
This approach has to be robust to obfuscation techniques, must handle accurately the dynamic features of JavaScript, and should not
require reconfiguration when new vulnerabilities are exploited. To
do so, our approach relies on comprehensive dynamic analysis and
anomaly detection.
3.1
3.1.2
Features
Anomaly detection is based on the hypothesis that malicious activity manifests itself through anomalous system events [5]. Anomaly detection systems monitor events occurring in the system under
analysis. For each event, a number of features are extracted. During a learning phase, normal feature values are learned, using
one or more models. After this initial phase, the system is switched
to detection mode. In this mode, the feature values of occurring
events are assessed with respect to the trained models. Events that
are too distant from the established models of normality are flagged
as malicious.
In our system, the features characterize the events (e.g., the instantiation of an ActiveX control, the invocation of a plugins method, or the evaluation of a string using the eval function) occurring
during the interpretation of the JavaScript and HTML code of a
page. In the following, we introduce the features used in our system by following the steps that are often followed in carrying out
an attack, namely redirection and cloaking, deobfuscation, environment preparation, and exploitation.
3.1.1
Deobfuscation
Typically, before a victim is served the exploit code, several activities take place. First, the victim is often sent through a long
chain of redirection operations. These redirections make it more
difficult to track down an attack, notify all the involved parties (e.g.,
registrars and providers), and, ultimately, take down the offending
sites.
In addition, during some of these intermediate steps, the users
browser is fingerprinted. Depending on the obtained values, e.g.,
brand, version, and installed plugins, extremely different scripts
may be served to the visitor. These scripts may be targeting different vulnerabilities, or may redirect the user to a benign page, in
case no vulnerability is found.
Finally, it is common for exploit toolkits to store the IP addresses
of victims for a certain interval of time, during which successive
visits do not result in an attack, but, for example, in a redirection to
a legitimate web site.
We monitor two features that characterize this kind of activity:
Feature 1: Number and target of redirections. We record the
number of times the browser is redirected to a different URI, for
example, by responses with HTTP Status 302 or by the setting of
specific JavaScript properties, e.g., document.location. We
also keep track of the targets of each redirection, to identify redirect
chains that involve an unusually-large number of domains.
Feature 2: Browser personality and history-based differences.
We visit each resource twice, each time configuring our browser
with a different personality, i.e., type and version. For example, on
3.1.3
Environment preparation
283
3.1.4
The exploitation step is required to perform the attack and compromise vulnerable components. Of course, different types of attacks might affect certain features more than others. We found
that, in practice, these three features are effective at characterizing
a wide range of exploits.
3.1.5
Exploitation
The last step of the attack is the actual exploit. Since the vast majority of exploits target vulnerabilities in ActiveX or other browser
plugins, we extract the following three features related to these
components:
Feature 8: Number of instantiated components. We track the
number and type of browser components (i.e., plugins and ActiveX
controls) that are instantiated in a page. To maximize their success rate, exploit scripts often target a number of vulnerabilities in
different components. This results in pages that load a variety of
unrelated plugins or that load the same plugin multiple times (to
attempt an exploit multiple times).
Feature 9: Values of attributes and parameters in method calls.
For each instantiated component, we keep track of the values passed
as parameters to its methods and the values assigned to its properties. The values used in exploits are often very long strings, which
are used to overflow a buffer or other memory structures, or large
integers, which represent the expected address of the shellcode.
Feature 10: Sequences of method calls. We also monitor the
sequences of method invocations on instantiated plugins and ActiveX controls. Certain exploits, in fact, perform method calls that
are perfectly normal when considered in isolation, but are anomalous (and malicious) when combined. For example, certain plugins
allow to download a file on the local machine and to run an executable from the local file system. An attack would combine the
two calls to download malware and execute it.
3.2
Models
In the context of anomaly detection, a model is a set of procedures used to evaluate a certain feature. More precisely, the task
of a model is to assign a probability score to a feature value. This
probability reflects the likelihood that a given feature value occurs,
given an established model of normality. The assumption is that
feature values with a sufficiently low probability are indication of a
potential attack.
A model can operate in training or detection mode. In training
mode, a model learns the characteristics of normal events and deter-
284
Attack Class
Example
Vulnerability
CVE-2008-1027
CVE-2006-0003
CVE-2008-4844
F1
Useful Features
F2
F3
F4
F5
F6
Necessary Features
F7
F8
F9
F10
X
X
X
X
Table 1: Required and optional features for each attack class. An X in a column means that the corresponding feature characterizes
a required step in an attack. Features are numbered from 1 to 10 as in Section 3.
mines the threshold to distinguish between normal and anomalous
feature values. In detection mode, the established models are used
to determine an anomaly score for each observed feature value. For
our system, we use several models provided by libAnomaly, a library to develop anomaly detection systems [16,17]. Here, we only
briefly describe these models and refer the interested reader to the
original references for further information.
Token Finder. The Token Finder model determines if the values
of a certain feature are elements of an enumeration, i.e., are drawn
from a limited set of alternatives. In legitimate scripts, certain features can often have a few possible values. For example, in an
ActiveX method that expects a Boolean argument, the argument
values should always be 0 or 1. If a script invokes that method with
a value of 0x0c0c0c0c, the call should be flagged as anomalous.
We apply this model to each method parameter and property
value exposed by plugins.
String Length and Occurrence Counting. The goal of this model
is to characterize the normal length of a string feature. We also
use it to model the expected range of a feature that counts the occurrence of a certain event. The rationale behind this model is that,
in benign scripts, strings are often short and events occur only a
limited number of times. During an attack, instead, longer strings
are used, e.g., to cause overflows, and certain events are repeated a
large number of times, e.g., memory allocations used to set up the
process heap.
We use this model to characterize the length of string parameters passed to methods and properties of plugins, and the length of
dynamically evaluated code. In addition, we use it to characterize
all the features that count how many times a certain event repeats,
i.e., the number of observed redirections, the ratio of string definitions and uses, the number of code executions, the number of bytes
allocated through string operations, the number of likely shellcode
strings, and the number of instantiated plugins.
Character Distribution. The Character Distribution model characterizes the expected frequency distribution of the characters in a
string. The use of this model is motivated by the observation that, in
most cases, strings used in JavaScript code are human-readable and
are taken from a subset of some well-defined character set. On the
contrary, attacks often employ strings containing unusual characters, e.g., non-printable characters, in order to encode binary code
or to represent memory addresses.
We use this model to characterize the values passed as arguments
to methods and to properties of plugins.
Type Learner. The Type Learner model was not present in the
original libAnomaly library, and we added it for this work. This
model determines if the values of a certain feature are always of
the same type. For example, during normal usage, the parameter of
a plugins method may always contain a URL. However, during an
attack, the parameter may be used to overwrite a function pointer
with a specific memory address (i.e., a large integer). In this case,
the Type Learner would flag this value as anomalous.
In training mode, the Type Learner classifies feature values as
one of several possible types. The types currently recognized are
small integers (integer values smaller or equal to 1024), large inte-
3.3
Emulation
To deal with exploits that heavily rely on dynamic JavaScript features and sophisticated browser functionality, we visit web pages
with a customized browser, which loads the page, executes its dynamic content, and records the events used by the anomaly detection system. In particular, in our system, a full browser environment is emulated by HtmlUnit, a Java-based framework for testing web-based applications [9]. HtmlUnit models HTML documents and provides an API to interact with these documents. It
supports JavaScript by integrating Mozillas Rhino interpreter [22].
HtmlUnit implements the standard functionality provided by regular browsers, except visual page rendering. We have instrumented
HtmlUnit and Rhino to extract the features used to detect and analyze malicious code.
We have decided to use HtmlUnit rather than instrumenting a
traditional browser, such as Firefox or Internet Explorer, for several reasons. First, HtmlUnit makes it easy to simulate multiple
browser personalities, which is used in one of our detection features. For example, depending on the personality we want to assume, we can easily configure the value of HTTP headers that are
transmitted with each request (e.g., the User-Agent header), the
settings of JavaScript attributes (such as the navigator object
and its properties), and the handling of certain HTML and JavaScript features and capabilities that are implemented differently
in different browsers (e.g., event handlers are registered using the
addEventListener() function in Firefox and the attachEvent() function in Internet Explorer). While some of these differences could be handled by existing browsers extensions (e.g.,
the User Agent plugin for Firefox [25]), others would require more
substantial changes to the browser itself.
Second, in HtmlUnit, it is possible to simulate an arbitrary system environment and configuration. In fact, we have modified
HtmlUnit so that, regardless of the actual system configuration,
requests for loading any ActiveX control or plugin are successful
and cause the instantiation of a custom logging object, which keeps
track of all methods and attributes invoked or set on the control.
This allows us to detect, without any further configuration effort,
exploits that target any control or plugin, even those for which no
vulnerability has been publicly disclosed. This is different from
traditional high-interaction honeyclients (and real browsers), where
285
4.
5.
SYSTEM EVALUATION
5.1
Detection Results
ANALYSIS
286
Dataset
Samples
(#)
FN
ClamAV
FN
PhoneyC
FN
JSAND
Capture-HPC
FN
Spam Trap
SQL Injection
Malware Forum
Wepawet-bad
257
23
202
341
1 (0.3%)
0 (0.0%)
1 (0.4%)
0 (0.0%)
243 (94.5%)
19 (82.6%)
152 (75.2%)
250 (73.3%)
225 (87.5%)
17 (73.9%)
85 (42.1%)
248 (72.7%)
0 ( 0.0%)
31 (9.1%)
Total
823
2 (0.2%)
664 (80.6%)
575 (69.9%)
31 (5.2%)
287
exploit targeting that application. In this case, we say that the application covers those pages. Uncovered pages (those that do
not target any of the installed applications) will not be detected as
malicious. To solve the problem, we used the greedy algorithm (the
problem is known to be NP-complete), where, at each step, we add
to the set of installed applications a program that covers the largest
number of uncovered pages. Figure 1 shows the results. It is interesting to observe that even though a relatively high detection rate
can be achieved with a small number of applications (about 90%
with the top 5 applications), the detection curve is characterized
by a long tail (one would have to install 22 applications to achieve
98% of detection). Clearly, a false negative rate between 10% and
2% is significant, especially when analyzing large datasets.
Large-scale comparison with high-interaction honeyclients. We
performed an additional, more comprehensive experiment to compare the detection capability of our tool with high-interaction honeyclients. More precisely, we ran JSAND and Capture-HPC sideby-side on the 16,894 URLs of the Wepawet-uncat dataset. Each
URL was analyzed as soon as it was submitted to the Wepawet
online service. Capture-HPC and JSAND were run from distinct
subnets to avoid spurious results due to IP cloaking.
Overall, Capture-HPC raised an alert on 285 URLs, which were
confirmed to be malicious by manual analysis. Of these, JSAND
missed 25. We identified the following reasons for JSANDs false
negatives. In four cases, JSAND was redirected to a benign page
(google.cn) or an empty page, instead of being presented with
the malicious code. This may be the result of a successful detection of our tool or of its IP. An internal bug caused the analysis to
fail in three additional cases. Finally, the remaining 18 missed detections were the consequence of subtle differences in the handling
of certain JavaScript features between Internet Explorer and our
custom browser (e.g., indirect calls to eval referencing the local
scope of the current function) or of unimplemented features (e.g.,
the document.lastModified property).
Conversely, JSAND flagged 8,714 URLs as anomalous (for 762
of these URLs, it was also able to identify one or more exploits).
Of these, Capture-HPC missed 8,454. We randomly sampled 100
URLs from this set of URLs and manually analyzed them to identify the reasons for the different results between JSAND and CaptureHPC. We identified three common cases. First, an attack is launched
but it is not successful. For example, we found many pages (3,006
in the full Wepawet-uncat dataset) that were infected with JavaScript code used in a specific drive-by campaign. In the last step
of the attack, the code redirected to a page on a malicious web
site, but this page failed to load because of a timeout. Nonetheless, JSAND flagged the infected pages as anomalous because of
the obfuscation and redirection features, while Capture-HPC did
not observe the full attack and, thus, considered them benign. We
believe JSANDs behavior to be correct in this case, as the infected
pages are indeed malicious (the failure that prevents the successful
attack may be only temporary). Second, we noticed that CaptureHPC stalled during the analysis (there were 1,093 such cases in
total). We discovered that, in some cases, this may be the consequence of an unreliable exploit, e.g., one that uses too much memory, causing the analyzer to fail. Finally, a missed detection may
be caused by evasion attempts. Some malicious scripts, for example, launch the attack only after a number of seconds have passed
(via the window.setTimeout method) or only if the user minimally interacts with the page (e.g., by releasing a mouse button,
as it is done in the code used by the Mebroot malware). In these
cases, JSAND was able to expose the complete behavior of the page
thanks to the forced execution technique described in Section 3.
5.2
Operational Experience
288
100
80
60
40
20
10
15
20
25
30
1000
8
100
6
4
10
2
35
Useful Features
Necessary Features
100
200
300
400
500
Samples
600
700
800
1
900
6.
7.
RELATED WORK
POSSIBLE EVASION
289
8.
CONCLUSIONS
Acknowledgment
This work has been supported by the National Science Foundation,
under grants CCR-0238492, CCR-0524853, CCR-0716095, CCR0831408, CNS-0845559 and CNS-0905537, and by the ONR under
grant N000140911042. We would like to thank Jose Nazario and
the malwaredomainlist.com community for providing data.
9.
REFERENCES
290