Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
To Appear in the Proceedings of the 11th USENIX Security Symposium (Security ’02)
Abstract 1 Introduction
Dec 15
Oct 15
Jan 15
Nov 1
Dec 1
Jan 1
Oct 1
2000
Distinct Remote Hosts Attacking LBNL
Sep 19
Distinct Remote Hosts Attacking LBNL
Aug 1
Sep 1
Jul 19
Oct 1
20000
Code Red I v2
1500
Code Red I v2 Code Red II
Code Red II Nimda
Nimda
1000
10000
500
5000
0
0
0 50 100 150
0 20 40 60 80
Days Since Sept. 20, 2001
Days Since July 18, 2001
Figure 1: Onset of Code Red I v2, Code Red II, and Nimda: Figure 2: The endemic nature of Internet worms: Number
Number of remote hosts launching confirmed attacks corre- of remote hosts launching confirmed attacks corresponding to
sponding to different worms, as seen at the Lawrence Berkeley different worms, as seen at the Lawrence Berkeley National
National Laboratory. Hosts are detected by the distinct URLs Laboratory, over several months since their onset. Since July,
they attempt to retrieve, corresponding to the IIS exploits and 139,000 different remote Code Red I hosts have been con-
attack strings. Since Nimda spreads by multiple vectors, the firmed attacking LBNL; 125,000 different Code Red II hosts;
counts shown for it may be an underestimate. and 63,000 Nimda hosts. Of these, 20,000 were observed to
be infected with two different worms, and 1,000 with all three
worms. (Again, Nimda is potentially an underestimate because
[Sp89, ER89], but the severity of their threat has rapidly we are only counting those launching Web attacks.)
grown with (i) the increasing degree to which the In-
ternet has become part of a nation’s critical infrastruc-
human-mediated counter-response is possible.
ture, and (ii) the recent, widely publicized introduction
of very large, very rapidly spreading Internet worms,
We then turn in Section 5 to the threat of a new class of
such that this technique is likely to be particularly cur-
surreptitious worms. These spread more slowly, but in a
rent in the minds of attackers.
much harder to detect “contagion” fashion, masquerad-
ing as normal traffic. We demonstrate that such a worm
We present an analysis of the magnitude of the threat.
today could arguably subvert upwards of 10,000,000 In-
We begin with a mathematical model derived from em-
ternet hosts.
pirical data of the spread of Code Red I v2 in July and
August, 2001 (Section 2). We then discuss techniques
Then in Section 6, we discuss some possibilities
employed for achieving greater effectiveness and viru-
by which an attacker could control the worm using
lence by the subsequent Code Red II and Nimda worms
cryptographically-secured updates, enabling it to remain
(Section 3). Figures 1 and 2 show the onset and progress
a threat for a considerable period of time. Even when
of the Code Red and Nimda worms as seen “in the wild.”
most traces of the worm have been removed from the
network, such an “updatable” worm still remains a sig-
In this context, we develop the threat of three new
nificant threat.
techniques for highly virulent worms: hit-list scanning,
permutation scanning, and Internet scale hit-lists (Sec-
Having demonstrated the very serious nature of the
tion 4). Hit-list scanning is a technique for accelerat-
threat, we then in Section 7 discuss an ambitious but
ing the initial spread of a worm. Permutation scanning
we believe highly necessary strategy for addressing it:
is a mechanism for distributed coordination of a worm.
the establishment at a national or international level
Combining these two techniques creates the possibility
of a “Center for Disease Control” analog for virus-
of a Warhol worm,2 seemingly capable of infecting most
and worm-based threats to cybersecurity. We discuss
or all vulnerable targets in a few minutes to perhaps an
the roles we envision such a Center serving, and offer
hour. An extension of the hit-list technique creates a
thoughts on the sort of resources and structure the Cen-
flash worm, which appears capable of infecting the vul-
ter would require in order to do so. Our aim is not to
nerable population in 10s of seconds: so fast that no
comprehensively examine each role, but to spur further
to identify and control their spread. discussion of the issues within the community.
2 So named for the quotation “In the future, everyone will have 15
minutes of fame.”
2 An Analysis of Code Red I a monthly resurgence, as seen in Figure 2. Why it con-
tinues to gain strength with each monthly appearance re-
mains unknown.3
The first version of the Code Red worm was initially
seen in the wild on July 13th, 2001, according to Ryan We call this model the Random Constant Spread (RCS)
Permeh and Marc Maiffret of Eeye Digital Security model. The model assumes that the worm had a good
[EDS01a, EDS01b], who disassembled the worm code random number generator that is properly seeded. We
and analyzed its behavior. The worm spread by compro- define N as the total number of vulnerable servers which
mising Microsoft IIS web servers using the .ida vulner- can be potentially compromised from the Internet. (We
ability discovered also by Eeye and published June 18th make the approximation that N is fixed—ignoring both
[EDS01c] and was assigned CVE number CVE-2001- patching of systems during the worm spread and normal
0500 [CV01]. deploying and removing of systems or turning on and
off of systems at night. We also ignore any spread of the
Once it infected a host, Code-Red spread by launching worm behind firewalls on private Intranets).
99 threads which generated random IP addresses, and
then tried to compromise those IP addresses using the K is the initial compromise rate. That is, the number
same vulnerability. A hundredth thread defaced the web of vulnerable hosts which an infected host can find and
server in some cases. compromise per hour at the start of the incident, when
few other hosts are compromised. We assume that K is
However, the first version of the worm analyzed by a global constant, and does not depend on the processor
Eeye, which came to be known as CRv1, had an apparent speed, network connection, or location of the infected
bug. The random number generator was initialized with machine. (Clearly, constant K is only an approxima-
a fixed seed, so that all copies of the worm in a particular tion.) We assume that a compromised machine picks
thread, on all hosts, generated and attempted to compro- other machines to attack completely at random, and that
mise exactly the same sequence of IP addresses. (The once a machine is compromised, it cannot be compro-
thread identifier is part of the seeding, so the worm had a mised again, or that if it is, that does not increase the
hundred different sequences that it explores through the rate at which it can find and attack new systems. We
space of IP addresses, but it only explored those hun- assume that once it is compromised, it stays that way.
dred.) Thus CRv1 had a linear spread and never com-
promised many machines. T is a time which fixes when the incident happens.
On July 19th, 2001, a second version of the worm began We then have the following variables:
to spread. This was suspected informally via mailing list
discussion, then confirmed by the mathematical analysis • a is the proportion of vulnerable machines which
we present below, and finally definitively confirmed by have been compromised.
disassembly of the new worm. This version came to be
known as CRv2, or Code Red I. • t is the time (in hours).
400,000 Figure 3 shows hourly probe rate data from Ken Eich-
300,000
mann of the Chemical Abstracts Service for the hourly
probe rate inbound on port 80 at that site. Also shown
200,000
is a fit to the data with K = 1.8, T = 11.9, and with
100,000 the top of the fit scaled to a maximum probe rate of
510,000 scans/hour. (We fit it to fall slightly below the
0
0 2 4 6 8 10 12 14 16
data curve, since it seems there is a fixed background
Hour in the day
rate of web probes that was going on before the rapid
rise due to the worm spread.) This very simple theory
# of scans # of unique IPs Predicted # of scans can be seen to give a reasonable first approximation ex-
planation of the worm behavior. See also Section 4.3 for
validation of the theory via simulation.
Figure 3: Hourly probe rate data for inbound port 80 at the
Chemical Abstracts Service during the initial outbreak of Code Note that we fit the scan rate, rather than the number of
Red I on July 19th, 2001. The x-axis is the hour of the day distinct IPs seen at this site. The incoming scan rate seen
(CDT time zone), while the y-axis is probe rate, the number at a site is directly proportional to the total number of in-
of different IP addresses seen, and a fit to the data discussed in fected IPs on the Internet, since there is a fixed probabil-
the text.
ity for any worm copy to scan this particular site in the
current time interval. However, the number of distinct
compromise per unit time (K(1 − a)), times the incre- IPs seen at a site is distorted relative to the overall in-
ment of time (dt). (Note that machines can compromise fection curve. This is because a given worm copy, once
K others per unit time to begin with, but only K · (1 − a) it is infected, will take some amount of time before it
once a proportion of other machines are compromised gets around to scanning any particular site. For a small
already.) address space, this delay can be sizeable and causes the
distinct IP graph at the given site to lag behind the over-
This give us the differential equation: all Internet infection rate graph.
da Two implications of this graph are interesting. One is
= Ka(1 − a) (2)
dt that the worm came close to saturating before it turned
with solution: itself off at midnight UTC (1900 CDT), as the num-
ber of copies ceased increasing a few hours before the
eK(t−T ) worm’s automatic turnoff. Thus it had found the bulk of
a= , (3)
1 + eK(t−T ) the servers it was going to find at this time. Secondly,
the infection rate was about 1.8 per hour—in the early
where T is a constant of integration that fixes the time stages of the infection, each infected server was able to
position of the incident. This equation has been well find about 1.8 other servers per hour.
known for many years as the logistic equation, and gov-
erns the rate of growth of epidemics in finite systems Although Code Red I turned itself off at midnight UTC
when all entities are equally likely to infect any other on July 19th, hosts with inaccurate clocks kept it alive
entity (which is true for randomized spreading among and allowed it to spread again when the worm code al-
Internet-connected servers, in the absence of firewall fil- lowed it to re-awaken on August 1st. Figure 4 shows
tering rules that differentially affect infectability from or similar data and fit for that incident. The K here is about
to different addresses). 0.7. Since the worm code-base was the same, this lower
spread rate indicates that the number of vulnerable sys-
This is an interesting equation. For early t (significantly tems was a little less than 40% as many as the first time
before T ), a grows exponentially. For large t (signifi- around. That is, the data appears consistent with slightly
cantly after T ), a goes to 1 (all vulnerable machines are more than half the systems having been fixed in the 11
compromised). The rate at which this happens depends days intervening.
only on K (the rate at which one machine can compro-
mise others), and not at all on the number of machines.
Finally, with probability 1/8 it would choose a random
Number seen in an hour 250,000 address from the whole Internet.
200,000
This strategy appears quite successful. The localized
spreading allows the worm to quickly infect parts of the
150,000
Internet that contain many vulnerable hosts, and also
100,000 means that the infection often proceeds quicker since
hosts with similar IP addresses are often close together
50,000 in the network topology also. This strategy also allows a
worm to spread very rapidly within an internal network
0
0 2 4 6 8 10 12 14 16 18 20
once it manages to pass through the external firewall.
Hour of the day
Unfortunately, developing an analytic model for the
# of scans Predicted # of scans spread of a worm employing this type of localized scan-
ning strategy is significantly more difficult than the mod-
eling effort in Section 2, because it requires incorpo-
Figure 4: Hourly probe rate data for inbound port 80 at the rating potentially highly non-homogeneous patterns of
Chemical Abstracts Service, for Code Red I’s reemergence on population locality. The empirical data is also harder
August 1st. The x-axis the time of day on August 1st (Central to interpret, because Code Red I was quite active when
US Time). The y-axis shows the monitored probe rate and a fit Code Red II was released. Indeed, it appears that Code
for the data discussed in the text. Red II took a while to overcome Code Red I (see Fig-
ure 1), but fully determining the interplay between the
3 “Better” worms—practice two appears to be a significant undertaking.
In this section, we explore the strategies adopted by the 3.2 Multi-vector worms—Nimda
two major worms released subsequent to Code Red I:
“Code Red II” and “Nimda.”
As well illustrated by the Nimda worm/virus (and, in-
deed, the original Internet Worm [Sp89, ER89]), malev-
3.1 Localized scanning—Code Red II olent code is not restricted to a single technique. Nimda
began on September 18th, 2001, spread very rapidly,
and maintained itself on the Internet for months after it
The Code Red II worm was released on Saturday August started. Nimda spread extensively behind firewalls, and
4th, 2001 and spread rapidly [CE01, SA01]. The worm illustrates the ferocity and wide reach that a multi-mode
code contained a comment stating that it was “Code worm can exhibit. The worm is thought to have used at
Red II,” but it was an unrelated code base. It did use the least five different methods to spread itself.
same vulnerability, however—a buffer overflow in Mi-
crosoft’s IIS Web server with CVE number CVE-2001-
0500. When successful, the payload installed a root • By infecting Web servers from infected client ma-
backdoor allowing unrestricted remote access to the in- chines via active probing for a Microsoft IIS vul-
fected host. The worm exploit only worked correctly nerability (CVE-2000-0884).
when IIS was running on Microsoft Windows 2000; on
• By bulk emailing of itself as an attachment based
Windows NT it caused a system crash rather than an in-
on email addresses determined from the infected
fection.
machine.
The worm was also a single-stage scanning worm that • By copying itself across open network shares
chose random IP addresses and attempted to infect them.
However, it used a localized scanning strategy, where it • By adding exploit code to Web pages on com-
was differentially likely to attempt to infect addresses promised servers in order to infect clients which
close to it. Specifically, with probability 3/8 it chose a browse the page.
random IP address from within the class B address space
(/16 network) of the infected machine. With probability • By scanning for the backdoors left behind by Code
1/2 it chose randomly from its own class A (/8 network). Red II and also the “sadmind” worm [CE03].
ing, if it receives the right trigger, or a prearranged time
Onset of NIMDA rolls around. We return to this point in Section 7.
20 40 60 80 100 120 140
4 “Better” worms—theory
Connections/Second
6.0 6.5 7.0 7.5 8.0 A worm’s scanner can obviously be made significantly
Time (PDT) 18 September, 2001 faster than the ones seen today, by careful use of thread-
ing and an understanding of the protocols. By having
Figure 5: HTTP connections per second seen at the many requests outstanding, a worm should be capable
Lawrence Berkeley National Laboratory, rising due to the on- of scanning targets at a rate proportional to its access
set of Nimda, September 18. bandwidth. Since it only takes 40 bytes for a TCP SYN
packet to determine if a service is accessible, and often
only a few hundred bytes to attempt an exploit, the po-
Figure 5 illustrates how rapidly the worm tried to in- tential scans per second can easily exceed 100 for even
fect one site, the Lawrence Berkeley National Labora- poor Internet connections. This increases K by allow-
tory. The x-axis plots hours past midnight, PDT, while ing a worm to search for a greater number of targets in a
the y-axis plots HTTP connection attempts per second. given period of time.
Only connections from hosts confirmed to have harbored
Nimda are counted, to avoid possible confusion with Similarly, the more widespread the vulnerable software
concurrent Code Red connection attempts. After the on- is, the faster a worm using that vulnerability can spread,
set of the infection, the total rate of probing was about because each random scan of the network is more likely
3 times that from the hosts subsequently confirmed to to pick up a target, also increasing K. We should there-
harbor Nimda. fore expect that worm authors will devote considerable
scrutiny to highly homogeneous, highly deployed ser-
Clearly, onset was quite rapid, rising in just half an hour vices, both for the faster spreading and for the greater
from essentially no probing to a sustained rate of nearly number of machines that could be compromised in a sin-
100 probes/sec. gle attack.
Number of Instances
prehensive scan while maintaining the benefits of ran- 200,000
dom probing. Each worm looks like it is conducting a
random scan, but it attempts to minimize duplication of
effort. Any time an instance of the worm, W , encounters 100,000
A timer could then induce the worms to wake up, change It may seem that the permutation scanning algorithm is
the permutation key to the next one in a prespecified se- spoofable, but only to a very limited degree. If an unin-
quence, and begin scanning through the new permuta- fected machine responds to the scan in the same way as
tion, starting at its own index and halting when another a worm, by falsely claiming to be infected, it will tem-
instance is discovered. This process insures that every porarily protect those machines which exist later in the
address would be efficiently rescanned at regular inter- current permutation from being scanned by the worm.
vals, detecting any machines which came onto the net However, since the permutation itself changes on ev-
or were reinstalled but not patched, greatly increasing a ery rescan, the set of machines protected is constantly
worm’s staying power. Otherwise, the worms are silent changing. The result is that unless a very large number
and difficult to detect, until they receive attack orders of uninfected machines respond to probes like an actual
(see Section 6). worm, the protection is almost nonexistent.
Permutation scanning interacts particularly well with a In order to evaluate the effects of hit-list and permuta-
worm which attacks multiple security holes: after de- tion scanning, we wrote a small, abstract simulator of a
ciding that the initial exploit is exhausted, the worm re- Warhol worm’s spread. The simulator assumes complete
sets the permutation to its current address, changes the connectivity within a 232 entry address space4 using a
permutation key, and exploits the second security hole. pseudo-random permutation to map addresses to a sub-
Thus, even relatively rare secondary holes can be effi- 4 In general, the Internet address space isn’t completely connected.
ciently and quickly scanned once the worm has estab- If a machine is not reachable from an arbitrary point on the external
300,000 300,000
Infected Machines
Number of Instances
200,000
200,000
100,000
100,000
0
0
0 2 4 6 8 10 12 14 16
0 1 2 3 4 5 6 7 8 Time (minutes)
Time (hours)
Infected Machines Active Worms
Conventional Fast Scanning Warhol Dormant Worms
Figure 7: The spread of three simulated worms in a popu- Figure 8: A closeup of the behavior of the Warhol worm
lation of 300,000 vulnerable machines: (i) a Code Red-like seen in Figure 7. The infection initially progresses rapidly—
worm capable of 10 scans/second, (ii) a faster scanning worm effectively all worms are actively scanning the net—but as in-
capable of 100 scans/second, and (iii) a Warhol worm, capable fection rates near 100%, many worms have gone dormant, cor-
of 100 scans/second, using a 10,000 entry hit-list and permu- rectly concluding that there are few vulnerable machines re-
tation scanning which gives up when 2 infected machines are maining and should therefore cease scanning.
discovered without finding a new target. All graphs stop at
99.99% infection of the simulated address space.
15 minutes.
set of vulnerable machines. We used a 32-bit, 6-round Figure 8 shows in more detail the behavior of the Warhol
variant of RC5 to generate all permutations and random strategies. It gets a huge boost from the hit-list during the
numbers. first few seconds of propagation, quickly establishing it-
self on the network and then spreading exponentially. As
We can parameterize the simulation in terms of: the the infection exceeds the 50% point, some of the worms
number of vulnerable machines in the address space; begin recognizing that saturation is occurring and stop
scans per second; the time to infect a machine; num- scanning. By the time the graph ends (at 99.99% of
ber infected during the hit-list phase; and the type of the simulated population), most of the worms have gone
secondary scan (permutation, partitioned permutation, silent, leaving a few remaining worms to finish scanning
or random). The simulator assumes multithreaded scan- the last of the address space.
ning.
To ensure that the simulator produces reasonable re- 4.4 Topological Scanning
sults, Figure 6 shows a comparison between the simu-
lator’s output and the model developed in Section 2, for
a worm capable of 10 scans/second in a population of An alternative to hit-list scanning is topologically aware
300,000 vulnerable machines. The simulation results fit scanning, which uses information contained on the vic-
the model for K = 2.6 and T = 5.52. This represents a tim machine in order to select new targets. Email worms
worm which is slightly faster (less than 50%) than Code have used this tactic since their inception, as they harvest
Red I. addresses from their victim in order to find new poten-
tial targets, as did the Morris worm (necessary because
Figure 7 then shows how both faster scanning and the of the very sparse address space when it was released)
Warhol strategies affect the propagation time. The faster [Sp89, ER89].
scanning worm (capable of 100 scans/second) reduces
the infection time down to under an hour, while the com- Many future active worms could easily apply these tech-
bination of hit-list scanning, permutation scanning, and niques during the initial spread, before switching to
fast scanning, further reduces infection time to roughly a permutation scan once the known neighbors are ex-
network, it is usually not reachable directly by a worm except through hausted. An active worm that attacked a flaw in a peer-
local scanning. to-peer application could easily get a list of peers from
a victim and use those peers as the basis of its attack, grammed to divide the list into n blocks, and then to find
which makes such applications highly attractive targets and infect the first address in each block (or an especially
for worm authors. Although we have yet to see such a chosen high-bandwidth address in that block), and then
worm in the wild, these applications must be scrutinized hand the child worm the list of addresses for that block.
for security. These applications are also vulnerable to That copy of the worm can then re-iterate the process to
contagion worms, as discussed in Section 5. infect everything in its block. A threaded worm could
begin infecting hosts before it had received the full host
Similarly, a worm attacking web servers could look for list from its parent to work on, to maximize the paral-
URLs on disk and use these URLs as seed targets as well lelization process, and it could start work on looking for
as simply scanning for random targets. Since these are multiple children in parallel.
known to be valid web servers, this would tend to greatly
increase the initial spread by preferentially probing for This design is somewhat fragile if an early copy of the
likely targets. worm is neutralized very quickly, or infects a site from
which it cannot scan out. To mitigate this, the worm
copies could overlap in their scanning so that all ad-
4.5 Flash Worms dresses were scanned a small number of times, with
every target address being scanned by different paths
through the infection tree. This has the additional side-
We further observe that there is a variant of the hit-list effect of removing the need for further parent-to-child
strategy that could plausibly result in most of the vul- communication after initial infection occurs.
nerable servers on the Internet being infected in tens of
seconds. We term this a flash worm. A related design would call for most of the address list
to be located in pre-assigned chunks on one or a num-
The nub of our observation is that an attacker could plau- ber of high-bandwidth servers that were well-known to
sibly obtain a hit-list of most servers with the relevant the worm. Each copy of the worm would receive an as-
service open to the Internet in advance of the release of signment from its parent, and then fetch the address list
the worm.5 from there. The server would only have to send out por-
tions of the list, not the entire list; in principle, it should
In addition to the methods already discussed for con- only have to transmit each address in the list once. In ad-
structing a hit-list in Section 4.1, a complete scan of the dition, after the worm has propagated sufficiently that a
Internet through an OC-12 connection would complete large number of copies are attempting to fetch their (now
quickly. Given a rate of 750,000 TCP SYN packets per quite small) lists, at that point the worm collective could
second (the OC-12 provides 622 Mbps, the TCP seg- switch to sending around the address list with each new
ment takes 40 bytes, and we allow for link-layer fram- infection, rather than having the infectees each contact
ing), and that the return traffic is smaller in volume the server.
than the outbound (it is comprised of either same-sized
SYN ACKs or RSTs, smaller ICMPs, or, most often, no This process will result in relatively little wasted effort.
response at all), it would take roughly 2 hours to scan For example, if the worm had a list of Web servers, and
the entire address space. Faster links could of course a zero-day IIS vulnerability, about 26% of the list would
scan even faster. Such a brute-force scan would be easily be vulnerable. No server would be probed twice. If
within the resources of a nation-state bent on cyberwar- n = 10, then the infection tree for the 3 million vulner-
fare. able servers would be just 7 layers deep.
Given that an attacker has the determination and fore- The spread rate of such a worm would likely be con-
sight to assemble a list of all or most Internet connected strained by one of two things. The worm itself is likely
addresses with the relevant service(s) open, a worm can to be small (Code Red I was about 4 KB, and a highly
spread most efficiently by simply attacking addresses on malicious worm could easily be less than 100 KB, even
that list. For example, there are about 12.6 million Web allowing for a complex payload). Thus, at the start, the
servers on the Internet (according to Netcraft [Ne02]), so address list is much larger than the worm itself, and the
the size of that particular address list would be 48 MB, propagation of the worm could be limited by the time re-
uncompressed. The initial copy of the worm can be pro- quired to transmit the host list out of the initial infection
site or servers where it was stored. Since all the children
5 Servers behind load balancers create complications here, as do
of the infection will have much smaller lists to transmit,
machines that connect to the Internet with variable IP addresses but
nonetheless have vulnerable services open.
these later lists are less likely to limit the worm spread
(unless a first generation child has less than 1/n of the 5 Stealth worms—contagion
initial copy’s bandwidth available to it). The exact time
required to transmit the list will depend on the available
bandwidth of the storage sites. As an example, however, The great speed with which the worms described in the
we point out that a 48 MB address list could be pushed previous sections can propagate presents a grave threat
down an OC-12 link in less than a second.6 to the Internet’s security, because there is so little time
available to react to their onset. Still, there might be
Thus, starting the worm on a high-bandwidth link is a possibility of devising mechanisms that automatically
desirable for the attacker, and bandwidth is probably a detect the spread of such worms and shut them down
concern at the next layer or two. Compression of the in some fashion [MSVS02]. Such mechanisms would
list could make the list delivery much faster. Indeed, likely be triggered by the singular communication pat-
we took a sorted list of the 9 million server addresses terns the worms evince—hosts generating much more
discussed in Section 5 and found that gzip compression diverse and rapid Internet traffic than they usually do.
shrinks the list from 36 MB to 13 MB, and differencing
the addresses prior to compression reduced it to 7.5 MB. We now turn to a different paradigm of worm prop-
agation, contagion, which, while likely spreading sig-
Another possible limitation is simply the latency re- nificantly slower than the rapidly-propagating worms,
quired to infect each new layer in the tree. Given that evinces almost no peculiar communication patterns. As
probes can be issued in parallel, and substantially more such these worms could prove much more difficult to de-
threads can be spawned than n (the number of children), tect and counter, allowing a patient attacker to slowly but
we do not have to add up the time required for a given surreptitiously compromise a vast number of systems.
copy to cycle through its list, but simply take the maxi-
mum infection latency. A single second is a reasonable The core idea of the contagion model can be expressed
latency, but with n = 10 and a large hit-list to trans- with the following example. Suppose an attacker has
fer, it might take a little longer to get 10 copies of the attained a pair of exploits: Es , which subverts a popular
worm through a given site’s link. However, not much type of Web server; and Ec , which subverts a popular
longer—if a 5 KB worm can get 50% utilization through type of Web client (browser). The attacker begins the
a 256 Kbps DSL uplink, it can transmit ten copies of it- worm on a convenient server or client (it doesn’t matter
self in three seconds. That leads to a sub-thirty-second which, and they could start with many, if available by
limit on the total infection time, given an infection tree some other means), and then they simply wait. If the
seven layers deep and a design where the new worm chil- starting point is a server, then they wait for clients to visit
dren go to a server for their addresses. (An additional (perhaps baiting them by putting up porn content and
concern here is the possibility of elements of the worm taking care that the large search engines index it). As
interfering with one another, either directly, by induc- each client visits, the subverted server detects whether
ing congestion, or indirectly, for example by overflowing the client is vulnerable to Ec . If so, the server infects it,
ARP tables, as happened during the Code Red I outbreak sending along both Ec and Es . As the client’s user now
[SA01]. These possibilities are difficult to analyze.) surfs other sites, the infected client inspects whether the
servers on those sites are vulnerable to Es , and, if so,
In conclusion, we argue that a compact worm that be- again infects them, sending along Ec and Es .
gins with a list including all likely vulnerable addresses,
and that has initial knowledge of some vulnerable sites In this fashion, the infection spreads from clients to
with high-bandwidth links, appears able to infect almost servers and along to other clients, much as a contagious
all vulnerable servers on the Internet in less than thirty disease spreads based on the incidental traffic patterns of
seconds. its hosts.
used for fast-spreading worms, but are suitable for contagion worms. framework [Re02].
tion protocol. Given HTTP’s support for variable-sized
headers, it would not be surprising to find that a buffer
overflow exploit of KaZaA exists. Given such an ex-
ploit, it is apparent that if an attacker started out having
0.1
infected all of the university’s KaZaA hosts, then after a
month they would have control of about 9 million hosts,
P[X >= x]
assuming that the KaZaA clients are sufficiently homo-
0.001
geneous that a single exploit could infect them all.9
0.00001
control over all of the university’s KaZaA hosts? Quite:
while the goal of the contagion worm is to evade detec-
1 5 10 50 100
tion, the attacker can likely risk a more blatant attack on
a single university. If they can find a university lacking Degree of Remote KaZaA Host
in diligent security monitoring (surely there must be a
few of these!), they can then compromise a single host Figure 9: Complementary distribution of number of distinct
at the university, engage in “noisy” brute-force scanning local university hosts to which different remote KaZaA hosts
of the internal hosts to find all of the KaZaA clients, and connected. Both axes are log-scaled; the linear fit shown in the
infect them. They then switch into contagion spread- plot corresponds to a Pareto distribution with shape parameter
ing.10 α = 2.1.
While the above argues that the attacker could gain the which the university’s hosts are “typical.” We also lack
9 million hosts within a month, the actual spread is likely any traces of their internal peer-to-peer traffic, which, if
much faster, because once a remote host is infected, it frequent, would have major implications for the rate at
too contributes to spreading the contagion. Not only which the worm could infect an entire remote site.
does this accelerate the epidemic, but it also likely turns
it into a pandemic, because the remote hosts can connect We are pursuing further work in this area. First, we are
with other remote hosts that wouldn’t happen to visit the attempting with colleagues to develop graph-based mod-
university. Furthermore, depending on the protocol, a els with which we can then extrapolate properties of the
single infected node could pretend to have information it spread of the contagion based on different sets of as-
doesn’t have, in order to appear highly attractive and in- sumptions about the hosts in our trace. Second, we have
crease the number of connections received, although that obtained traces of KaZaA traffic from another university
would somewhat disrupt the normal patterns of commu- (in another country), and will be analyzing these to de-
nication. termine the degree of overlap and cross-talk between the
two universities, with which to then better estimate the
We would like therefore to better understand the rate at total KaZaA population and its communication patterns.
which a KaZaA contagion worm could spread, and to Finally, we are building a simulator for both active and
what breadth. To estimate this from just the university contagion worms within various peer-to-peer topologies.
trace is difficult, because we don’t know the total size
of the KaZaA population. Doubtless it is larger than As a last comment, we have evidence that the KaZaA
9,000,000—but is it as high as 30,000,000, as indicated network may behave like a “scale-free” topology in
in [Ka01]? How many of those copies were redundant terms of its interconnection. Figure 9 shows the dis-
(same user fetching the software multiple times), or are tribution of the degree of the remote hosts in the trace,
no longer in use? On the other hand, could the popula- i.e., the number of distinct local hosts to which each re-
tion be higher, due to users getting copies of the clients mote host connected during November, 2001. The plot is
from other sources than [Ka01]? shown as a log-log complementary distribution function:
the x-axis shows log10 of the remote host’s degree, and
Another problem is that we do not know the degree to the y-axis shows log10 of the probability of observing a
9 It is actually worse than this. It turns out [Bd02, We02] that remote host with that outdegree or higher. (Due to the
KaZaA already has a remote access backdoor installed! But for the immense size of the dataset, we plot a subset rather than
purposes of our discussion here, we put aside this fact. the entire dataset, randomly sampled with p = 0.01.)
10 We note that some P2P networks are also amenable to constructing
The last facet of worm design we examine concerns If each worm node only knows about other nodes it has
mechanisms by which the attacker can control and mod- probed, infected, or been probed by, the average con-
ify a worm after its dissemination. The ease and re- nectivity is still very high. With 1M hosts, using permu-
siliency with which an attacker can do so has serious tation scanning (with no halting), our simulator shows
consequences for both how the threat of a deployed that the average degree of nodes in the worm network
worm can evolve, and the potential difficulty in detect- is 4 when 95% infection is achieved, and 5.5 when 99%
ing the worm’s presence and operation after the initial infection is achieved. Additionally, each permutation-
infection. based rescan will add 2 to the degree of every worm, rep-
resenting the copy discovered by each instance, and the
Some previous worms such as the Goner mail worm copy which discovers each instance. Thus, after a couple
[CE02] contained primitive remote control code, similar of rescans, the connectivity becomes very high without
to many common “zombies”, allowing the authors and requiring additional communication between the worm
others to issue commands to a distributed DOS mod- instances.
ule through an IRC [OR93] channel. (Indeed, the root
backdoor installed by Code Red II also offered a form Such a network could be used to quickly pass updates to
of unlimited remote control.) Others worms have at- all running copies, without having a single point of com-
tempted to download updates and payloads from web munication like that seen in previous worms, increas-
pages, such as W32/sonic [Sy00]. Both of these mech- ing the staying power by preventing the communica-
anisms, when employed, were quickly countered by re- tion channel from being disrupted or co-opted by others,
moving the pages and tracking the channels. Similarly, while still allowing the author to control their creation in
previously seen DDOS tools such as Stacheldraht [Di99] a difficult-to-track manner.
have included both encrypted communication and up-
date mechanisms for directly controlling the zombies.
6.2 Programatic Updates
Here we briefly explore a more sophisticated
method—direct worm-to-worm communication and
programmable updates—which, while not yet observed The commands to a worm can of course be arbitrary
in the wild, is a natural evolution based on the previous code. Many operating systems already support conve-
updatable worms and DDOS tools. nient dynamic code loading, which could be readily em-
ployed by a worm’s author. Another possibility has the • Rapidly analyzing pathogens.
bulk of the worm written in a flexible language com-
bined with a small interpreter. By making the worm’s • Fighting infections.
commands be general modules, a huge increase in flexi-
bility would be achieved. • Anticipating new vectors.
Of particular interest are new attack modules and seeds • Proactively devising detectors for new vectors.
for new worms. If the author discovers a new security
hole and creates a new attack module, this could be re- • Resisting future threats.
leased into the worm network. Even if only a few thou-
sand copies of the worm remain, this is enough of an
installed base for a hit-list like effect to occur upon in- In the remainder of this section, we discuss each of these
troduction of a new attack module, quickly spreading the in turn, with our aim being not to comprehensively ex-
worm back through the network. amine each role, but to spur further discussion within the
community.
It is an interesting question whether it is possible for
a worm author to release such a worm with the cryp-
tographic modules correctly implemented. From expe- 7.1 Identifying outbreaks
rience, if the worm author attempts to build their own
cryptographic implementation, this could well suffer
from a significant weakness that could be exploited for As discussed earlier in this paper, to date Internet-scale
countering the worm. Yet there are a number of strong worms have been identified primarily via informal email
cryptographic applications and libraries that could be discussion on a few key mailing lists. This process takes
used by a worm author to provide the cryptographic hours at a minimum, too slow for even the “slower” of
framework, a good example being OpenSSL [Op01], the rapidly-propagating worms, much less the very fast
which includes an encrypted session layer, symmetric worms developed in Section 4. The use of mailing lists
ciphers, hash functions, and public key ciphers and sig- for identification also raises the possibility of an attacker
natures to provide for code signing. targeting the mailing lists for denial-of-service in con-
junction with their main attack, which could greatly de-
lay identification and a coordinated response. Present
institutions for analyzing malicious code events are not
7 Envisioning a Cyber “Center for Disease able to produce a meaningful response before a fast ac-
Control” tive worm reaches saturation.
CDC Task: analyze the threat potential of new appli- A final basic issue regarding the CDC is to what degree
cations. How widely spread might their use become? should it operate in an open fashion. For example, dur-
How homogeneous are the clients and servers? What are ing an outbreak the CDC could maintain a web site for
likely exploit strategies for subverting the implementa- use by the research community. Such an approach would
tions? What are the application’s native communication allow many different people to contribute to the analy-
patterns? sis of the outbreak and of the pathogen, perhaps adding
invaluable insight and empirical data. This sort of coor-
dination happens informally today, in part; but it is also
7.5 Proactively devising detectors the case that currently a variety of anti-viral and secu-
rity companies analyze outbreaks independently, essen-
tially competing to come out with a complete analysis
Once a new potential disease vector has been identified, first. This makes for potentially very inefficient use of
we would then want to deploy analyzers that understand a scarce resource, namely the highly specialized skill of
how the protocol functions, to have some hope of detect- analyzing pathogens.
ing contagion worms as they propagate.
A key question then is the cost of operating in an open
For example, to our knowledge there is no KaZaA mod- fashion. First, doing so brings with it its own set of secu-
ule (one specific to how KaZaA functions) available for rity issues, regarding authenticating purported informa-
network intrusion detection systems in use today. With- tion uploaded into the analysis database, and preventing
out such a module, it would be exceedingly difficult to an attacker from crippling the analysis effort by launch-
detect when KaZaA is being exploited to propagate a ing a side-attack targeting the system. Second, the at-
contagion worm. tacker could monitor the progress made in understand-
ing the worm, and perhaps gain insight into how it has
CDC Task: foster the development of application anal- spread beyond what they could directly gather for them-
ysis modules suitable for integration with the intru- selves, allowing them to better hone their attack. Third,
sion detection systems in use by the CDC’s outbreak- some sources of potentially highly valuable empirical
identification elements. data might refuse to make their data available if doing
so is to release it to the public at large.
7.6 Resisting future threats Given these concerns, it seems likely that the CDC
would pursue a “partially open” approach, in which
subsets of information are made publicly available,
Devising the means to live with an Internet periodically and publicly-attained information is integrated into the
ravaged by flash or contagion worms is at best an uneasy CDC’s internal analysis, but the information flow is scru-
equilibrium. The longer-term requirement is to shift the tinized in both directions. Unfortunately, such scrutiny
makeup of Internet applications such that they become would surely involve manual assessment, and could
much less amenable to abuse. For example, this may greatly slow the collection of vital information.
entail broader notions of sandboxing, type safety, and
inherent limitations on the rate of creating connections A related question is how international in scope such a
and the volume of traffic transmitted over them. facility should be. A national facility is likely to have
a simpler mission and clearer management and account- [CSA00] Neal Cardwell, Stefan Savage, and Thomas
ability. However, there are real benefits to an interna- Anderson, “Modeling TCP Latency,” Proc.
tional approach to this problem; one’s allies are awake INFOCOM, 2000.
and working while one sleeps. A worm released in the
middle of the night in the US would be far more likely [CDC02] Centers for Disease Control and Prevention,
to receive intense early research and attention in Europe Jan. 2002. http://www.cdc.org
or Asia than in the US itself. Thus, at a minimum, na-
[CE01] CERT, “Code Red II: Another Worm Ex-
tional level CDCs are likely to need to maintain strong
ploiting Buffer Overflow In IIS Indexing Ser-
linkages with one another.
vice DLL,” Incident Note IN-2001-09, Aug. 6,
2001. http://www.cert.org/incident notes/IN-
2001-09.html