Sei sulla pagina 1di 6

The Sybil Attack

John R. Douceur
Microsoft Research
johndo@microsoft.com

“One can have, some claim, as many electronic personas as one has time and energy to create.”
– Judith S. Donath [12]
Abstract – Large-scale peer-to-peer systems face If the local entity has no direct physical
security threats from faulty or hostile remote knowledge of remote entities, it perceives them
computing elements. To resist these threats, many only as informational abstractions that we call
such systems employ redundancy. However, if a identities. The system must ensure that distinct
single faulty entity can present multiple identities, identities refer to distinct entities; otherwise, when
it can control a substantial fraction of the system, the local entity selects a subset of identities to
thereby undermining this redundancy. One redundantly perform a remote operation, it can be
approach to preventing these “Sybil attacks” is to duped into selecting a single remote entity
have a trusted agency certify identities. This multiple times, thereby defeating the redundancy.
paper shows that, without a logically centralized We term the forging of multiple identities a Sybil
authority, Sybil attacks are always possible except attack [30] on the system.
under extreme and unrealistic assumptions of It is tempting to envision a system in which
resource parity and coordination among entities. established identities vouch for other identities, so
that an entity can accept new identities by trusting
1. Introduction the collective assurance of multiple (presumably
independent) signatories, analogous to the PGP
We* argue that it is practically impossible, in web of trust [37] for human entities. However,
a distributed computing environment, for initially our results show that, in the absence of a trusted
unknown remote computing elements to present identification authority (or unrealistic assumptions
convincingly distinct identities. With no logically about the resources available to an attacker), a
central, trusted authority to vouch for a one-to-one Sybil attack can severely compromise the initial
correspondence between entity and identity, it is generation of identities, thereby undermining the
always possible for an unfamiliar entity to present chain of vouchers.
more than one identity, except under conditions Identification authorities can take various
that are not practically realizable for large-scale forms, not merely that of an explicit certification
distributed systems. agency such as VeriSign [33]. For example, the
Peer-to-peer systems commonly rely on the CFS cooperative storage system [8] identifies
existence of multiple, independent remote entities each node (in part) by a hash of its IP address.
to mitigate the threat of hostile peers. Many The SFS network file system [23] names remote
systems [3, 4, 8, 10, 17, 18, 29, 34, 36] replicate paths by appending a host identifier to a DNS
computational or storage tasks among several name. The EMBASSY [22] platform binds
remote sites to protect against integrity violations machines to cryptographic keys embedded in
(data loss). Others [5, 6, 7, 16, 28] fragment tasks device hardware. These approaches may thwart
among several remote sites to protect against Sybil attacks, but they implicitly rely on the
privacy violations (data leakage). In either case, authority of a trusted agency (such as ICANN [19]
exploiting the redundancy in the system requires or Wave Systems [35]) to establish identity.
the ability to determine whether two ostensibly In the following section, we define a model of
different remote entities are actually different. a distributed computing environment that lacks a
central authority. Building on this model, Section
* 3 proves a series of lemmas that severely limit the
Use of the plural pronoun is customary even in solely
authored research papers; however, given the subject of ability of an entity to determine identity. Section
the present paper, its use herein is particularly ironic. 4 surveys related work, and Section 5 concludes.

1
This model has two noteworthy qualities:
2. Formal model First, it is quite general. By leaving the internals
As a backdrop for our results, we construct a of the cloud unspecified, this model includes
formal model of a generic distributed computing virtually any interconnection topology of shared
environment. Our model definition implicitly segments, dedicated links, routers, switches, or
limits the obstructive power of corrupt entities, other components. Second, the environment in
thereby strengthening our negative results. The this model is very friendly. In particular, in the
universe, shown schematically in Fig. 1, includes: absence of resource constraints, denial-of-service
• A set E of infrastructural entities e attacks are not possible. A message from a
• A broadcast communication cloud correctly functioning entity is guaranteed to reach
• A pipe connecting each entity to the cloud all other correctly functioning entities.
Set E is partitioned into two disjoint subsets, We place a minimal restriction on the relative
C and F. Each entity c in subset C is correct, computational resources available to each entity,
abiding by the rules of any protocol we define. namely that there exists some security parameter n
Each entity f in subset F is faulty, capable of for which all entities can perform operations
performing any arbitrary behavior except as whose computational complexity is (low-order)
limited by explicit resource constraints. (The polynomial in n but for which no entity can
terms “correct” and “faulty” are standard in the perform operations that are superpolynomial in n.
domain of Byzantine fault tolerance [21], even This restriction allows entities to use public-key
though terms such as “honest” and “deceptive” cryptography [24] to establish virtual point-to-
might be more appropriate.) point communication paths that are private and
Entities communicate by means of messages. authenticated. Although these virtual paths are as
A message is an uninterrupted, finite-length bit secure as point-to-point physical links, they come
string whose meaning is determined either by an to exist only when created by pairs of entities that
explicit protocol or by an implicit agreement have acknowledged each other. Our model
among a set of entities. An entity can send a excludes direct links between entities because a
message through its pipe, thereby broadcasting it physical link provides a form of centrally supplied
to all other entities. The message will be received identification of a distinct remote entity. Also, in
by all entities within a bounded interval of time. the real world, packets can be sniffed and spoofed,
Message delivery is guaranteed, but there is no so the base assumption of a broadcast medium
assurance that all entities will hear messages in (augmented by cryptography) is not unrealistic.
the same order. An identity is an abstract representation that
persists across multiple communication events.
Each entity e attempts to present an identity i to
entities
other entities in the system. (Without loss of
generality, we state our results with respect to a
specific local entity l that is assumed to be
correct.) If e successfully presents identity i to l,
pipes we say that l accepts identity i.
A straightforward form for an identity is a
communication secure hash of a public key. Under standard
cloud cryptographic assumptions, such an identifier is
unforgeable. Furthermore, since it can generate a
symmetric key for a communication session, it is
also persistent in a useful way.
Each correct entity c will attempt to present
one legitimate identity. Each faulty entity f may
attempt to present a legitimate identity and one or
more counterfeit identities. Ideally, the system
local entity should accept all legitimate identities but no
Fig. 1: Formal model of distributed environment counterfeit entities.

2
Lemma 1: If ρ is the ratio of the resources of a
3. Results faulty entity f to the resources of a minimally
This section presents four simple lemmas, capable entity, then f can present g = ρ distinct
with nearly trivial proofs, that collectively show identities to local entity l.
the impracticality of establishing distinct identities Proof: Define rM as the resources available to a
in a large-scale distributed system. minimally capable entity. By hypothesis, g
An entity has three potential sources of entities can present g identities to l; therefore, g rM
information about other entities: a trusted agency, resources are sufficient to present g identities.
itself, or other (untrusted) entities. In the absence Since ρ ≥ g, f has at least g rM resources available,
of a trusted authority, either an entity accepts only so it can present g identities to l.
identities that it has directly validated (by some
means) or it also accepts identities vouched for by Lemma 1 states a lower bound on the damage
other identities it has already accepted. achievable by a faulty entity. To show how this
For direct validation, we show: can be enforced as an upper bound, we present
• Even when severely resource constrained, three mechanisms that can (at least theoretically)
a faulty entity can counterfeit a constant exploit limitations in three different resources:
number of multiple identities. communication, storage, and computation.
• Each correct entity must simultaneously If communication resources are restricted,
validate all the identities it is presented; local entity l can broadcast a request for identities
otherwise, a faulty entity can counterfeit and then only accept replies that occur within a
an unbounded number of identities. given time interval.
Large-scale distributed systems are inevitably If storage resources are restricted, entity l can
heterogeneous, leading to resource disparities that challenge each identity to store a large amount of
exacerbate the former result. The latter result unique, uncompressible data. By keeping small
presents a direct impediment to scalability. excerpts of this data, entity l can verify, with
For indirect validation, in which an entity arbitrarily high probability, that all identities
accepts identities that are vouched for by already simultaneously store the data they were sent.
accepted identities, we show: If computation resources are restricted, entity
• A sufficiently large set of faulty entities l can challenge each identity to solve a unique
can counterfeit an unbounded number of computational puzzle. For example, the local
identities. entity can generate a large random value y and
challenge the identity to find, within a limited
• All entities in the system must perform
time, a pair of values x, z such that the
their identity validations concurrently;
concatenation x | y | z, when run through a secure
otherwise, a faulty entity can counterfeit a
hash function, yields a value whose least
constant number of multiple identities.
significant n bits are all zero:
Since the number of faulty entities in the system is
likely to grow as the system size increases, the given y, find x, z s.t. LSBn(hash(x | y | z)) = 0
former result places another limit on system scale.
The time to solve* such a puzzle is proportional to
The latter restriction becomes harder to satisfy as
2n–1. The time to verify the result is constant.
system size increases.
(The reason for allowing the challenged entity to
find a prefix x and a suffix z, rather than merely
3.1. Direct identity validation one or the other, will become clear in Section 3.2.)
The only direct means by which two entities
can convince a third entity that they are distinct is *
by performing some task that a single entity could measured in count of hash function evaluations. For a
random oracle [2] hash function, the only way to find a
not. If we assume that the resources of any two
solution is to iterate through candidate values of x
entities differ by at most a constant factor, a local and/or z; compute the hash for each x | y | z triple; and
entity can demand proof of a remote entity’s test the result. Actual implementation requires a hash
resources before accepting its identity. However, function that is both preimage-resistant and resistant to
this leaves us with the following limitation: non-brute-force attacks such as chaining attacks [24].

3
To be effective, these resource challenges Lemma 3: If local entity l accepts any identity
must be issued to all identities simultaneously: vouched for by q accepted identities, then a set F
of faulty entities can present an arbitrarily large
Lemma 2: If local entity l accepts entities that are
number of distinct identities to l if either |F| ≥ q or
not validated simultaneously, then a single faulty
the collective resources available to F at least
entity f can present an arbitrarily large number of
equal those of q + |F| minimally capable entities.
distinct identities to entity l.
Proof: Define rF as the total resources available
Proof: Faulty entity f presents an arbitrarily long
to set F, rk as the resources available to each
succession of distinct identities to l. The
faulty entity fk, and rM as the resources available to
resources required for each presentation are used
a minimally capable entity. Then:
and then freed for the subsequent presentation.
rF rk  rk 
Lemma 2 is insurmountable for intrinsically q+ F ≤
rM
= ∑r
f k ∈F M
< ∑ r
f k ∈F  M 
+ F
temporal resources, such as computation speed
and communication bandwidth. However, since By Lemma 1, entity fk can present rk / rM ≥ 1
storage is not inherently time-based, entity l can identities to l, so F can present q identities to l.
indefinitely extend the challenge duration by Thereafter, all of F’s identities vouch for an
periodically demanding to see newly specified arbitrarily large number of counterfeit identities,
excerpts of the stored data. If an accepted identity all of which will be accepted by l.
ever fails to meet a new challenge, the local entity
can discard it from its acceptance list, thereby As in the case of direct identity validation,
eventually catching a Sybil attack that it might indirect identity validation also has a concurrency
have initially missed. A major practical problem requirement. In particular, all entities must
with this extension is that (by assumption) the perform their resource challenges concurrently:
challenge consumes the majority of an entity’s Lemma 4: If the correct entities in set C do not
storage resources, so extending the challenge coordinate time intervals during which they accept
duration greatly impedes the ability of the entity identities, and if local entity l accepts any identity
to perform other work. (However, the challenge vouched for by q accepted identities, then even a
data itself could be valuable data, compressed and minimally capable faulty entity f can present
encrypted by the local entity before sending it to g =  |C| / q  distinct identities to l.
the remote entities, using a different key for each
remote entity to maintain challenge uniqueness.) Proof: Define rM as the resources required to
present one identity. By assumption, entity f has
3.2. Indirect identity validation rM resources available. Partition set C into g
disjoint subsets Ck of minimum cardinality q.
As described in the introduction, the reason Faulty entity f presents identity ik to each entity in
for establishing the distinctness of identities is to Ck, using rM resources during time interval Tk.
allow the local entity to employ redundancy in Since Tk need not overlap with Tk′ for k ≠ k′, rM
operations it delegates to remote entities. One resources are available during interval Tk′ to
such operation it could conceivably delegate is the present identity ik′ ≠ ik to entities in set Ck′. At
validation of other identities. Thus, in addition to least q entities in each set Ck will vouch for
accepting identities that it has directly validated distinct identity ik, so l will accept all g identities.
using one of the challenge mechanisms described
above, an entity might also accept identities that Lemma 4 shows the need for multiple entities
have been validated by a sufficient count of other to issue challenges concurrently. Whether it is
identities that it has already accepted. possible for a correct entity to satisfy multiple
If an entity that has presented identity i1 concurrent challenges depends upon the resource:
claims to have accepted another entity’s identity In our formal model, all communication is
i2, we say that i1 vouches for i2. An obvious broadcast, so an entity can simultaneously reply to
danger of accepting indirectly validated identities communication challenges from arbitrarily many
is that a group of faulty entities can vouch for entities. (However, this is rather less practical in
counterfeit identities: actual networks than in our abstract model.)

4
It may not be possible to satisfy multiple
concurrent storage challenges, and there are 5. Summary and conclusions
information-theoretic reasons for believing that it Peer-to-peer systems often rely on redundancy
is impossible, since every bit of data stored for to diminish their dependence on potentially hostile
one challenger consumes one bit of storage space peers. If distinct identities for remote entities are
that is thus unavailable to serve another challenger not established either by an explicit certification
(and the data from all challengers is, of necessity, authority (as in Farsite [3]) or by an implicit one
incompressible). This may prevent storage (as in CFS [8]), these systems are susceptible to
challenges from being used for indirect validation. Sybil attacks, in which a small number of entities
For computation challenges, it is possible for counterfeit multiple identities so as to compromise
an entity to solve multiple puzzles simultaneously a disproportionate share of the system.
by combining them. If an entity receives m Systems that rely upon implicit certification
puzzles y1, y2, … ym, it can find a w such that: should be acutely mindful of this reliance, since
LSBn(hash(0 | y1 | y2 | … ym | w)) = 0 apparently unrelated changes to the relied-upon
mechanism can undermine the security of the
Then, the solution to each puzzle yk is: system. For example, the proposed IPv6 privacy
xk = 0 | y1 | y2 | … yk–1 and zk = yk+1 | … ym | w extensions [26] obviate much of the central
allocation of IP addresses assumed by CFS.
An obvious danger here is that if a validating In the absence of an identification authority, a
entity issues challenges to multiple identities that local entity’s ability to discriminate among
have been counterfeited by a single faulty entity, distinct remote entities depends on the assumption
the faulty entity could combine the challenges and that an attacker’s resources are limited. Entities
solve them together. However, the challenger can can thus issue resource-demanding challenges to
identify this attempted Sybil attack by checking validate identities, and entities can collectively
whether x1 | y1 | z1 = x2 | y2 | z2 for any two pool the identities they have separately validated.
solutions from putatively different identities. This approach entails the following conditions:
Like Lemma 1, the result of Lemma 4 is that a • All entities operate under nearly identical
faulty entity can amplify its influence. A system resource constraints.
that can tolerate a fraction φ of all identities being • All presented identities are validated
faulty can tolerate only φ/g of all entities being simultaneously by all entities, coordinated
faulty. In some systems, this may be acceptable. across the system.
• When accepting identities that are not
4. Related work directly validated, the required number of
vouchers exceeds the number of system-
Most prior research on electronic identities wide failures.
has focused on persistence and unforgeability [14, We claim that in a large-scale distributed
15, 27, 31], rather than on distinctness. system, these conditions are neither justifiable as
Computational puzzles are an old technique assumptions nor practically realizable as system
[25] that has become popular recently for resisting requirements.
denial-of-service attacks [1, 9, 20] by forcing the
attacker to perform more work than the victim. Acknowledgements
Dingledine et al. [11] suggest using puzzles to
provide a degree of accountability in peer-to-peer The author thanks Miguel Castro for issuing
systems, but this still allows a resourceful attacker the challenge that led to this paper, Jon Howell
to launch a substantial attack, especially if the and Sandro Forin for reviewing drafts, Dan Simon
potential for damage is disproportionate to the for sanity checking the combinable computational
fraction of the system that is compromised. puzzle described in Section 3.2, the anonymous
The issue of establishing on-line identities for IPTPS reviewers for their helpful suggestions for
humans has been studied for some time [12, 32], improving this presentation, and Brian Zill for
with solutions that generally depend on some suggesting the term “Sybil attack.”
direct interaction in the physical world [13, 37].

5
[18] J. H. Hartman, I. Murdock, T. Spalink, “The
References Swarm Scalable Storage System”, 19th ICDCS,
[1] T. Aura, P. Nikander, J. Leiwo, “DoS-Resistant 1999, pp. 74-81.
[19] ICANN, Internet Corporation for Assigned Names
Authentication with Client Puzzles”, Cambridge
and Numbers, 4676 Admiralty Way, Suite 330,
Security Protocols Workshop, Springer, 2000.
Marina del Rey, CA 90292-6601, www.icann.org.
[2] M. Bellare and P. Rogaway, “Random Oracles are
[20] A. Juels, J. Brainard, “Client Puzzles: A
Practical: A Paradigm for Designing Efficient
Protocols”, 1st Conference on Computer and Cryptographic Defense against Connection
Communications Security, ACM, 1993, pp. 62-73. Depletion Attacks”, NDSS ’99, ISOC, 1999, pp.
151-165.
[3] W. J. Bolosky, J. R. Douceur, D. Ely, M. Theimer,
[21] L. Lamport, R. Shostak, M. Pease, “The Byzantine
“Feasibility of a Serverless Distributed File
Generals Problem”, TPLS 4(3), 1982.
System Deployed on an Existing Set of Desktop
[22] K. R. Lefebvre, “The Added Value of EMBASSY
PCs”, SIGMETRICS 2000, 2000, pp. 34-43.
[4] M. Castro, B. Liskov, “Practical Byzantine Fault in the Digital World”, Wave Systems Corp. white
Tolerance”, 3rd OSDI, 1999. paper, www.wave.com, 2000.
[23] D. Mazières, M. Kaminsky, M. F. Kaashoek, E.
[5] D. Chaum, “Untraceable Electronic Mail, Return
Witchel, “Separating Key Management from File
Addresses, and Digital Pseudonyms”, CACM 4
System Security”, 17th SOSP, 1999, pp. 124-139.
(2), 1982.
[24] A. J. Menezes, P. C. van Oorschot, S. A.
[6] B. Chor, O. Goldreich, E. Kushilevitz, M. Sudan,
“Private Information Retrieval”, 36th FOCS, 1995. Vanstone. Handbook of Applied Cryptography.
[7] I. Clarke, O. Sandberg, B. Wiley, T. Hong, CRC Press, 1997.
[25] R. C. Merkle, “Secure Communications over
“Freenet: A Distributed Anonymous Information
Insecure Channels”, CACM 21, 1978, pp. 294-299.
Storage and Retrieval System”, Design Issues in
[26] T. Narten, R. Draves, “Privacy Extensions for
Anonymity and Unobervability, ICSI, 2000.
[8] F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, Stateless Address Autoconfiguration in IPv6”,
I. Stoica, “Wide-Area Cooperative Storage with RFC 3041, 2001.
[27] K. Ohta, T. Okamoto, “A Modification to the Fiat-
CFS”, 18th SOSP, 2001, pp. 202-215.
Shamir Scheme”, Crypto ’88, 1990, pp. 232-243.
[9] D. Dean, A. Stubblefield, “Using Client Puzzles to
[28] M. K. Reiter, A. D. Rubin, “Crowds: Anonymous
Protect TLS”, 10th USENIX Security Symp., 2001.
Web Transactions”, Transactions on Information
[10] R. Dingledine, M. Freedman, D. Molnar “The Free
Haven Project: Distributed Anonymous Storage System Security 1 (1), ACM, 1998.
Service”, Design Issues in Anonymity and [29] A. Rowstron, P. Druschel, “Storage Management
and Caching in PAST, a Large-Scale, Persistent
Unobservability, 2000.
Peer-to-Peer Storage Utility”, 18th SOSP, 2001,
[11] R. Dingledine, M. J. Freedman, D. Molnar
pp. 188-201.
“Accountability”, Peer-to-Peer: Harnessing the
[30] F. R. Schreiber, Sybil, Warner Books, 1973.
Power of Disruptive Technologies, O’Reilly, 2001.
[12] J. S. Donath, “Identity and Deception in the [31] A. Shamir, “An Efficient Identification Scheme
Virtual Community”, Communities in Cyberspace, Based on Permuted Kernels”, Crypto ’89, 1990,
pp. 606-609.
Routledge, 1998.
[32] S. Turkle, Life on the Screen: Identity in the Age of
[13] C. Ellison, “Establishing Identity Without
the Internet, Simon & Schuster, 1995.
Certification Authorities”, 6th USENIX Security
[33] VeriSign, Inc. 487 East Middlefield Road,
Symposium, 1996, pp. 67-76.
[14] U. Feige, A. Fiat, A. Shamir, “Zero-Knowledge Mountain View, CA 94043, www.verisign.com.
[34] M. Waldman, A. D. Rubin, L. F. Cranor, “Publius:
Proofs of Identity”, Journal of Cryptology 1 (2),
A Robust, Tamper-Evident Censorship-Resistant
1988, pp. 77-94.
Web Publishing System”, 9th USENIX Security
[15] A. Fiat, A. Shamir, “How to Prove Yourself:
Symposium, 2000, pp. 59-72.
Practical Solutions of Identification and Signature
Problems”, Crypto ’86, 1987, pp. 186-194. [35] Wave Systems Corp. 480 Pleasant Street, Lee, MA
[16] Y. Gertner, S. Goldwasser, T. Malkin, “A Random 01238, www.wave.com
[36] J. J. Wylie, M. W. Bigrigg, J. D. Strunk, G. R.
Server Model for Private Information Retrieval”,
Ganger, H. Kilite, P. K. Khosla, “Survivable
RANDOM ’98, 1998.
Information Storage Systems”, IEEE Computer 33
[17] A. Goldberg, P. Yianilos, “Towards an Archival
(8), IEEE, 2000, pp. 61-68.
Intermemory”, International Forum on Research
and Technology Advances in Digital Libraries, [37] P. Zimmerman, PGP User’s Guide, MIT, 1994.
IEEE, 1998, pp. 147-156.

Potrebbero piacerti anche