Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Guy E. Blelloch
d k b l c s , cmu. e d u
b l e l l o c h c s , cmu. e d u
C o m p u t e r Science D e p a r t m e n t
Carnegie Mellon University
Pittsburgh, PA 15213
Abstract
Preliminaries
P r o c e s s o r m o d e l . In our d a t a structures we asstone that the processor word length is w bits, for some
w > tog IC], where IC] is the total nmnber of bits consumed by our d a t a structure. T h a t is, we assume t h a t
we can use a w-bit word to point to any m e m o r y we
allocate.
We assume t h a t the processor supports two special
operations, b i t S e l e c t and b i t R a n k , defined as follows.
Given a bit string s of length w bits, b i t S e l e c t ( s , i) returns the least position j such that there are i ones in
the range s[0].., s[j]. btRank(s, j) retm'ns the number
of ones in the range s[0] ... s[j]. These operations mimic
the function of the r a n k and s e l e c t d a t a structures of
Jacobson [19].
If the processor does not support these operations,
we can implenlent t h e m using table lookup in 1/e time
using O(2"Wew log(cw)) bits. By simulating a word size
of O(log [CI) this can be reduced to less than ICI, and
thus made a low order term, while running in constant
time. Note t h a t it is always possible to simulate smaller
words with larger words with constant overhead by
T ~ d simplicial mesh we mean a pure simpliciM complex of
dimension d, which is a manifold, possibly with b o u n d a r y [13].
[1]).
Q u o t i e n t i n g . For sets of fixed length elements
a space b o u n d is already known [24]: to represent
n elements, each of size Isl bits, requires O(n(Isl l o g n ) ) bits. A m e t h o d used to achieve this b o u n d is
quotienting: every element s c U is uniquely hashed
into two bit strings s~,s" such t h a t s t is a l o g n - b i t
index into a hash bucket and s" contains Isl - l o g n
bits. Together, s ~ and s ~ contain enough bits to describe
s; however to add s to the d a t a structure, it is only
necessary to store s" in the bucket specified by s'. T h e
idea of quotienting was first described by K n u t h [21,
Section 6.4, exercise 13] and has been used in several
contexts [11, 8, 28, 15].
Gamma
c o d e s . T h e g a m m a code [14] is a
variable-length prefix code t h a t represents a positive integer v with [log vJ zeroes, followed by the ( [log vJ + 1)bit binary representation of v, for a total of 2 [log vJ + 1
bits.
Given a string s containing a g a m m a code (of
length <_ w) followed possibly by other information,
it is possible to decode the g a m m a code in constant
time. First, an algorithm uses b i t S e l e c t ( s , 1) to find
the location j of the first one in s. T h e length of the
g a m m a code is 2j + 1, so the algorithm uses shifts to
extract the first 2j + 1 bits of s. A g a m m a code for d
is equivalent to a binary code for d with some leading
zeroes; thus decoding d is equivalent to reinterpreting it
as an integer.
If the integer d to be encoded might be zero or
negative, this can be handled by packing a sign bit with
the g a m m a code for d. If the sign bit is a zero, then the
g a m m a code is a code for d; otherwise, the g a m m a code
is actually a code for 1 - d.
G a m m a codes are only one of a wide class of
variable-length codes. This paper makes use of g a m m a
codes because t h e y require very few operations to
decode.
Arrays
Dictionaries
Using our variable-bit-length array structure we can implement space-efficient variable-bit-length dictionaries.
In this section we describe dictionary structures that can
store a set of bit strings sl ... Sn, for 1 < Isil _~ w + l o g n .
(We can handle strings of length greater than w + log n
by allocating memory separately and storing a w-bit
pointer in our structure.) Our structures use space
O(m) where m = ~(max(Is~ I - logn, 1) + Itil).
We will first discuss a straightforward implementation based on chained hashing that permits O(1) expected query time and O(1) expected amortized update
time. We will then present an implementation based
on the dynamic version [12] of the FKS perfect hashing
scheme [16] that improves the query time to O(1) worstcase time. Our structure uses quotienting, as described
in Section 2.
For our quotienting scheme to work, we will need
the number of hash buckets to be a power of two. We
will let q be the number of bits quotiented, and assume
there are 2 q hash buckets in the structure. As the
nmnber of entries grows or shrinks, we will resize the
structure using a standard doubling or halving scheme
so that 2 q ,-~ n .
H a s h i n g . For purposes of hashing it will be convenient to treat the bit strings si as integers. Accordingly
we reinterpret, when necessary, each bit string as the binary representation of a number. To distinguish strings
with different lengths we prepend a 1 to each si before
interpreting it as a number. We denote this padded
numerical representation of si by xi.
We say a family H of hash functions onto 2q
elements is k-universal if for random h E H, P r ( h ( x l ) =
h(x2)) < k/2q [9], and is k-pairwise independent if for
random h E H, Pr(h(xl) = Yl A h(x2) = Y2) _< k/22q
for any xl ~ x2 in the domain, and Yl, Y2 in the range.
We wish to construct hash functions h ' , h " . The
function h' nmst be a hash function h' : {0, 1} w+q+l
{0, 1} q.
The binary representation of h"(xi) must
contain q fewer bits than the binary representation of
xi. Finally, it must be possible to reconstruct xi given
h'(xi) and h'(x~).
Note that others, such as [21, 24, 27], have described
quotienting functions in the past. Previous authors,
however, were not concerned with variable length keys,
so their h" functions do not have the length properties
we need.
For clarity we break xi into two words, one containing the low-order q bits of xi, the other containing the
remaining high-order bits. The hash functions we use
are:
Ti
= x~ div 2q
h"(xd = ~
where h0 is any 2-pairwise independent hash function with range 2q. For example, we can use:
~ = h"
D i c t i o n a r i e s . Our dictionary d a t a structure is a the structure allocates only O(n) array slots, and our
hash table consisting of a variable-bit-length array A structure requires only O(1) bits per unused slot. Thus
and a hash function h ' , h ' . To insert (si,ti) into the the space requirement of our structure is dominated by
structure, we compute s~ and s~~ and insert s~~ and ti the O(m) bits required to store the elements of the set.
Access to elements stored in secondary arrays takes
into bucket s~.
It is necessary to handle the possibility t h a t multi- worst-case constant time. Access to elements stored
ple strings hash to the same bucket. To handle this we in the primary array is more problematic, as the poprepend to each string .s~ or t~ a gannna code indicat- tentially w bits stored in a bucket might contain O(w)
ing its length. (This increases the length of the strings strings, and to meet a worst-case bound it is necessary
by at most a constant factor.) We concatenate together to find the correct string in constant time.
We can solve this problem using table lookup. The
all the strings in a bucket and store the result in the
table needed would range over {0, 1} e*o * {0, 1 } ~ ; and
appropriate array slot.
If the concatenation of all the strings in a bucket would allow searching in a string a of g a m m a codes for
is of size greater t h a n w, we allocate t h a t memory a target string b. Each entry would contain the index in
separately and store a w-bit pointer in the array slot a of b, or the index of the last g a m m a code in a if b was
not present. The total space used would be 2 2 ~ log(ew);
instead.
It takes O(1) time to decode any element in the time needed for a query would be O(1/e).
By selecting e and w appropriately we can nmde the
the bucket (since the g a m m a code for the length
of an element can be read in constant time with a table require o(ICI) space.
bitSelect
function and shifts). Each bucket has exThis gives us the following theorem:
pected size O(1) elements (since our hash function is
universal), so lookups for any element can be accom- T h e o r e m 4.1 Our variable-bit-string dictionary repreplished in expected O(1) time, and insertions and dele- sentation can store bit strings of any size using O(m)
tions can be accomplished in expected amortized O(1) where m = ~ ( m a x ( I s i l - l o g n, 1)+ti) bits while allowing
time.
updates in O(1) amortized expected time and accesses in
The bit string stored for each si has size O(1) worst-case time.
O(max(Isi I - q, 1)); the bit string for ti has size O(Itil).
Our variable-bit-length array increases the space by 5 G r a p h s
at most a constant factor, so the total space used by Using our variable-bit-length dictionary structure we
our variable dictionary structure is O(m) for m = can implement space-efficient representations of unla- l o g n , 1)) + It l).
beled graphs. We will begin by describing a general
P e r f e c t H a s h i n g . We can also use our variable- data structure for representing integer labeled n-vertex
bit-length arrays to implement a dynamized version of graphs. We will then describe how this structure can be
the FKS perfect hashing scheme. We use the same efficiently compressed by assigning labels appropriately.
hash functions h ~, h" as above, except that h ~ maps
O p e r a t i o n s . We wish to support the following
to {0, 1} lgn+l rather than {0, 1} lgn. We maintain a operations:
variable-bit-length array of 2n buckets, and as before
we store each pair (s~~, t~) in the bucket indicated by s~. ADJACENT(u,V): true iff u and v are adjacent in G
If nmltiple strings collide within a bucket, and
their total length is w bits or less, then we store the FIRSTEDGE(v): return the first neighbor of v in G
concatenation of the strings in the bucket, as we did NEXTEDGE(u,V): given a vertex u and neighbor v in G,
with chained hashing above. However, if the length is
return the next neighbor of u
greater than w bits, we allocate a separate variablebit-length array to store the elements. If the bucket ADDEDGE(u, v): add the edge (u, v) to G
contained k bits then the new array has about k 2 s l o t s - we maintain the size and hash function of that array as DELETEEDGE(u, v): delete the edge (u, v) from G.
described by Dietzfelbinger et. al. [12].
The query operations will take O(1) worst-case
In the primary array we store a w-bit pointer to
time, while the update operations ( a d d E d g e and d e l e the secondary array for that bucket. We charge the
t e E d g e ) will take O(1) amortized expected time.
cost of this pointer, and the O(w)-bit overhead for
The a d j a c e n t operation allows us to support adthe array and hash function, to the cost of the w bits
jacency queries, while the combination of f i r s t N e i g h t h a t were stored in t h a t bucket. The space bounds for
b o r and n e x t N e i g h b o r allow us to support neighbor
our structure follow from the bounds proved in [12]:
listing in O(1) time per neighbor. The interface can be
to say, rather than store an entry (u, v~; vi-1, v~+l ) in the
dictionary, we instead store (u, vi - u; v i - 1 - u, vi+ 1 - u).
We use our variable-bit-length dictionary to store
the entries. The encoding of u in each entry requires
log IV] bits; the dictionary absorbs this cost using
quotienting. The space used, then, is proportional to
the cost of encoding vi - u, vi-1 - u, and vi+l - u, for
each edge (u, vi) in the dictionary. V~e compress these
differences by representing t h e m with g a m m a codes
(with sign bits). The cost to encode each edge e with a
logarithnfic code is O(log [e]). Each edge appears O(1)
times in the structure, so the total cost to encode all
the edges is ~ e e E log [e I.
For a k-compact labeling, ~ e ~ E log ]e] is O(kn).
6
Ordered Sets
ADJACENT(U,V)
r e t u r n (LOOKUP((u,
V)) # null)
FIRSTEDGE(u)
(Vp, Vn) ~---LOOKUP((u,U))
ADDEDGE(u,V)
(Vp, Vn) ~--LOOKUP((u,U))
INSERT((u,U), (Vp, V) )
1NSERT((tt,V), (U, Vn) )
DELETEEDGE(u,V)
(Yp, Vn) e-LOOKUP((u,V))
(Vpp, V) ~--LOOKUP((u,Vp) )
(V, Vnn ) +--LOOKUP((U,Vn) )
INSERT((u, Vp), (Vpp, Vn) )
INSERT((u,Vn),(Vp, Vnn) )
DELETE((u,Y))
r e t u r n v~
NEXTEDGE(u, V)
(Vp, Vn) ~--LOOKUP((u, V))
r e t u r n vn
L~EL(S)
[Li[ log(m/[L~l)
LiCL
i<l
_< ~-~lSllog(a~m/ISI))
i=O
cO!
_< 2 +.(~d-2-~lSllog(m/ISl)
e O(ISllog(,~/ISl))
'[his represents the total log-difference when summed
across all "next" pointers. The same analysis bounds
similarly defined "previous" pointers. Together we call
these cross pointers.
We now account for each pointer in the red-black
tree against one of the cross pointers. First partition
the red-black tree into levels based on number of black
nodes in the path from the root to the node. This
gives a proper level covering with a = 2. Now for each
node i, the distance to each of its two children is at
most the distance to the previous or next element in its
level. Therefore we can account for the cost of the left
child against the previous pointer and the right child
against next pointer. The sum of the log-differences of
Cardinal Trees
L e m m a 7.2 gives:
Ec(T)
~(~,t)~o(Itl
<
O(n) + 2 E
log(max(d(v),d(p(v))))
vCV
<
o0 ) + 2
v6V
O(n) + 4n + 2 E
log(d(p(v)))
vCV
labeling.
L e m m a 7.2 For all tree-separator labelings of trees
T = (V, E) of size n, E(.,~)cE(lOg lu - vl) < O(n) +
2 E(u,v)~E log(max(d(u), d(v))).
Theorem
<
O(n) + 2 E
log(1 + c(p(v)))
vCV
Simplicial M e s h e s
tation
8.1 Our
using
our
simplicial
variable-bit
mesh
represendictionary
uses
FIND((a, b, c))
r e t u r n LOOKUP((a, b, c), T)
INSERT(S)
f o r e a c h r o t a t i o n (a, b, c, d) o f S
(e, 0) ~-- LOOKUP((a, b, c), T)
INSERT((a, b, c), (d, e))
DELETE(S)
f o r e a c h r o t a t i o n (a, b, c, d) o f S
(d, e) ~-- LOOKUP((a, b, c), T)
if e = 0 t h e n DELETE((a, b, c), T)
e l s e INSERT((a, b, c), (e, 0))
Figure 2: P s e u d o c o d e to s u p p o r t simplicial mesh operations.
[7] G. E. Blelloch, B. Maggs, and M. Woo. Spaceefficient finger search on degree-balanced search trees.
In SODA, pages 374-383, 2003.
[8] A. Brodnick and J. I. Munro. Membership in constant
time and almost-mininmm space. Siam Journal of
Computing, 28(5):1627-1640, 1999.
[9] L. Carter and M. Wegman. Universal classes of hash
functions. Journal of Computer and System Sciences,
pages 143-154, 1979.
[10] R. C.-N. Chuang, A. Garg, X. He, M.-Y. Kao, and H.-I.
Lu. Compact encodings of planar graphs via canonical
orderings and nmltiple parentheses. Lecture Notes in
Computer Science, 1443:118-129, 1998.
[11] J. G. Cleary. Compact hash tables using bidirectional
linear probing. IEEE -Trans. Comput, 9:828-834, 1984.
[12] M. Dietzfelbinger, A. R. Karlin, K. Mehlhorn, F. M.
auf der Heide, H. Rohnert, and R. E. Tarjan. Dynamic
perfect hashing: Upper and lower bounds. SIAM J.
Comput., 23(4):738-761, 1994.
[13] H. Edelsbrunner.
Geometry and Topology of Mesh
Generation. Cambridge Univ. Press, England, 2001.
[14] P. Elias. Universal codeword sets and representations
of the integers. IEEE Transactions on Information
Theory, IT-21(2):194-203, March 1975.
[15] D. Fotakis, R. Pagh, P. Sanders, and P. G. Spirakis.
Space efficient hash tables with worst case constant
access time. In STACS, 2003.
[16] M. L. Fredman, J. Komlos, and E. Szemerdi. Storing a
sparse table with 0(1) worst case access time. JACM,
31(3):538-544, 1984.
[17] R. Grossi and J. S. Vitter. Compressed suffix arrays
and suffix trees with applications to text indexing and
string matching. In FOCS, pages 397-406, 2000.
[18] X. He, M.-Y. Kao, and H.-I. Lu. A fast general
methodology for information-theoretically optimal encodings of graphs. SIAM J. Computing, 30(3):838-846,
2000.
[19] G. Jacobsen. Space-efficient static trees and graphs.
In 30th FOCS, pages 549-554, 1989.
[20] K. Keeler and J. Westbrook. Short encodings of planar
graphs and maps.
Discrete Applied Mathematics,
58:239-252, 1995.
[21] D. E. Knuth.
The Art of Computer Programruing/Sorting and Searching, Volumes 3. Addison Wesley, 1973.
[22] G. L. Miller, S.-H. Teng, W. P. Thurston, and S. A.
Vavasis. Separators for sphere-packings and nearest
neighbor graphs. Journal of the ACM, 44:1-29, 1997.
[23] J. I. Munro and V. Raman. Succinct representation of
balanced parentheses, static trees and planar graphs.
In 38th FOCS, pages 118-126, 1997.
[24] R. Pagh. Low redundancy in static dictionaries with
o(1) worst case lookup. In ICALP, pages 595-604,
1999.
[25] R. Pagh. Low redundancy in static dictionaries with
constant query time. Siam Jo'a7Tzal of Computing,
31(2):353-363, 2001.
[26] R. Pagh and F. F. Rodler. Cuckoo hashing. In ESA,
2001.
[27] R. Raman, V. Raman, and S. S. Rao.
Succinct
indexable dictionaries with applications to encoding kary trees and multisets. In SODA, 2002.
[28] R. Raman and S. S. Rao. Succinct dynamic dictionaries and trees. In ICALP, pages 357-36, 2003.
[291 G. Turd.n. Succinct representations of graphs. Discrete
Applied Mathematics, 8:289-294, 1984.
10