Sei sulla pagina 1di 10

Dictionaries Using Variable-Length Keys

and Data, with Applications *


Daniel K. Blandford

Guy E. Blelloch

d k b l c s , cmu. e d u

b l e l l o c h c s , cmu. e d u

C o m p u t e r Science D e p a r t m e n t
Carnegie Mellon University
Pittsburgh, PA 15213
Abstract

interested in dynamic dictionaries in which both the


We consider the problem of maintaining a dynamic keys and data are variable-length bit-strings. Our main
dictionary in which both the keys and the associated motivation is to use such dictionaries as building blocks
data are variable-length bit-strings.
We present a for various other applications. We describe applications
dictionary structure based on hashing that supports of our dictionary structure to graphs, cardinal trees with
constant time lookup and expected amortized constant nodes of varying cardinality, ordered sets, and simpliciM
time insertion and deletion. To store the key-data pairs meshes. These applications either generalize or simplify
(Sl, t l ) . . . (Sn, tn), our dictionary structure uses O(m) previously known results. We assume the machine has
bits where m = ~ ( i n a x ( I s i l - logn, 1) + Itil) and Isil is a word length w > log ICI, where ICI is the number of
the length of bit string si. We assume a word length bits used to represent the collection. We assume the
size of each string Isil > 1, Itil > 1 for all bit-strings si
w > log m.
We present several applications, including represen- and ti.
Fox' fixed-length keys the dictionary problem has
tations for semi-dynamic graphs, ordered sets for intebeen
well studied.
The information-theoretic lower
gers in a bounded range, cardinal trees with varying carbound
for
representing
n elements fl'om a universe U
dinality, and simplicial meshes of k dimensions. These
is
B
=
[(InUI)7
=
n(loglU
I - logn) + O(n). Cleary
results either generalize or simplify previous results.
[11] showed how to achieve (1 + e)B + O(n) bits with
O(1/e 2) expected time for lookup and insertion while
1
Introduction
allowing satellite data. His structure used the technique
There has been significant recent interest in data strucof quotienting [21], which involves storing only part of
tures that use near optimal space while supporting fast
each key in a hash bucket; the part not stored can be
access [19, 23, 10, 8, 24, 17, 27, 15, 4, 28]. In additional
reconstructed using tile index of the bucket containing
to theoretical interest, such structures have significant
the key. Brodnik and Munro [8] describe a static
practical implications. In recent experimental work [5],
structure using B + o(B) bits and requiring O(1) time
for example, it was shown that a compact representafor lookup; the structure can be dynamized, increasing
tion of graphs not only requires much less space than
the space cost to O(B) bits. T h a t structure does
standard representations (e.g., adjacency lists), but in
not support satellite data. Pagh [25] showed a static
many cases it is faster. This is because it requires less
dictionary using B + o(B) bits and O (1) query time that
data to be loaded into the cache.
supported satellite data, using ideas similar to Cleary's,
The dictionary problem is to maintain an n-element
but that structure could not be easily dynamized.
set of keys si with associated data t~. A dictionary
Recently Raman and Rao [28] described a dynamic
is dynamic if it supports insertion and deletion as
dictionary structure using B + o(B) bits that supports
well as the lookup operation. In this paper we are
lookup in O(1) time and insertion and deletion in
O(1) expected amortized time. The structure allows
;Ttils work was supported in part by the Na- attaching fixed-length (It I-bit) satellite data to elements:
tional Science Foundation as part of the Aladdin Cen- in that case the space bound is B + 'nit I + o(B + nit[)
ter (m~w. a l a d d i n , cmu. edu) under grants ACI-0086093, bits. None of this considers variable-bit keys or data.
CCR-0085982, and CCR-0122581.

Ore" variable-bit dictionary structure can store pairs


(si, t~) using O(m) space where m = ~ i ( m a x ( 1 , ]sil logn) + Itd). Note that if Isil is constant and It~l is zero
then O(m) simplifies to O(B). Our dictionaries support
lookup in 0(1) time and insertion and deletion in 0(1)
expected amortized time.
Our dictionary makes use of a simpler structure: an
"array" structure that supports an array of n locations
( 1 , . . . ,n) with lookup and update operations. We
denote the i th element of an array A as ai. In our case
each location will store a bit-string. We present a datastructure that uses O(m+w) space where m = ~ 1
lail
and w is the machine word length. The structure
supports lookups in 0 ( 1 ) worst-case time and updates
in 0 ( 1 ) expected amortized time. Note that if all bitstrings were the same length then this would be trivial.
A p p l i c a t i o n s . Using our dictionaries we present
succinct dynamic representations for several other d a t a
structures. For graphs we support adjacency queries,
listing the neighbors of a vertex, and deleting and
inserting edges. Insertion and deletion run in 0 ( 1 )
expected amortized time, adjacency queries require
O(1) worst case time, and listing neighbors requires
0 ( 1 ) time per neighbor. Given an integer labeling of
the vertices, the space required is O ( m + n) where
m = ~ ( ~ , ~ ) ~ E l g l u - - v l , and n = IVI. Any graph
from a class satisfying an n 1-~ edge-separator theorem
(e > 0) can be labeled so that m < kn for some
constant k, and hence can be coded in O(n) bits. It
is well known, for example, that the class of boundeddegree planar-graphs satisfies an n 1/2 edge-separator
theorem. For graphs with bounded degree this extends
previous results [29, 20, 18, 4] by pernfitting insertion
and deletion of edges. We say t h a t the graph is partially
dynamic since although it allows dynamic insertions and
deletions, the space bound relies on m remaining small.
As far as we know this is the first compact dynamic
graph representation of any kind.
For ordered sets S C { 0 , . . . , I U ] - 1} we support
the same operations in the same bounds as recently
reported [3], except that the updates are expected
amortized time here and were worst case time bounds
there.
The structure we describe here, however is
simpler and quite different from the previous structure.
It also allows for attaching a satellite bit string to each
key.
For cardinal trees (aka tries) we support a tree
in which each node can have a different cardinality.
Queries can request the k th child, or the parent of any
vertex. Again we can attach satellite bit-strings to each
vertex. Updates can add or delete the k th child. For
an integer labeled tree the space bound is O(m) where
m = ~vev(logc(p(v))+ log l v - p(v)l), and p(v) and

c(v) are the parent and cardinality of v, respectively.


Using an appropriate labeling of the vertices m reduces
to ~ v e V log c(p(v)), which is asymptotically optimal.
This generalizes previous results on cardinal trees [2, 27]
to varying cardinality. We do not match the optimal
constant in the first order term.
For d-simplicial meshes I we support insertion and
deletion of simplices of dimension d, and returning the
neighbors across all faces of dimension d - 1. For example in a 3d tetrahedral mesh we can add and delete
tetrahedrons, and ask for the neighboring tetrahedron
across any of the four faces, if there is one. Given an
integer labeling of the vertices, the space required is
O(m + n) where m = ~(a,b,c)EF (log [a -- b I + log la - cl) ,
n = IVI, and F are the faces in the 2 - s k e l e t o n of
the mesh. In the bounded degree case this reduces to
m = ~(u,v)eE(log lu -- vl) where E are the edges in the
1-skeleton of the mesh. As usual, updates take O(1)
amortized expected time and queries take O(1) worst
case time. We note t h a t we have actually used a similar
d a t a structure as described here to implement triangulated and tetrahedral meshes [6]. The experiments show
that the tetrahedral mesh allows fast access and updates
and uses only about 7 bytes per tetrahedron (compared
to 32 bytes per tetrahedron needed for the most compact traditional representation). In t h a t paper we do
not give any theoretical bounds on space.
2

Preliminaries

P r o c e s s o r m o d e l . In our d a t a structures we asstone that the processor word length is w bits, for some
w > tog IC], where IC] is the total nmnber of bits consumed by our d a t a structure. T h a t is, we assume t h a t
we can use a w-bit word to point to any m e m o r y we
allocate.
We assume t h a t the processor supports two special
operations, b i t S e l e c t and b i t R a n k , defined as follows.
Given a bit string s of length w bits, b i t S e l e c t ( s , i) returns the least position j such that there are i ones in
the range s[0].., s[j]. btRank(s, j) retm'ns the number
of ones in the range s[0] ... s[j]. These operations mimic
the function of the r a n k and s e l e c t d a t a structures of
Jacobson [19].
If the processor does not support these operations,
we can implenlent t h e m using table lookup in 1/e time
using O(2"Wew log(cw)) bits. By simulating a word size
of O(log [CI) this can be reduced to less than ICI, and
thus made a low order term, while running in constant
time. Note t h a t it is always possible to simulate smaller
words with larger words with constant overhead by
T ~ d simplicial mesh we mean a pure simpliciM complex of
dimension d, which is a manifold, possibly with b o u n d a r y [13].

packing multiple small words into a larger one.


M e m o r y a l l o c a t i o n . M a n y of our structures do
not explicitly s u p p o r t storage of bit strings longer t h a n
w bits. To handle these strings we use a separate
m e m o r y allocation system. This m e m o r y system must
be capable of allocating or freeing Isl bits of m e m o r y in
time Isl/w, and m a y use O(Isl) space to keep track of
each allocation. It is well known how to do this (e.g.,

[1]).
Q u o t i e n t i n g . For sets of fixed length elements
a space b o u n d is already known [24]: to represent
n elements, each of size Isl bits, requires O(n(Isl l o g n ) ) bits. A m e t h o d used to achieve this b o u n d is
quotienting: every element s c U is uniquely hashed
into two bit strings s~,s" such t h a t s t is a l o g n - b i t
index into a hash bucket and s" contains Isl - l o g n
bits. Together, s ~ and s ~ contain enough bits to describe
s; however to add s to the d a t a structure, it is only
necessary to store s" in the bucket specified by s'. T h e
idea of quotienting was first described by K n u t h [21,
Section 6.4, exercise 13] and has been used in several
contexts [11, 8, 28, 15].
Gamma
c o d e s . T h e g a m m a code [14] is a
variable-length prefix code t h a t represents a positive integer v with [log vJ zeroes, followed by the ( [log vJ + 1)bit binary representation of v, for a total of 2 [log vJ + 1
bits.
Given a string s containing a g a m m a code (of
length <_ w) followed possibly by other information,
it is possible to decode the g a m m a code in constant
time. First, an algorithm uses b i t S e l e c t ( s , 1) to find
the location j of the first one in s. T h e length of the
g a m m a code is 2j + 1, so the algorithm uses shifts to
extract the first 2j + 1 bits of s. A g a m m a code for d
is equivalent to a binary code for d with some leading
zeroes; thus decoding d is equivalent to reinterpreting it
as an integer.
If the integer d to be encoded might be zero or
negative, this can be handled by packing a sign bit with
the g a m m a code for d. If the sign bit is a zero, then the
g a m m a code is a code for d; otherwise, the g a m m a code
is actually a code for 1 - d.
G a m m a codes are only one of a wide class of
variable-length codes. This paper makes use of g a m m a
codes because t h e y require very few operations to
decode.

Arrays

T h e variable-bit-string array problem is to maintain bit


strings at ... an, supporting u p d a t e and l o o k u p operations. Our array representation supports strings of size
1 _< lai[ < w. Strings of size more t h a n w must be allocated separately, and w-bit pointers to t h e m can be

stored in our structure.


O u r structure consists of two parts: a set of blocks
B and an index I. T h e bit-strings in the array are stored
in the blocks. T h e index allows us to quickly locate the
block containing a given array element.
B l o c k s . A block B~ is an encoding of a series of
bit strings (in increasing order) ai, ai+l, . . . , a~+k.
T h e block stores the concatenation of the strings bi =
aiai+l...a~+k, together with information from which
the start location of each string can be found. It suffices
to store a second bit string b~ such t h a t b~ contains a 1
at position j if and only if some bit string ak ends at
position j in bi.
A block Bi consists of the pair (bi, b~). We denote
k
the size of a block by Ibil = ~ j = o lai+Jl We maintain
the strings of our array in blocks of size at most w.
We maintain the invariant that, if two blocks in our
structure are adjacent (meaning, for some i, one block
contains ai and the other contains at+l), then the sum
of their sizes is greater t h a n w.
I n d e x s t r u c t u r e . T h e index I for our array structure consists of a bit array A l l . . . n] and a hash table
H. T h e array A is maintained such t h a t A[i] = 1 if and
only if the string ai is the first string in some block Bi
in our structure. In t h a t case, the hashtable H maps i
to Bi.
T h e hashtable H must use O(w) bits (that is, O(1)
words) per block maintained in the hashtable. It must
support insertion and deletion in expected amortized
O(1) time, and lookup in worst-case O(1) time. Cuckoo
hashing [26] or the d y n a m i c version of the FKS perfect
hashing scheme [12] have these properties. If expected
rather t h a n worst-case lookup bounds are acceptable,
then a s t a n d a r d implementation of chained hashing will
work as well.
O p e r a t i o n s . We begin by observing t h a t no block
can contain more t h a n w bit strings (since blocks have
m a x i m u m size w and each bit string has size at least one
bit). Thus, fi'om any position A[k], the distance to the
nearest one in either direction is at most w. To find the
nearest one on the left, we let s = A[k - w ] . . . A[k - 1]
and c o m p u t e b i t S e l e c t ( s , b i t R a n k ( s , w - 1)). To find
the nearest one on the right, we let s = A [ k + l ] . . . A [ k +
w] and c o m p u t e b i t S e l e c t ( s , 1). These operations take
constant time.
To access a string ak, our structure first searches I
for the block Bi containing a k . This is simply a search
on A for the nearest one on the left. T h e structure
performs a hashtable lookup to access the target block
Bi. Once the block is located, the structure scans the
index string b~ to find the location of ak. This can be
done using b i t S e l e c t ( b ~ , k - i + 1).
If ak is updated, its block Bi is rewritten. If Bi

becomes smaller as a result of an update, it may need


to be merged with its left neighbor or its right neighbor
(or both). In either case this takes constant time.
If Bi becomes too large as a result of an update to
ak, it is split into at most three blocks. The structure
may create a new block at position k, at position k + 1,
or (if the new lakl is large) both. To maintain the size
invariant, it may then be necessary to join Bi with the
block on its left, or to join the rightmost new block with
the block on its right.
All of the operations on blocks and on A take O(1)
time: shifting and copying takes can be done w bits at
a time. Access operations on H take O(1) worst-case
time; updates take O(1) expected amortized time.
We define the total length of the bit-strings in
n jail). The structure
the structure to be m = O (~i=1
contains n bits in A plus O(w) bits per block; there are
0 ( m / w + 1) blocks, so the total space usage is O (m + w).
This gives us the following theorem:

T h e o r e m 3.1 Our variable-bit-string array represen-

tation can store bit strings of length 1 _< ai ~ w in


O(w + ~in l lail) bits while allowing accesses in O(1)
worst-case time and updates in O(1) amortized expected
time.
The proof follows from the discussion above.
4

Dictionaries

Using our variable-bit-length array structure we can implement space-efficient variable-bit-length dictionaries.
In this section we describe dictionary structures that can
store a set of bit strings sl ... Sn, for 1 < Isil _~ w + l o g n .
(We can handle strings of length greater than w + log n
by allocating memory separately and storing a w-bit
pointer in our structure.) Our structures use space
O(m) where m = ~(max(Is~ I - logn, 1) + Itil).
We will first discuss a straightforward implementation based on chained hashing that permits O(1) expected query time and O(1) expected amortized update
time. We will then present an implementation based
on the dynamic version [12] of the FKS perfect hashing
scheme [16] that improves the query time to O(1) worstcase time. Our structure uses quotienting, as described
in Section 2.
For our quotienting scheme to work, we will need
the number of hash buckets to be a power of two. We
will let q be the number of bits quotiented, and assume
there are 2 q hash buckets in the structure. As the
nmnber of entries grows or shrinks, we will resize the
structure using a standard doubling or halving scheme
so that 2 q ,-~ n .

H a s h i n g . For purposes of hashing it will be convenient to treat the bit strings si as integers. Accordingly
we reinterpret, when necessary, each bit string as the binary representation of a number. To distinguish strings
with different lengths we prepend a 1 to each si before
interpreting it as a number. We denote this padded
numerical representation of si by xi.
We say a family H of hash functions onto 2q
elements is k-universal if for random h E H, P r ( h ( x l ) =
h(x2)) < k/2q [9], and is k-pairwise independent if for
random h E H, Pr(h(xl) = Yl A h(x2) = Y2) _< k/22q
for any xl ~ x2 in the domain, and Yl, Y2 in the range.
We wish to construct hash functions h ' , h " . The
function h' nmst be a hash function h' : {0, 1} w+q+l
{0, 1} q.
The binary representation of h"(xi) must
contain q fewer bits than the binary representation of
xi. Finally, it must be possible to reconstruct xi given
h'(xi) and h'(x~).
Note that others, such as [21, 24, 27], have described
quotienting functions in the past. Previous authors,
however, were not concerned with variable length keys,
so their h" functions do not have the length properties
we need.
For clarity we break xi into two words, one containing the low-order q bits of xi, the other containing the
remaining high-order bits. The hash functions we use
are:
Ti
= x~ div 2q
h"(xd = ~

-~i --- xi rood 2 q


h'(x~) = ( t , 0 ( ~ ) ) e ~

where h0 is any 2-pairwise independent hash function with range 2q. For example, we can use:

ho(xi) = ((axi + b) rood p) mod 2 q


where p > 2q is prime and a, b are randomly chosen fi'om
1 . . . p. Given h ~ and h ' , these functions can be inverted
in a straightforward manner:

~ = h"

x_~ = ho(h") h'

We can show that the family from which ]ff are


drawn is 2-universal as follows. Given Xl ~ x2, we have
P r ( h ' ( x l ) = h'(x2))

Pr(ho(.Y1) g21 = ho(~2) 0 ~ 2 )

Pr(ho(Xl) ho(.g2) = ~1 0.~_2)

The probability is zero if-~1 = .Y2, and otherwise it is


< 2/22q (by the 2-pairwise independence of h0). Thus
Pr(h'(Xl) -- h'(x2)) _~ 2/22q.
Note also that selecting a function from H requires
O(logn) random bits.

D i c t i o n a r i e s . Our dictionary d a t a structure is a the structure allocates only O(n) array slots, and our
hash table consisting of a variable-bit-length array A structure requires only O(1) bits per unused slot. Thus
and a hash function h ' , h ' . To insert (si,ti) into the the space requirement of our structure is dominated by
structure, we compute s~ and s~~ and insert s~~ and ti the O(m) bits required to store the elements of the set.
Access to elements stored in secondary arrays takes
into bucket s~.
It is necessary to handle the possibility t h a t multi- worst-case constant time. Access to elements stored
ple strings hash to the same bucket. To handle this we in the primary array is more problematic, as the poprepend to each string .s~ or t~ a gannna code indicat- tentially w bits stored in a bucket might contain O(w)
ing its length. (This increases the length of the strings strings, and to meet a worst-case bound it is necessary
by at most a constant factor.) We concatenate together to find the correct string in constant time.
We can solve this problem using table lookup. The
all the strings in a bucket and store the result in the
table needed would range over {0, 1} e*o * {0, 1 } ~ ; and
appropriate array slot.
If the concatenation of all the strings in a bucket would allow searching in a string a of g a m m a codes for
is of size greater t h a n w, we allocate t h a t memory a target string b. Each entry would contain the index in
separately and store a w-bit pointer in the array slot a of b, or the index of the last g a m m a code in a if b was
not present. The total space used would be 2 2 ~ log(ew);
instead.
It takes O(1) time to decode any element in the time needed for a query would be O(1/e).
By selecting e and w appropriately we can nmde the
the bucket (since the g a m m a code for the length
of an element can be read in constant time with a table require o(ICI) space.
bitSelect
function and shifts). Each bucket has exThis gives us the following theorem:
pected size O(1) elements (since our hash function is
universal), so lookups for any element can be accom- T h e o r e m 4.1 Our variable-bit-string dictionary repreplished in expected O(1) time, and insertions and dele- sentation can store bit strings of any size using O(m)
tions can be accomplished in expected amortized O(1) where m = ~ ( m a x ( I s i l - l o g n, 1)+ti) bits while allowing
time.
updates in O(1) amortized expected time and accesses in
The bit string stored for each si has size O(1) worst-case time.
O(max(Isi I - q, 1)); the bit string for ti has size O(Itil).
Our variable-bit-length array increases the space by 5 G r a p h s
at most a constant factor, so the total space used by Using our variable-bit-length dictionary structure we
our variable dictionary structure is O(m) for m = can implement space-efficient representations of unla- l o g n , 1)) + It l).
beled graphs. We will begin by describing a general
P e r f e c t H a s h i n g . We can also use our variable- data structure for representing integer labeled n-vertex
bit-length arrays to implement a dynamized version of graphs. We will then describe how this structure can be
the FKS perfect hashing scheme. We use the same efficiently compressed by assigning labels appropriately.
hash functions h ~, h" as above, except that h ~ maps
O p e r a t i o n s . We wish to support the following
to {0, 1} lgn+l rather than {0, 1} lgn. We maintain a operations:
variable-bit-length array of 2n buckets, and as before
we store each pair (s~~, t~) in the bucket indicated by s~. ADJACENT(u,V): true iff u and v are adjacent in G
If nmltiple strings collide within a bucket, and
their total length is w bits or less, then we store the FIRSTEDGE(v): return the first neighbor of v in G
concatenation of the strings in the bucket, as we did NEXTEDGE(u,V): given a vertex u and neighbor v in G,
with chained hashing above. However, if the length is
return the next neighbor of u
greater than w bits, we allocate a separate variablebit-length array to store the elements. If the bucket ADDEDGE(u, v): add the edge (u, v) to G
contained k bits then the new array has about k 2 s l o t s - we maintain the size and hash function of that array as DELETEEDGE(u, v): delete the edge (u, v) from G.
described by Dietzfelbinger et. al. [12].
The query operations will take O(1) worst-case
In the primary array we store a w-bit pointer to
time, while the update operations ( a d d E d g e and d e l e the secondary array for that bucket. We charge the
t e E d g e ) will take O(1) amortized expected time.
cost of this pointer, and the O(w)-bit overhead for
The a d j a c e n t operation allows us to support adthe array and hash function, to the cost of the w bits
jacency queries, while the combination of f i r s t N e i g h t h a t were stored in t h a t bucket. The space bounds for
b o r and n e x t N e i g h b o r allow us to support neighbor
our structure follow from the bounds proved in [12]:
listing in O(1) time per neighbor. The interface can be

supported by using doubly linked adjacency lists along


with a hash table using O((]E[ + ]V]) log [V[) bits. The
hash table can be used for the adjacency and d e l e t e E d g e operations. We would like to improve on the
space bounds.
Our structure can represent any graph but it will
give good compression results only on a certain class
of graphs: those with k-compact labelings. Given an
integer labeling for the vertices of a graph, we define
the length of an edge ]e], e = (u, v) to be the distance
between its vertices ]u - v[. We say that a k-compact
labeling for a graph is one for which EeCElog [e] <
k]V]. We define a graph to be k-compact if it has a
k-compact labeling.
Blandford et. al. showed that for any class of
graphs satisfying an O(nl-~)-edge separator theorem,
e > 0, all members are O(1)-compact [ 4 ] . This
includes bounded-degree planar graphs, which satisfy
an O(n)-edge separator theorem, and certain wellshaped meshes [22] of fixed dimension. The labeling
can be found using separator trees. Additionally, m a n y
graphs in practice have been found to be k-compact
for much smaller k than would be expected for random
graphs [5] (e.g., web link graphs, VLSI circuits, and
internet connectivity graphs).
T h e o r e m 5.1 All n-vertex graphs with a k-compact
labeling can be stored in O(k]V[) bits while allowing
updates in O(1) amortized expected time and queries in
O(1) worst-case time.

Proof. We begin by describing our graph structure in


an uncompressed form, and then describe how it is
compressed.
Our structure represents a graph as a dictionary
of edges. The edges incident on each vertex are cross
linked into a doubly linked list. Consider a vertex u
and some ordering on its neighboring vertices V l , . . . , Yd.
We represent each edge (u, v i ) , l < i < d using the
dictionary entry (u, vi;vi_l,Vi+l). (That is, (u, vi) is
the key, and ( v i - l , v i + l ) is the associated data.) We
define vo, Vd+l = u and for each vertex we include an
entry (u, u; Vd, vl).
Given this representation we can support all of
the above operations using functions of the dictionary.
Pseudocode for these operations is shown in Figure 1.
In its uncompressed form this dictionary consumes
d + 1 entries for each vertex of degree d. The total
number of entries is therefore IV] + [El. The space used
is O((IEI + Ivl)~).
C o m p r e s s i o n . To compress this structure we
make use of difference coding: we simply store each dictionary entry using differences with respect to u. T h a t is

to say, rather than store an entry (u, v~; vi-1, v~+l ) in the
dictionary, we instead store (u, vi - u; v i - 1 - u, vi+ 1 - u).
We use our variable-bit-length dictionary to store
the entries. The encoding of u in each entry requires
log IV] bits; the dictionary absorbs this cost using
quotienting. The space used, then, is proportional to
the cost of encoding vi - u, vi-1 - u, and vi+l - u, for
each edge (u, vi) in the dictionary. V~e compress these
differences by representing t h e m with g a m m a codes
(with sign bits). The cost to encode each edge e with a
logarithnfic code is O(log [e]). Each edge appears O(1)
times in the structure, so the total cost to encode all
the edges is ~ e e E log [e I.
For a k-compact labeling, ~ e ~ E log ]e] is O(kn).
6

Ordered Sets

We would like to represent ordered sets S of integers


in the range ( 0 , . . . , m - 1). In addition to lookup
operations, an ordered set needs to efficiently support
queries that depend on the order. Here we consider
findNext and finger searching, findNext on a key kl
finds min{k2 c Slk2 > kl}; finger searching on a finger
key kl E S and a key k2 finds rain{k3 c Slka > k2}, and
returns a finger to ka. Finger searching takes O(log l)
time, where l = [{k c Slkl < k < k2}l.
To represent the set we use a red-black tree on the
elements. We will refer to vertices of the tree by the
value of the element stored at the vertex, use n to refer
to the size of the set, and without loss of generality we
assume n < m / 2 . For each element v we denote the
parent, left-child, right child, and red-black flag as p(v),
l(v), r(v), and q(v) respectively.
We represent the tree as a dictionary containing entries of the form (v; l(v) - v, r(v) - v, q(v)). (We could
also add parent pointers p(v) - v without violating the
space bound, but in this case they are unnecessary.) It is
straightforward to traverse the tree from top to b o t t o m
in the standard way. It is also straightforward to implement a rotation by inserting and deleting a constant
number of dictionary elements. Assuming dictionary
queries take O(1) time, findNext can be implemented
in O(logn) time. Using a hand d a t a structure [7], finger searching can be implemented in O(log l) time with
an additional O(log 2 n) space. Membership takes O(1)
time. Insertion and deletion take O(logn) expected
amortized time. We call this d a t a structure a dictionary red-black tree.
It renmins to show the space bound for the structure.
L e m m a 6.1 If a set of integers S c { 0 , . . . , m - 1} of
size n is arranged in-order in a red-black tree T then

Ev~T(log Ip(v) - vl) c O(n log(m/n)).

ADJACENT(U,V)
r e t u r n (LOOKUP((u,

V)) # null)

FIRSTEDGE(u)
(Vp, Vn) ~---LOOKUP((u,U))

ADDEDGE(u,V)
(Vp, Vn) ~--LOOKUP((u,U))
INSERT((u,U), (Vp, V) )
1NSERT((tt,V), (U, Vn) )
DELETEEDGE(u,V)
(Yp, Vn) e-LOOKUP((u,V))
(Vpp, V) ~--LOOKUP((u,Vp) )
(V, Vnn ) +--LOOKUP((U,Vn) )
INSERT((u, Vp), (Vpp, Vn) )
INSERT((u,Vn),(Vp, Vnn) )
DELETE((u,Y))

r e t u r n v~

NEXTEDGE(u, V)
(Vp, Vn) ~--LOOKUP((u, V))
r e t u r n vn

Figure 1: Pseudocode to support our graph operations.

Proof. Consider the elements of a set S C {0,... , m - 1}


organized in a set of levels L(S) = {L1,.. . , Lt}, Li C S.
If IL~I _< a [ L i + l ] , l _< i < 1,a > 1, we say such an
organization is a pwper level covering of the set.
We first consider the sum of the log-differences
of cross pointers within each level, and then count
the pointers in the red-black trees against these pointers.
For any set S C {0,... , m 1} we define
next(e,S) = min{e' E S U {m}[e' > e}, and M ( S ) =
E y e s log(next(j, S) - j). Since logarithms are concave
the sum is maximized when the elements are evenly
spaced and hence M ( S ) _< ISllog(m/ISl)). For any
proper level covering L of a set S this gives:
~_

L~EL(S)

[Li[ log(m/[L~l)

LiCL
i<l
_< ~-~lSllog(a~m/ISI))

the child pointers is therefore at most the stun of the


log-difibrences of the next and previous cross pointers.
This gives the desired bound.
T h e o r e m 6.1 A set of integers S c {0,... , m - 1}

of size n represented as a dictionary red-black tree and


using a compressed dictionary uses O(n log((n + m ) / n )
bits and supports find-next queries in O(logn) time,
finger-search queries in O(log l) time, and insertion and
deletion in O(log n) expected amortized time.
Proof. (outline) Recall that the space for a compressed dictionary is bounded by O(m) where m =
~(s,t)cD(max(1, Isl--log ]DI)+]t]). The keys use log ]D I
bits each, and the size of the data stored in the dictionary is bounded by Lemma 6.1. This gives the desired
bounds.

i=O
cO!

_< 2 +.(~d-2-~lSllog(m/ISl)
e O(ISllog(,~/ISl))
'[his represents the total log-difference when summed
across all "next" pointers. The same analysis bounds
similarly defined "previous" pointers. Together we call
these cross pointers.
We now account for each pointer in the red-black
tree against one of the cross pointers. First partition
the red-black tree into levels based on number of black
nodes in the path from the root to the node. This
gives a proper level covering with a = 2. Now for each
node i, the distance to each of its two children is at
most the distance to the previous or next element in its
level. Therefore we can account for the cost of the left
child against the previous pointer and the right child
against next pointer. The sum of the log-differences of

Cardinal Trees

A cardinal tree (aka trie) is a rooted tree in which every


node has c slots for children any of which can be filled.
We generalize the standard definition of cardinal trees
to allow each node v to have a different c, denoted as
c(v). For a node v we want to support returning the
parent p(v) and the i th child v[i], if any. We also want
to support deleting or inserting a leaf node. As with
graphs, we consider these semi dynamic operations since
the updates might require relabeling of the vertices to
maintain the space bounds.
L e m m a 7.1 Integer labeled cardinal trees with m =

~ v c v ( l o g c(p(v) ) + log I v - p(v)l), can be stored in O(m)


bits and support parent and child queries in O(1) time
and insertion and deletion of leaves in O(1) expected
amortized time.

Proof. (outline) For child queries we can just store a


dictionary entry for each vertex v that is keyed on
(p(v), i) and stores p(v) - v as the data. In the cost for
dictionaries given by rn =
+ m a x ( l , Isllog IDI)) the p(v) can be counted against the log IDI,
the i against the logc(p(v)) and the p(v) - v against
the log I v - p(v)l. Parent queries can be supported by a
dictionary from v to p(v).

L e m m a 7.2 gives:

Ec(T)

~(~,t)~o(Itl

<

O(n) + 2 E

log(max(d(v),d(p(v))))

vCV

<

d(v) + log d(p(v)))

o0 ) + 2
v6V

O(n) + 4n + 2 E

log(d(p(v)))

vCV

Any tree T can be separated into a set of trees of size


at most 1/2n by removing a single node. Recursively
applying such a separator on the cardinal tree defines
a separator tree T~ over the nodes. An integer labeling
can then be given to the nodes of T based on the inorder
traveral of T~. We call such a labeling a tree-separator

labeling.
L e m m a 7.2 For all tree-separator labelings of trees
T = (V, E) of size n, E(.,~)cE(lOg lu - vl) < O(n) +
2 E(u,v)~E log(max(d(u), d(v))).

Proof. Consider the separator tree T~ = (V,E~) on


which the labeling is based. For each node v we denote
the degree of v by d(v). We let To(v) denote the subtree
of T~ t h a t is rooted at v. Thus IZ~(v)l is the size of the
piece of T for which v was chosen as a separator.
There is a one-to-one correspondence between the
edges E and edges E~.
In particular consider an
edge (v, v ~) C E~ between a vertex v and a child v'.
This corresponds to an edge (v,v") E T, such t h a t
v" c T~(v~). We need to account for the log-difference
log [v - v"]. We have Iv - v'q < ITs(v)l since all labels
in any subtree are given sequentially. We partition the
edges into two classes and calculate the cost for edges
in each class.
First, if d(v) > ~
we have for each edge
(v,v"), l o g l v - v" I < loglT~(v)l < 21ogd(v) <
2 log max(d(v), d(v")).
Second, if d(v) < IT~s(v)~ we charge each edge
(v, v ' ) to the node v. The most t h a t can be charged to a
node is ~T~s(v)~ log Ir~(v)l (one pointer to each child).
Note t h a t for any tree in which for every node v (A)
Ir~(v)l < 1/2lr~(p(v))l, and (B) cost(v) C O(IT~(v)l ~)
for some e < 1, we have ~ c v c o s t ( v )
c O(n).
Therefore the total charge is O(n).
Summing the two classes of edges gives O(ITI) +
2 E(u,v)cE log(max(d('@, d(v))).
7.1 Cardinal trees with a tree-separator labeling, with m = ~ v e y ( 1 + log(1 + c(p(v)))) can be
stored in O(m) bits.

Theorem

Proof. We are interested in the edge cost Ec(T) =


~v(loglvp(v)l ).
Substituting p(v) for u in

<

O(n) + 2 E

log(1 + c(p(v)))

vCV

With L e m m a 7.1 this gives tile required bounds.

Simplicial M e s h e s

Using our variable-bit-length dictionary structure we


can implement space-efficient representations of d dimensional simplicial meshes. By a d simplicial mesh we
mean a pure simplicial complex of dimension d, which
is a manifold, possibly with boundary [13].
We will describe the structure for d = 3 but note
t h a t this can be generalized to d dimensions. Our
structure supports the following operations:

find(a,b,e): finds all vertices d such t h a t (a,b,c,d)


form a tetrahedron in M (at most two since the
mesh is a manifold).
i n s e r t ( a , b, e, d): adds the tetrahedron (a, b, c, d) to M
d e l e t e ( a , b, c, d): delete the tetrahedron (a, b, c, d) from
M
We represent a simplicial mesh as a dictionary of
simplices. Each face (a, b, c) in the mesh m a y belong to
two tetrahedra, (a, b, c, d) and (a, b, c, e). For each face
in the mesh we store the entry (a, b, c; d, e). If a face
is not part of two tetrahedra then we store the special
character 0 in t h a t slot. The operations can then be
implemented as shown in Figure 2.
As we did with our graph implementation, we
can compress this structure by encoding b, c, d, and
e relative to a. T h a t is, in our variable-bit-length
dictionary we store tuples of the form (a, b - a, c - a; d a, e - a). To account for the space stored, we charge
the cost of storing b - a and c - a to the face (a, b, c);
we charge the cost of d - a to the face (a, b, d) and of
e - a to the face (a, b, e). The dictionary absorbs the
log IVl-bit cost of representing a using quotienting.
Each face is charged at most O(1) times, and each
time the charge is O(log Ib - al + log I c - al). This gives:
Theorem

tation

8.1 Our

using

our

simplicial
variable-bit

mesh
represendictionary
uses

FIND((a, b, c))
r e t u r n LOOKUP((a, b, c), T)
INSERT(S)
f o r e a c h r o t a t i o n (a, b, c, d) o f S
(e, 0) ~-- LOOKUP((a, b, c), T)
INSERT((a, b, c), (d, e))

DELETE(S)
f o r e a c h r o t a t i o n (a, b, c, d) o f S
(d, e) ~-- LOOKUP((a, b, c), T)
if e = 0 t h e n DELETE((a, b, c), T)
e l s e INSERT((a, b, c), (e, 0))
Figure 2: P s e u d o c o d e to s u p p o r t simplicial mesh operations.

O(E(a,b,~)cF(lOgla -- b] + log]a - c])) bits where


F is the 2 - s k e l e t o n of the mesh. It supports f i n d in
worst-case constant time and add and d e l e t e in
expected amortized constant time.
If the 2-skeleton of the mesh (that is, the graph
induced by the faces) has a k - c o m p a c t labeling, then
the representation of the mesh will use O(n) bits. We
note t h a t well-shaped meshes with b o u n d e d degree have
small separators [22] and are therefore k - c o m p a c t tbr
fixed dimension.
If the mesh has bounded degree, then the bound
O(~(a,b,c)eg(logla -- b I + log la - cl) ) simplifies to
O ( ~ ( ~ # ) c E ( l O g lu -- vl)), where E is the 1 - s k e l e t o n of
the mesh.
References

[1] H. G. Baker. List processing in real-time on a serial


computer. Communications of the ACM, 21(4):28094, 1978.
[2] D. Benoit, E. D. Demaine, J. I. Munro, and V. Raman.
Representing trees of higher degree. In WADS, pages
169-180, 1999.
[3] D. Blandford and G. Blelloch. Compact representations of ordered sets. In SODA, pages 11-19, 2004.
[4] D. Blandford, G. Blelloch, and I. Kash. Compact
representations of separable graphs. In SODA, pages
342-351, 2003.
[5] D. Blandford, G. Blelloch, and I. Kash. An experinmntal analysis of a compact graph representation. In
ALENEX04, 2004.
[6] D. K. Blandford, G. E. Blelloch, D. E. Cardoze, and
C. Kadow.
Compact representations of simplicial
meshes in two and three dimensions. In Prec. International Meshing Roundtable (IMR), Sept. 2003.

[7] G. E. Blelloch, B. Maggs, and M. Woo. Spaceefficient finger search on degree-balanced search trees.
In SODA, pages 374-383, 2003.
[8] A. Brodnick and J. I. Munro. Membership in constant
time and almost-mininmm space. Siam Journal of
Computing, 28(5):1627-1640, 1999.
[9] L. Carter and M. Wegman. Universal classes of hash
functions. Journal of Computer and System Sciences,
pages 143-154, 1979.
[10] R. C.-N. Chuang, A. Garg, X. He, M.-Y. Kao, and H.-I.
Lu. Compact encodings of planar graphs via canonical
orderings and nmltiple parentheses. Lecture Notes in
Computer Science, 1443:118-129, 1998.
[11] J. G. Cleary. Compact hash tables using bidirectional
linear probing. IEEE -Trans. Comput, 9:828-834, 1984.
[12] M. Dietzfelbinger, A. R. Karlin, K. Mehlhorn, F. M.
auf der Heide, H. Rohnert, and R. E. Tarjan. Dynamic
perfect hashing: Upper and lower bounds. SIAM J.
Comput., 23(4):738-761, 1994.
[13] H. Edelsbrunner.
Geometry and Topology of Mesh
Generation. Cambridge Univ. Press, England, 2001.
[14] P. Elias. Universal codeword sets and representations
of the integers. IEEE Transactions on Information
Theory, IT-21(2):194-203, March 1975.
[15] D. Fotakis, R. Pagh, P. Sanders, and P. G. Spirakis.
Space efficient hash tables with worst case constant
access time. In STACS, 2003.
[16] M. L. Fredman, J. Komlos, and E. Szemerdi. Storing a
sparse table with 0(1) worst case access time. JACM,
31(3):538-544, 1984.
[17] R. Grossi and J. S. Vitter. Compressed suffix arrays
and suffix trees with applications to text indexing and
string matching. In FOCS, pages 397-406, 2000.
[18] X. He, M.-Y. Kao, and H.-I. Lu. A fast general
methodology for information-theoretically optimal encodings of graphs. SIAM J. Computing, 30(3):838-846,
2000.
[19] G. Jacobsen. Space-efficient static trees and graphs.
In 30th FOCS, pages 549-554, 1989.
[20] K. Keeler and J. Westbrook. Short encodings of planar
graphs and maps.
Discrete Applied Mathematics,
58:239-252, 1995.
[21] D. E. Knuth.
The Art of Computer Programruing/Sorting and Searching, Volumes 3. Addison Wesley, 1973.
[22] G. L. Miller, S.-H. Teng, W. P. Thurston, and S. A.
Vavasis. Separators for sphere-packings and nearest
neighbor graphs. Journal of the ACM, 44:1-29, 1997.
[23] J. I. Munro and V. Raman. Succinct representation of
balanced parentheses, static trees and planar graphs.
In 38th FOCS, pages 118-126, 1997.
[24] R. Pagh. Low redundancy in static dictionaries with
o(1) worst case lookup. In ICALP, pages 595-604,
1999.
[25] R. Pagh. Low redundancy in static dictionaries with
constant query time. Siam Jo'a7Tzal of Computing,
31(2):353-363, 2001.
[26] R. Pagh and F. F. Rodler. Cuckoo hashing. In ESA,

2001.
[27] R. Raman, V. Raman, and S. S. Rao.
Succinct
indexable dictionaries with applications to encoding kary trees and multisets. In SODA, 2002.
[28] R. Raman and S. S. Rao. Succinct dynamic dictionaries and trees. In ICALP, pages 357-36, 2003.
[291 G. Turd.n. Succinct representations of graphs. Discrete
Applied Mathematics, 8:289-294, 1984.

10

Potrebbero piacerti anche