Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract. In this paper, we consider the representation and management of an element set on
which a lattice partial order relation is defined. √
In particular, let n be the element set size. We present an O(n n)-space implicit data structure
for performing the following set of basic operations:
1. Test the presence of an order relation between two given elements, in constant time.
2. Find a path between two elements whenever one exists, in O(l) steps, where l is the path
length.
3. Compute the successors and/or predecessors set of a given element, in O(h) steps, where
h is the size of the returned set.
4. Given two elements, find all elements between them, in time O(k log d), where k is the size
of the returned set and d is the maximum in-degree or out-degree in the transitive reduction of the
order relation.
√5. Given two elements, find the least common ancestor and/or the greatest common successor
in O( n)-time.
√6. Given k elements, find the least common ancestor and/or the greatest common successor
in O( n + k log n)time. (Unless stated otherwise, all logarithms are to the base 2.)
The preprocessing time is O(n2 ). Focusing√on the first operation, representing the building-
box for all the others, we derive an overall O(n n)-space × time bound which beats the order n2
bottleneck representing the present complexity for this problem. Moreover, we will show that the
complexity bounds for the first three operations are optimal with respect to the worst case. Addi-
tionally,
√ a stronger result can be derived. In particular, it is possible to represent a lattice in space
O(n t), where t is the minimum number of disjoint chains which partition the element set.
Key words. data structure, lattices, reachability, least common ancestors, graph decomposition
PII. S0097539794274404
1998; published electronically May 13, 1999. This work was supported by the Italian Authority for
Public Administration and the ESPRIT Basic “Research Action on Algorithms and Data Structure”
20244 (ALCOM-IT).
http://www.siam.org/journals/sicomp/28-5/27440.html
† Italien Authority for Public Administration and Dipartimento di Informatica e Sistemistica,
Università di Roma “La Sapienza,” Via Salaria 113, I-00198 Rome, Italy (talamo@dis.uniroma1.it).
‡ Dipartimento di Matematica, Università di Roma “Tor Vergata,” Via della Ricerca Scientifica,
a directed acyclic graph (dag) G = (V, E), we present an implicit data structure for
efficiently performing the following operations:
1. reachability(x, y). Given x, y ∈ V , tests the presence of a directed path
from x to y; returns true if such a path exists, false otherwise.
2. path(x, y). Given x, y ∈ V , returns a path from x to y, if at least one such
path exists.
3. succ(x). Given x ∈ V , returns the set of all successors of x.
4. pred(x). Given x ∈ V , returns the set of all predecessors x.
5. range(x, y). Given x, y ∈ V , returns all vertices in all the directed paths
connecting x to y.
6. lub(x, y). Given x, y ∈ V , returns the least upper bound between x and y.
7. glb(x, y). Given x, y ∈ V , returns the greatest lower bound between x and
y.
8. lub(x1 , . . . , xk ). Given x1 , . . . , xk ∈ V , returns the least upper bound of
x1 , . . . , xk .
9. glb(x1 , . . . , xk ). Given x1 , . . . , xk ∈ V , returns the greatest lower bound of
x1 , . . . , xk .
The idea is to minimize the storage complexity while efficiently performing the
above operations. In particular, the aim is to be able to test reachability in O(1).
reachability captures connectivity information and hence is a basic operation for
digraphs management.
Without loss of generality, we prove all theorems under the assumption that
G = (V, E) is connected; hence if |V | = n and |E T | = m, where E T is the edge set
of the transitive reduction graph, then m ≥ n − 1. In the case of nonconnected dags,
all our results can be applied to each connected component without any change in
complexity bounds.
Although a lot of work has been done in this field, it is still an open problem
of how to represent partial order relations with a worst case time × space complexity
less than o(n2 ). This problem is difficult because of “naive” representations, such
as explicitly representing the whole transitive closure G∗ , at one extreme, or testing
reachability on G by means of DFS traversal, for example, at the other. Both require
Ω(n2 ) time × space in the worst case.
There is an additional reason for the importance of this open problem: under-
standing the complexity of directed graph versus undirected graph traversal. In fact,
there was considerable empirical evidence (in terms of the efficiency of algorithms that
have been discovered) that reachability in directed graphs is “harder” than reacha-
bility in undirected graphs. This has been proven to be substantially true in [4]. For
instance, although directed graphs can be traversed nondeterministically in polyno-
mial time and logarithmic space simultaneously, there is evidence that they cannot be
traversed deterministically in polynomial time and small space simultaneously [31]. In
contrast, the undirected s-t connectivity can be performed deterministically in poly-
nomial time and sublinear space [6], and undirected graphs can be generally traversed
in polynomial time and logarithmic space probabilistically by using a random walk
[5, 10]; this implies similar resource bounds on (nonuniform) deterministic algorithms
[5]. Moreover, using simple techniques, efficient data structures for undirected graphs
have been developed for the dynamic maintenance of several graph properties [14].
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1785
The gap in efficiency between directed graphs and nondirected graphs for reach-
ability problem management is mainly due to the nonsymmetry of the reachability
relation. Moreover, the harder case is when the graph is acyclic, in the sense that the
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
problem for general graphs can be reduced in linear time to the acyclic case. In fact,
we can partition the vertex set of the graph into strongly connected components (e.g.,
using depth-first search) and then shrink the components to form an acyclic graph
(see [3]).
Therefore, efficient solutions to the reachability problem have been found for
special classes of dags [17, 26, 30] exploiting the order dimension property of the
associated posets [13]. In this context, the partial orders considered are those having
a constant order dimension [21]. Informally speaking, a partial order having order
dimension k, with k ≥ 1 a constant value, can be represented using Θ(kn) space,
where n is the element set size, while testing partial order relationship in O(k) time.
Unfortunately, since posets can have nonconstant order dimension, this technique
cannot be extended to general digraphs [24]. In particular, this is true for partial
lattices. Another approach that was studied, allowing derivation of an efficient repre-
sentation for interval orders and a distributive lattice, is based on the representation
of a poset by subsets of an n set [23, 8].
Hence, although there are strategies for dealing with restricted classes of dags
optimal for time × space complexity, there is no uniform approach for general directed
graphs. This is due mainly to intrinsic deficiencies of the general schema adopted by
most of the widely accepted strategies for reachability problem resolution. In fact, the
key idea is to decompose a dag G into a collection F of sparse dags Gi representing a
covering of the graph G∗ , the transitive closure of G.
In particular, for any dag G = (V, E), a collection F = {G1 , . . . , Gk } must satisfy
the following property:
(i) hx, yi ∈ G∗ ⇔ ∃Gi ∈ F such that hx, yi ∈ G∗i .
This property is nice for the design of efficient algorithms as it allows us to search
locally only and not in the whole graph. On the other hand, using this approach
presents two main problems.
The first problem is how to bound the overall space complexity. In fact, a trivial
solution to the above approach is to represent a dag G by means of an n-element
collection F, with each element representing the subgraph induced by all vertices
connected to a given vertex. However, it can be easily seen that this approach requires
O(n2 ) space.
The second problem is how to derive connectivity information in an efficient way
with respect to the time complexity. In fact, a “naive” application of the decom-
position technique, optimal from the space complexity point of view, is based on a
one-element set F, containing the transitive reduction of G. Unfortunately, in this
case, the time complexity is O(m).
In this paper, we present a two-level decomposition strategy, previously intro-
duced in [28, 29] but now extended to deal with a more general set of operations,
which balances time and space complexities.
In particular, we will prove the following.
Theorem
√ 1.1. Let G = (V, E) be a dag satisfying lattice property, with n = |V |.
An O(n n)-space implicit data structure exists allowing us to perform the following:
1. reachability(x, y) in time O(1);
2. path(x, y) in time O(l), where l is the path length;
3. succ(x) in time O(h), where h is the size of the returned set;
1786 MAURIZIO TALAMO AND PAOLA VOCCA
√
6. lub(x, y) in time O(√n);
7. glb(x, y) in time O( n); √
8. lub(x1 , . . . , xk ) in time O(√n + k log n);
9. glb(x1 , . . . , xk ) in time O( n + k log n);
Moreover, the preprocessing time is O(n2 ) and all bounds are worst case.
Results
√ of Theorem 1.1 are a bit surprising. In fact, in some cases, e.g., when
|E| = Ω(n n), our decomposition technique makes a compression of G = (V, E). It
is worth noting that the complexity bounds obtained do not break any information
theoretic lower bound and are optimal with respect to the worst case, as shown in the
following proposition.
Proposition 1.1 (see [22]). Let L(n) be the number of labeled lattices with n
elements. Then
√ √
L(n) < α(n n+o(n n))
,
u v
Each double tree collection associated with a cluster represents the basic element
of the proposed decomposition strategy.
For every two vertices x, y, with x ∈ Clus(c), the corresponding collection
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
AAAAAA
AAA
A A
AA
AAA
AAAAAA
AAA AAA
c
AAAA
AAA
AAAAA
AAAAAA
AAA
AAA
AAA
AA
z Clus(c) DT(w,c)
AAAA
AA AAA
AA
A AA
AAAAAAA
A AAAA
AAA w
AAAAA
A
AA
AAA
AAAA
AA
AAAAAAA
AAAA
AAAAA
A A
AAAAAA
AAAAAAA
AAA
AAAA
AAA AAAAAA
AAAA
AA
vv
A +
DT(v,c)
AAAA
AAA
AA
AAAA
A
AAAAAAA
AAA
AAAA AAAA
AAA
AAA
AA
Fig. 2. Double tree decomposition.
We solve the first problem by computing for each internal vertex u a spanning
tree of the graph induced by the set P red(u)∩Clus(c) or by the set Succ(u)∩Clus(c),
depending on whether Clus(c) is a Clus+ (c) or a Clus− (c). Hence, to each internal
vertex u ∈ Clus(c) there is associated a tree rooted at u, the internal tree induced by
u, denoted IntT ree(u, c).
For the second problem, from Lemma 3.1, given an external vertex v, the pair
(v, Clus(c)) univocally identifies a vertex u ∈ Clus(c) representing either the
lub(Clus+ (c)∩Clus+ (v)) or the glb(Clus− (c)∩Clus− (v)). This implies that, given
a cluster Clus(c) for each external vertex v connected to at least one vertex in Clus(c),
an internal vertex u, the internal representative of the external vertex v exists, uni-
vocally identifying the internal tree IntT ree(u, c) made up of all internal vertices
connected to v.
Let Ext(u) be the set of external vertices having u as an internal representative
vertex. Obviously, we have
[
Ext(Clus(c)) = Ext(u).
u∈Clus(c)
For each set Ext(u), we compute a spanning tree, rooted at u, of the subgraph
induced by Ext(u). We refer to this tree as the external tree induced by u, denoted
ExtT ree(u, c) (see Figure 2). The external trees in {ExtT ree(u, c)|u ∈ Clus(c)} have
the nice property of being pairwise disjoint, as shown in the following lemma.
Lemma 3.3. Let v, w ∈ Clus(c), with v 6= w. Then ExtT ree(v, c)∩ExtT ree(w, c)
= ∅.
Proof. Let us assume for contradiction that y ∈ ExtT ree(v, c) ∩ ExtT ree(w, c).
By the external tree definition, y is associated with both v and w. This contradicts
the uniqueness of the representative vertex stated in Lemma 3.1. The proof follows
by contradiction.
Definition 3.2. Given a cluster Clus(c) and the two collections {IntT ree(u, c)}
and {ExtT ree(u, c)} of internal and external trees, for each u ∈ Clus(c) a double tree
1790 MAURIZIO TALAMO AND PAOLA VOCCA
is defined as
A double tree represents the basic decomposition subgraph and, informally speak-
ing, is the union of an internal tree and the external tree rooted at the same internal
vertex, whenever the internal and external trees exist.
Each double tree DT (u, c) is associated with a partial order having order dimen-
sion 2, because its st-completion is a planar poset with one greatest element and
one least element [32]. The first consequence of the above property is that DT (u, c)
is representable with two linear extensions L1 and L2 ; that is, given two vertices
x, y ∈ DT (u, c), y is reachable from x if and only if x < y in both linear exten-
sions. In particular, two labels (coordinates) (x1 , x2 ) are associated with any vertex
x ∈ DT (u, c), each one representing an x position within the first and second linear
extensions, respectively.
The proof of the following proposition can be found in [20].
Proposition 3.1. Given x, y ∈ DT (u, c), reachability(x, y) = true in DT (u, c)
if and only if (x1 , x2 ) < (y1 , y2 ).
From the above proposition Corollary 3.4 easily follows.
Corollary 3.4. Given a double tree DT (u, c), such that |V ert(DT (u, c))| = K,
an O(K)-space implicit data structure exists for performing a reachability test in
O(1)-time.
3.3. Basic decomposition strategy. The decomposition strategy we propose
first generates a collection of clusters and then, for each cluster, builds the corre-
sponding decomposition elements (double tree collection).
procedure BasicBuildClusters;
1. begin;
2. C := ∅; {Cluster Collection}
3. while V 6= ∅ do
4. begin;
5. Choose a cluster Clus(c); {either Clus+ (c) or Clus− (c)}
6. C := C ∪ Clus(c);
7. V := V − Clus(c);
8. end;
9. return C;
10. end.
The cluster collection returned is the input to the following procedure which
builds, for each cluster, the associated double trees collection representing the decom-
position elements on which we base our data structure.
Data Structure C
Data Structure A Clus(ci )
y w
............
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
AAAAAA
1
............
AAAAAA
2 c
AAAAAA
y
x Clus(ci)
AAAAA
AAAAAA
............
AAAAA
AAAA
w
y Clus(cj)
AAAAA
AAAA
AAAAA
AAAA
x
Data Structure B
............
w
n
AAAAA
AAAA x ............ (x1,x 2) ............
√ √
n n
∀ Clus(ci ) ∈ C ⇒ ≤ |Clus(ci )| ≤ .
4 2
√
In this case, it is trivial to show that the overall space occupancy is O(n n). Unfortu-
nately, this is a special case and, generally, this condition does not hold. Nevertheless,
as shown in the following, it is possible to find a suitable cluster decomposition allow-
ing us to keep the space occupancy within the required bound.
4. Decomposition strategy. As shown in section 3, a proper choice of cluster
size
√
permits
√
us to bound the space complexity. We claim that a cluster size within
n n
4 and 2 is, in fact, the answer to our problem because this size allows us to
balance what is eliminated in one main iteration and what remains to be considered
(see line 7 of the BasicBuildClusters procedure). When it is not possible to find
clusters with the right size, we group them to form a cluster forest.
4.1. Elements classification. We need some more definitions.
Definition 4.1. A vertex c ∈ V is
1 good if
√ √ √ √
n + n n − n
≤ |Clus (c)| ≤ or ≤ |Clus (c)| ≤ ;
4 2 4 2
2 fat if it is not good√and if one of the following two conditions √holds:
(i) |Clus+ (c)| > 2n and ∀x ∈ Clus+ (c) then |Clus+ (x)| < 4n ; or
√ √
(ii) |Clus− (c)| > 2n and ∀x ∈ Clus− (c) then |Clus− (x)| < 4n ;
3 thin if it is neither good nor fat and if
√ √
+ n − n
|Clus (c)| < or |Clus (c)| < .
4 4
We establish the correctness of our approach by first showing that it is always
possible to find one of the above defined elements.
Lemma 4.1. Given a dag G = (V, E) satisfying the lattice property, at least one
good, fat, or thin vertex does exist which can be retrieved in time O(n2 ).
Proof. Let us consider the following strategy. We first visit the graph, searching
for good elements and meanwhile assigning a weight to each node to represent the
size of the clusters (both Clus+ and Clus− ) it induces. Whenever this search fails,
we look for fat vertices. √
If there is one node having weight greater than 2n , then at least one fat vertex
√
exists. In fact, without loss of generality, let c be a vertex such that |Clus+ (c)| > 2n .
We search the predecessors set of√c for an element x inducing the smallest cluster
satisfying condition |Clus+ (x)| > 2n . If such a vertex x exists, then x is fat because
there are no good vertices and, by a transitive √
property, none of its predecessors can
induce a cluster of cardinality greater than 2n . Otherwise, c is a fat vertex. If there
are no fat vertices, then the vertex with the maximum weight is a thin vertex. Hence,
this strategy always returns at least one element.
For the time complexity, let us consider a data structure for representing the graph
which maintains for each vertex the number of its predecessors and an ordered list of
1794 MAURIZIO TALAMO AND PAOLA VOCCA
the number of predecessors of its immediate predecessors. With this data structure,
which can be derived in time O(n2 ) by recursively visiting G = (V, E), the above
strategy can be implemented in time that is linear in the number of vertices.
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
4.2. Cluster collection. Our aim is to show that, given a dag G = (V, E), it
is possible to build a cluster collection of clusters induced by either good vertices or
thin vertices.
First we need to describe how to manage fat vertices.
Lemma 4.2. Given a fat vertex c, it is always possible to generate a sequence of
clusters induced by good vertices which covers Clus(c). √
Proof. Without loss of generality, let |Clus+ (c)| > 2n . By definition, each one
of c’s immediate predecessors, hc1 , . . . , ct i, satisfies the following condition:
√
+ n
|Clus (ci )| < f or all 1 ≤ i ≤ t.
4
We can then group √
clusters
√
induced by these vertices until the cardinality of each
n n
group is within 4 and 2 .
Let Clus(ci1 , . . . , cik ), where hci1 , . . . , cik i ⊂ hc1 , . . . , ct i is one group of clusters
so obtained.
To prevent Clus(ci1 , . . . , cik ) from violating Lemma 3.1 (an external vertex y
could be related to all hci1 , . . . , cik i through the fat vertex c), we add a dummy good
vertex and the following directed edges (see Figure 4):
(i) hdi , ci;
(ii) hcij , di i for 1 ≤ j ≤ k.
By construction, the dag obtained by adding dummy vertices still satisfies the
lattice property. Hence, to each fat vertex c there corresponds a collection {Clus(di )}
of clusters induced by dummy good vertices (see Figure 4) which covers
Clus+ (c).
Good Cluster
c
d1
.......... .......... .......... d l
n /4 <=|Vert(Clus(di ))|<= n /2
The following procedure shows how our strategy chooses the required cluster
collection. First, clusters induced by good vertices and by dummy good vertices
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1795
associated with fat vertices are chosen, then clusters induced by thin vertices are
considered. More precisely, we have the following:
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
procedure BuildClusters;
1. begin;
2. C := ∅; {Cluster Collection}
3. while V 6= ∅ do
4. begin;
5. if ∃c ∈ V s.t. c is good then C := C ∪ Clus(c);
6. else if ∃c ∈ V s.t. c is fat then
7. begin
8. Build the associated good cluster collection {Clus(di )};
9. goto 5;
10. end
11. else
12. begin
13. Choose a thin vertex c s.t. Clus(c) has maximum cardinality;
14. C := C ∪ Clus(c);
15. end;
16. V := V − Clus(c);
17. end;
18. return C;
19. end.
Note that according to our definitions, F ∩ Ext(F ) 6= ∅ in general. Therefore, for each
cluster Lemma 3.3 still holds.
The following version of the decomposition procedure generates for each cluster
in C the corresponding double tree collection. Moreover, it groups clusters induced
by thin vertices into cluster forests.
Each forest F = hClus(c1 ), . . . , Clus(ck )i is generated by choosing clusters from
the cluster collection in not increasing order of size until one of the following conditions
holds: √
1. |F | ≥ 4n , or
2. mk ≥ n, where m = |Ext(F )|.
As we will see in section 5, condition 2 allows us to obtain the required space
complexity.
1796 MAURIZIO TALAMO AND PAOLA VOCCA
2. begin
3. for each u ∈ Clus(c) Build {DT (u, c)};
4. return Clus(c) and {DT (u, c)};
5. end;
6. for each Clus(c) ∈ hClus(cg+1 ), . . . , Clus(cg+t )i;
{the sequence is in not increasing order of size}
7. begin
8. F = ∅;
9. k = 0; √
10. while |F | < 4n and mk < n.
11. begin
12. F = F ∪ Clus(c);
13. k = k + 1; {Number of clusters in a forest}
14. m = |Ext(F )|; {Number of external vertices}
15. next Clus(c);
16. end;
17. return F ; {Returns the current forest}
18. end;
Data Structure A
Data Structure B
wm
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
............
x ............ (x1,x 2) ............
2
Fi Data Structure C
x Fi
Clus(ci,m )
tree representation.
Data structure A stores, for each vertex x ∈ V , the identifiers of both the cluster
and forest to which it belongs, whenever they are different, and the cluster sign. Data
structure B is the same as in section 3.3.
For the reachability test, it is easy to extend the basic strategy described in
section 3.3 for the new data structure.
We now analyze the space complexity. With reference to the clusters and cluster
forests sequence hF1 , . . . , Fg , . . . , Fg+f i, by construction, the subsequence hF
√1 , . . . , Fg i,
together with the corresponding double tree decomposition, requires O(n n)-space.
Let us now analyze the subsequence hFg+1 , . . . , Fg+f i of cluster forests induced
by thin vertices.
Recall that forests of clusters are generated choosing clusters in decreasing order
until one of the following conditions holds:
√
1. |F | ≥ 4n , or
2. mk ≥ n, where m = |Ext(F )|.
If we denote mi = |Ext(Fi )|, then the overall space complexity of the D data
Pg+f
structure is O( i=g+1 mi ki ) since only vertices related to a cluster forest have the
corresponding D look-up table, each table of size O(ki ). Hence, the second condition
is used to bound each term of the summation.
We now√ show that the decomposition strategy returns a number of cluster forests
less than n.
Obviously, if it is always possible to generate a forest satisfying
√ both conditions,
then the space complexity of the overall data structure is O(n √ n). Unfortunately,
the second condition could prevent us from generating an O( n) √ collection of cluster
forests; that is, each cluster forest could be of size less than 4n . The following
technical lemmas show how to manage this case.
First, it is important to underscore one property, shown in Lemma 5.1, of cluster
forests useful for the following proofs.
1798 MAURIZIO TALAMO AND PAOLA VOCCA
Proof. The proof easily follows observing that, by construction, forests are gen-
erated only when there are neither good nor fat vertices. Moreover, each cluster is
added to a forest in not increasing order of size.
Without loss of generality, we denote the size of a cluster Clus(cij ) ∈ Fi as follows:
1
Pj
− δip
(1) |Clus(cij )| = n 2 p=1 ,
ki
X 1
Pj
− δip
(2) |Fi | = n2 p=1 ,
j=1
Proof. From Lemma 5.1, if |Clus(cij )| = tij , then the number of external vertices
related to Clus(cij ) is at most t2ij . As a consequence, we have
ki
X ki
X Pj 2
1
− δip
(3) mi ki ≤ ki (tij )2 = ki n2 p=1
j=1 j=1
ki
X Pj ki
X
1−2 δip
(4) = ki n p=1 ≤ ki n1−2δi1 = ki2 n1−2δi1 .
j=1 j=1
ki
X Pj √
1
2− δip n
|Fi | = n p=1 < ;
j=1
4
hence, n−δi1 < 14 .
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1799
Moreover,
√ k ¡ 1 Pj ki ¡ 1 Pki ¡ Pk
n X i X
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
1 i
2− δip 2− δip 2− δip
(6) > n p=1 ≥ n p=1 = ki n p=1
4 j=1 j=1
Hence,
¡ 1 Pki √
2− δip n
n p=2 < .
4
Dividing both terms by nδi1 , we have
¡ 1 Pki
n( 2 −δi1 )
1
2− δip
(8) n p=1 < .
4
The left-hand side of inequality (8) is, by definition, the size of Clus(ciki ).
With reference to the cluster forests sequence hFg+1 , . . . , Fg+f i, let g + 1 ≤ i <
g + f . We have the following.
1 −δ
Lemma 5.3. |Clus(ci+1,1 )| < n 2 4 .
i1
Proof. The proof follows trivially from Lemma 5.2, observing that clusters are
taken in not increasing order of size.
From the above technical lemmas, Lemma 5.4 easily follows.
Lemma 5.4. The decomposition strategy returns an O(log n) collection hFg+1 , . . . ,
Fg+f i of clusters forests.
From Lemma 5.4, we have the following.
Theorem 5.5. The data √ structure for the representation of dags satisfying the
lattice property has an O(n n)-space complexity and allows us to perform the reach-
ability operation in constant time.
6. Queries implementation. In this section, we describe algorithms for the
operations introduced in Theorem 1.1. A sketch of the algorithm for the reachabil-
ity operation already has been given in section 3.4 and the time complexity has been
proven in Theorem 5.5.
Let us now describe the path operation. For this operation we need to augment
our data structure. First note that each double tree is, by construction, the union
of two rooted trees. Hence, for each vertex x and for each double tree to which it
belongs, it is possible to associate one and only one vertex u (the parent in the tree)
which is on the path from x to the vertex inducing the double tree. We then store
in the data structures B and D u’s coordinates with respect to this double tree. The
path operation is now straightforward. While testing the presence of a directed path
between x and y using the reachability(x, y) test, we find the double tree to which
they both belong, and then we look at the coordinates of the parents of x and y.
These vertices are the immediate successors of x and y on the path from x to y. Then
we recursively repeat the search starting from these vertices until we reach the root
of the double tree.
In this paper, we will not furnish the full algorithm which, although simple, is
quite lengthy. The net result is the following.
1800 MAURIZIO TALAMO AND PAOLA VOCCA
tures to those described in section 3.4. In particular, for each double tree DT (u, c)
of the double tree decomposition T , we maintain the corresponding internal tree
IntT ree(u, c). Moreover, for each vertex x ∈ V , let Clus(c) be the cluster to which
x belongs; if Clus(c) is a Clus+ (c), then we maintain the set of x’s successors in
Clus+ (c); otherwise, we maintain the set of x’s predecessors. Let us denote the first
set Succ(x, c) and the second P red(x, c). Finally, for each vertex x ∈ V we store the
set of double tree identifiers to which each vertex is connected as an external vertex.
This information eliminates the need to visit all the D√tables. By construction, the
space complexity of the new data structure is still O(n n).
Let us now introduce the pred(x) operation. For succ(x) operation, the algo-
rithm is similar.
In a similar way, it is possible to prove that, if Clus(c0 ) has been generated before
Clus(c), then either range(x, y) ⊆ V ert(DT (u, c0 )), where u is the internal repre-
sentative vertex of x with respect to Clus(c0 ), or range(x, y) ⊆ V ert(DT (y, c0 )),
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Let {u1 , . . . , up } be the corresponding set of least upper bounds. We have the
following.
Lemma 6.4. The partial order associated with {u1 , . . . , up } is a total order.
Proof. To prove the lemma we show that there exists a permutation hui1 , . . . , uip i
such that uij ≺ uim , for all i1 ≤ ij < im ≤ ip .
Two cases are possible.
1. Either Clus(ch ) is a Clus+ (ch ) or Clus(ck ) is a Clus+ (ck ).
Let the clusters in hClus(c1 ), . . . , Clus(cp )i be ordered according to the generation
order.
If Clus(ch ) is a Clus+ (ch ), then p ≤ h. In fact, by cluster definition, once Clus+ (ch )
has been taken, then all x’s predecessors have been considered. Analogously, if
Clus(ck ) is a Clus+ (ck ), then p ≤ k.
Moreover, all Clus(ci ) in the sequence are Clus+ (ci ). In fact, suppose for contra-
diction that Clus(cj ), for 1 ≤ j ≤ p, is a Clus− (cj ). If Clus(ch ) is a Clus+ (ch ),
then x ∈ Clus− (cj ), or, if Clus(ck ) is a Clus+ (ck ), then y ∈ Clus− (cj ), but this is a
contradiction.
We claim that the sequence of least upper bounds hu1 , . . . , up i, ordered according to
the order in which the corresponding clusters are generated, is the required permuta-
tion. Let us assume that two values 1 ≤ i < j ≤ p exist such that either ui ∼ uj or
uj ≺ ui . In the first case the following relations hold simultaneously:
1802 MAURIZIO TALAMO AND PAOLA VOCCA
(i) ui ≺ x and ui ≺ y;
(ii) uj ≺ x and uj ≺ y;
(iii) ui ∼ uj .
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
defined as follows.
The subsequence hClus+ (ci1 ), . . . , Clus+ (cim )i is composed of all clusters in {Clus(c1 ),
. . . , Clus(cp )} having sign “+” and ordered according to the generation order. The
subsequence hClus− (cim+1 ), . . . , Clus− (cip )i is made up of all clusters in {Clus(c1 ),
. . . , Clus(cp )} having sign “−”, ordered according to the inverse generation order.
We claim that the corresponding sequence hui1 , . . . , uim , uim+1 , . . . , uip i is the desired
permutation.
By an argument similar to the one used in the previous case, it is possible to prove that
the two subsequences hui1 , . . . , uim i and huim+1 , . . . , uip i are totally ordered. We have
to prove that for all uij ∈ hui1 , . . . , uim i and for all uil ∈ huim+1 , . . . , uip i, uij ≺ uil .
By the lattice property either uij ≺ uil or uil ≺ uij . In the first case the claim is
proved. Suppose for contradiction that uil ≺ uij ; then, by cluster definition, uil ∈
Clus+ (cij ). This once again contradicts the hypothesis.
This completes the proof.
Using the above lemma, it is possible to state the following proposition.
Proposition
√ 6.4. The glb(x, y) and lub(x, y) operations can be implemented
in O( n)-time.
Proof. To implement these operations we augment the data structure as follows.
For each internal tree of a cluster, we maintain the set of least upper bounds (greatest
lower bounds) of the sets derived from the intersection with all the other internal
trees in the same cluster. Analogously, we maintain for each vertex x the set of
least upper bounds (greatest lower bounds) of the sets derived from the intersection
between P red(x, c) (Succ(x, c)) and P red(y, c) (Succ(y, c)) for all y in Clus(c).
√
By construction, the overall space occupancy is still O(n n).
√
Given x and y, using data structures D and B, in O( n)-time, it is possible to
find the set hu1 , . . . , up i. The maximal element of this sequence is the greatest lower
bound of x and y.
It is worth noting that, by means of lub(x, y), glb(x, y) and succ(x), pred(x)
operations it is possible to easily implement the following.
1. CommAnc(x, y). Given x, y ∈ V , returns the set of all common ancestors of
x and y.
2. CommSucc(x, y). Given x, y ∈ V , returns the set of all common successors
of x and y.
In particular, from Propositions 6.2 and 6.4, Proposition 6.5 is derived.
Proposition 6.5. √ The CommAnc(x, y) and CommSucc(x, y) operations can
be implemented in O( n + k) time, where k is the size of the returned set.
To prove Theorem 1.1, it remains to describe lub(x1 , . . . , xk ) and glb(x1 ,√ ...,
xk ). Note that a straightforward application of Proposition 6.4 leads to an O(k n)
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1803
worst-case time bound. In order to obtain the desired complexity, we have to under-
score some additional properties of the proposed decomposition.
Let us analyze only the glb(x1 , . . . , xk ) operation, as lub(x1 , . . . , xk ) can be
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
dually derived.
Lemma 6.5. If all vertices (x1 , . . . , xk ) belong to the same cluster, then operation
glb(x1 , . . . , xk ) can be performed in O(k) steps.
Proof. The lemma easily follows by exploiting the information added for the
glb(x, y) operation. Starting from the internal trees of the k vertices, we can first
compute the k2 least upper bounds associated with pairs of internal trees. Iterating
the process on the k2 least upper bounds we find the glb(x1 , . . . , xk ), whenever it
exists.
On the other hand, let the vertices (x1 , . . . , xk ) belong to different clusters. Con-
sider any two vertices, say, x1 and x2 .
Let hu1 , . . . , up i be the sequence of least upper bounds of the set of common
predecessors with respect to clusters hClus(c1 ), . . . , Clus(cp )i (see Lemma 6.4). By
construction, any other vertex in the sequence (x1 , . . . , xk ) satisfies the following prop-
erty.
Lemma 6.6. Let xi ∈ (x1 , . . . , xk ). If uj is the greatest vertex in the sequence
hu1 , . . . , up i such that uj ≺ xi , then either glb(x1 , x2 , xi ) = uj or glb(x1 , x2 , xi ) ∈
IntT ree(uj+1 , cj+1 ).
Proposition √ 6.6. The lub(x1 , . . . , xk ) and glb(x1 , . . . , xk ) operations can be
implemented in O( n + k log n)-time.
Proof.
√ First observe that, by Proposition 6.4, we can derive the sequence hx1 , . . . , xp i
in O( n)-time. Hence, by means of a binary search, in O(k log n)-time, it is possi-
ble to derive the maximum element uj such that uj ≺ xi for 1 < i < k. Then, by
Lemma 6.6 either glb(x1 , . . . , xk ) = uj or glb(x1 , . . . , xk ) ∈ IntT ree(uj+1 , cj+1 ).
In the former case, the proposition is proved. In the latter case, let (v1 , . . . , vk ) be
the internal representative vertices of (x1 , . . . , xk ) with respect to cluster Clus(cj+1 ).
Now the problem reduces to finding the glb(v1 , . . . , vk ), where (v1 , . . . , vk ) belong to
the same cluster. The proof follows by Lemma 6.5.
By a straightforward application of Lemma 5.5 and Propositions 6.1–6.4 and 6.6,
the proof of the main theorem (Theorem 1.1) is completed.
7. Conclusions and open problems. In this paper, a general technique for
the representation of a dag satisfying the lattice property has been presented. This
technique, based on a two-level graph decomposition strategy, is very efficient for the
reachability problem resolution,√ from either a space or a time complexity point of
view. In fact, when m = Ω(n n), it performs a compression of the given dag. Note
that the complexity bound we derive is optimal, as it matches the theoretical lower
bound for this problem (see Proposition 1.1).
The data structure proposed can be efficiently used not only for testing the pres-
ence of a path between two given vertices but also for performing a set of basic
operations for this class of graphs: namely, find a path between two vertices whenever
one exists; given two vertices, find all vertices on the directed paths connecting them
in the transitive reduction graph; compute all the successors and/or predecessors of
a given vertex; given two vertices, find the least common ancestor and/or the great-
est common successor; given a set of vertices, find all common ancestors and/or all
common successors.
Furthermore, a stronger result can √ be derived. In particular, it is possible to
represent a partial lattice in space O(n t) with the same time bound for all the
1804 MAURIZIO TALAMO AND PAOLA VOCCA
operations, where t is the minimum number of disjoint chains which partition the
element set. It is worth noting that this represents an interesting result, since it
provides a tight characterization of the complexity of partial lattices. In fact, based
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
REFERENCES
[1] R. Agrawal, Alpha: An extension of relational algebra to express a class of recursive queries,
IEEE Trans. Software Engrg., 14 (1988), pp. 879–885.
[2] R. Agrawal, A. Borgida, and H. V. Jagadish, Efficient management of transitive relation-
ship in large data and knowledge bases, in Proc. ACM Internat. Conf. on Management of
Data, ACM, New York, 1989, pp. 253–262.
[3] A. V. Aho, J. E. Hopcroft, and J. D. Ullman, Design and Analysis of Computer Algorithms,
Addison-Wesley, Reading, MA, 1974.
[4] M. Ajtai and R. Fagin, Reachability is harder for directed than for undirected finite graphs,
J. Symbolic Logic, 55 (1990), pp. 113–150.
[5] R. Aleliunas, R. M. Karp, R. J. Lipton, L. Lovasz, and C. Rackoff, Random walks,
universal traversal sequences, and the complexity of maze problems, in Proceedings 20th
Annual IEEE Symposium on Foundations of Computer Science, San Juan, Puerto Rico,
IEEE Computer Society Press, Los Alamitos, CA, 1979, pp. 218–223.
[6] G. Barnes and W. L. Ruzzo, Undirected s–t connectivity in polynomial and sublinear space,
Comput. Complex., 6 (1996), pp. 1–28.
[7] G. Birkhoff, Lattice Theory, American Mathemathical Society Colloquium Publications 25,
AMS, Providence, RI, 1979.
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1805
[8] G. Birkhoff and O. Frink, Representation of lattices by sets, Trans. AMS, 64 (1948), pp. 299–
316.
[9] J. Biskup and H. Stiefeling, Evaluation of upper bounds and least nodes as database opera-
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
tions, in Lecture Notes in Comput. Sci. 730, Springer-Verlag, New York, pp. 197–214.
[10] A. Borodin, S. A. Cook, P. W. Dymond, W. L. Ruzzo, and M. Tompa, Two applications of
inductive counting for complementation problems, SIAM J. Comput., 18 (1989), pp. 559–
578.
[11] B. A. Davey and H. A. Priestly, Introduction to Lattices and Order, Cambridge University
Press, Cambridge, 1990.
[12] R. Dilworth, A Decomposition Theorem for Partially Ordered Sets, Ann. Math., 51 (1950),
pp. 161–165.
[13] B. Dushnik and E. Miller, Partially ordered sets, Amer. J. Math., 63 (1941), pp. 600–610.
[14] D. Eppstein, Z. Galil, G. Italiano, and A. Nissenzweig, Sparsification—a technique for
speeding up dynamic graph algorithms, J. ACM, 44 (1997), pp. 669–696.
[15] S. Even, Graph Algorithm, Computer Science Press, Rockville, MD, 1979.
[16] P. G. Franciosa and M. Talamo, Orders, k-sets and fast halfplane search on paged memory,
in Orders, Algorithms, and Applications, International Workshop ORDAL ’94, Lecture
Notes in Comput. Sci. 831, V. Bouchitté and M. Morvan, eds., Springer-Verlag, Berlin,
1994, pp. 117–127.
[17] G. Gambosi, M. Protasi, and M. Talamo, An efficient implicit data structure for relation
testing and searching in partially ordered sets, BIT, 33 (1993), pp. 29–45.
[18] M. Habib and L. Nourine, Bit-vector encoding for partially ordered sets, in Orders, Algo-
rithms, and Applications, International Workshop ORDAL ’94, V. Bouchitté and M. Mor-
van, eds., Lecture Notes in Comput. Sci. 831, Springer-Verlag, New York, 1994, pp. 1–12.
[19] H. V. Jagadish, Incorporating hierarchy in a relational model of data, SIGMOD Record (ACM
Special Interest Group on Management of Data), 18 (1989), pp. 78–87.
[20] T. Kameda, On the vector representation of the reachability in planar directed graphs, Inform.
Process. Lett., 3 (1974/75), pp. 75–77.
[21] D. Kelly, On the dimension of partially ordered sets, Discrete Math., 35 (1981), pp. 135–156.
[22] D. J. Kleitman and K. J. Winston, The asymptotic number of lattices, Ann. Discrete Math.,
6 (1980), pp. 243–249.
[23] G. Markowsky, The Representation of posets and lattices by sets, Algebra Universalis, 11
(1980), pp. 173–192.
[24] R. H. Möhring, Computationally tractable classes of ordered sets, Tech. Report 87468-OR,
Bonn University, Germany, 1987.
[25] F. P. Preparata and M. I. Shamos, Computational Geometry, Springer-Verlag, Berlin, New
York, 1985.
[26] F. P. Preparata and R. Tamassia, Fully dynamic point location in a monotone subdivision,
SIAM J. Comput., 18 (1989), pp. 811–830.
[27] I. Rival, Graphical data structures for ordered sets, in Algorithms and Order, Kluwer Academic
Publishers, Dordrecht, The Netherlands, 1989, pp. 3–31.
[28] M. Talamo and P. Vocca, Fast lattice browsing on sparse representation, in Orders, Algo-
rithms, and Applications, International Workshop ORDAL ’94, Lecture Notes in Comput.
Sci. 831, V. Bouchitté and M. Morvan, eds., Springer-Verlag, Berlin, 1994, pp. 186–204.
[29] M. Talamo and P. Vocca, A data structure for lattices representation, Theoret. Comput.
Sci., 175 (1997), pp. 373–392.
[30] R. Tamassia and J. G. Tollis, Reachability in planar digraphs with one source and one sink,
Theoret. Comput. Sci., 119 (1993), pp. 331–343.
[31] M. Tompa, Two familiar transitive closure algorithms which admit no polynomial time, sub-
linear space implementations, SIAM J. Comput., 11 (1982), pp. 130–137.
[32] J. W. T. Trotter and J. J. I. Moore, The dimension of planar posets, J. Combin. Theory,
22 (1977), pp. 54–57.
[33] M. Yannakakis, Graph-theoretic methods in database theory, PODS ’90, Proceedings Ninth
ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, ACM, New
York, 1990, pp. 230–242.