Sei sulla pagina 1di 23

SIAM J. COMPUT.

c 1999 Society for Industrial and Applied Mathematics


°
Vol. 28, No. 5, pp. 1783–1805

AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS∗


Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

MAURIZIO TALAMO† AND PAOLA VOCCA‡

Abstract. In this paper, we consider the representation and management of an element set on
which a lattice partial order relation is defined. √
In particular, let n be the element set size. We present an O(n n)-space implicit data structure
for performing the following set of basic operations:
1. Test the presence of an order relation between two given elements, in constant time.
2. Find a path between two elements whenever one exists, in O(l) steps, where l is the path
length.
3. Compute the successors and/or predecessors set of a given element, in O(h) steps, where
h is the size of the returned set.
4. Given two elements, find all elements between them, in time O(k log d), where k is the size
of the returned set and d is the maximum in-degree or out-degree in the transitive reduction of the
order relation.
√5. Given two elements, find the least common ancestor and/or the greatest common successor
in O( n)-time.
√6. Given k elements, find the least common ancestor and/or the greatest common successor
in O( n + k log n)time. (Unless stated otherwise, all logarithms are to the base 2.)
The preprocessing time is O(n2 ). Focusing√on the first operation, representing the building-
box for all the others, we derive an overall O(n n)-space × time bound which beats the order n2
bottleneck representing the present complexity for this problem. Moreover, we will show that the
complexity bounds for the first three operations are optimal with respect to the worst case. Addi-
tionally,
√ a stronger result can be derived. In particular, it is possible to represent a lattice in space
O(n t), where t is the minimum number of disjoint chains which partition the element set.

Key words. data structure, lattices, reachability, least common ancestors, graph decomposition

AMS subject classifications. 06B99, 68P05, 68P25, 68R10

PII. S0097539794274404

1. Introduction. The study of partial orders (posets) efficient representation,


from either space or query time point of view, has been extensively tackled in the
last few years, as it is a basic problem in many fields of applications. For instance,
posets representation is needed when the elements are points in the d-dimensional
space over which we wish to perform a dominance test or range query (computational
geometry [16]); when the elements are sets and we want to perform a containment or
intersection query (knowledge bases [1, 2]; object-oriented and semantic data models
[19]); when the elements are vertices of a directed acyclic graph over which binary
relations are defined (taxonomy, graph traversal, distributed computing [25, 26], etc.),
and when we wish to perform the reachability test.
In general, posets are involved in all applications dealing with traversing sets of
items over which an order relation is defined [28, 18, 33, 9].
When the order relation is complex, for example when the order dimension is
proportional to the size of the element set, then these problems cannot be efficiently
∗ Received by the editors September 19, 1994; accepted for publication (in revised form) May 22,

1998; published electronically May 13, 1999. This work was supported by the Italian Authority for
Public Administration and the ESPRIT Basic “Research Action on Algorithms and Data Structure”
20244 (ALCOM-IT).
http://www.siam.org/journals/sicomp/28-5/27440.html
† Italien Authority for Public Administration and Dipartimento di Informatica e Sistemistica,

Università di Roma “La Sapienza,” Via Salaria 113, I-00198 Rome, Italy (talamo@dis.uniroma1.it).
‡ Dipartimento di Matematica, Università di Roma “Tor Vergata,” Via della Ricerca Scientifica,

I-00133 Rome, Italy (vocca@axp.mat.uniroma2.it).


1783
1784 MAURIZIO TALAMO AND PAOLA VOCCA

solved using the well-known techniques.


In this paper we show that for partial lattices it is possible to efficiently compute
a set of basic operations. In particular, adopting a graph theoretic notation, given
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

a directed acyclic graph (dag) G = (V, E), we present an implicit data structure for
efficiently performing the following operations:
1. reachability(x, y). Given x, y ∈ V , tests the presence of a directed path
from x to y; returns true if such a path exists, false otherwise.
2. path(x, y). Given x, y ∈ V , returns a path from x to y, if at least one such
path exists.
3. succ(x). Given x ∈ V , returns the set of all successors of x.
4. pred(x). Given x ∈ V , returns the set of all predecessors x.
5. range(x, y). Given x, y ∈ V , returns all vertices in all the directed paths
connecting x to y.
6. lub(x, y). Given x, y ∈ V , returns the least upper bound between x and y.
7. glb(x, y). Given x, y ∈ V , returns the greatest lower bound between x and
y.
8. lub(x1 , . . . , xk ). Given x1 , . . . , xk ∈ V , returns the least upper bound of
x1 , . . . , xk .
9. glb(x1 , . . . , xk ). Given x1 , . . . , xk ∈ V , returns the greatest lower bound of
x1 , . . . , xk .
The idea is to minimize the storage complexity while efficiently performing the
above operations. In particular, the aim is to be able to test reachability in O(1).
reachability captures connectivity information and hence is a basic operation for
digraphs management.
Without loss of generality, we prove all theorems under the assumption that
G = (V, E) is connected; hence if |V | = n and |E T | = m, where E T is the edge set
of the transitive reduction graph, then m ≥ n − 1. In the case of nonconnected dags,
all our results can be applied to each connected component without any change in
complexity bounds.
Although a lot of work has been done in this field, it is still an open problem
of how to represent partial order relations with a worst case time × space complexity
less than o(n2 ). This problem is difficult because of “naive” representations, such
as explicitly representing the whole transitive closure G∗ , at one extreme, or testing
reachability on G by means of DFS traversal, for example, at the other. Both require
Ω(n2 ) time × space in the worst case.
There is an additional reason for the importance of this open problem: under-
standing the complexity of directed graph versus undirected graph traversal. In fact,
there was considerable empirical evidence (in terms of the efficiency of algorithms that
have been discovered) that reachability in directed graphs is “harder” than reacha-
bility in undirected graphs. This has been proven to be substantially true in [4]. For
instance, although directed graphs can be traversed nondeterministically in polyno-
mial time and logarithmic space simultaneously, there is evidence that they cannot be
traversed deterministically in polynomial time and small space simultaneously [31]. In
contrast, the undirected s-t connectivity can be performed deterministically in poly-
nomial time and sublinear space [6], and undirected graphs can be generally traversed
in polynomial time and logarithmic space probabilistically by using a random walk
[5, 10]; this implies similar resource bounds on (nonuniform) deterministic algorithms
[5]. Moreover, using simple techniques, efficient data structures for undirected graphs
have been developed for the dynamic maintenance of several graph properties [14].
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1785

The gap in efficiency between directed graphs and nondirected graphs for reach-
ability problem management is mainly due to the nonsymmetry of the reachability
relation. Moreover, the harder case is when the graph is acyclic, in the sense that the
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

problem for general graphs can be reduced in linear time to the acyclic case. In fact,
we can partition the vertex set of the graph into strongly connected components (e.g.,
using depth-first search) and then shrink the components to form an acyclic graph
(see [3]).
Therefore, efficient solutions to the reachability problem have been found for
special classes of dags [17, 26, 30] exploiting the order dimension property of the
associated posets [13]. In this context, the partial orders considered are those having
a constant order dimension [21]. Informally speaking, a partial order having order
dimension k, with k ≥ 1 a constant value, can be represented using Θ(kn) space,
where n is the element set size, while testing partial order relationship in O(k) time.
Unfortunately, since posets can have nonconstant order dimension, this technique
cannot be extended to general digraphs [24]. In particular, this is true for partial
lattices. Another approach that was studied, allowing derivation of an efficient repre-
sentation for interval orders and a distributive lattice, is based on the representation
of a poset by subsets of an n set [23, 8].
Hence, although there are strategies for dealing with restricted classes of dags
optimal for time × space complexity, there is no uniform approach for general directed
graphs. This is due mainly to intrinsic deficiencies of the general schema adopted by
most of the widely accepted strategies for reachability problem resolution. In fact, the
key idea is to decompose a dag G into a collection F of sparse dags Gi representing a
covering of the graph G∗ , the transitive closure of G.
In particular, for any dag G = (V, E), a collection F = {G1 , . . . , Gk } must satisfy
the following property:
(i) hx, yi ∈ G∗ ⇔ ∃Gi ∈ F such that hx, yi ∈ G∗i .
This property is nice for the design of efficient algorithms as it allows us to search
locally only and not in the whole graph. On the other hand, using this approach
presents two main problems.
The first problem is how to bound the overall space complexity. In fact, a trivial
solution to the above approach is to represent a dag G by means of an n-element
collection F, with each element representing the subgraph induced by all vertices
connected to a given vertex. However, it can be easily seen that this approach requires
O(n2 ) space.
The second problem is how to derive connectivity information in an efficient way
with respect to the time complexity. In fact, a “naive” application of the decom-
position technique, optimal from the space complexity point of view, is based on a
one-element set F, containing the transitive reduction of G. Unfortunately, in this
case, the time complexity is O(m).
In this paper, we present a two-level decomposition strategy, previously intro-
duced in [28, 29] but now extended to deal with a more general set of operations,
which balances time and space complexities.
In particular, we will prove the following.
Theorem
√ 1.1. Let G = (V, E) be a dag satisfying lattice property, with n = |V |.
An O(n n)-space implicit data structure exists allowing us to perform the following:
1. reachability(x, y) in time O(1);
2. path(x, y) in time O(l), where l is the path length;
3. succ(x) in time O(h), where h is the size of the returned set;
1786 MAURIZIO TALAMO AND PAOLA VOCCA

4. pred(x) in time O(h), where h is the size of the returned set;


5. range(x, y) in time O(k log d), where k is the size of the returned set and d
is the maximum in-degree or out-degree in the transitive reduction graph;
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php


6. lub(x, y) in time O(√n);
7. glb(x, y) in time O( n); √
8. lub(x1 , . . . , xk ) in time O(√n + k log n);
9. glb(x1 , . . . , xk ) in time O( n + k log n);
Moreover, the preprocessing time is O(n2 ) and all bounds are worst case.
Results
√ of Theorem 1.1 are a bit surprising. In fact, in some cases, e.g., when
|E| = Ω(n n), our decomposition technique makes a compression of G = (V, E). It
is worth noting that the complexity bounds obtained do not break any information
theoretic lower bound and are optimal with respect to the worst case, as shown in the
following proposition.
Proposition 1.1 (see [22]). Let L(n) be the number of labeled lattices with n
elements. Then
√ √
L(n) < α(n n+o(n n))
,

where α is a constant (about 6.11343).


Furthermore, starting from Theorem 1.1, a stronger result can √ be derived. In
particular, it is possible to represent a partial lattice in space O(n t) with the same
time bound for all the operations of Theorem 1.1, where t is the minimum number of
disjoint chains which partition the element set. This is an important result, since it
provides a tight characterization of the complexity of partial lattices. In fact, based
on Dilworth’s theorem [12], t is equal to the width of the lattice.
As will be evident in the following, the proposed data structure cannot be straight-
forwardly applied to general digraphs. Therefore, in this case, it could represent a
good heuristic method for testing reachability which takes into account the sparseness
of the graph.
The paper is organized as follows. In section 2, some basic definitions are given;
in section 3, both the basic decomposition strategy and the data structure are pre-
sented; in section 4, using a classification of the vertex set, the final version of the
decomposition strategy is given; in section 5, an extended version of the data struc-
ture is given and the overall space complexity is analyzed; in section 6, time bounds
for all operations of the main theorem are stated; finally, in section 7, future research
is described.
2. Preliminaries. In this section, we briefly describe the notation used and give
some basic definitions useful for the following. More definitions on graphs and posets
can be found in textbooks such as [7, 11, 15].
Directed graphs are denoted G = (V, E), where V is the set of vertices or elements
and E is the set of edges or arcs. Whenever the vertex set is not explicitly defined, it
is denoted V ert(G).
In a directed graph, the in-degree of a vertex x is the number of edges directed
towards x, denoted Indeg(x). Analogously, the out-degree of x (Outdeg(x)) is the
number of edges directed away from v.
A partially ordered set (poset) P = (≺, N ) is an irreflexive, asymmetric, and
transitive relation on the element set N . We denote  the reflexive relation associated
with P. Two elements x, y are incomparable (denoted x ∼ y), if neither x  y nor
y  x.
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1787
w z
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

u v

Fig. 1. Lattice property.

An element z ∈ N is an upper bound of x, y ∈ N if x  z and y  z. The element


z is called the least upper bound of x and y, denoted lub(x, y), if z  w for all upper
bounds w of x and y. The greatest lower bound, denoted glb(x, y), is defined dually.
A lattice is a partial order L = (≺, N ) such that every two x, y ∈ N have both a
least upper bound and a greatest lower bound.
For a given partial order P = (≺, N ), it is always possible to find a linear extension
of P, i.e., a total order which is consistent with P. There always exists such a linear
extension, and the intersection of all linear extensions of P = (≺, N ) is P = (≺, N ).
A minimal set R of linear extensions of P = (≺, N ) whose intersection is P = (≺, N )
is called a realizer of P = (≺, N ), and the order dimension of P = (≺, N ) is the
minimum cardinality of any realizer of P = (≺, N ).
Given a poset P = (≺, N ), its st-completion is the poset obtained by adding to
N two elements labeled s and t and extending ≺ with the following order relations:
s ≺ x and x ≺ t ∀x ∈ N .
Posets whose st-completion is a lattice are partial lattices. Other authors refer to
this class of poset as truncated lattices [27].
Given a directed acyclic graph (dag) G = (V, E), the associated partial order is
the poset PG = (≺, V ) such that, for all u, v ∈ V , u ≺ v if and only if hu, vi ∈ E ∗ ,
where E ∗ is the edge set of the transitive closure graph G∗ = (V, E ∗ ).
Definition 2.1. A dag G = (V, E) satisfies the lattice property if and only if
the associated partial order is a partial lattice.
From the definition, the following property trivially follows and will be useful for
the following (see Figure 1).
Proposition 2.1. A dag G = (V, E) satisfies the lattice property if and only if,
for every four vertices (u, v, z, w) ∈ V , if four directed paths exist, pairwise disjoint
except for at most the extremal vertices, having as endpoints pairs of vertices hu, wi,
hu, zi, hv, wi, and hv, zi, then there exist a vertex x and four paths having as endpoints
pairs of nodes hu, xi, hv, xi, hx, zi, and hx, zi.
Obviously, partial lattices always satisfy the above property.
3. Decomposition strategy: Basic version. Regarding what was said in the
introduction, a general problem of a decomposition technique is the choice of the
decomposition criteria. Our key idea is to use a two-stage decomposition, with each
stage having its own decomposition criteria.
In particular, given a dag G = (V, E) satisfying the lattice property, at the first
stage we partition V in a sequence of sets, or clusters, denoted Clus(c), where c ∈ V
is a representative vertex.
At the second stage, for each cluster Clus(c), we choose a suitable collection
of digraphs (double tree or DT (u, c)), representing all the connectivity information
between vertices in Clus(c) and vertices in V − Clus(c).
1788 MAURIZIO TALAMO AND PAOLA VOCCA

Each double tree collection associated with a cluster represents the basic element
of the proposed decomposition strategy.
For every two vertices x, y, with x ∈ Clus(c), the corresponding collection
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

{DT (u, c)|u ∈ Clus(c)} satisfies the following property: x is connected to y in G


if and only if at least one DT (ui , c) ∈ {DT (u, c)} exists such that x is connected to y
in DT (ui , c).
As we will show in section 4, operating on the cluster size it is possible to bound
the overall space complexity, while realizing, with the second level decomposition, a
constant reachability test.
For the sake of simplicity, we first illustrate the decomposition strategy and the
corresponding data structure with no constraint on cluster size.
3.1. Clusters. We now give a formal definition of a cluster and state its main
properties.
Definition 3.1. Given a vertex c ∈ V , let Clus+ (c) = {P red(c) ∪ c} and
Clus− (c) = {Succ(c) ∪ c}, where P red(c) and Succ(c) are the sets of predecessors
and successors of c. A cluster is either Clus+ (c) or Clus− (c) for some c ∈ V .
In the following, when no confusion is possible, we will use Clus(c) to denote
either Clus+ (c) or Clus− (c).
Let v ∈ V − Clus(c). The following lemma shows an important relationship
between two different clusters, Clus(c) and Clus(v).
Lemma 3.1. If Clus+ (c)∩Clus+ (v) = I, then lub(I) ∈ I. Dually, if Clus− (c)∩
Clus− (v) = I, then glb(I) ∈ I.
Proof. By the cluster definition, for any two elements x, y ∈ Clus+ (c), lub(x, y) ∈
Clus+ (c). In fact, assume for contradiction that lub(x, y) 6∈ Clus+ (c). Then the
following three conditions simultaneously hold:
(i) x ≺ c,
(ii) y ≺ c,
(iii) lub(x, y) ∼ c.
This contradicts Proposition 2.1.
In order to prove the lemma we have to show that Clus+ (c) ∩ Clus+ (v) does not
have two distinct maximal elements. Let us now assume that two maximal elements
x, y ∈ Clus+ (c) ∩ Clus+ (v) exist, with x 6= y. Then we have
(i) lub(x, y) ∈ Clus+ (c),
(ii) lub(x, y) 6∈ Clus+ (v) ⇒ lub(x, y) ∼ v.
However, this once again contradicts Proposition 2.1. A similar reasoning holds
for Clus− (c) ∩ Clus− (v). The lemma is so proved by contradiction.
Lemma 3.2. Let G = (V, E) be a dag satisfying the lattice property and Clus(c)
a cluster. Then G0 = (V − Clus(c), E 0 ) is a dag satisfying the lattice property, where
E 0 is the set of edges of the subgraph induced by V − Clus(c).
Proof. The proof is an obvious consequence of the cluster definition.
3.2. Double trees. Given a cluster Clus(c), let us refer to vertices in Clus(c)
and in V − Clus(c) as internal and external vertices, respectively. In particular,
we denote by Ext(Clus(c)) the set of all external vertices connected to at least one
internal vertex.
The problem of maintaining all the connectivity information related to a given
cluster can be split into the following two subproblems: (i) the representation of the
connectivity relationships between internal vertices and (ii) the representation of the
connectivity relationships between internal and external vertices.
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1789
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

AAAAAA
AAA
A A
AA
AAA
AAAAAA
AAA AAA
c

AAAA
AAA
AAAAA
AAAAAA
AAA
AAA
AAA
AA
z Clus(c) DT(w,c)

AAAA
AA AAA
AA
A AA
AAAAAAA
A AAAA
AAA w

AAAAA
A
AA
AAA
AAAA
AA
AAAAAAA
AAAA
AAAAA
A A
AAAAAA
AAAAAAA
AAA
AAAA
AAA AAAAAA
AAAA
AA
vv
A +
DT(v,c)

AAAA
AAA
AA
AAAA
A
AAAAAAA
AAA
AAAA AAAA
AAA
AAA
AA
Fig. 2. Double tree decomposition.

We solve the first problem by computing for each internal vertex u a spanning
tree of the graph induced by the set P red(u)∩Clus(c) or by the set Succ(u)∩Clus(c),
depending on whether Clus(c) is a Clus+ (c) or a Clus− (c). Hence, to each internal
vertex u ∈ Clus(c) there is associated a tree rooted at u, the internal tree induced by
u, denoted IntT ree(u, c).
For the second problem, from Lemma 3.1, given an external vertex v, the pair
(v, Clus(c)) univocally identifies a vertex u ∈ Clus(c) representing either the
lub(Clus+ (c)∩Clus+ (v)) or the glb(Clus− (c)∩Clus− (v)). This implies that, given
a cluster Clus(c) for each external vertex v connected to at least one vertex in Clus(c),
an internal vertex u, the internal representative of the external vertex v exists, uni-
vocally identifying the internal tree IntT ree(u, c) made up of all internal vertices
connected to v.
Let Ext(u) be the set of external vertices having u as an internal representative
vertex. Obviously, we have
[
Ext(Clus(c)) = Ext(u).
u∈Clus(c)

For each set Ext(u), we compute a spanning tree, rooted at u, of the subgraph
induced by Ext(u). We refer to this tree as the external tree induced by u, denoted
ExtT ree(u, c) (see Figure 2). The external trees in {ExtT ree(u, c)|u ∈ Clus(c)} have
the nice property of being pairwise disjoint, as shown in the following lemma.
Lemma 3.3. Let v, w ∈ Clus(c), with v 6= w. Then ExtT ree(v, c)∩ExtT ree(w, c)
= ∅.
Proof. Let us assume for contradiction that y ∈ ExtT ree(v, c) ∩ ExtT ree(w, c).
By the external tree definition, y is associated with both v and w. This contradicts
the uniqueness of the representative vertex stated in Lemma 3.1. The proof follows
by contradiction.
Definition 3.2. Given a cluster Clus(c) and the two collections {IntT ree(u, c)}
and {ExtT ree(u, c)} of internal and external trees, for each u ∈ Clus(c) a double tree
1790 MAURIZIO TALAMO AND PAOLA VOCCA

is defined as

DT (u, c) = IntT ree(u, c) ∪ ExtT ree(u, c).


Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

A double tree represents the basic decomposition subgraph and, informally speak-
ing, is the union of an internal tree and the external tree rooted at the same internal
vertex, whenever the internal and external trees exist.
Each double tree DT (u, c) is associated with a partial order having order dimen-
sion 2, because its st-completion is a planar poset with one greatest element and
one least element [32]. The first consequence of the above property is that DT (u, c)
is representable with two linear extensions L1 and L2 ; that is, given two vertices
x, y ∈ DT (u, c), y is reachable from x if and only if x < y in both linear exten-
sions. In particular, two labels (coordinates) (x1 , x2 ) are associated with any vertex
x ∈ DT (u, c), each one representing an x position within the first and second linear
extensions, respectively.
The proof of the following proposition can be found in [20].
Proposition 3.1. Given x, y ∈ DT (u, c), reachability(x, y) = true in DT (u, c)
if and only if (x1 , x2 ) < (y1 , y2 ).
From the above proposition Corollary 3.4 easily follows.
Corollary 3.4. Given a double tree DT (u, c), such that |V ert(DT (u, c))| = K,
an O(K)-space implicit data structure exists for performing a reachability test in
O(1)-time.
3.3. Basic decomposition strategy. The decomposition strategy we propose
first generates a collection of clusters and then, for each cluster, builds the corre-
sponding decomposition elements (double tree collection).

procedure BasicBuildClusters;
1. begin;
2. C := ∅; {Cluster Collection}
3. while V 6= ∅ do
4. begin;
5. Choose a cluster Clus(c); {either Clus+ (c) or Clus− (c)}
6. C := C ∪ Clus(c);
7. V := V − Clus(c);
8. end;
9. return C;
10. end.

The cluster collection returned is the input to the following procedure which
builds, for each cluster, the associated double trees collection representing the decom-
position elements on which we base our data structure.

procedure BasicDecomposition (C : Cluster Collection);


1. for each Clus(c) ∈ C;
2. begin
3. for each u ∈ Clus(c) Build {DT (u, c)};
4. return Clus(c) and {DT (u, c)};
{Returns the current cluster and double tree collection}
5. end;
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1791

Hence, a dag G = (V, E) induces two collections of subgraphs,


C = hClus(c1 ), . . . , Clus(ck )i,
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

T = h{DT (u1,j , c1 )} , . . . , {DT (uk,j , ck )}i for all ui,j ∈ Clus(ci ),


satisfying the following invariants. Let x ∈ V .
(i) x belongs to one and only one cluster Clus(ci ) (for construction, see line 7 of
the BasicBuildClusters procedure);
(ii) given a cluster Clus(cj ) different from the one to which x belongs according
to (i), x belongs to at most one DT (uj,i , cj ) as an external vertex (Lemma 3.3);
(iii) given a cluster Clus(ci ), each element of the collection {DT (ui,j , ci )} is, by
definition, univocally identified by the element ui,j ∈ Clus(ci ).
To prove the correctness of the proposed strategy we show that the double tree
collection is a covering of the given graph.
Theorem 3.5. Given a dag G = (V, E) satisfying the lattice property and x, y ∈
V , reachability(x, y) = true in G = (V, E)if and only reachability(x, y) = true
in DT (u, c), for at least one DT (u, c) ∈ T .
Proof. (⇒) First, note that by Lemma 3.2 the graph G0 = (V − Clus(c), E 0 )
still satisfies the lattice property. Let reachability(x, y) = true in G = (V, E). We
show that the decomposition strategy returns a double tree DT (u, c) ∈ T such that
reachability(x, y) = true in DT (u, c). Note that, by construction, x and y must
belong to one and only one cluster. Two different cases are possible according to
which cluster they belong.
1. x, y ∈ Clus(c).
If Clus(c) is a Clus+ (c), then line 3 of the BasicDecomposition procedure assures
that x belongs to IntT ree(y, c), and hence x ∈ DT (y, c). On the other hand, if
Clus(c) is a Clus− (c), then y ∈ DT (x, c).
2. x ∈ Clus(c1 ) and y ∈ Clus(c2 ).
Without loss of generality, let us suppose that the decomposition algorithm first gen-
erates Clus(c1 ) and then Clus(c2 ). Since y is connected to a vertex in Clus(c1 ); it
then has an internal representative vertex, say, w. Hence y ∈ ExtT ree(w, c1 ). By
Proposition 2.1, either x ∈ IntT ree(w, c1 ) or x = w. Thus, reachability(x, y) =
true in DT (w, c1 ).
(⇐) This part of the proof obviously follows from observing that the decomposi-
tion algorithm does not add any new edge.
3.4. Basic data structure. In this section we briefly describe the basic data
structure for representing a dag satisfying the lattice property and describe how to
perform the reachability operation, since this is the basic operation for performing
all the others.
From the previously described invariants and from applying Proposition 3.1, we
derive the following simple implicit data structure based on look-up tables (see Fig-
ure 3).
The data structure is composed of one look-up table indexed on elements in V
(data structure A) and two sets of look-up tables (data structures B and C).
Data structure A stores for each vertex the unique identifier of Clus(ci ) to which
it belongs (see invariant (i) above) and its sign (“+” or “−” for Clus+ (c) or Clus− (c),
respectively).
Data structure B is a set of look-up tables, each table associated with a vertex
x ∈ V . For each double tree DT (u, ci ) of the decomposition induced by the cluster
Clus(ci ) to which x belongs, data structure B stores x’s coordinates with respect
1792 MAURIZIO TALAMO AND PAOLA VOCCA

Data Structure C
Data Structure A Clus(ci )
y w
............
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

............ (y1 ,y2 )

AAAAAA
1
............

AAAAAA
2 c

AAAAAA
y
x Clus(ci)

AAAAA
AAAAAA
............

AAAAA
AAAA
w

y Clus(cj)
AAAAA
AAAA
AAAAA
AAAA
x
Data Structure B
............

w
n
AAAAA
AAAA x ............ (x1,x 2) ............

Fig. 3. Data structure.

to the representation of DT (u, ci ), whenever x belongs to DT (u, ci ); otherwise, it


contains a null value.
Data structure C is a set of look-up tables, each table associated with an element
x ∈ V . For each cluster Clus(ci ) of the cluster collection C, if x ∈ Ext(Clus(ci )),
then data structure C stores the identifier of the double tree associated with Clus(ci ),
to which x belongs as an external element, and stores x’s coordinates with respect to
this double tree representation.
Let us now describe how to perform reachability(x, y). From data structure A
we derive the clusters to which x and y belong. Let x ∈ Clus(ci ) and y ∈ Clus(cj ).
Two cases are possible.
(i) Clus(ci ) = Clus(cj ).
In this case we look in the B table associated with x and y for their coordinates
with respect to both: (1) the double tree rooted at x; (2) the double tree rooted
at y. If, in one of the two double tree, x’s coordinates are smaller than y’s, then
reachability(x, y) = true; otherwise reachability(x, y) = false.
(ii) Clus(ci ) 6= Clus(cj ).
In this case, we first look in the table C associated with y, for the double tree iden-
tifier corresponding to cluster Clus(ci ). Then we search in the B table associated
with x, the coordinates of x with respect to this double tree. If x’s coordinates are
smaller than y’s, then reachability(x, y) = true; otherwise, we repeat the search,
looking in the C table associated with x for the double tree identifier corresponding
to cluster Clus(cj ). Then we search in the table B associated with y, y’s coordi-
nates with respect to this double tree. If x’s coordinates are smaller than y’s, then
reachability(x, y) = true. If both searches fail, then reachability(x, y) = false
(see Figure 3).
From the above discussion it is possible to state the following lemma.
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1793

Lemma 3.6. The reachability operation can be performed in constant time.


Obviously, for general clusters the space complexity is O(n2 ). Therefore, let us
suppose, for instance, that the cluster collection C satisfies the following condition:
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

√ √
n n
∀ Clus(ci ) ∈ C ⇒ ≤ |Clus(ci )| ≤ .
4 2

In this case, it is trivial to show that the overall space occupancy is O(n n). Unfortu-
nately, this is a special case and, generally, this condition does not hold. Nevertheless,
as shown in the following, it is possible to find a suitable cluster decomposition allow-
ing us to keep the space occupancy within the required bound.
4. Decomposition strategy. As shown in section 3, a proper choice of cluster
size

permits

us to bound the space complexity. We claim that a cluster size within
n n
4 and 2 is, in fact, the answer to our problem because this size allows us to
balance what is eliminated in one main iteration and what remains to be considered
(see line 7 of the BasicBuildClusters procedure). When it is not possible to find
clusters with the right size, we group them to form a cluster forest.
4.1. Elements classification. We need some more definitions.
Definition 4.1. A vertex c ∈ V is
1 good if
√ √ √ √
n + n n − n
≤ |Clus (c)| ≤ or ≤ |Clus (c)| ≤ ;
4 2 4 2
2 fat if it is not good√and if one of the following two conditions √holds:
(i) |Clus+ (c)| > 2n and ∀x ∈ Clus+ (c) then |Clus+ (x)| < 4n ; or
√ √
(ii) |Clus− (c)| > 2n and ∀x ∈ Clus− (c) then |Clus− (x)| < 4n ;
3 thin if it is neither good nor fat and if
√ √
+ n − n
|Clus (c)| < or |Clus (c)| < .
4 4
We establish the correctness of our approach by first showing that it is always
possible to find one of the above defined elements.
Lemma 4.1. Given a dag G = (V, E) satisfying the lattice property, at least one
good, fat, or thin vertex does exist which can be retrieved in time O(n2 ).
Proof. Let us consider the following strategy. We first visit the graph, searching
for good elements and meanwhile assigning a weight to each node to represent the
size of the clusters (both Clus+ and Clus− ) it induces. Whenever this search fails,
we look for fat vertices. √
If there is one node having weight greater than 2n , then at least one fat vertex

exists. In fact, without loss of generality, let c be a vertex such that |Clus+ (c)| > 2n .
We search the predecessors set of√c for an element x inducing the smallest cluster
satisfying condition |Clus+ (x)| > 2n . If such a vertex x exists, then x is fat because
there are no good vertices and, by a transitive √
property, none of its predecessors can
induce a cluster of cardinality greater than 2n . Otherwise, c is a fat vertex. If there
are no fat vertices, then the vertex with the maximum weight is a thin vertex. Hence,
this strategy always returns at least one element.
For the time complexity, let us consider a data structure for representing the graph
which maintains for each vertex the number of its predecessors and an ordered list of
1794 MAURIZIO TALAMO AND PAOLA VOCCA

the number of predecessors of its immediate predecessors. With this data structure,
which can be derived in time O(n2 ) by recursively visiting G = (V, E), the above
strategy can be implemented in time that is linear in the number of vertices.
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

4.2. Cluster collection. Our aim is to show that, given a dag G = (V, E), it
is possible to build a cluster collection of clusters induced by either good vertices or
thin vertices.
First we need to describe how to manage fat vertices.
Lemma 4.2. Given a fat vertex c, it is always possible to generate a sequence of
clusters induced by good vertices which covers Clus(c). √
Proof. Without loss of generality, let |Clus+ (c)| > 2n . By definition, each one
of c’s immediate predecessors, hc1 , . . . , ct i, satisfies the following condition:

+ n
|Clus (ci )| < f or all 1 ≤ i ≤ t.
4
We can then group √
clusters

induced by these vertices until the cardinality of each
n n
group is within 4 and 2 .
Let Clus(ci1 , . . . , cik ), where hci1 , . . . , cik i ⊂ hc1 , . . . , ct i is one group of clusters
so obtained.
To prevent Clus(ci1 , . . . , cik ) from violating Lemma 3.1 (an external vertex y
could be related to all hci1 , . . . , cik i through the fat vertex c), we add a dummy good
vertex and the following directed edges (see Figure 4):
(i) hdi , ci;
(ii) hcij , di i for 1 ≤ j ≤ k.
By construction, the dag obtained by adding dummy vertices still satisfies the
lattice property. Hence, to each fat vertex c there corresponds a collection {Clus(di )}
of clusters induced by dummy good vertices (see Figure 4) which covers
Clus+ (c).

Good Cluster
c

d1
.......... .......... .......... d l

n /4 <=|Vert(Clus(di ))|<= n /2

Fig. 4. Fat node.

The following procedure shows how our strategy chooses the required cluster
collection. First, clusters induced by good vertices and by dummy good vertices
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1795

associated with fat vertices are chosen, then clusters induced by thin vertices are
considered. More precisely, we have the following:
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

procedure BuildClusters;
1. begin;
2. C := ∅; {Cluster Collection}
3. while V 6= ∅ do
4. begin;
5. if ∃c ∈ V s.t. c is good then C := C ∪ Clus(c);
6. else if ∃c ∈ V s.t. c is fat then
7. begin
8. Build the associated good cluster collection {Clus(di )};
9. goto 5;
10. end
11. else
12. begin
13. Choose a thin vertex c s.t. Clus(c) has maximum cardinality;
14. C := C ∪ Clus(c);
15. end;
16. V := V − Clus(c);
17. end;
18. return C;
19. end.

Let C = hClus(c1 ), . . . , Clus(cg ), . . . , Clus(cg+t )i be the cluster collection so ob-


tained, where for 1 ≤ i ≤ g, Clus(ci ) is a cluster induced by either actual or dummy
vertices and for g + 1 ≤ i ≤ t, Clus(ci ) is induced by thin vertices.
To obtain the required space complexity, we group clusters induced by thin ver-
tices. We define a cluster forest as any collection of clusters induced by thin vertices.
Let F = hClus(c1 ), . . . , Clus(ck )i be a cluster forest. We define the external vertex
set of a forest F , denoted Ext(F ), as the union of all the external vertex sets of each
cluster composing the cluster forest; i.e.,
k
[
Ext(F ) = Ext(Clus(ci )).
i=1

Note that according to our definitions, F ∩ Ext(F ) 6= ∅ in general. Therefore, for each
cluster Lemma 3.3 still holds.
The following version of the decomposition procedure generates for each cluster
in C the corresponding double tree collection. Moreover, it groups clusters induced
by thin vertices into cluster forests.
Each forest F = hClus(c1 ), . . . , Clus(ck )i is generated by choosing clusters from
the cluster collection in not increasing order of size until one of the following conditions
holds: √
1. |F | ≥ 4n , or
2. mk ≥ n, where m = |Ext(F )|.
As we will see in section 5, condition 2 allows us to obtain the required space
complexity.
1796 MAURIZIO TALAMO AND PAOLA VOCCA

procedure Decomposition (C : Cluster Collection);


1. for each Clus(c) ∈ C;
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

2. begin
3. for each u ∈ Clus(c) Build {DT (u, c)};
4. return Clus(c) and {DT (u, c)};
5. end;
6. for each Clus(c) ∈ hClus(cg+1 ), . . . , Clus(cg+t )i;
{the sequence is in not increasing order of size}
7. begin
8. F = ∅;
9. k = 0; √
10. while |F | < 4n and mk < n.
11. begin
12. F = F ∪ Clus(c);
13. k = k + 1; {Number of clusters in a forest}
14. m = |Ext(F )|; {Number of external vertices}
15. next Clus(c);
16. end;
17. return F ; {Returns the current forest}
18. end;

The above decomposition strategy returns a sequence hF1 , . . . , Fg , . . . , Fg+f i of


clusters and cluster forests, where Fi = Clus(ci ), for 1 ≤ i ≤ g, is a cluster induced
by either actual or dummy good vertices, while for g + 1 ≤ i ≤ f , Fi is a cluster forest.
Further, for each cluster the associated double tree collection is produced. Both
collections are then inserted into the data structure. It is trivial to show that, even if
clusters forests are considered, the double tree decomposition satisfies Theorem 3.5.
5. Data structure and space complexity. The clusters and cluster forests
collection and the corresponding double tree decomposition returned by the Decompos-
ition procedure satisfy the following invariants. Let x ∈ V :
(i) x belongs to one and only one Fi , where 1 ≤ i ≤ g + f ;
(ii) x belongs to one and only one cluster Clus(ci,j );
(iii) given a cluster Clus(cj,l ) different from the one to which x belongs according
to (ii), at most one u ∈ Clus(cj,l ) exists such that x ∈ DT (u, cj,l );
(iv) given a cluster Clus(ci,j ), each element of the collection {DT (u, ci,j )} is, by
definition, univocally identified by the element u ∈ Clus(ci,j ).
To represent cluster forests, we modify the basic data structure described in sec-
tion 3.3 to take into account the double indirection between a forest and its clusters
(see Figure 5).
The new data structure C is again a set of look-up tables, each table associated
with a vertex x ∈ V . For each forest Fi , if x ∈ Ext(Fi ), then data structure C main-
tains the identifier of a fourth kind of table, D, which stores connectivity information
between x and Fi . If x is not connected to Fi , then it stores a null value.
Data structure D is a set of look-up tables, each table associated with a vertex x
and a cluster forest Fi . Table D(Fi ) exists if and only if x ∈ Ext(Fi ). For each cluster
Clus(cij ) in the cluster forest Fi , the corresponding field in the look-up table D(Fi )
stores the identifier of the unique double tree associated with Clus(cij ) to which x
belongs as an external vertex and stores x’s coordinates with respect to this double
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1797

Data Structure A

Data Structure B
wm
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

............
x ............ (x1,x 2) ............
2

Fi Data Structure C
x Fi
Clus(ci,m )

y ........ D-id(Fi ) Nil ........


............

Data Structure D(Fi)


Clus(c i,m)
wm
n y ............ (y1,y2) ............

Fig. 5. Extended Data Structure.

tree representation.
Data structure A stores, for each vertex x ∈ V , the identifiers of both the cluster
and forest to which it belongs, whenever they are different, and the cluster sign. Data
structure B is the same as in section 3.3.
For the reachability test, it is easy to extend the basic strategy described in
section 3.3 for the new data structure.
We now analyze the space complexity. With reference to the clusters and cluster
forests sequence hF1 , . . . , Fg , . . . , Fg+f i, by construction, the subsequence hF
√1 , . . . , Fg i,
together with the corresponding double tree decomposition, requires O(n n)-space.
Let us now analyze the subsequence hFg+1 , . . . , Fg+f i of cluster forests induced
by thin vertices.
Recall that forests of clusters are generated choosing clusters in decreasing order
until one of the following conditions holds:

1. |F | ≥ 4n , or
2. mk ≥ n, where m = |Ext(F )|.
If we denote mi = |Ext(Fi )|, then the overall space complexity of the D data
Pg+f
structure is O( i=g+1 mi ki ) since only vertices related to a cluster forest have the
corresponding D look-up table, each table of size O(ki ). Hence, the second condition
is used to bound each term of the summation.
We now√ show that the decomposition strategy returns a number of cluster forests
less than n.
Obviously, if it is always possible to generate a forest satisfying
√ both conditions,
then the space complexity of the overall data structure is O(n √ n). Unfortunately,
the second condition could prevent us from generating an O( n) √ collection of cluster
forests; that is, each cluster forest could be of size less than 4n . The following
technical lemmas show how to manage this case.
First, it is important to underscore one property, shown in Lemma 5.1, of cluster
forests useful for the following proofs.
1798 MAURIZIO TALAMO AND PAOLA VOCCA

Lemma 5.1. Let F = hClus(c1 ), . . . , Clus(ck )i be a cluster forest, where c1 , . . . , ck


are thin vertices, then
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

|Clus(ci )| = t ⇒ |Ext(Clus(ci ))| ≤ t2 ∀i ∈ {1, . . . , k} .

Proof. The proof easily follows observing that, by construction, forests are gen-
erated only when there are neither good nor fat vertices. Moreover, each cluster is
added to a forest in not increasing order of size.
Without loss of generality, we denote the size of a cluster Clus(cij ) ∈ Fi as follows:
1
Pj
− δip
(1) |Clus(cij )| = n 2 p=1 ,

where δip ≥ 0 ∀ p ∈ {1, . . . , j}.


In fact, if the ordered sequence of clusters hClus(ci1 ), . . . , Clus(ciki )i composing
a forest is generated by the Decomposition procedure above, then the corresponding
sequence of cluster

sizes is monotone and not increasing, and by hypothesis each size
is less than 4n . Moreover,

ki
X 1
Pj
− δip
(2) |Fi | = n2 p=1 ,
j=1

where hδi1 , . . . , δiki i is a sequence of nonnegative real values.


Let us suppose √
that the ith generated cluster forest satisfies the conditions
1. |Fi | < 4n ,
2. mi ki ≥ n,
and let Clus(ci,ki ) be the last cluster chosen. Then we have the following.
1 −δ
1
Lemma 5.2. |Clus(ci,ki )| < n 2 4 , where n 2 −δi1 = |Clus(ci,1 )|.
i1

Proof. From Lemma 5.1, if |Clus(cij )| = tij , then the number of external vertices
related to Clus(cij ) is at most t2ij . As a consequence, we have

ki
X ki 
X Pj 2
1
− δip
(3) mi ki ≤ ki (tij )2 = ki n2 p=1

j=1 j=1

ki
X Pj ki
X
1−2 δip
(4) = ki n p=1 ≤ ki n1−2δi1 = ki2 n1−2δi1 .
j=1 j=1

Hence, by condition mi ki ≥ n, we get

(5) ki2 n−2δi1 ≥ 1 =⇒ ki ≥ nδi1 > 4.


¡
The last inequality follows from a forest termination condition; i.e.,

ki
X Pj √
1
2− δip n
|Fi | = n p=1 < ;
j=1
4

hence, n−δi1 < 14 .
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1799

Moreover,
√ k ¡ 1 Pj  ki ¡ 1 Pki   ¡ Pk 
n X i X
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1 i
2− δip 2− δip 2− δip
(6) > n p=1 ≥ n p=1 = ki n p=1
4 j=1 j=1

and, from relation (5),


 ¡ Pk  ¡ 1 Pki  ¡ 1 Pki 
1 i
2− δip − δip − δip
(7) ki n p=1 ≥ nδi1 n 2 p=1 =n 2 p=2 .

Hence,
¡ 1 Pki  √
2− δip n
n p=2 < .
4
Dividing both terms by nδi1 , we have
¡ 1 Pki 
n( 2 −δi1 )
1

2− δip
(8) n p=1 < .
4
The left-hand side of inequality (8) is, by definition, the size of Clus(ciki ).
With reference to the cluster forests sequence hFg+1 , . . . , Fg+f i, let g + 1 ≤ i <
g + f . We have the following.
1 −δ
Lemma 5.3. |Clus(ci+1,1 )| < n 2 4 .
i1

Proof. The proof follows trivially from Lemma 5.2, observing that clusters are
taken in not increasing order of size.
From the above technical lemmas, Lemma 5.4 easily follows.
Lemma 5.4. The decomposition strategy returns an O(log n) collection hFg+1 , . . . ,
Fg+f i of clusters forests.
From Lemma 5.4, we have the following.
Theorem 5.5. The data √ structure for the representation of dags satisfying the
lattice property has an O(n n)-space complexity and allows us to perform the reach-
ability operation in constant time.
6. Queries implementation. In this section, we describe algorithms for the
operations introduced in Theorem 1.1. A sketch of the algorithm for the reachabil-
ity operation already has been given in section 3.4 and the time complexity has been
proven in Theorem 5.5.
Let us now describe the path operation. For this operation we need to augment
our data structure. First note that each double tree is, by construction, the union
of two rooted trees. Hence, for each vertex x and for each double tree to which it
belongs, it is possible to associate one and only one vertex u (the parent in the tree)
which is on the path from x to the vertex inducing the double tree. We then store
in the data structures B and D u’s coordinates with respect to this double tree. The
path operation is now straightforward. While testing the presence of a directed path
between x and y using the reachability(x, y) test, we find the double tree to which
they both belong, and then we look at the coordinates of the parents of x and y.
These vertices are the immediate successors of x and y on the path from x to y. Then
we recursively repeat the search starting from these vertices until we reach the root
of the double tree.
In this paper, we will not furnish the full algorithm which, although simple, is
quite lengthy. The net result is the following.
1800 MAURIZIO TALAMO AND PAOLA VOCCA

Proposition 6.1. The path(x,y) operation can be implemented in O(l) time,


where l is the path length.
For implementing the pred(x) and succ(x) operations, we add new data struc-
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

tures to those described in section 3.4. In particular, for each double tree DT (u, c)
of the double tree decomposition T , we maintain the corresponding internal tree
IntT ree(u, c). Moreover, for each vertex x ∈ V , let Clus(c) be the cluster to which
x belongs; if Clus(c) is a Clus+ (c), then we maintain the set of x’s successors in
Clus+ (c); otherwise, we maintain the set of x’s predecessors. Let us denote the first
set Succ(x, c) and the second P red(x, c). Finally, for each vertex x ∈ V we store the
set of double tree identifiers to which each vertex is connected as an external vertex.
This information eliminates the need to visit all the D√tables. By construction, the
space complexity of the new data structure is still O(n n).
Let us now introduce the pred(x) operation. For succ(x) operation, the algo-
rithm is similar.

procedure pred (x)


begin;
1. if A[x].sign = "+"
2. then return (IntTree(x))
3. else return (Pred(x,c)) ;
4. for each double tree to which x is connected as external vertex
5. begin;
6. Let u be the internal representative vertex of x;
7. return (IntTree(u));
8. end; end;
Proposition 6.2. The pred(x) and succ(x) operations require O(k) time,
where k is the size of the returned set.
Proof. This follows observing that only actual predecessors (successors) are vis-
ited.
For the range(x, y) operation, we have the following result.
Lemma 6.1. Let x, y ∈ V . If reachability(x, y) = true, then one and only
one double tree DT (u, ci ) exists such that range(x, y) ⊆ V ert(DT (u, ci )).
Proof. Let x ∈ Clus(c) and y ∈ Clus(c0 ). Let us first suppose that Clus(c) has
been generated before Clus(c0 ).
Two cases are possible.
1. Clus(c) is a Clus+ (c).
Let u be the internal representative vertex of y with respect to Clus(c). We claim
that DT (u, c) is the required double tree.
Let z ∈ range(x, y) and z ∈ Clus(c00 ). Clus(c00 ) cannot be generated before Clus(c);
otherwise, by cluster definition and according to Clus(c00 ) sign, either x ∈ Clus(c00 )
or y ∈ Clus(c00 ).
Hence, Clus(c00 ) has been generated after Clus(c) and z ∈ Ext(Clus(c)). Since
x ≺ z ≺ y, by double tree definition, x, y, z ∈ V ert(DT (u, c)).
2. Clus(c) is a Clus− (c).
By cluster definition, we have Clus(c) = Clus(c0 ). In this case, we want to prove that
DT (x, c) is the required double tree. Since x ≺ y, then y ∈ V ert(DT (x, c)).
Let z ∈ range(x, y) and z ∈ Clus(c00 ). As before, Clus(c00 ) cannot be generated be-
fore Clus(c). Moreover, in this case, Clus(c00 ) cannot also be generated after Clus(c);
then Clus(c) = Clus(c00 ) and z ∈ V ert(DT (x, c)).
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1801

In a similar way, it is possible to prove that, if Clus(c0 ) has been generated before
Clus(c), then either range(x, y) ⊆ V ert(DT (u, c0 )), where u is the internal repre-
sentative vertex of x with respect to Clus(c0 ), or range(x, y) ⊆ V ert(DT (y, c0 )),
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

depending on the Clus(c0 ) sign.


Proposition 6.3. The range(x, y) operation can be implemented in O(k log d)
time, where k is the size of the returned set and d is the maximum vertex degree (either
in-degree or out-degree) of the transitive reduction graph GT = (V, E T ).
Proof. Representing all double trees using the implicit data structure of Corol-
lary 3.4, we can first identify in constant time the unique double tree that contains
range(x, y), and we can then report range(x, y) by visiting this double tree within
the required time bound.
It is important to emphasize that, by means of the range(x, y) and path oper-
ations, it is possible to report the transitive reduction subgraph having x as a source
and y as a sink.
Let us now describe the glb(x, y) and lub(x, y) operations. We consider only
the glb(x, y), as results obtained can be stated, mutatis mutandis, for lub(x, y).
Let x ∈ Clus(ch ) and y ∈ Clus(ck ) and let {Clus(c1 ), . . . , Clus(cp )} be the
subset of clusters of the cluster collection generated by the BuildCluster procedure,
having as internal vertices at least one predecessor of both x and y. Moreover, let
{x1 , . . . , xp } and {y1 , . . . , yp } be the corresponding internal representative vertices of
x and y, respectively.
Using an argument similar to the one used to prove Lemma 3.1, the following two
lemmas can be established.
Lemma 6.2. If any Clus(ci ), for 1 ≤ i ≤ p, is a Clus+ (ci ), then

if IntT ree(xi , ci ) ∩ IntT ree(yi , ci ) = I, then LU B(I) ∈ I.

Lemma 6.3. If any Clus(ci ), for 1 ≤ i ≤ p, is a Clus− (ci ), then

if P red(xi , ci ) ∩ P red(yi , ci ) = I, then LU B(I) ∈ I.

Let {u1 , . . . , up } be the corresponding set of least upper bounds. We have the
following.
Lemma 6.4. The partial order associated with {u1 , . . . , up } is a total order.
Proof. To prove the lemma we show that there exists a permutation hui1 , . . . , uip i
such that uij ≺ uim , for all i1 ≤ ij < im ≤ ip .
Two cases are possible.
1. Either Clus(ch ) is a Clus+ (ch ) or Clus(ck ) is a Clus+ (ck ).
Let the clusters in hClus(c1 ), . . . , Clus(cp )i be ordered according to the generation
order.
If Clus(ch ) is a Clus+ (ch ), then p ≤ h. In fact, by cluster definition, once Clus+ (ch )
has been taken, then all x’s predecessors have been considered. Analogously, if
Clus(ck ) is a Clus+ (ck ), then p ≤ k.
Moreover, all Clus(ci ) in the sequence are Clus+ (ci ). In fact, suppose for contra-
diction that Clus(cj ), for 1 ≤ j ≤ p, is a Clus− (cj ). If Clus(ch ) is a Clus+ (ch ),
then x ∈ Clus− (cj ), or, if Clus(ck ) is a Clus+ (ck ), then y ∈ Clus− (cj ), but this is a
contradiction.
We claim that the sequence of least upper bounds hu1 , . . . , up i, ordered according to
the order in which the corresponding clusters are generated, is the required permuta-
tion. Let us assume that two values 1 ≤ i < j ≤ p exist such that either ui ∼ uj or
uj ≺ ui . In the first case the following relations hold simultaneously:
1802 MAURIZIO TALAMO AND PAOLA VOCCA

(i) ui ≺ x and ui ≺ y;
(ii) uj ≺ x and uj ≺ y;
(iii) ui ∼ uj .
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

However, this contradicts the lattice property.


In the second case, since by hypothesis Clus(ci ) has been generated before cluster
Clus(cj ), we have uj ∈ Clus(ci ). The proof follows by contradiction.
2. Clus(ch ) is a Clus− (ch ) and Clus(ck ) is a Clus− (ck ).
Let us consider the ordered sequence of clusters

hClus+ (ci1 ), . . . , Clus+ (cim ), Clus− (cim+1 ), . . . , Clus− (cip )i

defined as follows.
The subsequence hClus+ (ci1 ), . . . , Clus+ (cim )i is composed of all clusters in {Clus(c1 ),
. . . , Clus(cp )} having sign “+” and ordered according to the generation order. The
subsequence hClus− (cim+1 ), . . . , Clus− (cip )i is made up of all clusters in {Clus(c1 ),
. . . , Clus(cp )} having sign “−”, ordered according to the inverse generation order.
We claim that the corresponding sequence hui1 , . . . , uim , uim+1 , . . . , uip i is the desired
permutation.
By an argument similar to the one used in the previous case, it is possible to prove that
the two subsequences hui1 , . . . , uim i and huim+1 , . . . , uip i are totally ordered. We have
to prove that for all uij ∈ hui1 , . . . , uim i and for all uil ∈ huim+1 , . . . , uip i, uij ≺ uil .
By the lattice property either uij ≺ uil or uil ≺ uij . In the first case the claim is
proved. Suppose for contradiction that uil ≺ uij ; then, by cluster definition, uil ∈
Clus+ (cij ). This once again contradicts the hypothesis.
This completes the proof.
Using the above lemma, it is possible to state the following proposition.
Proposition
√ 6.4. The glb(x, y) and lub(x, y) operations can be implemented
in O( n)-time.
Proof. To implement these operations we augment the data structure as follows.
For each internal tree of a cluster, we maintain the set of least upper bounds (greatest
lower bounds) of the sets derived from the intersection with all the other internal
trees in the same cluster. Analogously, we maintain for each vertex x the set of
least upper bounds (greatest lower bounds) of the sets derived from the intersection
between P red(x, c) (Succ(x, c)) and P red(y, c) (Succ(y, c)) for all y in Clus(c).

By construction, the overall space occupancy is still O(n n).

Given x and y, using data structures D and B, in O( n)-time, it is possible to
find the set hu1 , . . . , up i. The maximal element of this sequence is the greatest lower
bound of x and y.
It is worth noting that, by means of lub(x, y), glb(x, y) and succ(x), pred(x)
operations it is possible to easily implement the following.
1. CommAnc(x, y). Given x, y ∈ V , returns the set of all common ancestors of
x and y.
2. CommSucc(x, y). Given x, y ∈ V , returns the set of all common successors
of x and y.
In particular, from Propositions 6.2 and 6.4, Proposition 6.5 is derived.
Proposition 6.5. √ The CommAnc(x, y) and CommSucc(x, y) operations can
be implemented in O( n + k) time, where k is the size of the returned set.
To prove Theorem 1.1, it remains to describe lub(x1 , . . . , xk ) and glb(x1 ,√ ...,
xk ). Note that a straightforward application of Proposition 6.4 leads to an O(k n)
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1803

worst-case time bound. In order to obtain the desired complexity, we have to under-
score some additional properties of the proposed decomposition.
Let us analyze only the glb(x1 , . . . , xk ) operation, as lub(x1 , . . . , xk ) can be
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

dually derived.
Lemma 6.5. If all vertices (x1 , . . . , xk ) belong to the same cluster, then operation
glb(x1 , . . . , xk ) can be performed in O(k) steps.
Proof. The lemma easily follows by exploiting the information added for the
glb(x, y) operation. Starting from the internal trees of the k vertices, we can first
compute the k2 least upper bounds associated with pairs of internal trees. Iterating
the process on the k2 least upper bounds we find the glb(x1 , . . . , xk ), whenever it
exists.
On the other hand, let the vertices (x1 , . . . , xk ) belong to different clusters. Con-
sider any two vertices, say, x1 and x2 .
Let hu1 , . . . , up i be the sequence of least upper bounds of the set of common
predecessors with respect to clusters hClus(c1 ), . . . , Clus(cp )i (see Lemma 6.4). By
construction, any other vertex in the sequence (x1 , . . . , xk ) satisfies the following prop-
erty.
Lemma 6.6. Let xi ∈ (x1 , . . . , xk ). If uj is the greatest vertex in the sequence
hu1 , . . . , up i such that uj ≺ xi , then either glb(x1 , x2 , xi ) = uj or glb(x1 , x2 , xi ) ∈
IntT ree(uj+1 , cj+1 ).
Proposition √ 6.6. The lub(x1 , . . . , xk ) and glb(x1 , . . . , xk ) operations can be
implemented in O( n + k log n)-time.
Proof.
√ First observe that, by Proposition 6.4, we can derive the sequence hx1 , . . . , xp i
in O( n)-time. Hence, by means of a binary search, in O(k log n)-time, it is possi-
ble to derive the maximum element uj such that uj ≺ xi for 1 < i < k. Then, by
Lemma 6.6 either glb(x1 , . . . , xk ) = uj or glb(x1 , . . . , xk ) ∈ IntT ree(uj+1 , cj+1 ).
In the former case, the proposition is proved. In the latter case, let (v1 , . . . , vk ) be
the internal representative vertices of (x1 , . . . , xk ) with respect to cluster Clus(cj+1 ).
Now the problem reduces to finding the glb(v1 , . . . , vk ), where (v1 , . . . , vk ) belong to
the same cluster. The proof follows by Lemma 6.5.
By a straightforward application of Lemma 5.5 and Propositions 6.1–6.4 and 6.6,
the proof of the main theorem (Theorem 1.1) is completed.
7. Conclusions and open problems. In this paper, a general technique for
the representation of a dag satisfying the lattice property has been presented. This
technique, based on a two-level graph decomposition strategy, is very efficient for the
reachability problem resolution,√ from either a space or a time complexity point of
view. In fact, when m = Ω(n n), it performs a compression of the given dag. Note
that the complexity bound we derive is optimal, as it matches the theoretical lower
bound for this problem (see Proposition 1.1).
The data structure proposed can be efficiently used not only for testing the pres-
ence of a path between two given vertices but also for performing a set of basic
operations for this class of graphs: namely, find a path between two vertices whenever
one exists; given two vertices, find all vertices on the directed paths connecting them
in the transitive reduction graph; compute all the successors and/or predecessors of
a given vertex; given two vertices, find the least common ancestor and/or the great-
est common successor; given a set of vertices, find all common ancestors and/or all
common successors.
Furthermore, a stronger result can √ be derived. In particular, it is possible to
represent a partial lattice in space O(n t) with the same time bound for all the
1804 MAURIZIO TALAMO AND PAOLA VOCCA

operations, where t is the minimum number of disjoint chains which partition the
element set. It is worth noting that this represents an interesting result, since it
provides a tight characterization of the complexity of partial lattices. In fact, based
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

on Dilworth’s theorem [12], t is equal to the width of the lattice.


Moreover, the graph decomposition strategy, introduced in this paper, has been
implemented. Performance evaluations indicate √ that this strategy can provide a stor-
age occupation substantially less than O(n n)-space while maintaining efficiency in
reachability query resolution.
Note that the class of graphs under investigation has applications in many fields
such as computational geometry, object oriented programming, and distributed sys-
tems.
An interesting research direction is to apply our approach for coping with sec-
ondary memory management problems. The digraphs representation introduced could
represent a powerful clustering technique for minimizing the total number of accessed
pages. Our conjecture is supported by results in [16], where two-dimensional lattices
are used to produce efficient data structures on paged memory for the half-plane
search in two dimensions.
A natural direction for further work is to adopt the same strategy for general
directed graphs. In fact, the main problem in this case is that they usually violate
Lemma 3.1. In fact, given a general dag G = (V, E) and a cluster Clus(c), let u be
a vertex in G−Clus(c); if Clus+ (c) ∩ Clus+ (u) 6= ∅, then it can have more than one
maximal element. Dually, if Clus− (c) ∩ Clus− (u) 6= ∅, then it can have more than
one minimal element. In such a case, a constant time reachability test cannot be
guaranteed. This prevents us from obtaining a constant time reachability test. In
fact, as there is not a one-to-one relation between a given cluster and an external
vertex, it is not possible, given a pair of vertices hx, yi, to univocally identify a double
tree containing both x and y.
Nevertheless, the proposed decomposition strategy represents, in this case, a
heuristic method for testing reachability which takes into account the sparseness of
the graph.
Acknowledgments. The authors wish to thank Prof. Jaroslav Nešetřil for many
interesting discussions on the topics presented here and Prof. Rao Kosaraju and the
anonymous referee for their detailed comments and suggestions.

REFERENCES

[1] R. Agrawal, Alpha: An extension of relational algebra to express a class of recursive queries,
IEEE Trans. Software Engrg., 14 (1988), pp. 879–885.
[2] R. Agrawal, A. Borgida, and H. V. Jagadish, Efficient management of transitive relation-
ship in large data and knowledge bases, in Proc. ACM Internat. Conf. on Management of
Data, ACM, New York, 1989, pp. 253–262.
[3] A. V. Aho, J. E. Hopcroft, and J. D. Ullman, Design and Analysis of Computer Algorithms,
Addison-Wesley, Reading, MA, 1974.
[4] M. Ajtai and R. Fagin, Reachability is harder for directed than for undirected finite graphs,
J. Symbolic Logic, 55 (1990), pp. 113–150.
[5] R. Aleliunas, R. M. Karp, R. J. Lipton, L. Lovasz, and C. Rackoff, Random walks,
universal traversal sequences, and the complexity of maze problems, in Proceedings 20th
Annual IEEE Symposium on Foundations of Computer Science, San Juan, Puerto Rico,
IEEE Computer Society Press, Los Alamitos, CA, 1979, pp. 218–223.
[6] G. Barnes and W. L. Ruzzo, Undirected s–t connectivity in polynomial and sublinear space,
Comput. Complex., 6 (1996), pp. 1–28.
[7] G. Birkhoff, Lattice Theory, American Mathemathical Society Colloquium Publications 25,
AMS, Providence, RI, 1979.
AN EFFICIENT DATA STRUCTURE FOR LATTICE OPERATIONS 1805

[8] G. Birkhoff and O. Frink, Representation of lattices by sets, Trans. AMS, 64 (1948), pp. 299–
316.
[9] J. Biskup and H. Stiefeling, Evaluation of upper bounds and least nodes as database opera-
Downloaded 07/22/12 to 134.68.190.47. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

tions, in Lecture Notes in Comput. Sci. 730, Springer-Verlag, New York, pp. 197–214.
[10] A. Borodin, S. A. Cook, P. W. Dymond, W. L. Ruzzo, and M. Tompa, Two applications of
inductive counting for complementation problems, SIAM J. Comput., 18 (1989), pp. 559–
578.
[11] B. A. Davey and H. A. Priestly, Introduction to Lattices and Order, Cambridge University
Press, Cambridge, 1990.
[12] R. Dilworth, A Decomposition Theorem for Partially Ordered Sets, Ann. Math., 51 (1950),
pp. 161–165.
[13] B. Dushnik and E. Miller, Partially ordered sets, Amer. J. Math., 63 (1941), pp. 600–610.
[14] D. Eppstein, Z. Galil, G. Italiano, and A. Nissenzweig, Sparsification—a technique for
speeding up dynamic graph algorithms, J. ACM, 44 (1997), pp. 669–696.
[15] S. Even, Graph Algorithm, Computer Science Press, Rockville, MD, 1979.
[16] P. G. Franciosa and M. Talamo, Orders, k-sets and fast halfplane search on paged memory,
in Orders, Algorithms, and Applications, International Workshop ORDAL ’94, Lecture
Notes in Comput. Sci. 831, V. Bouchitté and M. Morvan, eds., Springer-Verlag, Berlin,
1994, pp. 117–127.
[17] G. Gambosi, M. Protasi, and M. Talamo, An efficient implicit data structure for relation
testing and searching in partially ordered sets, BIT, 33 (1993), pp. 29–45.
[18] M. Habib and L. Nourine, Bit-vector encoding for partially ordered sets, in Orders, Algo-
rithms, and Applications, International Workshop ORDAL ’94, V. Bouchitté and M. Mor-
van, eds., Lecture Notes in Comput. Sci. 831, Springer-Verlag, New York, 1994, pp. 1–12.
[19] H. V. Jagadish, Incorporating hierarchy in a relational model of data, SIGMOD Record (ACM
Special Interest Group on Management of Data), 18 (1989), pp. 78–87.
[20] T. Kameda, On the vector representation of the reachability in planar directed graphs, Inform.
Process. Lett., 3 (1974/75), pp. 75–77.
[21] D. Kelly, On the dimension of partially ordered sets, Discrete Math., 35 (1981), pp. 135–156.
[22] D. J. Kleitman and K. J. Winston, The asymptotic number of lattices, Ann. Discrete Math.,
6 (1980), pp. 243–249.
[23] G. Markowsky, The Representation of posets and lattices by sets, Algebra Universalis, 11
(1980), pp. 173–192.
[24] R. H. Möhring, Computationally tractable classes of ordered sets, Tech. Report 87468-OR,
Bonn University, Germany, 1987.
[25] F. P. Preparata and M. I. Shamos, Computational Geometry, Springer-Verlag, Berlin, New
York, 1985.
[26] F. P. Preparata and R. Tamassia, Fully dynamic point location in a monotone subdivision,
SIAM J. Comput., 18 (1989), pp. 811–830.
[27] I. Rival, Graphical data structures for ordered sets, in Algorithms and Order, Kluwer Academic
Publishers, Dordrecht, The Netherlands, 1989, pp. 3–31.
[28] M. Talamo and P. Vocca, Fast lattice browsing on sparse representation, in Orders, Algo-
rithms, and Applications, International Workshop ORDAL ’94, Lecture Notes in Comput.
Sci. 831, V. Bouchitté and M. Morvan, eds., Springer-Verlag, Berlin, 1994, pp. 186–204.
[29] M. Talamo and P. Vocca, A data structure for lattices representation, Theoret. Comput.
Sci., 175 (1997), pp. 373–392.
[30] R. Tamassia and J. G. Tollis, Reachability in planar digraphs with one source and one sink,
Theoret. Comput. Sci., 119 (1993), pp. 331–343.
[31] M. Tompa, Two familiar transitive closure algorithms which admit no polynomial time, sub-
linear space implementations, SIAM J. Comput., 11 (1982), pp. 130–137.
[32] J. W. T. Trotter and J. J. I. Moore, The dimension of planar posets, J. Combin. Theory,
22 (1977), pp. 54–57.
[33] M. Yannakakis, Graph-theoretic methods in database theory, PODS ’90, Proceedings Ninth
ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, ACM, New
York, 1990, pp. 230–242.

Potrebbero piacerti anche