Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Alexandria University
ORIGINAL ARTICLE
a
Dept. of Computer and Systems Engineering, Faculty of Engineering, Alex. University, Egypt
b
Computer Science Faculty, Purdue University, USA
KEYWORDS Abstract This paper presents a distributed algorithm for processing continuous spatiotemporal
Spatiotemporal databases; queries. Distributed query processing that involve multiple servers is inspired by (1) the need for
Distributed algorithms; scalability in terms of supporting a large number of moving objects and a large number of queries,
Continuous queries and (2) the need for providing real-time answers to users’ queries. To rapidly answer queries, incom-
ing data streams are processed in-memory. The load is distributed among a set of regional servers
that collaborate to continuously answer the spatiotemporal queries. The algorithm assumes that
object movement is constrained by a road network and that the continuous queries are stationary,
i.e., the queries’ focal points do not move. Continuous queries are evaluated incrementally by mon-
itoring network positions that are part of existing queries’ search regions. The performance of the
proposed algorithm is measured to analyze their behavior under various work loads and distributed
system settings.
ª 2012 Faculty of Engineering, Alexandria University. Production and hosting by Elsevier B.V.
All rights reserved.
The underlying server architecture is assumed to be a collec- from the issuer to the network nodes encountered during the
tion of distributed servers that collaborate to present the query NN search in the form of an expansion tree. Updates from ob-
answers. Since data storage and retrieval into and from disk jects and edges’ weights falling in the expansion tree can alter
take considerable time and significantly degrade the perfor- the NN set of some queries. The regions of the affected queries
mance, in this paper, we assume that all spatiotemporal data are adjusted. In the second Group Monitoring Algorithm
is streamed and processed in main-memory. (GMA), the k-NN’s of the intersections are monitored using
The evaluation of continuous queries mandates that the ser- IMA and are used to compute the results of all the queries
ver store these queries to re-evaluate or incrementally evaluate in the path. IMA can be easily extended to support range que-
them whenever necessary. In query re-evaluation, the continu- ries. However, GMA focuses on storing pre-computed query
ous query is evaluated from scratch with a certain frequency; results in network nodes to be used in solving subsequent k-
answers are reported with each re-evaluation. In incremental NN queries and thus cannot be easily extended to solve other
evaluation, the query is evaluated once and the answer is types of queries.
stored at the user’s application; only the updates in the inputs MOVNet [9] supports location-based snapshot queries on
are processed and the query is not re-evaluated [1]. Updates to MOVing objects in road Networks. C-MNDR (Continuous
the query answer are then sent to the issuer. In this paper, we Mobile Network-Distance-based Range query algorithm) [10]
follow the incremental evaluation paradigm. is a continuous range query algorithm on top of MOVNet.
Since a large number of clients may be connected, a single Data and index structures are proposed to store the road net-
server may not sustain the streams generated from a large num- work connectivity, the moving objects, and the queries. It is as-
ber of objects. To achieve scalability in terms of the number of sumed that the underlying road network is too dense to be
objects, a network of servers may be used to process the streams. stored in-memory; therefore, the stationary network data is
In this case, servers should collaborate to answer the queries. stored in an on-disk R\-tree [11]. Space is partitioned into a reg-
More specifically, in this paper we present a distributed ular grid of l · l cells. For initial result gathering, edges of the
version of the IMA [2] algorithm where the effect of part of the network that lie in the cell of the issuer are retrieved
pre-computing the shortest paths on the execution time of the from disk, and the corresponding graph is constructed in mem-
algorithm is studied. The algorithm also modifies over the ory. The query is evaluated on that part of the graph and con-
QTP (Query-Track-Participate) distributed query evaluation necting vertices (vertices that have neighboring edges cross the
model in PLACE\, a distributed spatiotemporal data server boundaries of the cell) are recorded to be further used in
[3]. While PLACE\ assumes that objects move freely in space, expanding search to neighboring cells. An SD-Tree (Shortest-
the proposed distributed algorithm abides by road network Distance-based Tree) is built and stored in-memory to be used
constraints. Query issuers are assumed to be static as long as for incremental evaluation of the query when the query focal
they have queries that are continuously evaluated. updates its position. The SD-tree is similar to the expansion tree
The rest of this paper is organized as follows. Section 2 used for incremental evaluation of k-NN queries in IMA. Algo-
highlights the related work. Section 3 presents an overview rithms for maintaining both trees are nearly the same.
of the environment and the data structures. The details of All the algorithms above are single-server based. In con-
the proposed algorithm are presented in Sections 4 and 5. A trast, the algorithm presented in this paper is distributed in
performance study is presented in Section 6. Section 7 con- nature. Several systems and algorithms, e.g., MQM [12] and
cludes the paper. Mobieyes [13] use a limited notion of distributed processing
where they utilize the computing capability of moving object
clients to reduce the load from the server. PLACE\ [3] is a dis-
2. Related work tributed spatiotemporal server that uses a network of regional
servers to answer user queries over moving objects that hop
Research has been conducted for processing continuous range from one server to the next according to the coverage region
and k-NN queries on a single-server. For example, CPM [4], of each server. In contrast to the algorithm proposed in this
YPK-CNN [5], and SEA-CNN [6] monitor exact Continuous paper, PLACE\ does not support road network constraints
k-NN’s (CNNs) in the Euclidean space. In addition to moni- on the motion of objects.
toring CNNs, CPM supports aggregate nearest neighbors This paper is an extension to both IMA and PLACE\.
w.r.t. a set of query points. SEA-CNN monitors the NN PLACE\ does not support road networks. Instead of comput-
changes assuming that the initial result is available and shares ing and storing the distance from the issuer to each node in the
the execution and data structures among multiple concurrent search tree as in IMA, the alternative used in this paper is to
k-NN queries. DISC [7] approximates k-NN queries where store the all-pairs shortest path matrix and the leaf nodes of
the returned kth neighbor lies at most e distance units farther the search tree.
from the focal than the actual kth NN. SINA [8] performs con-
tinuous evaluation of range queries using a three-step spatial
join between moving objects and moving ranges. 3. Data and query representation
All the above algorithms assume that the objects move in
Euclidean space; the constrained motion of objects is not taken The road network is represented as a weighted graph. Roads
into consideration. In [2], two algorithms for processing con- are assumed to be bi-directional. Edges in the graph represent
tinuous k-NN queries on road networks are presented. In the roads, nodes represent road junctions (the intersection points
first Incremental Monitoring Algorithm (IMA) [4], the initial between roads), and the weight of an edge is the length of
result of a query is computed by expanding the network the corresponding road.
around the issuer until k-NN’s are found. To facilitate the pro- Each moving object has a unique identifier. Moving objects
cessing of subsequent updates, IMA stores the shortest paths send their current position periodically to the server in the
Distributed processing of continuous spatiotemporal queries over road networks 87
form (OID, x, y). Since objects in the network are restricted to location. Search starts from the end nodes of O’s road and ex-
move on roads, the system assigns the moving object to a road pands to their neighboring nodes; those nodes are added to a
and calculates the cost of the path between the object’s loca- queue. Nodes are removed from the queue one at a time; ob-
tion and the start and end junctions of the road. Objects’ states jects on the roads connected to these nodes are added to the
(current road ID and path cost to road ends) are kept in-mem- query answer and a reference to the query is added to investi-
ory at the server that the object is connected to. gated roads. Nodes neighboring to visited nodes are then
Query issuers can submit range and k-NN queries to the added to the queue. Search stops when reaching nodes at a dis-
server. A range query returns the objects distanced less than tance greater than the query range from the issuer (focal) or
the query range from the query’s focal point. To issue a range when a loop in the graph is detected (when reaching a road
query, the object sends a request to one of the regional servers that was visited before). Nodes that the search stopped at
in the form (OID, ‘R’, R), where OID is the ID of the query (leaves of the search tree) are stored to help incrementally eval-
issuer, ‘R’ is used to indicate a range query, and R is the re- uate the query answer.
quired range. Algorithm 1 gives the complete listing for the range query
A k-NN query returns the k objects that are closest to the evaluation algorithm. The function sendPositiveUpdates (Lines
query’s focal point. To issue a k-NN query, the object sends 2 and 12) takes, as input, reference to a range query and a set
a request to one of the regional servers in the form (OID, of objects, and then sends positive update tuples to the query
‘K’, k), where OID is the ID of the query issuer, ‘K’ is used issuer for the objects that are within the query range. In Lines
to identify a k-NN query, and k is the number of nearest neigh- 7 and 16, a node, say n, is added as a leaf to the query subtree if
bors requested. To help rapidly evaluate queries, the costs of n is located at a distance greater than the query range or if n
the shortest paths between all pairs of nodes in the network has a fanout of 1, i.e., n designates a dead-end road.
are pre-computed and stored in-memory.
Algorithm 1. evaluateRangeQuery(RangeQuery query).
Queries are continuous, i.e., they are stored in the system.
Changes to the query answer as a result of the objects’ move- 1: initialize an empty queue, queue
ments are sent progressively to the query issuer. Updates to the 2: sendPositiveUpdates (query, query.focal.road.objects)
answer of the query are sent to the query issuer in the form of 3: add query.focal.road.ends to queue
4: add query to query.focal.road.queries
update tuples: (±, QID, OID), to add or remove the object
5: for each, node n in queue do
whose ID is OID from the answer of the query whose ID is
6: x1 ‹ path cost from query.focal to node n
QID. In the following sections, we study how our algorithm 7: if x1 < query.range && n has more than 1 neighboring
behaves in a single-server setting and then show how we gen- node then
eralize it to the multi-server setting. 8: for each edge e(n.k) adjacent to n do
9: if !e.containsQuery(query) then
4. Single-server processing 10: e.addQuery(query)
11: sendPositiveUpdates(query, e.objects)
12: queue.add(k)
In the single-server case, the server handles the following 13: end if
events: (1) the connection and disconnection of objects, (2) 14: end for
the submission of a new query, and (3) the arrival of position 15: else
updates generated by moving objects. The issuers of queries 16: add n to query.leaves
are assumed not to change their location while their issued 17: end if
queries are persistent in the system. 18: end for
(Lines 2 and 10) takes a k-NN query and a set of objects as range. It is also necessary to find objects that lie on those newly
arguments and, for each object in the set, says or, calculates added roads. For this purpose, search is performed in the for-
the distance d from the query focal point to or, adds or to ward direction and stops when nodes at a distance greater than
the query answer if less than k objects were encountered so or equal to the new query range from the query focal point are
far, or if d is less than the cost to the kth neighbor. In the latter reached. The current kth neighbor may be replaced by another
case, the farthest neighbor is removed and is replaced by or. object during this search. If the kth neighbor is replaced, the
search range can change too and the search region is adjusted
Algorithm 2. evaluateKNNQuery(KMN_Qucry query).
accordingly by searching in the backward direction.
1: initialize an empty priority queue PQ A problem occurs in the previous technique if the search re-
2: addObjectsToQueryAnswer(query, query.issuer.road) gion of the query contains a cycle. Some parts of the search
3: add (query to query.focal.road.queries tree may not be reached when updating the search range after
4: add query.focal.road.ends to PQ
the neighbors of a k-NN query have changed. This problem
5: for each node n in PQ do
gets resolved after an object in the query search region updates
6: if n has more than 1 neighboring node then
7: for each edge e(n, k) adjacent to n do its position. Refer to Fig. 1a for illustration. The figure gives
8: if !e.containsQuery(query) then an example of updating the position of an object that affects
9: e.addQuery (query) the range of the 2-NN query issued by f. The rectangular nodes
10: addObjectsToQueryAnswer(query, e.objects) are the leaves of the search tree of the query. Initially, the two
11: x ‹ the cost of the path from query.focal to k nearest neighbors to f are O1 and O2. The costs of the paths
12: if guery.answerSize < k || x < query. from f to O1 and O2 are 10 and 20, respectively. When O1
k-thObjectWeight then changes its position, the server updates the distance from f
13: add k to PQ with weight x to O1 to 30. O1 becomes the kth neighbor to f. The server ex-
14: else
pands the search region of the query from its leaves. Fig. 1b
15: add k to query.leaves
illustrates the direction of the query search region expansion.
16: end if
17: end if The kth neighbor is updated to be O3. Searching backward is
18: end for started from the new query leaves. The direction of search is
19: else given in Fig. 1c and the final search tree is given in Fig. 1d.
20 add n to query.leaves Algorithm 3 gives the complete listing of the algorithm for
21: end if updating the position of a moving object.
22: end for
5.1. New object connection Every moving object sends its position periodically to its Vis-
ited Server. Upon receiving an update from an object O,
The following steps take place when a new object, say O, con- VS(O) performs the following:
nects to the servers.
1. Check if O’s position is in VS(O)’s region. If the O’s new
1. O fi DS: O sends a connection request to DS, to locate position lies outside VS(O)’s region, VS(O) sends the old
HS(O) using O’s location. state of O and queries that O is part of to the new server
2. DS fi O: DS sends the connection details of HS(O) to O. and sends a message to HS(O) containing the ID of the
3. O fi HS(O): O sends a connect message to HS(O). new VS(O). Then, VS(O) performs the following steps.
4. HS(O) attaches the object to a road by checking all roads 2. Locate O’s new road and the cost from O’s position to the
in its range to find the one distanced minimum to the ends of this road.
location of O. HS(O) stores the state of the object. 3. Send +ve updates to the QSs of queries that O entered
5. HS(O) fi O: HS(O) generates O’s unique identifier that is their search range, and send ve updates to the QSs of
prefixed with the ID of HS(O) to easily identify HS(O) queries that O was part of their answer and went out of
using O’s ID. their range as a result of this update.
4. Broadcast O’s location (O’s road and cost from O to road
O sends its updates to HS(O) till O moves to another ser- ends) to querying servers of k-NN queries that O is part
ver’s coverage range. of. The querying server updates the weight of O in the
answers of the queries.
5.2. Query evaluation 5. Send a message to O to confirm the update. If the VS(O)
has changed in Step 1, the confirmation message contains
When an object f issues a new query q to its visited server, the new server’s connection details.
VS(f) or QS(q), the following steps describes how to evaluate
range and k-NN queries.
5.4. Incremental evaluation of queries
1. QS(q) expands the search from f‘s position.
2. QS(q) finds the regional servers, {PS(q)}, whose regions As objects change their position, they can get into or out of the
overlap the query search region. search range of some queries. The regions of k-NN queries are
90 A. Sallam et al.
ject moves out of the search regions of some queries and enters of queries increases, processing time also increases as the que-
the search regions of others increases. For high velocities, the ries that the object is part of increases. Consequently, the
execution time of the multi-server system becomes higher be- data and the state that are sent to the new visited server
cause more objects are likely to change their visited servers, are larger, which requires more time to send and to handle
hence increasing the communication overhead. at each server’s end.
Fig. 5 studies the memory required for the single- and
multi-server cases for different numbers of submitted queries.
7. Conclusion
The figure illustrates that the memory requirements increase
by increasing the number of queries in both systems because
data is stored about the query parameters and answer with In this paper, a multi-server system for processing queries
each submitted query. The memory required in the multi-ser- over spatiotemporal data streams is presented. The experi-
ver system is less than half the memory required at the single- ments show that multi-server setup can reduce the runtime
server system because the complete shortest path matrix is in half. Also, the proposed algorithm requires a very small
stored on the single-server; part of this matrix only is stored portion of the time period to process all updates. Accord-
in each server in the distributed system. As shown in the ingly, the server(s) will be available to process query requests
same figure, the rate of increase in memory requirements in in the rest of the time period. However, the experiments also
the multi-server system is much less than that required by show that the communication and state transmission over-
the single-server. Thus, the multi-server system can sustain heads among the servers can be significant, especially for
road networks and queries of sizes greater than those of large query parameters (e.g., large k or large query range).
the single-server system. In order to gain the benefits of both worlds, we plan to inves-
To illustrate the hand-off overhead in the multi-server case tigate the possibility of reducing the communication overhead
when an object moves from one server to the next, Fig. 6 by using remote Direct-Memory Access protocols in place of
gives the time spent to change the visited server of one object the resource-demanding network protocols. Also, in contrast
for different numbers of submitted queries. When the number to space-partitioning of the regions among the participating
servers, we plan to consider other data partitioning criteria,
e.g., partitioning over the object identifier space, hence possi-
bly reducing or entirely eliminating the need for data and
query migration among servers.
References
[6] X. Xiong, M. Mokbel, W. Aref, SEA-CNN: Scalable processing [10] H. Wang, Member, R. Zimmermann, Processing of continuous
of continuous k-nearest neighbor queries in spatiotemporal location-based range queries on moving objects in road
databases, ICDE (2005). networks, TKDE (2010).
[7] N. Koudas, B. Ooi, K. Tan, R. Zhang, Approximate NN queries [11] N. Beckmann, H. Kriegel, R. Schneider, B. Seeger, The R\-tree:
on streams with guaranteed error/performance bounds, VLDB an efficient and robust access method for points and rectangles,
(2004). ACMSIGMOD (1990).
[8] M. Mokbel, X. Xiong, W. Aref, SINA: scalable incremental [12] Y. Cai, K. Hua, G. Cao, Processing range-monitoring queries
processing of continuous queries in spatiotemporal databases, on heterogeneous mobile objects, MDM (2004).
SIGMOD (2004). [13] B. Gedik, L. Liu, MobiEyes: distributed processing of continuously
[9] H. Wang, R. Zimmermann, Snapshot location-based query moving queries on moving objects in a mobile system, EDBT (2004).
processing on moving objects in road networks, in: Proceedings [14] T. Brinkhoff, A framework for generating network-based
of ACM GIS, 2008. moving objects, GeoInformatica 6 (2) (2002) 153–180.