Sei sulla pagina 1di 7

Information and Software Technology 48 (2006) 869875

www.elsevier.com/locate/infsof

Online aggregation with tight error bounds in dynamic environments


Seok-Ju Chun a, Ju-Hong Lee b,*, Seok-Lyong Lee c
a
Department of Computer Education, Seoul National University of Education, Seoul, South Korea
b
School of Computer Science and Engineering, Inha University, 253, Yonghyun-dong, Nam-Ku, Incheon 402-751, South Korea
c
School of Industrial and Information Engineering, Hankuk University of Foreign Studies, Seoul, South Korea

Received 12 March 2005; received in revised form 4 November 2005; accepted 4 December 2005
Available online 3 March 2006

Abstract
OLAP is a category of database technology that allows analysts to gain insight into the aggregation of data by enabling them to gain access to a
variety of different views of the information contained in a database. It is very important to provide analysts with guaranteed error bounds for
approximate results to aggregation queries in enterprise applications such as decision support systems. We propose a general method of providing
tight error bounds for approximate results to OLAP range-sum queries. We perform an extensive experiment on diverse data sets and examine the
effectiveness of the proposed method for various data cube dimensions and query sizes.
q 2006 Elsevier B.V. All rights reserved.

Keywords: Information system; Database; OLAP; Online aggregation; Decision support; Approximate query answering

1. Introduction expensive in terms of query cost. Thus, it is important to


provide a means of obtaining a quick approximate result rather
On-line analytical processing (OLAP) supports the inter- than an accurate one, in order for the decision-making process
active analysis of large data sets, e.g. information stored in data to furnish answers in a timely manner.
warehouses [10]. To accomplish this, it is often necessary to Various approximation methods for processing OLAP
summarize data at various levels of detail and for various range-sum queries have been proposed [1,79,12,13]. Among
combinations of attributes. Typical OLAP applications include them, the pCube [9] and the MRA-tree [7] are solutions that
the assessment of product performance and profitability, the provide for the progressive refinement of an approximate
analysis of the effectiveness of a sales program or a marketing answer with absolute error bounds. The pCube is organized as a
campaign, sales forecasting and capacity planning. Naturally, quadtree-like structure based on spatial decomposition, in
the response time is crucial for OLAP applications that require order to avoid the overlap between the node-regions of sibling
user-interactions. The data cube provides a useful analysis tool nodes. It provides early feedback information with absolute
on data called the OLAP range-sum queries that applies an error bounds for queries on data cubes. However, this approach
aggregate operation to the measure attribute within the range of provides loose error bounds for approximate results to OLAP
the query [5]. A typical example includes find the total amount range-sum queries, because it uses only a trivial bound
of sales in Seoul for customers aged from 30 to 39 with auto technique. For example, the upper bound is the maximum
insurances, in years 20012004. Queries of this form are very aggregate value stored in a node that intersects with a query-
region and the value 0 is the trivial lower bound [9].
popular and important in OLAP. In many enterprise
The MRA-tree also uses a tree structure to obtain a quick
applications such as decision support systems, there are a
response, while relaxing the requirement for an exact answer to
number of trial-and-error steps involved in getting the right
be returned. Several multi-dimensional index structures for
answer. This forces OLAP range-sum queries to be executed an
data cubes have been developed. A space-partitioning method
excessively large number of times, causing them to be
like quadtree [11] divides the data space along predefined lines
regardless of data distribution. A data-partitioning method
* Corresponding author. Tel.: C82 32 860 7453; fax: 82 32 876 8052. like R-tree [4] divides the data space according to the
E-mail addresses: chunsj@snue.ac.kr distribution of data nodes inserted into the tree [14]. The
(S.-J. Chun), juhong@inha.ac.kr (J.-H. Lee), sllee@hufs.ac.kr (S.-L. Lee). MRA-tree is a modified multi-dimensional index structure and
0950-5849/$ - see front matter q 2006 Elsevier B.V. All rights reserved. is implemented as either a quadtree or an R-tree. This approach
doi:10.1016/j.infsof.2005.12.005 provides the answer to an aggregation query with 100%
870 S.-J. Chun et al. / Information and Software Technology 48 (2006) 869875

guaranteed error bounds and improves the quality of the answer of sumQ,X is defined as follows
answer, until some constraint (time and/or error bound) is X
reached. The MRA-tree produces a better answer to the query sumQ;X Z sumc;X C PN !sumN;X
in a shorter period of time than the pCube. However, the N2N p

bounding technique used in this approach is the same as that where sumc,X is the sum of the aggregate values of the nodes in
used in the pCube. Furthermore, in applications where frequent Nc and PN is the percentage of overlap of node N with query Q.
updates are commonplace, this approach incurs a high update Since all of the nodes in Nc are contained in the query, sumc,X has
cost. Recently, we proposed the D-tree [3] to manage updates an exact value. Therefore, it is clear that the lower and upper
efficiently in the dynamic OLAP environment. The basic bounds of sumQ;X are sumc,X and sumc,XCsump,X, respectively.
precept of this approach is that changes in the data cube are However, since the MRA-tree provides the user with only an
stored in the D-tree and managed separately from the data cube. approximate answer with loose error bounds, we cannot expect
This drastically reduces the update cost at run-time. In him or her to use it as the basis for critical decisions. Moreover,
addition, by taking advantage of the hierarchical structure of the MRA-tree has a prohibitive update cost in applications in
the D-tree, we proposed a hybrid method of providing either an which frequent updates are commonplace and run concurrently
approximate result or a precise one, in such a way as to reduce with queries [7]. The value update in the MRA-tree consists of
the overall cost of a query. However, we did not discuss the changing the value of data point PZhloc, vali to P 0 Zhloc, val 0 i.
technique that is used to provide error bounds for the The update operation consists of two steps, that is, the deletion
approximate results obtained in the hybrid method. of the previous data point, P, and the insertion of the new data
In this paper, we present a general method of providing tight point P 0 . The problem is that the cost of searching the MRA-tree
error bounds for approximate results to OLAP range-sum in order to delete P is very high.
queries. The proposed algorithm is very effective and directly
applicable to various approximation techniques that use a tree
structure such as the pCube and MRA-tree. We conduct an 2.2. The D-tree
extensive experiment on diverse data sets, and examine the
effectiveness of the proposed method for various data cube In this section, we introduce an approximation technique using
dimensions and query sizes. The experimental results show that the D-tree. The D-tree is a modified version of the R*-tree [2],
the proposed method provides tighter error bounds than the which is designed to store the updated values of a data cube and to
MRA-tree. support efficient query processing. The process of constructing
the D-tree is the same as that of the R*-tree. Initially, the D-tree
has only a directory node (called the root node). Whenever the
2. Tree-based index structure data cube cell is updated, the difference (D) between the new and
old values of the data cube cell and its spatial position are stored
The aggregation functions that are used for range query are into the D-tree. We define the D-tree formally as follows:
SUM, AVERAGE, COUNT, MIN, MAX, etc. Among these
Definition 2.1. (the D-tree)
functions, the SUM function is the most popular and important.
So, in this paper, we concentrated on the range-sum queries. 1. Each directory node contains (L1, L2,., Ln), where Li is the
tuple
P pertaining to thePith child node, Ci, and P has the form
2.1. The MRA-tree ( D, M, cpi, MBRi). D is the sum of the D values of the
child nodes (D values) of Ci, where Ci is a directory node (data
The multi-resolution aggregate (MRA) tree is a modified node). cpi is the address of Ci and MBRi is the MBR enclosing
multi-dimensional index structure which stores data points of all entries in Ci. M has the form (m1, m2,., md) where d is the
the form hloc, valuesi, where loc2Rspace and values2Dv. Non- dimension and mj is the weighted mean position of the jth
leaf nodes contain entries of the form hptr, region, aggregatesi dimension of MBRi which is defined as follows:
for each of their child nodes, where ptr is a pointer to a child Let f(k1, k2,., kd) be the value of an updated position (k1,
node, region is the space covered by that node and aggregates is k2,., kd) in MBRi with 1%kj%nj, where nj is the number of
a tuple hSUMi of aggregate information over all data points cells in the jth dimension of MBRi. For 1%m%nj, let
covered by that node. A leaf node contains the data points
covered by the region of the parent node. The authors of this X
n1 X
m X
nd
Gj m Z . . f k1 ;.;kj ;.;kd :
approach provide a progressive algorithm for approximate k1Z1 kjZ1 kd Z1
aggregate queries. The algorithm maintains two sets of tree
nodes Nc and Np, where Nc is a set of nodes completely Let a be the largest lower bound such that Gj(a)%Gj(nj)/2, and
contained in the query and Np is a set of nodes that either b be the smallest upper bound such that Gj(b)RGj(nj)/2. Then
enclose or partially overlap those in the query. It is assumed mj Z aC b=2. That is, mj divides the hyperspace MBRi such
that we find sumQ,X for the query region RQ, that is, the sum of that each subspace has about a half of the sum of updated
the values of the attributes, X, for all points contained in RQ. values in MBRi.
sumQ;X is the estimate for query Q of aggregate type SUM over 2. Each data node is situated at the level 0 and it contains (D1,
attribute X (value stored at each data point). The approximate D2,., Dn), where Di is the tuple pertaining to the ith data entry
S.-J. Chun et al. / Information and Software Technology 48 (2006) 869875 871

Fig. 1. Data cube cells changed from a two-dimensional data cube and the D-tree corresponding to the changed cells.

P P
and has the form (Pi, Di). Pi is the position index and Di is the Let ( D)P i (iZ1,., m) be the D valuePof the ith inclusive
difference of the changed cell. MBR, and ( D)j (jZmC1,., n) be the D value of the jth
intersecting MBR. The answer to the range-sum query at the level
Example 2.2. As shown in Fig. 1a, we assume that 12 data cube
k of the D-tree can be approximated by the following equation:
cells have been changed in the data cube. The value in the data
cube cells is the difference (D) between the new and old values sumQ Z sumQ Q
PC C sumD
of the changed data cube cell. The D value and its spatial
position are stored into the D-tree as shown in Fig. 1b. m X 
X
Z sumQ
PC C D
i
Ho et al. [6] presented an elegant algorithm for computing iZ1
range-sum queries in data cubes, which we call the prefix sum   X 
Xn
VolMBRj h MBRQ
approach. Their approach uses an additional cube called a C ! D
prefix sum cube (PC), to store the cumulative sum of the data. jZmC1
VolMBRj j

The essential idea behind this approach is to pre-compute P P


prefix sums of the data cube, and to use these pre-computed Let DC and DK be the sum of the positive D values and
values to answer ad hoc queries at run-time. This technique D valuesPof the node
the sum of the negative P P MBR of the D-tree,
respectively. Note that DZ DC CP DK. WePassume that
turned out to be very powerful. Range-sum queries were
each MBR ofP the D-tree contains both DC and DK values
processed in constant time, regardless of the size of the data
instead of the D value in order to compute the error bounds for
cube. The PC has the same size as the data cube and is used to
the approximate answer. The upper and lower bounds for the
store various pre-computed prefix sums of the data cube. Each
approximate answer, sumQ, at level k of the D-tree may be
cell of the PC contains the sum of all cells of the data cube up to
formally defined as:
and including itself. We use both the PC and the D-tree in order
X m X  X n X 
to answer the range-sum query, with the PC containing the
LBkQ Z sumQ PC C D C DK
information which was most recently bulk updated and the i j
iZ1 jZmC1
D-tree containing all information updated after this bulk update
operation. Those updated cells that are spatially close to each m X 
X X
n X 
other are clustered into a corresponding MBR. UBkQ Z sumQ
PC C D C DC
i j
When a range-sum query, Q, where Q is (l1:h1, l2:h2,., ld:hd), iZ1 jZmC1
is formulated, we use the PC and the D-tree to obtain the answer. Given that searching is performed from the root to the leaf
Let sumQ be a function that returns the answer to Q, sumQ PC be a node, the query results along with their error bounds are
function that returns the answer, which is calculated from the PC, progressively refined and are ultimately returned to the user.
and sumQ D be a function that returns the answer found from the Thus, the user can stop the processing of the query when the
D-tree. Then, the answer will be: sumQ Z sumQ Q
PC C sumD . When stopping criteria (time and/or error bounds) are satisfied.
processing a range-sum query, we can also obtain an approximate
answer by searching the D-tree partially. That is, searching is Definition 2.3. (Error bound ratio)
performed from the root to an internal node of level k, instead of a Let UBl and LBl be the upper and lower bounds for an
leaf node. There exist several MBRs, which participate in finding approximate answer to the range-sum query at level k of the
the answer to a range-sum query. We can classify these MBRs D-tree, respectively. The error bound ratio (EBR) for an
into two groups as follows: approximate answer at level l of the D-tree is defined as:

UBlQ KLBlQ
1. Inclusive MBRs. MBRi (iZ1,., m), where m is the number EBRl Z ; lR 0
Max1;jsumQ j
of MBRs included in the query MBR, denoted by MBRQ.
2. Intersecting MBRs. MBRj (jZmC1,., n), where nKm is the Note that the level of data nodes (i.e. leaf nodes) is 0. Thus,
number of MBRs intersecting with MBRQ. EBR (0)Z0. The result of a range-sum query is large in almost
872 S.-J. Chun et al. / Information and Software Technology 48 (2006) 869875

all cases. We set sumQ below 1 as l to avoid the effect of the very
unusual sumQ close to 0.
Lemma 2.4. (Progressively refined error bounds)
The error bounds are progressively refined as the level of the
tree decreases.
Proof. Let l and m be the levels of the tree such that l%m.
Since UB(l)QKUB(m)Q%0, and LB(m)QKLB(l)Q%0, then

EBRlKEBRm

UBlQ KUBmQ C LBmQ KLBlQ


Z % 0:
Max1;jSumQ j

3. Tight bound technique


Fig. 3. An example of the relationship between a node MBR and query MBR in
In this section, we introduce an algorithm that obtains tight the D-tree.
error bounds using the spatial relationship between the query
MBR and the node MBR. For the sake of simplicity, we assume
that the node MBR contains only non-negative values. P P
Case 2. UBQ
MBRT lZ D and LBQ
MBRT lZ D=2, if dj
Definition 3.1. (Half spaces) Let DZ{1, 2,., d} denote a set of such that Vol(MBRThMBR PQ)Ihsj,
dimensions. Let mi be the weighted mean position of the ith Case 3. UBQMBRT lZ LBQ
MBRT lZ D=2, if dj such that
dimension of the node MBR, where i2D. The MBR is divided Vol(MBRThMBRQ)Zhsj,
into two hyperspaces by the weighted mean position of each P
Case 4. UBQMBRT lZ D and LBQMBRT lZ 0, Otherwise.
dimension. We call these two hyperspaces the half spaces of the
ith dimension, denoted by hs2iK1 and hs2i, respectively. Example 3.4. As shown in Fig. 3, let us consider the relationship
Example 3.2. As shown in Fig. 2, we consider the half spaces of between a node MBR (MBRT) and query MBR (MBRQ) in the
a node MBR in a two-dimensional case. We can find four half D-tree. The shaded area indicates the intersecting region
spaces in the MBR, since it has two weighted mean positions, m1 between MBRT and MBRQ. Fig. 3ad refer to cases 14 of
and m2. That is, hs1 and hs2 are the half spaces divided by m1, and Lemma 3.3, respectively.
hs3 and hs4 are the half spaces divided by m2. We need to deal with negative values in order to
Let HSZ{hs1, hs2,., hs2d} denote the set of half spaces generalize the error bound technique. Thus, we assume that
found in a d-dimensional node MBR. Let MBRT be the MBR of each node MBR contains negative values as well as non-
a node T that intersects with MBRQ. negative ones.
P
Lemma 3.3. Let D be the sum of the D values of MBRT. For Definition 3.5. (Positive and negative half spaces) Let mC be
i
all hsj2HS, the upper and lower bounds of a part of an the weighted mean position of the ith dimension for all positive
approximate answer to the query in the MBRT at level l of the
values of a node MBR and let mK i

be the weighted mean
D-tree are as follows:
position of the ith dimension for all negative values, where i2D.
P The MBR is divided into two hyperspaces by mC for each
Case 1. UBQ MBRT lZ D=2 and LBQ MBRT lZ 0, if dj such
i
dimension. We call these two hyperspaces the positive half
that Vol(MBRThMBRQ)3hsj,
spaces of the ith dimension, denoted by phs2iK1 and phs2i,
respectively. The MBR is also divided into two hyperspaces by
mK
i

for each dimension. We call these two hyperspaces the
negative half spaces of the ith dimension, denoted by nhs2iK1
and nhs2i, respectively.
Let PHSZ{phs1, phs2,., phs2d} and NHSZ{nhs1, nhs2,.,
nhs2d} denote the set of positive half spaces and the set of
negative half spaces which are found in a d-dimensional node
MBR, respectively. Let MBRT be the MBR of a node T that
intersects with MBRQ.
Lemma 3.6. For all phsj2PHS and nhsk2NHS, the upper and
lower bound of a part of an approximate answer to the query in
the MBRT at the level
P C l of the D-tree are asP
follows: Initially, we
Fig. 2. Example of half spaces in a two-dimensional case. set UBQMBRT lZ D and LB Q
MBRT lZ DK.
S.-J. Chun et al. / Information and Software Technology 48 (2006) 869875 873

Fig. 4. An example of tight bound technique in a two-dimension data cube with quadtree-based index structure (pCube).

P C P K
Case 1. UBQ MBRT lZ D =2 and LBQ MBRT lZ D =2, if Naive bound technique:
dj such that Vol(MBRThMBRQ)3phsjodk such Approximation: 8C14/2C6/2C12/4Z21
that Vol(MBRP ThMBRQP )3nhsk Upper bound: 8C14C6C12Z40
Case 2. UB Q
P MBR T
lZ DC
=2C DK=2 and LBQ MBRT lZ Lower bound: 8C0C0C0Z8
D , if dj such that Vol(MBRThMBRQ)3phsj-
K
Tight bound technique:
odk such that ThMBRQ)InhskP Approximation: 8C14/2C6/2C12/4Z21
Q PVol(MBR
Case 3. UB
P MBR T
lZ DC
and LBQ MBRT lZ DC=2C Upper bound: 8C14/2C6/2C12/2Z27
D =2, if dj such that Vol(MBRThMBRQ)Iphsj-
C Lower bound: 8C14/2C6C12/2Z21
odk such that PVol(MBR PThMBR Q)3nhsk
Q
lZ =2 LBQ In the example, we can find that the EBRs of both the naive
Case 4. UB D D and MBRT lZ
C K
P MBR T P
C
D C D =2, if dj such that Vol(MBRT-
K C and tight bound techniques at tree level 1 are 1.46 and 0.27,
hMBRQ)Iphsjodk such that Vol(MBRTh respectively.
MBRQ)Inhsk
Tight-bound technique can be applied to various kinds of 4. Experiments
tree-based index structure such as MRA-tree, D-tree, quadtree
(pCube). Therefore, it is enough to show an example using 4.1. Test data sets
quadtree-based index structure (pCube) in Fig. 4.
As shown in Fig. 4, when a range-sum query (shaded box) is In order to evaluate the effectiveness of the proposed
given, we can obtain the approximation value and error bounds method, we conducted an extensive experiment on diverse
for the naive bound technique and the tight bound technique at data sets that are generated synthetically with various
the tree level 1 as follows: dimensions. Our experiment focuses on showing the

Table 1
Test data sets used in the experiment

Method d Cube volume (V) Number of cells stored Data distributions Query size (query
in the tree volume (V))
MRA-tree 3 512!512!512 400,000 Uniform/zipf Large (0.1), medium
4 128!128!128!128 400,000 (0.05), small (0.01)
5 64!64!64!64!64 400,000
D-tree 3 512!512!512 40,000
4 128!128!128!128 40,000
5 64!64!64!64!64 40,000
874 S.-J. Chun et al. / Information and Software Technology 48 (2006) 869875

Fig. 5. The error bound ratio for a uniform distribution in the D-tree when the
dimensionality is three and the query sizes are small, medium and large. Fig. 7. The error bound ratio in the MRA-tree and the D-tree for a zipf
distribution when the dimensionZ3 and the query sizeZlarge.

effectiveness of the tight bound technique. All of the


experiments were conducted on a Sun Ultra II workstation and the tight bound technique in both the MRA-tree and the
with 256 MB of main memory and a 10 GB hard disk. Each D-tree when the query size is large and the data distributions are
experimental result was obtained by issuing 30 range-sum uniform. Although the MRA-tree in [6] only provided the naive
queries and averaging their results. Table 1 summarizes the bound, we applied our tight bound technique to the MRA-tree.
test data sets used in the experiment. The size of the D-tree As shown in Fig. 6, the error bounds are progressively refined as
is much smaller than that of the MRA-tree, since the D-tree the tree level becomes lower.
stores only changed cells. We consider that the size of the Fig. 7 shows a comparison of the EBRs for the naive bound
D-tree is generally around 10% of that of the MRA-tree. technique and the tight bound technique in both the MRA-tree
and the D-tree when the query size is large and the data
distributions are zipf. The results show that the EBR value of the
4.2. Result zipf-distributed data is almost twice that of the uniform
distributed data at the same level.
Fig. 5 shows the error bound ratio (EBR) in the D-tree when Fig. 8 shows experimental results in the MRA-tree with the
the dimensionalities are three. The data distributions are uniform zipf-distributed data when the dimensionality is varied from 3 to
and the query sizes are small, medium and large. The internal 5. The experimental results show that the effectiveness of the
sub-cubes do not overlap with the query while the external sub- tight bound technique is more noticeable when the dimension-
cubes overlap with the query. The larger the query size is, the ality is high. Fig. 9 shows the experimental results obtained with
more internal sub-cubes are included in the query. The partial
results obtained from the internal sub-cubes have exact (not
approximate) answers. So, their EBR are zero. The partial results
from the external sub-cubes have approximate answers. There-
fore, the experimental results indicate that the large query has
the best EBR in both the naive and tight bound techniques.
Therefore, the subsequent experiments were conducted only on
the large query. The X-axis represents the level of the D-tree up
to which the query is performed. The level of the data nodes (i.e.
leaf nodes) is 0, that of their parents is 1, and so on. Tree level 0
indicates that the query is evaluated up to the data node, and tree
level 3 indicates that the query is evaluated up to level 3. Fig. 6 Fig. 8. The error bound ratio for a zipf distribution in the MRA-tree when the
shows a comparison of the EBRs for the naive bound technique query size is large and the dimensionalities are 35.

Fig. 6. The error bound ratio in the MRA-tree and the D-tree for a uniform Fig. 9. The error bound ratio for a zipf distribution in the D-tree when the query
distribution when the dimensionZ3 and the query sizeZlarge. size is large and the dimensionalities are 35.
S.-J. Chun et al. / Information and Software Technology 48 (2006) 869875 875

the D-tree. The result shows a considerable improvement in the References


error bounds. There may be several sub-cubes overlapping with
the query MBR at the internal nodes when processing a range- [1] S. Acharya, P. Gibbon, V. Poosala, S. Ramaswamy, Join Synopses for
Approximate Query Answering, Proceedings of ACM SIGMOD Con-
sum query. The number of sub-cubes overlapping with the query
ference (1999) 275286.
increases as the dimensionality increases. When using a naive [2] N. Beckmann, H. Kriegel, R. Schneider, B. Seeger, The R*-tree: an efficient
bound technique in a data cube with a high dimensions, P and robust access method for points and rectangles, Proceedings of ACM
the upper and lower bound of the overlapped sub-cubes are D SIGMOD Conference (1990) 322331.
and 0, respectively. On the other hand, when using a tight bound [3] Seok-Ju Chun, Chin-Wan Chung, Ju-Hong Lee, Seok-Lyong Lee, dynamic
update cube for range-sum queries, Proceedings of VLDB Conference
technique in a data cube with a high dimensions, Pthe upper and (2001) 521530.
lower bound of most overlapped sub-cubes are D/2 and 0, [4] A. Guttman, R-trees: a dynamic index structure for spatial searching,
respectively. It means that the tight bound technique is very Proceedings of ACM SIGMOD Conference (1984) 4757.
effective in high dimensions. [5] S. Geffner, D. Agrawal, A. El Abbadi, T. Smith, Relative prefix sums: an
efficient approach for quering dynamic OLAP data cubes, Proceedings of
ICDE conference (1999) 328335.
[6] C. Ho, R. Agrawal, N. Megido, R. Srikant, Range queries in OLAP data
5. Conclusions cubes, Proceedings of ACM SIGMOD Conference (1997) 7388.
[7] L. Lazaridis, S. Mehrotra, Progressive approximate aggregate queries with
In this paper, we proposed a general algorithm designed to a multi-resolution tree structure, Proceedings of ACM SIGMOD
Conference (2001) 401412.
provide tight error bounds for approximate results to OLAP [8] V. Poosala, V. Ganti, Fast approximate answers to aggregate queries on a
range-sum queries. We showed that the tight bound technique data cube, Proceedings of SSDBM Conference (1999) 2433.
presented in this paper could be applied to other methods [9] M. Riedewald, D. Agrawal, A.E. Abbadi, pCube: update-efficient online
such as the pCube and the MRA-tree. To our knowledge, this aggregation with progressive feedback and error bounds, Proceedings of
SSDBM Conference (2000) 95108.
is the first approach to provide tight error bounds for
[10] M. Riedewald, D. Agrawal, A.E. Abbadi, Flexible data cubes for online
approximate results to OLAP range-sum queries. An aggregation, Proceedings of ICDT Conference (2001) 159173.
extensive experiment showed that the tight bound technique [11] H. Samet, The quadtree and related hierarchical data structures, Computing
was very general and effective on various dimensions of data Surveys 16 (2) (1984) 187260.
cubes and query sizes. [12] J. Shanmugasundaram, U. Fayyad, P. Bradley, Compressed data cubes for
OLAP aggregate query approximation on continuous dimensions,
Proceedings of KDD Conference (1999) 223232.
Acknowledgement [13] J.S. Vitter, M. Wang, Approximate computation of multidimensional
aggregates of sparse data using wavelets, Proceedings of ACM SIGMOD
Conference (1999) 193204.
This work was supported by the Korea Research Foundation [14] R. Weber, H. Schek, S. Blott, A quantitative and performance study for
Grant funded by the Korean Government (MOEHRD) (KRF- similarity-search methods in high-dimensional spaces. Proceedings of
2005-041-D00657). VLDB Conference (1998) 194205.