Sei sulla pagina 1di 38

# Orthogonal Range Searching

Lecture 4, CS 631100

## Fall 2011 National Tsing Hua University (NTHU)

Lecture 4, CS 631100

## Orthogonal Range Searching

Outline

Reference
Textbook chapter 5 Mounts Lectures 17 and 18

Problem: querying a database Solution in one dimension Data structure in IR2 : range trees Extension to higher dimensions log n factor improvement

Lecture 4, CS 631100

## An Example of Application on Database

A database in a bank records transactions A query: nd all the transactions such that
The amount is between \$ 1000 and \$ 2000 It happened between 10:40am and 11:20am

Lecture 4, CS 631100

## An Example of Application on Database

A database in a bank records transactions A query: nd all the transactions such that
The amount is between \$ 1000 and \$ 2000 It happened between 10:40am and 11:20am

Geometric interpretation

Lecture 4, CS 631100

## Orthogonal Range Searching

Query problems

Assume n is the total number of transactions in the database We will show how to build a data structure in O(n log n) time that allows to perform this type of queries in O(k + log n) time where k is the size of the output (the number of transactions that are reported) The data structure is built only once, then a large number of queries can be answered quickly O(n log n) is the preprocessing time O(k + log n) is the query time

Lecture 4, CS 631100

## Orthogonal Range Searching

Boxes

3dbox [0, 3] [0, 2.5] [0, 2] Generalize to any dimension 2dbox Also known as rectangle Parallel to coordinate axis Algorithmic problems with boxes are relatively easy

Lecture 4, CS 631100

## Orthogonal Range Searching

Boxes

3dbox [0, 3] [0, 2.5] [0, 2] Generalize to any dimension 2dbox Also known as rectangle Parallel to coordinate axis Algorithmic problems with boxes are relatively easy

Lecture 4, CS 631100

Boxes

## 2dbox Also known as rectangle Parallel to coordinate axis

3dbox [0, 3] [0, 2.5] [0, 2] Generalize to any dimension Algorithmic problems with boxes are relatively easy

Lecture 4, CS 631100

## Orthogonal Range Searching

Problem statement

Let P be a set of n points in IRd We assume d = O(1) Preprocess P so as to answer queries of the type
Input: (a1 , b1 , a2 , b2 , . . . ad , bd ) Output: P ([a1 , b1 ] [a2 , b2 ] [ad , bd ])

## We denote k = |P ([a1 , b1 ] [a2 , b2 ] [ad , bd ])|

Lecture 4, CS 631100

## One Dimensional Case (d=1): Using BBST

Lecture 4, CS 631100

## One Dimensional Case (d=1)

Problem statement

P is a set of real numbers Queries: nd all the points in P that are between a and b Data structure:
Balanced Binary Search Tree Preprocessing time: (n log n) time to build a BBST Space usage: (n)

## Query time: (k + log n) time. How?

Lecture 4, CS 631100

## One Dimensional Case (d=1)

Algorithm Report (T , a, b) Input: a BBST T storing P , an interval [a, b] Output: P [a, b] 1. if T = N U LL 2. then return 3. x value stored at the root of T 4. if a<x 5. then Report(T .lef t, a, b) 6. if a x b 7. then output x 8. if x<b 9. then Report(T .right, a, b)

Lecture 4, CS 631100

## Analysis of query time

Report left path, right path, vsplit and subtrees in between. Length of
path from root to vsplit left path right path

## Query time: O(k + log n)

Lecture 4, CS 631100

10

## Two Dimensional Case (d=2): Using range tree

Lecture 4, CS 631100

11

## Two Dimensional Case (d=2)

Introduction

A set P of n points in IR2 Query: given (a1 , b1 , a2 , b2 ), nd all points (x, y ) from P in rectangle [a1 , b1 ] [a2 , b2 ]. Results presented in this section
(n log n) preprocessing time (n log n) space usage (k + log2 n) query time

## Query time will be slightly improved in the last section

Lecture 4, CS 631100

12

## Two Dimensional Case (d=2)

Canonical sets
First store T in a BBST using the xcoordinates as keys We associate each node v of T with a canonical set Cv containing points in P stored in the subtree rooted at v .

Lecture 4, CS 631100

13

## Two Dimensional Case (d=2)

Range trees in IR

Each canonical set Cv is stored in a BBST Tv using the y coordinates as keys. Tv is called the canonical tree at node v .

We make the query through TWO steps: 1st on x-coordinates, & 2nd on y -coordinates (as shown in the following slides).
Lecture 4, CS 631100 Orthogonal Range Searching 14

## Step 1: Querying x-coordinates

First make the query with range [a1 , b1 ] on x-coordinates
Let P = P ([a1 , b1 ] (, )) Let P be the set of points on the right path and the left path (when searching for a1 and b1 ) We partition P \ P into c canonical subsets
Thus P = P C1 C2 . . . Cc

Lecture 4, CS 631100

15

## Two Dimensional Case (d=2)

Partitioning P

After we make the query with range [a1 , b1 ] on x-coord.: We take the nodes on the left path and the right path, which gives P . For each node on the left path, select canonical tree Ti of its right child, (gives some Ci ). For each node on the right path, select canonical tree Ti of its left child, (gives some Ci ). It takes O(log n) time (height of the BBST). There are c = O(log n) canonical sets in our partition.

Lecture 4, CS 631100

16

## Step 2: Querying y -coordinates

p P check if p [a1 , b1 ] [a2 , b2 ], and report it if it is.

For all i, use interval [a2 , b2 ] to perform a 1-dim. search query in Ci using canonical tree Ti .

The union of all these results gives P ([a1 , b1 ] [a2 , b2 ]) Analysis of query time:
Let ki = no. of points reported from Ti c i=1 ki k Query time:
c c

O(log n + ki ) = c log n +
i=1 i=1

ki = O(log2 n + k )

Lecture 4, CS 631100

17

## Analysis of total query time

Ci

Ti

canonical tree

Query on x-coordinates on T :

Obtain P (points on left & right paths)& canonical trees Ti . It takes O(log n) time. It takes O(log2 n + k ) (refer to previous slide).

Query on y -coordinates on Ti :

## Total query time = O(log n) + O(log2 n + k ) = O(log2 n + k ).

Lecture 4, CS 631100 Orthogonal Range Searching 18

## Space complexity (Proof 1)

A point p belongs to all the canonical sets in the path from the vertex of T that stores p to the root (and only these canonical sets) Thus p lies in O(log n) canonical sets Hence
v T

## |Cv | = O(n log n),

where Cv = the canonical set at node v . The memory space used is O(n log n). Actually, it is (n log n).
Why?

Lecture 4, CS 631100

19

Ti n
n 2 n 2

n 2( n 2) = n
n 4

n 4

n 4

n 4

4( n 4) = n ...

## n(1) = n Total = (n log n)

Lecture 4, CS 631100 Orthogonal Range Searching 20

## Two Dimensional Case (d=2)

Preprocessing time
Tv can be build in O(|Cv | log |Cv |) time |Cv | log |Cv | log n

Hence the range tree can be built in time |Cv | = log nO(n log n) = O(n log2 n)

## We can do better ...

Compute the Tv s from leaves to root Computing Tv is merging two sorted sequences It takes O(|Cv |) time Overall, we can build the range tree in time |Cv | = (n log n)

Lecture 4, CS 631100

21

## Range trees in higher dimensions

Lecture 4, CS 631100

22

## Range trees in higher dimensions

Idea
We assume d > 1 and d = O(1). We want to perform range searching in IRd . We still build T with respect to the x1 coordinate.

For each canonical set of T we build a (d 1)dimensional range searching data structure using coordinates (x2 , x3 , . . . xd ). To answer a ddimensional query
Find the canonical trees of T associated with [a1 , b1 ] Make a d 1dimensional query on each canonical tree recursively, using [a2 , b2 ] [a3 , b3 ] . . . [ad , bd ]

Lecture 4, CS 631100

23

Analysis

## Query time: O(logd n + k )

Due to d nested levels in d-dim. range tree, Searching for d levels takes O(logd n) time. Reporting all points inside the query range takes O(k ) time.

## Space complexity: O(n logd1 n)

By induction on d (See next slide ...)

## Preprocessing time: O(n logd1 n)

Compute the Tv s from leaves to root As the size of the range tree is O(n logd1 n), building the whole range tree takes O(n logd1 n).

Lecture 4, CS 631100

24

## Space complexity (Proof by Induction)

Suppose (d 1)-dim. range tree has size of O(n logd2 n).
T Ti O(n logd2 n)
d2 n O( n 2 log 2)

O(n logd2 n)
d2 n 2O( n 2 log 2) = O(n logd2 n 2) d2 n 4O( n 4 log 4) = O(n logd2 n 4)

## log n levels ... ... ... ...

d2 n O( n 4 log 4)

...

nO(1) = O(n)

Then size of d-dim. range tree is d2 n O(n logd2 n) + O(n logd2 n 2 ) + O (n log 4 ) + . . . + O (n) d2 d1 = log n O(n log n) = O(n log n).
Lecture 4, CS 631100 Orthogonal Range Searching

25

## Improved range trees: Fractional cascading

Lecture 4, CS 631100

26

## Improved range trees

Motivation

In IR2 the query time of range trees is (k + log2 n) For comparison based algorithms, (k + log n) is a lower bound. Can we do better to achieve the lower bound? Yes, well then show how to obtain (k + log n) optimal query time.

Lecture 4, CS 631100

27

## Step 1: Querying x-coordinates (Same as before:)

Make the query with range [a1 , b1 ] on x-coordinates.

Ci

Cj

Take the nodes on the left path and the right path. Select canonical set Ci at right child of a node on left path; Select canonical set Cj at left child of a node on right path. It takes O(log n) time (height of the BBST T ). Let {C1 , C2 , . . . , Cc } = canonical sets selected, where c = O(log n).
Lecture 4, CS 631100 Orthogonal Range Searching 28

## Step 2: Querying y -coordinates (Modied)

When processing a query (a1 , b1 , a2 , b2 ), we search canonical trees Tv , always with two keys a2 and b2 . For each such tree, we spend O(log n) searching time. Main Idea: As Cv.lef t and Cv.right are subsets of Cv , We keep pointers between nodes of Tv and nodes of Tv.lef t & Tv.right that keep same key, or next larger key.
Av

Av.lef t

Av.right

Thus after performing search on a2 or b2 in Tv , we can perform search on a2 or b2 in Tv.lef t & Tv.right in O(1) time.
Lecture 4, CS 631100 Orthogonal Range Searching 29

## Step 2: Querying y -coordinates (Modied)

Minor Idea: Replacing each canonical tree Ti by a canonical array Ai for canonical set Ci :

Make a search for key a2 in array Ai ; Starting from a2 , walk along array Ai until b2 is exceeded.

Av

Av.lef t

Av.right

Lecture 4, CS 631100

30

## Step 2: Querying y -coordinates (Modied)

First make a binary search for a2 in Aroot , which takes O(log n) time.
Aroot Au Av v u Ci w Cj Aw Aj

Ai

By following pointer links, we can search a2 in a canonical array Ai in O(1) time. Starting from a2 , walk along array Ai (& reporting them) until b2 is exceeded.
Lecture 4, CS 631100 Orthogonal Range Searching

31

## Improving d-dim. range trees

Hence we can answer 2-dim. range query in O(log n + k ) optimal time. This technique is known as fractional cascading. By induction, it also improves by a factor O(log n) the results in d > 2 (by using canonical arrays at the last level, and the linking pointers). Hence range trees with fractional cascading in d 2 yield

Query time: O(k + logd1 n) (improved by a O(log n) factor) Space usage: O(n logd1 n) (same as before) Preprocessing time: O(n logd1 n) (same as before)

Lecture 4, CS 631100

32

## Remarks on 2-dim. improved range trees

O(log n + k ) query time and O(n log n) preprocessing time are optimal. But space complexity is NOT optimal. O(n log n/ log log n) space is possible in 2 dimensions with the same query time, and this is optimal. (not covered in this course)

Lecture 4, CS 631100

33

## Improved range trees

Concluding remarks

Range trees:
simple nearly optimal

## Spatial databases mainly use Rtrees

not covered in this course good in practice with real data-sets but no performance guarantee (no good worst case bound on the query time)

Lecture 4, CS 631100

34

Next Lecture

## Summary of this lecture:

Orthogonal Range Searching
2-dim. range trees d-dim. range trees Fractional cascading

Next lecture:
Segment Trees and Interval Trees
Segment Trees Interval Trees

Lecture 4, CS 631100

35