Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
SAMs
Spatial Indexing
Point Access Methods can index only points. What about regions?
Z-ordering and quadtrees Use the transformation technique and a PAM New methods: Spatial Access Methods SAMs
Problem
Given a collection of geometric objects (points, lines, polygons, ...) organize them on disk, to answer spatial queries (range, nn, etc)
Transformation Technique
Map an d-dim MBR into a point: ex. [(xmin, xmax) (ymin, ymax)] => (xmin, xmax, ymin, ymax) Use a PAM to index the 2d points Given a range query, map the query into the 2d space and use the PAM to answer it
R-trees
=> guaranteed 50% utilization => easier insertion/split algorithms. (only deal with Minimum Bounding Rectangles - MBRs)
R-trees
A multi-way external memory tree Index nodes and data (leaf) nodes All leaf nodes appear on the same level Every node contains between m and M entries The root node has at least 2 entries (children)
Example
eg., w/ fanout 4: group nearby rectangles to parent MBRs; each group -> disk page
I
AC B E D
H
J
Example
F=4
P3 I G
P1 AC B E P2 D
H
P4 J
A B C D E H I F G J
Example
F=4
P3 I G
P1 P2 P3 P4
P1 AC B E P2 D
H
P4 J
A B C D E H I F G J
A B C
P1 P2 P3 P4
...
A B C
R-trees:Search
P1 AC B E P2 D P3 G I
P1 P2 P3 P4
H
P4 J
A B C D E H I F G J
R-trees:Search
P1
AC B E P2 D F
P3
G
I
H P4 J
A B C D E
P1 P2 P3 P4
H I F G
R-trees:Search
Main points:
every parent node completely covers its children a child MBR may be covered by more than one parent - it is stored under ONLY ONE of them. (ie., no need for dup. elim.) a point query may follow multiple branches. everything works for any(?) dimensionality
R-trees:Insertion
Insert X P1 AC B X P2 D E P3 G I
P1 P2 P3 P4
H
P4 J
A B C D E X H I F G J
R-trees:Insertion
Insert Y P1 AC B Y P2 D E P3 G I
P1 P2 P3 P4
H
P4 J
A B C D E H I F G J
R-trees:Insertion
P1 AC B Y P2 D E
P1 P2 P3 P4
G
F H P4 J
A B C D E Y H I F G J
R-trees:Insertion
Using ChooseLeaf: Find the entry that needs the least enlargement to include Y. Resolve ties using the area (smallest)
R-trees:Insertion
P1
K AC W E
P3 G
I H P4 J
P1 P2 P3 P4
B
P2 D
A B C K D E
H I F G
R-trees:Insertion
P1
K P5 A C
P3 W E G
I H P4 J Q2
P1 P5 P2
P3 P4
B
P2 D Q1
A B C K W H I F G D E J
R-trees:Split
P1
R-trees:Split
pick two rectangles as seeds; assign each rectangle R to the closest seed
seed2 R seed1
R-trees:Split
pick two rectangles as seeds; assign each rectangle R to the closest seed: closest: the smallest increase in area
seed2
R seed1
R-trees:Split
How to pick Seeds: Linear:Find the highest and lowest side in each dimension, normalize the separations, choose the pair with the greatest normalized separation Quadratic: For each pair E1 and E2, calculate the rectangle J=MBR(E1, E2) and d= J-E1-E2. Choose the pair with the largest d
R-trees:Insertion
Use the ChooseLeaf to find the leaf node to insert an entry E If leaf node is full, then Split, otherwise insert there
R-Trees:Deletion
Find the leaf node that contains the entry E Remove E from this node If underflow: Eliminate the node by removing the node entries and the parent entry Reinsert the orphaned (other entries) into the tree using Insert
R-trees: Variations
R+-tree: DO not allow overlapping, so split the objects (similar to z-values) R*-tree: change the insertion, deletion algorithms (minimize not only area but also perimeter, forced re-insertion ) Hilbert R-tree: use the Hilbert values to insert objects into the tree