Sei sulla pagina 1di 23

VIEWS

Prof. Navneet Goyal


Department of Computer Science & I nformation Systems
BI TS, Pilani
Topics
Query Modification
View Materialization
Which Views to Materialize?
How to exploit Materialized
Views to answer queries?
View Maintenance

View Modification
(Evaluate On Demand)
CREATE VIEW RegionalSales(category,sales,state)
AS SELECT P.category, S.sales, L.state
FROM Products P, Sales S, Locations L
WHERE P.pid=S.pid AND S.locid=L.locid
SELECT R.category, R.state, SUM(R.sales)
FROM RegionalSales AS R GROUP BY R.category, R.state
SELECT R.category, R.state, SUM(R.sales)
FROM (SELECT P.category, S.sales, L.state
FROM Products P, Sales S, Locations L
WHERE P.pid=S.pid AND S.locid=L.locid) AS R
GROUP BY R.category, R.state
View
Query
Modified
Query
View Materialization
(Precomputation)
Suppose we precompute RegionalSales and
store it with a clustered B+ tree index on
[category,state,sales].
Then, previous query can be answered by an
index-only scan.
SELECT R.state, SUM(R.sales)
FROM RegionalSales R
WHERE R.category=Laptop
GROUP BY R.state
SELECT R.state, SUM(R.sales)
FROM RegionalSales R
WHERE R. state=Wisconsin
GROUP BY R.category
Index on precomputed view
is great!
Index is less useful (must
scan entire leaf level).
Materialized Views
A view whose tuples are stored in the
database is said to be materialized
Provides fast access, like a (very high-
level) cache.
Need to maintain the view as the
underlying tables change.
Ideally, we want incremental view
maintenance algorithms.
Close relationship to Data
Warehousing, OLAP,
Issues in View
Materialization
What views should we materialize, and
what indexes should we build on the
precomputed results?
Given a query and a set of materialized
views, can we use the materialized views
to answer the query?
How frequently should we refresh
materialized views to make them
consistent with the underlying tables?
(And how can we do this incrementally?)
View Materialization:
Example
SELECT P.Category, SUM(S.sales)
FROM Product P, Sales S
WHERE P.pid=S.pid
GROUP BY P.Category
SELECT L.State, SUM(S.sales)
FROM Location L, Sales S
WHERE L.locid=S.locid
GROUP BY L.State
Both queries
require us to
join the Sales
table with
another table &
aggregate the
result
How can we use
materialization
to speed up
these queries?
View Materialization:
Example
Pre-compute the two joins involved
( product & sales & Location & sales)
Pre-compute each query in its entirety
OR let us define the following view:
CREATE VIEW TOTALSALES (pid, lid, total)
AS Select S.pid, S.locid, SUM(S.sales)
FROM Sales S
GROUP BY S.pid, S.locid
View Materialization:
Example
The View TOTALSALES can be
materialized & used instead os Sales in
our two example queries
SELECT P.Category, SUM(T.Total)
FROM Product P, TOTALSALES T
WHERE P.pid=T.pid
GROUP BY P.Category
SELECT L.State, SUM(T.Total)
FROM Location L, TOTALSALES T
WHERE L.locid=T.locid
GROUP BY L.State
View Maintenance
A materialized view is said to be refreshed
when it is made consistent with changes ot
its underlying tables
Often referred to as VIEW MAINTENANCE
Two issues:
HOW do we refresh a view when an underlying
table is refreshed? Can we do it incrementally?
WHEN should we refresh a view in response to a
change in the underlying table?
View Maintenance
The task of keeping a materialized view up-to-date with
the underlying data is known as materialized view
maintenance
Materialized views can be maintained by recomputation on
every update
A better option is to use incremental view maintenance
Changes to database relations are used to compute
changes to materialized view, which is then updated
View maintenance can be done by
Manually defining triggers on insert, delete, and update of
each relation in the view definition
Manually written code to update the view whenever database
relations are updated
Supported directly by the database
View Maintenance
Two steps:
Propagate: Compute changes to view when data
changes.
Refresh: Apply changes to the materialized view
table.
Maintenance policy: Controls when we do
refresh.
Immediate: As part of the transaction that
modifies the underlying data tables. (+
Materialized view is always consistent; - updates
are slowed)
Deferred: Some time later, in a separate
transaction. (- View becomes inconsistent; + can
scale to maintain many views without slowing
updates)
Deferred Maintenance
Three flavors:
Lazy: Delay refresh until next query on view;
then refresh before answering the query.
Periodic (Snapshot): Refresh periodically.
Queries possibly answered using outdated
version of view tuples. Widely used, especially
for asynchronous replication in distributed
databases, and for warehouse applications.
Event-based: E.g., Refresh after a fixed number
of updates to underlying data tables.
View Maintenance:
Incremental Algorithms
Recomputing the view when an
underlying table is modified
straightforward approach
Not feasible to do so for all changes
made
Ideally algorithms for refreshing a
view should be incremental
Cost of refresh is proportional to
the extent of the change
View Maintenance:
Incremental Algorithms
Note that a given row in the
materialized view can appear many
times (duplicates are not eliminated)
Main idea behind incremental
algorithms is to efficiently compute
changes to the rows of the view
New rows
Changes to count associated with a row
A row is deleted if its count becomes 0
View Maintenance
The changes (inserts and deletes) to a relation
or expressions are referred to as its differential
Set of tuples inserted to and deleted from r are denoted
i
r
and d
r
To simplify our description, we only consider
inserts and deletes
We replace updates to a tuple by deletion of the tuple
followed by insertion of the update tuple
We describe how to compute the change to the
result of each relational operation, given
changes to its inputs
We then outline how to handle relational algebra
expressions
Join Operation
Consider the materialized view v = r s and an
update to r
Let rold and rnew denote the old and new states
of relation r
Consider the case of an insert to r:
We can write rnew s as (rold ir) s
And rewrite the above to (rold s) (ir s)
But (rold s) is simply the old value of the materialized
view, so the incremental change to the view is just ir s
Thus, for inserts vnew = vold (ir s)
Similarly for deletes vnew = vold (dr s)
Selection & Projection
Operations
Selection: Consider a view v =

(r).
v
new
= v
old

(i
r
)
v
new
= v
old
-

(d
r
)
Projection is a more difficult operation
R = (A,B), and r(R) = { (a,2), (a,3)}

A
(r) has a single tuple (a).
If we delete the tuple (a,2) from r, we should not delete the
tuple (a) from
A
(r), but if we then delete (a,3) as well, we
should delete the tuple
For each tuple in a projection
A
(r) , we will keep a count
of how many times it was derived
On insert of a tuple to r, if the resultant tuple is already in

A
(r) we increment its count, else we add a new tuple with
count = 1
On delete of a tuple from r, we decrement the count of the
corresponding tuple in
A
(r)
if the count becomes 0, we delete the tuple from
A
(r)
Aggregate Operations
count : v =
A
g
count(B)
(r)
( count of the attribute B, after grouping r by attribute A)
When a set of tuples i
r
is inserted
For each tuple t in i
r
, if the group t.A is present
in v, we increment its count, else we add a new
tuple (t.A, 1) with count = 1
When a set of tuples d
r
is deleted
for each tuple t in i
r
.
we look for the group t.A in
v, and subtract 1 from the count for the group.
If the count becomes 0, we delete from v the tuple
for the group t.A
Example
Relation account grouped by branch-name:
branch_name
g
sum(balance)
(account)
branch_name account_number balance
Perryridge
Perryridge
Brighton
Brighton
Redwood
A-102
A-201
A-217
A-215
A-222
400
900
750
750
700
branch_name sum(balance)
Perryridge
Brighton
Redwood
1300
1500
700
Aggregate Operations
sum: v =
A
g
sum (B)
(r)

We maintain the sum in a manner similar to count,
except we add/subtract the B value instead of
adding/subtracting 1 for the count
Additionally we maintain the count in order to
detect groups with no tuples. Such groups are
deleted from v
Cannot simply test for sum = 0 (why?)
To handle the case of avg, we maintain the
sum and count aggregate values separately,
and divide at the end
Aggregate Operations
min, max: v =
A
g
min (B)
(r).
Handling insertions on r is
straightforward.
Maintaining the aggregate values min
and max on deletions may be more
expensive.

We have to look at the
other tuples of r that are in the same
group to find the new minimum

Example
Test
g
Min(marks)
(student_record))
TEST IDNO Marks
T1
T1
T2
T2
T3
A-102
A-103
A-102
A-103
A-104
15
20
25
10
10
TEST Min(marks)
T1
T2
15
10

Potrebbero piacerti anche