Sei sulla pagina 1di 9

Primary Horizontal Fragmentation

Primary Horizontal Fragmentation is about fragmenting a single table horizontally (row wise) using a set
of simple predicates (conditions).
What is simple predicate?
Given a table R with set of attributes [A
1
, A
2
, , A
n
], a simple predicate P
i
can be expressed as follows;
P
i
: A
j
Value
Where can be any of the symbols in the set {=, <, >, , , }, value can be any value stored in the table
for the attributed A
i
. For example, consider the following table Account given in Figure 1;
Acno Balance Branch_Name
A101 5000 Mumbai
A103 10000 New Delhi
A104 2000 Chennai
A102 12000 Chennai
A110 6000 Mumbai
A115 6000 Mumbai
A120 2500 New Delhi
Figure 1: Account table
For the above table, we could define any simple predicates like, Branch_name = Chennai,
Branch_name= Mumbai, Balance < 10000 etc using the above expression Aj Value.
What is set of simple predicates?
Set of simple predicates is set of all conditions collectively required to fragment a relation into subsets.
For a table R, set of simple predicate can be defined as;
P = { P
1
, P
2
, , P
n
}
As an example, for the above table Account, if simple conditions are, Balance < 10000, Balance 10000,
then,
Set of simple predicates P1 = {Balance < 10000, Balance 10000}
As another example, if simple conditions are, Branch_name = Chennai, Branch_name= Mumbai,
Balance < 10000, Balance 10000, then,
Set of simple predicates P2 = { Branch_name = Chennai, Branch_name= Mumbai, Balance < 10000,
Balance 10000}
What is Min-term Predicate?
When we fragment any relation horizontally, we use single condition, or set of simple predicates to filter
the data.
Given a relation R and set of simple predicates, we can fragment a relation horizontally as follows
(relational algebra expression);
Fragment, R
i
=
Fi
(R), 1 i n
where F
i
is the set of simple predicates represented in conjunctive normal form, otherwise called as
Min-term predicate which can be written as follows;
Min-term predicate, M
i
=P
1
P
2
P
3
P
n

Here, P
1
means both P
1
or (P
1
), P
2
means both P
2
or (P
2
), and so on. Using the conjunctive form of
various simple predicates in different combination, we can derive many such min-term predicates.

For the example stated previously, we can derive set of min-term predicates using the rules stated
above as follows;
We will get 2
n
min-term predicates, where n is the number of simple predicates in the given predicate
set. For P1, we have 2 simple predicates. Hence, we will get 4 (2
2
) possible combinations of min-term
predicates as follows;

m
1
= {Balance < 10000 Balance 10000}
m
2
= {Balance < 10000 (Balance 10000)}
m
3
= {(Balance < 10000) Balance 10000}
m
4
= {(Balance < 10000) (Balance 10000)}

Our next step is to choose the min-term predicates which can satisfy certain conditions to fragment a
table, and eliminate the others which are not useful. For example, the above set of min-term predicates
can be applied each as a formula Fi stated in the above rule for fragment Ri as follows;
Account
1
=
Balance< 10000 Balance 10000
(Account)
which can be written in equivalent SQL query as,
Account
1
SELECT * FROM account WHERE balance < 10000 AND balance 10000;
Account
2
=
Balance< 10000 (Balance 10000)
(Account)
which can be written in equivalent SQL query as,
Account
2
SELECT * FROM account WHERE balance < 10000 AND NOT balance 10000;
where NOT balance 10000 is equivalent to balance < 10000.
Account
3
=
(Balance< 10000) Balance 10000
(Account)
which can be written in equivalent SQL query as,
Account
3
SELECT * FROM account WHERE NOT balance < 10000 AND balance 10000;
where NOT balance < 10000 is equivalent to balance 10000.
Account
4
=
(Balance< 10000) (Balance 10000)
(Account)
which can be written in equivalent SQL query as,
Account
4
SELECT * FROM account WHERE NOT balance < 10000 AND NOT balance 10000;
where NOT balance < 10000 is equivalent to balance 10000 and NOT balance 10000 is equivalent to
balance < 10000. This is exactly same as the query for fragment Account
1
.

From these examples, it is very clear that the first query for fragment Account
1
(min-term predicate m
1
)
is invalid as any record in a table cannot have two values for any attribute in one record. That is, the
condition (Balance < 10000 Balance 10000) requires that the value for balance must both be less
than 10000 and greater and equal to 10000, which is not possible. Hence the condition violates and can
be eliminated. For fragment Account
2
(min-term predicate m
2
), the condition is (balance<10000 and
balance<10000) which ultimately means balance<10000 which is correct. Likewise, fragment Account
3
is
valid and Account
4
must be eliminated. Finally, we use the min-term predicates m2 and m3 to fragment
the Account relation. The fragments can be derived as follows for Account;
SELECT * FROM account WHERE balance < 10000;
Account
2

Acno Balance Branch_Name
A101 5000 Mumbai
A104 2000 Chennai
A120 2500 New Delhi

SELECT * FROM account WHERE balance 10000;
Account
3

Acno Balance Branch_Name
A103 10000 New Delhi
A102 12000 Chennai
A110 6000 Mumbai
A115 6000 Mumbai

Correctness of Fragmentation

We have chosen set of min-term predicates which would be used to horizontally fragment a relation
(table) into pieces. Now, our next step is to validate the chosen fragments for their correctness. We
need to verify did we miss anything? We use the following rules to ensure that we have not changed
semantic information about the table which we fragment.
1. Completeness If a relation R is fragmented into set of fragments, then a tuple (record) of R must be
found in any one or more of the fragments. This rule ensures that we have not lost any records during
fragmentation.
2. Reconstruction After fragmenting a table, we must be able to reconstruct it back to its original form
without any data loss through some relational operation. This rule ensures that we can construct a base
table back from its fragments without losing any information. That is, we can write any queries involving
the join of fragments to get the original relation back.
3. Disjointness If a relation R is fragmented into a set of sub-tables R
1
, R
2
, , R
n
, a record belongs to R
1

is not found in any other sub-tables. This ensures that R
1
R
2
.

For example, consider the Account table in Figure 1 and its fragments Account
2
, and Account
3
created
using the min-term predicates we derived.
From the tables Account
2
, and Account
3
it is clear that the fragmentation is Complete. That is, we have
not missed any records. Just all are included into one of the sub-tables.
When we use an operation, say Union between Account
2
, and Account
3
we will be able to get the
original relation Account.

(SELECT * FROM account2) Union (SELECT * FROM account3);

The above query will get us Account back without loss of any information. Hence, the fragments created
can be reconstructed.
Finally, if we write a query as follows, we will get a Null set as output. It ensures that the Disjointness
property is satisfied.

(SELECT * FROM account2) Intersect (SELECT * FROM account3);

We get a null set as result for this query because, there is no record common in both relations Account
2

and Account
3
.

For the example 2, recall the set of simple predicates which was as follows;

Set of simple predicates P2 = { Branch_name = Chennai, Branch_name= Mumbai, Balance < 10000,
Balance 10000}

We can derive the following min-term predicates;
m
1
= { Branch_name = Chennai Branch_name= Mumbai Balance < 10000 Balance 10000}
m
2
= { Branch_name = Chennai Branch_name= Mumbai Balance < 10000 (Balance 10000)}
m
3
= { Branch_name = Chennai Branch_name= Mumbai (Balance < 10000) Balance 10000}
m
4
= { Branch_name = Chennai (Branch_name= Mumbai) Balance < 10000 Balance 10000}



m
n
= { (Branch_name = Chennai) (Branch_name= Mumbai) (Balance < 10000) (Balance
10000)}
As in the previous example, out of 16 (2
4
) min-term predicates, the set of min-term predicates which are
not valid should be eliminated. At last, we would have the following set of valid min-term predicates.
m
1
= { Branch_name = Chennai (Branch_name= Mumbai) (Balance < 10000) Balance
10000}
m
2
= { Branch_name = Chennai (Branch_name= Mumbai) Balance < 10000 (Balance
10000)}
m
3
= { (Branch_name = Chennai) Branch_name= Mumbai (Balance < 10000) Balance
10000}
m
4
= { (Branch_name = Chennai) Branch_name= Mumbai Balance < 10000 (Balance
10000)}
m
5
= { (Branch_name = Chennai) (Branch_name= Mumbai) (Balance < 10000) Balance
10000}
m
6
= { (Branch_name = Chennai) (Branch_name= Mumbai) Balance < 10000 (Balance
10000)}
The horizontal fragments using the above set of min-term predicates can be generated as follows;

Fragment 1: SELECT * FROM account WHERE branch_name = Chennai AND balance 10000;
Account
1

Acno Balance Branch_Name
A102 12000 Chennai

Fragment 2: SELECT * FROM account WHERE branch_name = Chennai AND balance < 10000;
Account
2

Acno Balance Branch_Name
A102 2000 Chennai

Fragment 3: SELECT * FROM account WHERE branch_name = Mumbai AND balance 10000;

Account
3

Acno Balance Branch_Name


Fragment 4: SELECT * FROM account WHERE branch_name = Mumbai AND balance < 10000;
Account
4

Acno Balance Branch_Name
A101 5000 Mumbai
A110 6000 Mumbai
A115 6000 Mumbai

In the ACCOUNT table we have the third branch New Delhi, which was not specified in the set of simple
predicates. Hence, in the fragmentation process we must not leave the tuple with the value New Delhi.
That is the reason we have included the min-term predicates m
5
and m
6
which can be derived as follows;

Fragment 5: SELECT * FROM account WHERE branch_name <> Mumbai AND branch_name <>
Chennai AND balance 10000;
Account
5

Acno Balance Branch_Name
A103 10000 New Delhi

Fragment 6: SELECT * FROM account WHERE branch_name <> Mumbai AND branch_name <>
Chennai AND balance < 10000;
Account
6

Acno Balance Branch_Name
A120 2500 New Delhi

Correctness of fragmentation:

Completeness: The tuple of the table Account is distributed into different fragments. No records were
omitted. Otherwise, by performing the union operation between all the Account table fragments
Account
1
, Account
2
, Account
3
, and Account
4
, we will be able to get Account back without any information
loss. Hence, the above fragmentation is Complete.

Reconstruction: As said before, by performing Union operation between all the fragments, we will be
able to get the original table back. Hence, the fragmentation is correct and the reconstruction property
is satisfied.

Disjointness: When we perform Intersect operation between all the above fragments, we will get null
set as result, as we do not have any records in common for all the fragments. Hence, disjointness
property is satisfied.












Derived Horizontal Fragmentation

In the previous post, we have seen about Primary Horizontal Fragmentation. We use the primary
horizontal technique when we would like to horizontally fragment a table which is not dependent
on any other table, or without considering any other table. That is, a table fragmented based on
set of conditions where all the conditional attributes are part of that table only. This type of
fragmentation is simple and straight forward. But in most of the cases, we need to fragment a
database as a whole. For example, consider a relation which is connected with another relation
using foreign key concept. That is, whenever a record is inserted into the child table, the foreign
key column value of the inserted record must be verified for its availability in its parent table. In
such condition, we cannot fragment the parent table (Table with primary key) and the child table
(table with foreign key). If we fragment the tables separately, then for every insertion of records
the table must verify the existence of one such value in the parent table. Hence, for this case the
Primary Horizontal Fragmentation would not work.

Consider an example, where an organization maintains the information about its customers.They
store information about the customer in CUSTOMER table and the customer addresses in
C_ADDRESS table as follows;
CUSTOMER(CId, CName, Prod_Purchased, Shop_Location)
C_ADDRESS(CId, C_Address)
The table CUSTOMER stores information about the customer, the product purchased from their
shop, and the shop location where the product is purchased. C_Address stores information about
permanent and present addresses of the customer. Here, CUSTOMER is the owner relation and
C_ADDRESS is the member relation.
Figure 1: CUSTOMER table
CID CNAME PROD_PURCHASED SHOP_LOCATION
C001 Ram Air Conditioner Mumbai
C002 Guru Television Chennai
C010 Murugan Television Coimbatore
C003 Yuvraj DVD Player Pune
C004 Gopinath Washing machine Coimbatore
Figure 2: C_ADDRESS table
CID C_ADDRESS
C001 Bandra, Mumbai
C001 XYZ, Pune
C002 T.Nagar, Chennai
C002 Kovil street, Madurai
C003 ABX, Pune
C004 Gandhipuram, Ooty
C004 North street, Erode
C010 Peelamedu, Coimbatore

If the organization would go for fragmenting the relation CUSTOMER on the shop_location
attribute, it needs to create 4 fragments using horizontal fragmentation technique as given in
Figure 3 below.
Figure 3: Horizontal fragments of Figure 1 on Shop_Location attribute
CUSTOMER
1

CID CNAME PROD_PURCHASED SHOP_LOCATION
C001 Ram Air Conditioner Mumbai
CUSTOMER
2

CID CNAME PROD_PURCHASED SHOP_LOCATION
C002 Guru Television Chennai
CUSTOMER
3

CID CNAME PROD_PURCHASED SHOP_LOCATION
C010 Murugan Television Coimbatore
C004 Gopinath Washing machine Coimbatore
CUSTOMER
4

CID CNAME PROD_PURCHASED SHOP_LOCATION
C003 Yuvraj DVD Player Pune

Now, it is necessary to fragment the second relation C_ADDRESS based on the fragment created on
CUSTOMER relation. Because, in any other way, if we fragment the relation C_ADDRESS, then it may end
in different location for different data. For example, if C_ADDRESS is fragmented on the last digit of the
CID attribute, it will end up with more number of fragments and the data may not be stored in the same
location where customer information are stored. That is, customer Ram information is stored in
Mumbai and his address information might be stored somewhere else. To avoid such confusion, the
table C_ADDRESS which is actually a member table of CUSTOMER, must be fragmented into four
fragments and based on the CUSTOMER table fragments given in Figure 3. This type of fragmentation
based on owner relation is called Derived Horizontal Fragmentation. This will work for relations where
an equi-join is required for joining two relations. Because, an equi-join can be represented as set of
semi-joins.

The fragmentation of C_ADDRESS is done as follow as set of semi-joins as follows.
C_ADDRESS
1
= C_ADDRESS CUSTOMER
1

C_ADDRESS
2
= C_ADDRESS CUSTOMER
2

C_ADDRESS
3
= C_ADDRESS CUSTOMER
3

C_ADDRESS
4
= C_ADDRESS CUSTOMER
4


This will result in four fragments of C_ADDRESS where the customer address of all customers of
fragment CUSTOMER
1
will go into C_ADDRESS
1
, and the customer address of all customers of fragment
CUSTOMER
2
will go into C_ADDRESS
2
, and so on. The resultant fragment of C_ADDRESS will be the
following.

Figure 4: Derived Horizontal fragments of Figure 2 as a member relation of the owner relations
fragments from Figure 3
C_ADDRESS
1

CID C_ADDRESS
C001 Bandra, Mumbai
C001 XYZ, Pune
C_ADDRESS
2

CID C_ADDRESS
C002 T.Nagar, Chennai
C002 Kovil street, Madurai
C_ADDRESS
3

CID C_ADDRESS
C004 Gandhipuram, Ooty
C004 North street, Erode
C010 Peelamedu, Coimbatore
C_ADDRESS
4

CID C_ADDRESS
C003 ABX, Pune

Checking for correctness
Completeness: The completeness of a derived horizontal fragmentation is more difficult than primary
horizontal fragmentation. Because, the predicates used are determining the fragmentation of two
relations. Formally, for fragmentation of two relations R and S, such as {R
1
, R
2
, , R
3
} and {S
1
, S
2
, , S
3
},
there should be one common attribute such as A. Then, for each tuple t of R
i
, there should be a tuple S
i

which have a common value for A. This is known as referential integrity.
The derived fragmentation of C_ADDRESS is complete. Because, the value of the common attributes CID
for the fragments CUSTOMER
i
and C_ADDRESS
i
are the same. For example, the value present in CID of
CUSTOMER
1
is also and only present in C_ADDRESS
1
, etc.

Reconstruction: Reconstruction of a relation from its fragments is performed by the union operator in
both the primary and the derived horizontal fragmentation.

Disjointness: If the minterm predicates are mutually exclusive then the disjointness rule is satisfied for
Primary Horizontal Fragmentation. For derived horizontal fragmentation, we state that the fragments
are disjoint if the fragments were created using the mutually exclusive simple predicates of the base
relation. Hence, in our example, as the simple predicates Shop_Location=Mumbai, etc are mutually
exclusive, the derived fragments are also disjoint.

Potrebbero piacerti anche