Sei sulla pagina 1di 67

Distributed Database

Management Systems

CS-600
Rahil But

Lecture - 01

1
References
Distributed Database Systems (3rd
Edition) by T.M., Ozsu, P. Valdusiez

Prerequisites: Database
Management Systems, Computer
Networks
2
History
 Traditional File Processing System: the
very first form of business data processing

 Each program contains data description


that it manipulates

 Redundancy of data

 Problems in maintenance 3
Library Examination Registration

Library Examination Registration


Applications Applications Applications

Library Examination Registration


Data Data Data
Files Files Files

Program and Data Interdependence


4
File Processing Systems
Library Exam Registration

Reg_Number Reg_Number Reg_Number


Name Name Name
Father Name Address Father Name
Books Issued Class Phone
Fine Semester Address
Grade Class

Duplication of Data
Vulnerable to Inconsistency
5
5
History continues

Database Approach: (Also


called centralized database)
Database is a shared collection
of logically related data

6
Database Approach
PROGRAM 1
Data
Description
PROGRAM 2 Database
Data
Manipulation
….
PROGRAM 3

Takes care of all major drawbacks of File


System Environment plus more
7
Motivation

Database Computer
Technology Networks
integration distribution

Distributed
Database
Systems
integration

integration ≠ centralization
Distributed Computing System
(DCS)

A number of autonomous processing


elements that are connected through a
computer network and that cooperate in
performing their assigned tasks

9
Distributed Computing System
(DCS)
 Distributed
System Software enables
computers to coordinate and share
 Whatis being distributed?
Processing logic
Functions
Data
Control; All are relevant and
important here 10
Distributed Database:
A collection of logically interrelated
databases that are spread physically
across multiple locations connected by a
data communications link.

15
Distributed Database Management
System :
A distributed database management
system (D–DBMS) is the software that
manages the DDB and provides an
access mechanism that makes this
distribution transparent to the users
16
Main Characteristics

Data at multiple sites


DM at each site ind.
Local requirements
Global perspective

18
Where to apply
Major two reasons that make an
application a candidate to be DDBS
application
Large Number of Users
Operation spread large geographical
area

19
Example Applications

Banking
Air Ticketing
Business at multiple locations

20
What is not a DDBS?
Distributed Files: A collection of files
stored on different computers of a
network; not a DDBS
DDBS is logically related, common
structure among files, and accessed via
same interface

22
Multiprocessor System: multiple
processors that share some form of
memory
Processor Processor Processor
Unit Unit Unit

Memory

Shared Everything
Tight Coupling
I/O System
23
Shared Everything
Loose Coupling
Computer Computer Computer
System System System
CPU CPU CPU

Memory Memory Memory

Shared
Secondary
Memory
Shared Nothing

Computer Computer Computer


System System System

CPU CPU CPU

Memory Memory Memory

Switch
25
Client/Server Databases

DDBS is also different from a


centralized system having C/S
system involving network
Centralized DBMS on a
Network
Site 1
Site 2

Site 5

Communication
Network

Site 4 Site 3
Distributed DBMS Environment

Site 1
Site 2

Site 5
Communication
Network

Site 4 Site 3
Reasons for DDBS

 Local units want control over data.


 Consolidate data for integrated decisions
 Reduce telecommunication costs.
 Reduce the risk of telecommunication
failures.
Global User Global User

Local User
Distributed DBMS
Global
Schema

Node 1 Node n
DBMS
DBMS11 •••• DBMS n

Local User
Data Delivery Alternatives
 In distributed databases, data are
“delivered” from the sites where they
are stored to where the query is posed.
 The data delivery alternatives are
characterized along three dimensions:
delivery modes,
frequency
communication methods
Delivery modes
 Pull-only: the transfer of data from servers to
clients is initiated by a client pull.
When a client request is received at a server, the
server responds by locating the requested
information.
 Push-only: the transfer of data from servers to
clients is initiated by a server push in the absence
of any specific request from clients.
 Hybrid: hybrid mode of data delivery combines
the client-pull and server-push mechanisms.
Frequency
 Periodic: data are sent from the server to clients at
regular intervals.
Both pull and push can be performed in periodic
fashion.
 Conditional: data are sent from servers whenever
certain conditions installed by clients in their
profiles are satisfied.
mostly used in the hybrid or push-only delivery
systems.
 Ad-hoc or irregular: delivery is irregular
mostly in a pure pull-based system.
Communication Methods
 Determine the various ways in which servers &
clients communicate for delivering information to
clients.
 Unicast: the communication from a server to a
client is one-to-one
 One-to-many: the server sends data to a number of
clients.
Promises of DDBs
Transparency
• Separation of the higher-level semantics of a system from lower-level implementation
issues
• User have no idea of distribution

User View System View


Example
Transparent Access

SELECT ENAME,SAL
Tokyo
FROM EMP,ASG,PAY
WHERE DUR > 12 Boston Paris
AND EMP.ENO = ASG.ENO Paris projects
Paris employees
AND PAY.TITLE = EMP.TITLE Communication Paris assignments
Network Boston employees

Boston projects
Boston employees
Boston assignments
Montreal
New
Montreal projects
York Paris projects
Boston projects New York projects
New York employees with budget > 200000
New York projects Montreal employees
New York assignments Montreal assignments
Distributed Database - User
View

Distributed Database
Distributed DBMS - Reality
User
Query

User
DBMS
Application
Software
DBMS
Software

DBMS Communication
Software Subsystem

User
DBMS User Application
Software Query
DBMS
Software

User
Query
Types of Transparency
Data independence
Network transparency (or distribution
transparency)
Replication transparency
Fragmentation transparency
Data Independence
Two types, Logical Data Independence
and Physical Data Independence
A transparent system hides the
implementation details from its users
Network Transparency

User should not only be free from


network management activities rather it
should be unaware of even existence of
the network
Location Transparency and Naming
Transparency
43
Replication Transparency
Replica: Same data stored at multiple sites

Advantages:
Access from remote site will be treated as
local
Failure at one site
 User have no idea of replication

44
Fragmentation Transparency
Fragmentation: Split data in multiple
parts/fragments
Alternative to replication
 User have no idea of fragmentation

45
Responsibility of Transparency
Transparency is desirable but there is a
compromise between level of
Transparency and difficulty/cost
The language/Compiler: to provide
uniform method of manipulating data.
Avoid connectivity details.
Operating System: already provides in
form of Device Drivers
46
Canalso provide network Trns.
However not all provide complete.
Thirdlayer is DBMS. Uses features of
Operating System in particular for
network transparency.
Practically, we get combination of all
three.

47
Layers of Transparency
Compiler Language transparency

DBMS Replication/Fragmentation Transparency

OS Network Transparency

DB Approach Data Independence

Data

48
Reliability in DDBS
Reliabilitythrough Distributed
Transactions: Avoids simple point of
failure
Concurrency Issues: Transaction
involving multiple records
Failure Recovery: Involving multiple
sites

49
Performance Improvement
Through Data Localization
Each site handles a portion of data, so
contention of CPU or I/O is relatively
less
Reduces remote access delays; no
matter how fast networks are latency
delays are there and may be
unacceptable in certain case
Inter-query and Intra-query parallelism 50
System Expansion
 easier to accommodate increasing database sizes

 Emergence of microprocessor and workstation


technologies

Demise of Grosh's law

Client-server model of computing

 Data communication cost vs telecommunication


cost
Complicating Factors

Incase of Replication:
Choose the appropriate copy
Update impact reflected on all
replicas
Failure Recovery

52
Complication Factors

Complexity
Cost: hardware and also the
duplication of manpower
Distribution of Control may cause
problems if not used cautiously

53
Design Issues
 Database Design
• How to distribute the database
• Replicated & non-replicated database distribution
• A related problem in directory management
 Query Processing
• Convert user transactions to data manipulation instructions
• Optimization problem
• min{cost = data transmission + local processing}
 Concurrency Control
• Synchronization of concurrent accesses
• Consistency and isolation of transactions' effects
• Deadlock management
54
Design Issues
 OS Support
• Operating system with proper support for database
operations
 Heterogeneity
 Relationship among them

55
Relationship Between Issues
Directory
Management

Query Distribution
Reliability
Processing Design

Concurrency
Control

Deadlock
Management
Architecture
Defines the structure of the system

components identified
functions of each component defined
interrelationships
and interactions
between components defined
ANSI/SPARC Architecture

Users

External External External External


Schema view view view

Conceptual Conceptual
view
Schema

Internal Internal view


Schema
Generic DBMS Architecture
DBMS Implementation
Alternatives
Dimensions of the Problem
 Distribution
 Whether the components of the system are located on the same machine or not
 Heterogeneity
 Various levels (hardware, communications, operating system)
 DBMS important one
 data model, query language,transaction management algorithms
 Autonomy
 Not well understood and most troublesome
 Various versions

 Design autonomy: Ability of a component DBMS to decide on issues


related to its own design.
 Communication autonomy: Ability of a component DBMS to decide
whether and how to communicate with other DBMSs.
 Execution autonomy: Ability of a component DBMS to execute local
operations in any manner it wants to.
Client/Server Architecture
Advantages of Client-Server Architectures
 More efficient division of labor
 Horizontal and vertical scaling of resources
 Better price/performance on client machines
 Ability to use familiar tools on client machines
 Client access to remote data (via standards)
 Full
DBMS functionality provided to client
workstations
 Overall better system price/performance
Database Server
Distributed Database Servers
Distributed DBMS Architecture

ES1 ES2 ... ESn

GCS

LCS1 LCS2
... LCSn

LIS1 LIS2 ... LISn


Peer-to-Peer Component
Architecture
USER PROCESSOR DATA PROCESSOR

Global Local System Local


External
Conceptual Conceptual Log Internal
Schema Schema GD/D Schema Schema
User
requests Database
Semantic Data
User Interface

Local Recovery
Global Query

Local Query
Controller

Optimizer

Execution
Handler

Processor

Processor
Manager

Runtime
Monitor

Support
USER Global

System
responses
Datalogical Multi-DBMS
Architecture
GES1 GES2 ... GESn

LES11 … LES1n GCS LESn1 … LESnm

LCS1 LCS2 … LCSn

LIS1 LIS2 … LISn


MDBS Components & Execution

Global
User
Request

Local Local
User Multi-DBMS User
Request Layer Request
Global Global Global
Subrequest Subrequest Subrequest

DBMS1 DBMS2 DBMS3


Mediator/Wrapper
Architecture
Assignment # 1
Distributed Database Architecture Case Study

Discuss the Organizational Setup (distribution of working


units), Architecture, and Component model of any
organization that use Distributed Database system.

Note: Every student should discuss different case


study/organization

Date: //2019
Submission Date: //2019
71
Assignment # 2
SQL to Relational Algebra transformation

Date:
Submission Date:

72

Potrebbero piacerti anche