Distributed Database Management Systems: Rahil But

Distributed Database
Management Systems
CS-600
Rahil But
Lecture - 01
1
References
Distributed Database Systems (3rd
Edition) by T.M., Ozsu, P. Valdusiez
Prerequisites: Database
Management Systems, Computer
Networks
2
History
 Traditional File Processing System: the
very first form of business data processing
 Each program contains data description

that it manipulates
 Redundancy of data
 Problems in maintenance 3
Library Examination Registration

Applications Applications Applications

Data Data Data
Files Files Files
Program and Data Interdependence

4
File Processing Systems
Library Exam Registration
Reg_Number Reg_Number Reg_Number

Name Name Name
Father Name Address Father Name
Books Issued Class Phone
Fine Semester Address
Grade Class
Duplication of Data
Vulnerable to Inconsistency
5
5
History continues
Database Approach: (Also

called centralized database)
Database is a shared collection
of logically related data
6
Database Approach
PROGRAM 1
Data
Description
PROGRAM 2 Database
Data
Manipulation
….
PROGRAM 3
Takes care of all major drawbacks of File

System Environment plus more
7
Motivation
Database Computer
Technology Networks
integration distribution
Distributed
Database
Systems
integration
integration ≠ centralization
Distributed Computing System
(DCS)
A number of autonomous processing

elements that are connected through a
computer network and that cooperate in
performing their assigned tasks
9
Distributed Computing System
(DCS)
 Distributed
System Software enables
computers to coordinate and share
 Whatis being distributed?
Processing logic
Functions
Data
Control; All are relevant and
important here 10
Distributed Database:
A collection of logically interrelated
databases that are spread physically
across multiple locations connected by a
data communications link.
15
Distributed Database Management
System :
A distributed database management
system (D–DBMS) is the software that
manages the DDB and provides an
access mechanism that makes this
distribution transparent to the users
16
Main Characteristics
Data at multiple sites

DM at each site ind.
Local requirements
Global perspective
18
Where to apply
Major two reasons that make an
application a candidate to be DDBS
application
Large Number of Users
Operation spread large geographical
area
19
Example Applications
Banking
Air Ticketing
Business at multiple locations
20
What is not a DDBS?
Distributed Files: A collection of files
stored on different computers of a
network; not a DDBS
DDBS is logically related, common
structure among files, and accessed via
same interface
22
Multiprocessor System: multiple
processors that share some form of
memory
Processor Processor Processor
Unit Unit Unit
Memory
Shared Everything
Tight Coupling
I/O System
23
Shared Everything
Loose Coupling
Computer Computer Computer
System System System
CPU CPU CPU
Memory Memory Memory
Shared
Secondary
Memory
Shared Nothing
Computer Computer Computer

System System System
CPU CPU CPU
Memory Memory Memory
Switch
25
Client/Server Databases
DDBS is also different from a

centralized system having C/S
system involving network
Centralized DBMS on a
Network
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
Distributed DBMS Environment
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
Reasons for DDBS
 Local units want control over data.

 Consolidate data for integrated decisions
 Reduce telecommunication costs.
 Reduce the risk of telecommunication
failures.
Global User Global User
Local User
Distributed DBMS
Global
Schema
Node 1 Node n
DBMS
DBMS11 •••• DBMS n
Local User
Data Delivery Alternatives
 In distributed databases, data are
“delivered” from the sites where they
are stored to where the query is posed.
 The data delivery alternatives are
characterized along three dimensions:
delivery modes,
frequency
communication methods
Delivery modes
 Pull-only: the transfer of data from servers to
clients is initiated by a client pull.
When a client request is received at a server, the
server responds by locating the requested
information.
 Push-only: the transfer of data from servers to
clients is initiated by a server push in the absence
of any specific request from clients.
 Hybrid: hybrid mode of data delivery combines
the client-pull and server-push mechanisms.
Frequency
 Periodic: data are sent from the server to clients at
regular intervals.
Both pull and push can be performed in periodic
fashion.
 Conditional: data are sent from servers whenever
certain conditions installed by clients in their
profiles are satisfied.
mostly used in the hybrid or push-only delivery
systems.
 Ad-hoc or irregular: delivery is irregular
mostly in a pure pull-based system.
Communication Methods
 Determine the various ways in which servers &
clients communicate for delivering information to
clients.
 Unicast: the communication from a server to a
client is one-to-one
 One-to-many: the server sends data to a number of
clients.
Promises of DDBs
Transparency
• Separation of the higher-level semantics of a system from lower-level implementation
issues
• User have no idea of distribution
User View System View

Example
Transparent Access
SELECT ENAME,SAL
Tokyo
FROM EMP,ASG,PAY
WHERE DUR > 12 Boston Paris
AND EMP.ENO = ASG.ENO Paris projects
Paris employees
AND PAY.TITLE = EMP.TITLE Communication Paris assignments
Network Boston employees
Boston projects
Boston employees
Boston assignments
Montreal
New
Montreal projects
York Paris projects
Boston projects New York projects
New York employees with budget > 200000
New York projects Montreal employees
New York assignments Montreal assignments
Distributed Database - User
View
Distributed Database
Distributed DBMS - Reality
User
Query
User
DBMS
Application
Software
DBMS
Software
DBMS Communication
Software Subsystem
User
DBMS User Application
Software Query
DBMS
Software
User
Query
Types of Transparency
Data independence
Network transparency (or distribution
transparency)
Replication transparency
Fragmentation transparency
Data Independence
Two types, Logical Data Independence
and Physical Data Independence
A transparent system hides the
implementation details from its users
Network Transparency
User should not only be free from

network management activities rather it
should be unaware of even existence of
the network
Location Transparency and Naming
Transparency
43
Replication Transparency
Replica: Same data stored at multiple sites
Advantages:
Access from remote site will be treated as
local
Failure at one site
 User have no idea of replication
44
Fragmentation Transparency
Fragmentation: Split data in multiple
parts/fragments
Alternative to replication
 User have no idea of fragmentation
45
Responsibility of Transparency
Transparency is desirable but there is a
compromise between level of
Transparency and difficulty/cost
The language/Compiler: to provide
uniform method of manipulating data.
Avoid connectivity details.
Operating System: already provides in
form of Device Drivers
46
Canalso provide network Trns.
However not all provide complete.
Thirdlayer is DBMS. Uses features of
Operating System in particular for
network transparency.
Practically, we get combination of all
three.
47
Layers of Transparency
Compiler Language transparency
DBMS Replication/Fragmentation Transparency
OS Network Transparency
DB Approach Data Independence
Data
48
Reliability in DDBS
Reliabilitythrough Distributed
Transactions: Avoids simple point of
failure
Concurrency Issues: Transaction
involving multiple records
Failure Recovery: Involving multiple
sites
49
Performance Improvement
Through Data Localization
Each site handles a portion of data, so
contention of CPU or I/O is relatively
less
Reduces remote access delays; no
matter how fast networks are latency
delays are there and may be
unacceptable in certain case
Inter-query and Intra-query parallelism 50
System Expansion
 easier to accommodate increasing database sizes
 Emergence of microprocessor and workstation

technologies
Demise of Grosh's law
Client-server model of computing
 Data communication cost vs telecommunication

cost
Complicating Factors
Incase of Replication:
Choose the appropriate copy
Update impact reflected on all
replicas
Failure Recovery
52
Complication Factors
Complexity
Cost: hardware and also the
duplication of manpower
Distribution of Control may cause
problems if not used cautiously
53
Design Issues
 Database Design
• How to distribute the database
• Replicated & non-replicated database distribution
• A related problem in directory management
 Query Processing
• Convert user transactions to data manipulation instructions
• Optimization problem
• min{cost = data transmission + local processing}
 Concurrency Control
• Synchronization of concurrent accesses
• Consistency and isolation of transactions' effects
• Deadlock management
54
Design Issues
 OS Support
• Operating system with proper support for database
operations
 Heterogeneity
 Relationship among them
55
Relationship Between Issues
Directory
Management
Query Distribution
Reliability
Processing Design
Concurrency
Control
Deadlock
Management
Architecture
Defines the structure of the system
components identified
functions of each component defined
interrelationships
and interactions
between components defined
ANSI/SPARC Architecture
Users
External External External External

Schema view view view
Conceptual Conceptual
view
Schema
Internal Internal view

Schema
Generic DBMS Architecture
DBMS Implementation
Alternatives
Dimensions of the Problem
 Distribution
 Whether the components of the system are located on the same machine or not
 Heterogeneity
 Various levels (hardware, communications, operating system)
 DBMS important one
 data model, query language,transaction management algorithms
 Autonomy
 Not well understood and most troublesome
 Various versions
 Design autonomy: Ability of a component DBMS to decide on issues

related to its own design.
 Communication autonomy: Ability of a component DBMS to decide
whether and how to communicate with other DBMSs.
 Execution autonomy: Ability of a component DBMS to execute local
operations in any manner it wants to.
Client/Server Architecture
Advantages of Client-Server Architectures
 More efficient division of labor
 Horizontal and vertical scaling of resources
 Better price/performance on client machines
 Ability to use familiar tools on client machines
 Client access to remote data (via standards)
 Full
DBMS functionality provided to client
workstations
 Overall better system price/performance
Database Server
Distributed Database Servers
Distributed DBMS Architecture
ES1 ES2 ... ESn
GCS
LCS1 LCS2
... LCSn
LIS1 LIS2 ... LISn

Peer-to-Peer Component
Architecture
USER PROCESSOR DATA PROCESSOR
Global Local System Local

External
Conceptual Conceptual Log Internal
Schema Schema GD/D Schema Schema
User
requests Database
Semantic Data
User Interface
Local Recovery
Global Query
Local Query
Controller
Optimizer
Execution
Handler
Processor
Processor
Manager
Runtime
Monitor
Support
USER Global
System
responses
Datalogical Multi-DBMS
Architecture
GES1 GES2 ... GESn
LES11 … LES1n GCS LESn1 … LESnm
LCS1 LCS2 … LCSn
LIS1 LIS2 … LISn

MDBS Components & Execution
Global
User
Request
Local Local
User Multi-DBMS User
Request Layer Request
Global Global Global
Subrequest Subrequest Subrequest
DBMS1 DBMS2 DBMS3

Mediator/Wrapper
Architecture
Assignment # 1
Distributed Database Architecture Case Study
Discuss the Organizational Setup (distribution of working

units), Architecture, and Component model of any
organization that use Distributed Database system.
Note: Every student should discuss different case

study/organization
Date: //2019
Submission Date: //2019
71
Assignment # 2
SQL to Relational Algebra transformation
Date:
Submission Date:
72

Distributed Database Management Systems: Rahil But

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Distributed Database Management Systems: Rahil But

Caricato da

Copyright:

Formati disponibili

Distributed Database

 Each program contains data description

Library Examination Registration

Library Examination Registration

Program and Data Interdependence

Reg_Number Reg_Number Reg_Number

Database Approach: (Also

Takes care of all major drawbacks of File

A number of autonomous processing

Data at multiple sites

Memory Memory Memory

Computer Computer Computer

CPU CPU CPU

Memory Memory Memory

DDBS is also different from a

 Local units want control over data.

User View System View

User should not only be free from

DBMS Replication/Fragmentation Transparency

DB Approach Data Independence

 Emergence of microprocessor and workstation

Demise of Grosh's law

Client-server model of computing

 Data communication cost vs telecommunication

External External External External

Internal Internal view

 Design autonomy: Ability of a component DBMS to decide on issues

ES1 ES2 ... ESn

LIS1 LIS2 ... LISn

Global Local System Local

LES11 … LES1n GCS LESn1 … LESnm

LCS1 LCS2 … LCSn

LIS1 LIS2 … LISn

DBMS1 DBMS2 DBMS3

Discuss the Organizational Setup (distribution of working

Note: Every student should discuss different case

Potrebbero piacerti anche