Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Management Systems
CS-600
Rahil But
Lecture - 01
1
References
Distributed Database Systems (3rd
Edition) by T.M., Ozsu, P. Valdusiez
Prerequisites: Database
Management Systems, Computer
Networks
2
History
Traditional File Processing System: the
very first form of business data processing
Redundancy of data
Problems in maintenance 3
Library Examination Registration
Duplication of Data
Vulnerable to Inconsistency
5
5
History continues
6
Database Approach
PROGRAM 1
Data
Description
PROGRAM 2 Database
Data
Manipulation
….
PROGRAM 3
Database Computer
Technology Networks
integration distribution
Distributed
Database
Systems
integration
integration ≠ centralization
Distributed Computing System
(DCS)
9
Distributed Computing System
(DCS)
Distributed
System Software enables
computers to coordinate and share
Whatis being distributed?
Processing logic
Functions
Data
Control; All are relevant and
important here 10
Distributed Database:
A collection of logically interrelated
databases that are spread physically
across multiple locations connected by a
data communications link.
15
Distributed Database Management
System :
A distributed database management
system (D–DBMS) is the software that
manages the DDB and provides an
access mechanism that makes this
distribution transparent to the users
16
Main Characteristics
18
Where to apply
Major two reasons that make an
application a candidate to be DDBS
application
Large Number of Users
Operation spread large geographical
area
19
Example Applications
Banking
Air Ticketing
Business at multiple locations
20
What is not a DDBS?
Distributed Files: A collection of files
stored on different computers of a
network; not a DDBS
DDBS is logically related, common
structure among files, and accessed via
same interface
22
Multiprocessor System: multiple
processors that share some form of
memory
Processor Processor Processor
Unit Unit Unit
Memory
Shared Everything
Tight Coupling
I/O System
23
Shared Everything
Loose Coupling
Computer Computer Computer
System System System
CPU CPU CPU
Shared
Secondary
Memory
Shared Nothing
Switch
25
Client/Server Databases
Site 5
Communication
Network
Site 4 Site 3
Distributed DBMS Environment
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
Reasons for DDBS
Local User
Distributed DBMS
Global
Schema
Node 1 Node n
DBMS
DBMS11 •••• DBMS n
Local User
Data Delivery Alternatives
In distributed databases, data are
“delivered” from the sites where they
are stored to where the query is posed.
The data delivery alternatives are
characterized along three dimensions:
delivery modes,
frequency
communication methods
Delivery modes
Pull-only: the transfer of data from servers to
clients is initiated by a client pull.
When a client request is received at a server, the
server responds by locating the requested
information.
Push-only: the transfer of data from servers to
clients is initiated by a server push in the absence
of any specific request from clients.
Hybrid: hybrid mode of data delivery combines
the client-pull and server-push mechanisms.
Frequency
Periodic: data are sent from the server to clients at
regular intervals.
Both pull and push can be performed in periodic
fashion.
Conditional: data are sent from servers whenever
certain conditions installed by clients in their
profiles are satisfied.
mostly used in the hybrid or push-only delivery
systems.
Ad-hoc or irregular: delivery is irregular
mostly in a pure pull-based system.
Communication Methods
Determine the various ways in which servers &
clients communicate for delivering information to
clients.
Unicast: the communication from a server to a
client is one-to-one
One-to-many: the server sends data to a number of
clients.
Promises of DDBs
Transparency
• Separation of the higher-level semantics of a system from lower-level implementation
issues
• User have no idea of distribution
SELECT ENAME,SAL
Tokyo
FROM EMP,ASG,PAY
WHERE DUR > 12 Boston Paris
AND EMP.ENO = ASG.ENO Paris projects
Paris employees
AND PAY.TITLE = EMP.TITLE Communication Paris assignments
Network Boston employees
Boston projects
Boston employees
Boston assignments
Montreal
New
Montreal projects
York Paris projects
Boston projects New York projects
New York employees with budget > 200000
New York projects Montreal employees
New York assignments Montreal assignments
Distributed Database - User
View
Distributed Database
Distributed DBMS - Reality
User
Query
User
DBMS
Application
Software
DBMS
Software
DBMS Communication
Software Subsystem
User
DBMS User Application
Software Query
DBMS
Software
User
Query
Types of Transparency
Data independence
Network transparency (or distribution
transparency)
Replication transparency
Fragmentation transparency
Data Independence
Two types, Logical Data Independence
and Physical Data Independence
A transparent system hides the
implementation details from its users
Network Transparency
Advantages:
Access from remote site will be treated as
local
Failure at one site
User have no idea of replication
44
Fragmentation Transparency
Fragmentation: Split data in multiple
parts/fragments
Alternative to replication
User have no idea of fragmentation
45
Responsibility of Transparency
Transparency is desirable but there is a
compromise between level of
Transparency and difficulty/cost
The language/Compiler: to provide
uniform method of manipulating data.
Avoid connectivity details.
Operating System: already provides in
form of Device Drivers
46
Canalso provide network Trns.
However not all provide complete.
Thirdlayer is DBMS. Uses features of
Operating System in particular for
network transparency.
Practically, we get combination of all
three.
47
Layers of Transparency
Compiler Language transparency
OS Network Transparency
Data
48
Reliability in DDBS
Reliabilitythrough Distributed
Transactions: Avoids simple point of
failure
Concurrency Issues: Transaction
involving multiple records
Failure Recovery: Involving multiple
sites
49
Performance Improvement
Through Data Localization
Each site handles a portion of data, so
contention of CPU or I/O is relatively
less
Reduces remote access delays; no
matter how fast networks are latency
delays are there and may be
unacceptable in certain case
Inter-query and Intra-query parallelism 50
System Expansion
easier to accommodate increasing database sizes
Incase of Replication:
Choose the appropriate copy
Update impact reflected on all
replicas
Failure Recovery
52
Complication Factors
Complexity
Cost: hardware and also the
duplication of manpower
Distribution of Control may cause
problems if not used cautiously
53
Design Issues
Database Design
• How to distribute the database
• Replicated & non-replicated database distribution
• A related problem in directory management
Query Processing
• Convert user transactions to data manipulation instructions
• Optimization problem
• min{cost = data transmission + local processing}
Concurrency Control
• Synchronization of concurrent accesses
• Consistency and isolation of transactions' effects
• Deadlock management
54
Design Issues
OS Support
• Operating system with proper support for database
operations
Heterogeneity
Relationship among them
55
Relationship Between Issues
Directory
Management
Query Distribution
Reliability
Processing Design
Concurrency
Control
Deadlock
Management
Architecture
Defines the structure of the system
components identified
functions of each component defined
interrelationships
and interactions
between components defined
ANSI/SPARC Architecture
Users
Conceptual Conceptual
view
Schema
GCS
LCS1 LCS2
... LCSn
Local Recovery
Global Query
Local Query
Controller
Optimizer
Execution
Handler
Processor
Processor
Manager
Runtime
Monitor
Support
USER Global
System
responses
Datalogical Multi-DBMS
Architecture
GES1 GES2 ... GESn
Global
User
Request
Local Local
User Multi-DBMS User
Request Layer Request
Global Global Global
Subrequest Subrequest Subrequest
Date: //2019
Submission Date: //2019
71
Assignment # 2
SQL to Relational Algebra transformation
Date:
Submission Date:
72