Distributed OS 1

Introduction
A distributed system is a collection of autonomous computers

that appear to the users of the system as a single computer.
1. 1
Introduction
Computer Architecture consisting of interconnected multiple
processors are basically of two types:
1. Tightly Coupled system:

Mainly with Shared memory
2. Loosely Coupled System:

* With own local memory.
* All physical communication between the processors is done by
passing messages across the network that interconnects the
processors
Tightly coupled systems are called parallel processing system and

loosely coupled system is called distributed computing systems
1. 2
Network Operating System
•It provides an environment where users are aware of the multiplicity
of machines.
• Users can access remote resources by

* Logging into the remote machine OR
•Transferring data from the remote machine to their own machine
•Users should know where the required files and directories are and
mount them.
• E.g NFS from Sun Microsystems
1. 3
Distributed Operating System
Runs on a cluster of machines with no shared memory
Users get the feel of a single processor - virtual uniprocessor
Transparency is the driving force.
Requires
•A single global IPC mechanism
•A global protection mechanism
•Identical process management and system calls at all nodes
•Common file system at all nodes
1. 4
Advantages
Communication and resource sharing possible

Economics – price-performance ratio
Reliability, scalability
Potential for incremental growth
Disadvantages
Network connectivity essential

Security and privacy
Complexity
1. 5
Transparency: A Main Design Issue for DOS
Transparency Description
Hide differences in data representation and how a resource
Access
is accessed
Location Hide where a resource is located
Migration Hide that a resource may move to another location
Hide that a resource may be moved to another location
Relocation
while in use
Hide that a resource may be shared by several competitive
Replication
users
Hide that a resource may be shared by several competitive
Concurrency
users
Failure Hide the failure and recovery of a resource
Hide whether a (software) resource is in memory or on
Persistence 1. 6
disk
Divided Mainly in Two Parts:
1. Process Management
2. File Management OR Distributed File Systems
1. 7
File System
Two Main Purpose:
1. Permanent Storage of Information
2. Sharing of Information
File system performs file management activities like

Organization
Storage
Retrieval
Naming
sharing
protection
1. 8
Distributed File System Supports…
•Complexity increases as distributed file system is physically dispersed
•Additionally it supports the following:
1. Remote Information Sharing:

A file is to be transparently accessed by processes of any node of
the system, irrespective of the file’s location
2. User Mobility:
User should not be forced to work on particular node but
flexibility should be there to work on any node at different times
1. 9
Distributed File System Supports…
Cont…
3. Availability:
Files should be available for use even in the even of
temporary failure of one and more nodes of the system
For this, multiple copies of the file should be kept at
different nodes. Each copy is called “replica”
Both existence of multiple copies and their locations
are hidden from the clients
4. Diskless Workstation:
Allows the use of diskless workstation in a system.
Adv.: Disk drives are expensive, emits heat as well as noisy
1. 10
Services provided by DFS
Typically three types of services
1. Storage Service:
2. True File Service
3. Name Service
1. Storage Service:
* Used for storage of files in the file system

* Provides logical view of the storage system by providing
operation for storing and retrieval
* Also called disk service as magnetic disk are used for storage
* Also called block service as allocation of disk space is in units of
fixed-size block
1. 11
Cont…
2. True File Service
* Concerned with the operation of individual files, e.g. accessing

and modifying data in files, creation and deletion of files
* Design issues like

File access mechanism, file-sharing semantics, file-caching
mechanism, multiple copy update protocol, access control
mechanism, concurrency control mechanism
1. 12
Cont..
3. Name Service
• Provides mapping between text names for files and references to

files (file IDs)
• Mostly implemented through directory so, also known as
directory service
• Responsible for directory related services like creation-deletion of
directories, adding/deleting a new file to a directory, changing the
name of file, moving a file from one to another directory
1. 13
Desirable Features of the DFS
1. Transparency:
Mainly of Four types :
Structure Transparency –
A DFS normally uses multiple file servers but the multiplicity of file
servers should be transparent to the clients and they should not
know the number of location of the file servers and the storage
devices. Ideally, it should look to its clients, like a conventional file
system offered by a centralized, time-sharing operating system.
Access transparency –
Both remote and local files should be accessible in the same way. File
system interface should not distinguish them
Naming Transparency –
The name of a file should not give the hint as to where the file is
located.
Replication Transparency –
If the file is replicated, their locations should be hidden from the
1. 14
clients
2. User Mobility:
User should not be forced to work on particular node

Performance characteristic of the file system should not discourage user
for that ….one of the way is, bring environment of user (home
directory) at the time of login at particular node
3. Performance
It is measure normally as the average amount of time needed to satisfy
clients requests. It includes CPU processing time, secondary storage
access time as well as network communication overhead
4. Simplicity and ease of use:

user interface should be simple and number of commands must be
small as well as simple.
1. 15
5. Scalability
• Addition of new nodes or interconnection between two
networks should be allowed
• Such growth should not cause serious disruption of
service or significant loss of performance to users
6. High Availibility
• A DFS should continue to function even when partial
failure occur due to failure of one or more components.
• Therefore, highly available and scalable DFS should have
multiple and independent file servers, controlling multiple and
independent storage devices.
• Replication of files at multiple servers is needed
1. 16
7. High Reliability
Probability of loss of stored data should be minimized….

Automatically should be generated backup copies
8. Data Integrity
A file is often shared by multiple users… Concurrent access request

should be synchronized by proper concurrency control mechanism
1. 17
9. Security:
User must be secure for the privacy of their data. Passing rights to
access a file should be performed safely
10. Heterogeneity:
It provides flexibility to user for using different platforms for different
applications… one node may be UNIX workstation and another may
be Macintosh
A DFS should be allowed to allow a variety of workstations to

participate in the sharing of files
Also, different storage media (magnetic and optical or any other)

should be allowed
1. 18
File Models
Conceptual model supported…
1. Unstructured and Structured Models

2. Mutable and Immutable Models
A file is unstructured sequence of data and Operating system is not

interested in the information stored in the files. Hence, application
programs decide the interpretation and meaning of the file.
In Structured model, a file appears to the file server as an ordered

sequence of records. Records of different files of the same file system
can be of different size.
1. 19
File Models
(cont..)
Structured files are of two types:
Files with indexed records and files with non-indexed records
i. )In non-indexed records, files are accessed by specifying its position

within the file (e.g. 5th record)
ii.)In indexed record, records have one or more key fields and can be
addressed by specifying the values of the key fields. (B-tree or hash-
table is suitable data structure for this)
Most modern operating systems use unstructured file model

nowadays because of the flexibility to application program
1. 20
File Models
2. Mutable and Immutable Files
According to criteria of file modifiability
Mutable files: update of a file, overwrites on its old contents and file
is represented as a single stored sequence. Most modern OS are of
this type.
In immutable files, a file cannot be modified once it is created except

it is deleted. Each time file version is changed when updated and old
version is retained. This method makes sharing, caching and
replication features easier…
Cedar File system use a parameter ‘keep’. Value of keep will decide
number of versions to be retained.
1. 21
File Accessing Models
Decides the manner in which client request to access a file is
serviced.
Mainly depends on two factors –
Remote Service Model

How to access remote files
Data-caching Model
File-level transfer model

Unit of data access
Block-level transfer model
1. 22
Remote Service Model
Processing of the client’s request is performed at the server’s node.
The client request for file, the server performs the access request and
finally the server replies are transferrred across the network as
messages.
Dis adv: each remote file access generates network traffic
Data-caching model
To reduce traffic, this method copies the data needed to satisfy the
clients’s access request and cached there The algorithm like LRU is
used to keep the cache updated
Adv: decrease netowrk traffic

Dis adv: write operation incurs overhead for modifying the cached
data as well as original data…. Refereed as cache consistency problem
Hybrid implementations are also suggested…e.g. LOTUS and NFS
(Network File Systems) use remote service model as well as data-
caching model 1. 23
Unit of Data Transfer
The file system that uses data-caching mechanism, important desing

issue is to decide the unit of data transfer
1. File level transfer model

2. Block-level transfer model
3. Byte-level transfer model
4. Record-level transfer model
1. 24
1. File level transfer model
Whole file is moved when required
Adv: simple and more efficient as fewer access to file server is

required. Disk access routines can be better optimized. If network
failure occurs in between (after the file is transferred), not affected.
Dis adv: sufficient storage space on the client’s node is required. Not
efficient when file size is large. Not useful for diskless workstations
Amoeba uses this approach
1. 25
2. Block level transfer model
Data transfer take place in units of blocks. Also called page level
transfer model as the page size is same as block size
Adv: More storage is not required on client side,

fast when file size is large, useful for diskless workstation
Dis adv: Multiple server request are needed when whole file is
required to transfer. It increases traffic.
LOCUS uses this approach
1. 26
3. Byte level transfer model
Data transfer take place in units of bytes.
Adv: Provides max. flexibility. More storage is not required on client

side,
Dis adv: Cache management is difficult
1. 27
4. Record-level transfer model
Useful only for structured file model

Data transfer in units of records
Flexibility is there at the cost of complexity ( size of record may not
be fixed)
File Sharing, Caching Semantics…………
1. 28
File Sharing Semantics
In shared file, the challenge is that when the modification of file data
made by a user is observable by other users…
Main THREE schemes are there:
1) UNIX Semantics
2) Session Semantics
3) Immutable Semantics
4) Transaction – like semantics
1. 29
UNIX Semantics
• Every operation on a file sees the effects of all previous write

operations performed on that file…i.e. write to an open file becomes
immediately visible to other user who have opened the same file
• This is easily possible for single processor system…
• Not advisable for distributed file system:
• Anyhow it is implemented, for distributed file system, …. One
possibility – DON’T ALLOW CACHING for shared files….
• Even if caching is not allowed, the network delay will not make this
possible
• Poor scalability, poor reliability, poor performance – as caching is not
allowed.
• DFS use relaxed schemes and not unix like scheme. Whenever,
immediate effect is expected, mechanism like ‘lock’ should be used
1. 30
Session Semantics
• A session is a series of file accesses made between open and close

operations.
• All changes made are visible only to client process that opened the
session and invisible for others (remote clients)
• Once the session is closed, the changes are made visible to remote
processes only in later starting session. Already open instances of the
file do not reflect these changes.
• If multiple clients modify the file …what should be the final image?
This is the challenging question: The semantics suggest to store the
image in order of the session ended (file closed)… but because of
network delay, this may not be correct. Therefore, final file image is
non-deterministic
* This semantics should be used with file level transfer model… as the
session is reading the whole image of the entire file when it is
opened 1. 31
Immutable shared-file Semantics
• It is used with Immutable file models

• Changes to the file are visible using new version of the file
• When the file is declared as immutable, it becomes read only
• Files are shared only in read only mode
• Less flexible as well as least complex semantics
1. 32
Transaction like semantics
• Transaction is a mechanism to control concurrent access to shared,

mutable data.
• Transaction is a set of operation enclosed between begin_transaction
and end_transaction like operations.
• It ensures that partial modifications made to the shared data by a
transaction will not be visible to other concurrently executing
transaction until the transaction ends.
• If transactions are run in some particular order, the final file content
will be the same.
1. 33
File Caching Schemes
For improving the performance, file caching is used
By this mechanism, recently used data is retained in memory so that
repeated access to the same information can be handled without
additional disk transfer.
It is possible to cache the remotely located data on a client mode,
along with performance scalability and reliability improves, network
traffic decreases considerably.
While implementing following issues should be considered:

Granularity of cached data ( large Vs small)
Cache size (large Vs small)
Replacement policy
Cache location
Modification propagation
Cache validation
1. 34
Three possible Cache Locations
1) Server’s main memory

2) Client’s disk
3) Client’s main memory
1. 35
Cache Location
1) Server’s Main Memory:
Total cost involved in one disk access + one network access , even
though considerable performance gain than no caching
Adv:
Transparent to clients
Always original file and cached data are consistent
Easy to implement
Easy to support UNIX like file sharing semantics
Limitation:
Don’t avoid network access cost
Not helping in scalability and reliability issues
1. 36
Cache Location
2) Client’s Disk
It eliminates network access but disk access is still there
Adv:
Contribute to scalability and Reliability increases (but in case of
crash, modification are lost)
Largest storage capacity, so hit ratio can be increased
Continue to work during disconnected network
Limitation:
Don’t work with diskless workstations
Disk access has not been avoided
According to Tanenbaum, this method is not better than the previous
one. 1. 37
Cache Location
3) Client’s Main Memory
It eliminates both network access and disk access
Adv:
Provides max. performance gain
Permits workstations to be diskless
Contributes to scalability and reliability issues
Limitation:
Not good when large file sizes are there
Relaiability is less than the cache on client’s disk
1. 38
Modification Propagation
• When to propogate the modification made to a cached data to the

corrosponding file server ( cache may become inconsistent if proper
scheme is not adapted)
• How to verify the validity of cached data
1. 39
• Two approaches for propogation
• 1) Write through scheme
When a cache entry is modified, the new value is sent to server

and master copy is updated
• 2) Delayed-write scheme
previous method is not reducing the network traffic, so this
method is prposed:
1. 40
• 2) Delayed-write scheme
Three approaches for this:
• I) Write on ejection from cache

• Modified data is sent to server when cache replacement policy
decides to eject it from the client’s cache
• II) Periodic Write:
• III) Write on Close
When the file is closed, the modification is sent to server.
Adv of Delayed write:

• Quickly and effective (temporary data not req. to store)
• All files simultaneously can be updated efficiently
Dis Adv:
Reliability is less 1. 41
• Cache Validataion Schemes:
1) Client Initiated approach

i) check before every access
ii) periodic checking
iii) check on file open
2) Server-Initated approach
Server keep track of which client has opened the file in which mode…
Does not allow a file to be opened in read and write mode

simultaneously
1. 42

Distributed OS 1

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Distributed OS 1

Caricato da

Copyright:

Formati disponibili

Introduction

A distributed system is a collection of autonomous computers

1. Tightly Coupled system:

2. Loosely Coupled System:

Tightly coupled systems are called parallel processing system and

• Users can access remote resources by

• E.g NFS from Sun Microsystems

Users get the feel of a single processor - virtual uniprocessor

Transparency is the driving force.

Communication and resource sharing possible

Network connectivity essential

Divided Mainly in Two Parts:

2. File Management OR Distributed File Systems

1. Permanent Storage of Information

File system performs file management activities like

•Additionally it supports the following:

1. Remote Information Sharing:

* Used for storage of files in the file system

2. True File Service

* Concerned with the operation of individual files, e.g. accessing

* Design issues like

• Provides mapping between text names for files and references to

User should not be forced to work on particular node

4. Simplicity and ease of use:

Probability of loss of stored data should be minimized….

A file is often shared by multiple users… Concurrent access request

A DFS should be allowed to allow a variety of workstations to

Also, different storage media (magnetic and optical or any other)

1. Unstructured and Structured Models

1. Unstructured and Structured Models

A file is unstructured sequence of data and Operating system is not

In Structured model, a file appears to the file server as an ordered

Files with indexed records and files with non-indexed records

i. )In non-indexed records, files are accessed by specifying its position

Most modern operating systems use unstructured file model

According to criteria of file modifiability

In immutable files, a file cannot be modified once it is created except

Remote Service Model

File-level transfer model

Adv: decrease netowrk traffic

The file system that uses data-caching mechanism, important desing

1. File level transfer model

1. File level transfer model

Whole file is moved when required

Adv: simple and more efficient as fewer access to file server is

Amoeba uses this approach

2. Block level transfer model

Adv: More storage is not required on client side,

LOCUS uses this approach

3. Byte level transfer model

Data transfer take place in units of bytes.

Adv: Provides max. flexibility. More storage is not required on client

Dis adv: Cache management is difficult

4. Record-level transfer model

Useful only for structured file model

File Sharing, Caching Semantics…………

Main THREE schemes are there:

• Every operation on a file sees the effects of all previous write

• A session is a series of file accesses made between open and close

Immutable shared-file Semantics

• It is used with Immutable file models

• Transaction is a mechanism to control concurrent access to shared,