Sei sulla pagina 1di 42

Introduction

A distributed system is a collection of autonomous computers


that appear to the users of the system as a single computer.

1. 1
Introduction
Computer Architecture consisting of interconnected multiple
processors are basically of two types:

1. Tightly Coupled system:


Mainly with Shared memory

2. Loosely Coupled System:


* With own local memory.
* All physical communication between the processors is done by
passing messages across the network that interconnects the
processors

Tightly coupled systems are called parallel processing system and


loosely coupled system is called distributed computing systems
1. 2
Network Operating System
•It provides an environment where users are aware of the multiplicity
of machines.

• Users can access remote resources by


* Logging into the remote machine OR
•Transferring data from the remote machine to their own machine

•Users should know where the required files and directories are and
mount them.

• E.g NFS from Sun Microsystems

1. 3
Distributed Operating System
Runs on a cluster of machines with no shared memory

Users get the feel of a single processor - virtual uniprocessor

Transparency is the driving force.

Requires
•A single global IPC mechanism
•A global protection mechanism
•Identical process management and system calls at all nodes
•Common file system at all nodes

1. 4
Distributed Operating System
Advantages

Communication and resource sharing possible


Economics – price-performance ratio
Reliability, scalability
Potential for incremental growth

Disadvantages

Network connectivity essential


Security and privacy
Complexity

1. 5
Transparency: A Main Design Issue for DOS
Transparency Description
Hide differences in data representation and how a resource
Access
is accessed
Location Hide where a resource is located
Migration Hide that a resource may move to another location
Hide that a resource may be moved to another location
Relocation
while in use
Hide that a resource may be shared by several competitive
Replication
users
Hide that a resource may be shared by several competitive
Concurrency
users
Failure Hide the failure and recovery of a resource
Hide whether a (software) resource is in memory or on
Persistence 1. 6
disk
Distributed Operating System

Divided Mainly in Two Parts:

1. Process Management

2. File Management OR Distributed File Systems

1. 7
File System
Two Main Purpose:

1. Permanent Storage of Information

2. Sharing of Information

File system performs file management activities like


 Organization
 Storage
 Retrieval
 Naming
 sharing
 protection

1. 8
Distributed File System Supports…
•Complexity increases as distributed file system is physically dispersed

•Additionally it supports the following:

1. Remote Information Sharing:


A file is to be transparently accessed by processes of any node of
the system, irrespective of the file’s location

2. User Mobility:
User should not be forced to work on particular node but
flexibility should be there to work on any node at different times

1. 9
Distributed File System Supports…
Cont…

3. Availability:
 Files should be available for use even in the even of
temporary failure of one and more nodes of the system
 For this, multiple copies of the file should be kept at
different nodes. Each copy is called “replica”
 Both existence of multiple copies and their locations
are hidden from the clients

4. Diskless Workstation:
Allows the use of diskless workstation in a system.
Adv.: Disk drives are expensive, emits heat as well as noisy

1. 10
Services provided by DFS
Typically three types of services

1. Storage Service:
2. True File Service
3. Name Service

1. Storage Service:

* Used for storage of files in the file system


* Provides logical view of the storage system by providing
operation for storing and retrieval
* Also called disk service as magnetic disk are used for storage
* Also called block service as allocation of disk space is in units of
fixed-size block

1. 11
Services provided by DFS
Cont…

2. True File Service

* Concerned with the operation of individual files, e.g. accessing


and modifying data in files, creation and deletion of files

* Design issues like


File access mechanism, file-sharing semantics, file-caching
mechanism, multiple copy update protocol, access control
mechanism, concurrency control mechanism

1. 12
Services provided by DFS
Cont..

3. Name Service

• Provides mapping between text names for files and references to


files (file IDs)
• Mostly implemented through directory so, also known as
directory service
• Responsible for directory related services like creation-deletion of
directories, adding/deleting a new file to a directory, changing the
name of file, moving a file from one to another directory

1. 13
Desirable Features of the DFS
1. Transparency:
Mainly of Four types :
Structure Transparency –
A DFS normally uses multiple file servers but the multiplicity of file
servers should be transparent to the clients and they should not
know the number of location of the file servers and the storage
devices. Ideally, it should look to its clients, like a conventional file
system offered by a centralized, time-sharing operating system.
Access transparency –
Both remote and local files should be accessible in the same way. File
system interface should not distinguish them
Naming Transparency –
The name of a file should not give the hint as to where the file is
located.
Replication Transparency –
If the file is replicated, their locations should be hidden from the
1. 14
clients
Desirable Features of the DFS
2. User Mobility:

User should not be forced to work on particular node


Performance characteristic of the file system should not discourage user
for that ….one of the way is, bring environment of user (home
directory) at the time of login at particular node

3. Performance
It is measure normally as the average amount of time needed to satisfy
clients requests. It includes CPU processing time, secondary storage
access time as well as network communication overhead

4. Simplicity and ease of use:


user interface should be simple and number of commands must be
small as well as simple.
1. 15
Desirable Features of the DFS
5. Scalability
• Addition of new nodes or interconnection between two
networks should be allowed
• Such growth should not cause serious disruption of
service or significant loss of performance to users

6. High Availibility
• A DFS should continue to function even when partial
failure occur due to failure of one or more components.
• Therefore, highly available and scalable DFS should have
multiple and independent file servers, controlling multiple and
independent storage devices.
• Replication of files at multiple servers is needed

1. 16
Desirable Features of the DFS
7. High Reliability

Probability of loss of stored data should be minimized….


Automatically should be generated backup copies

8. Data Integrity

A file is often shared by multiple users… Concurrent access request


should be synchronized by proper concurrency control mechanism

1. 17
Desirable Features of the DFS
9. Security:
User must be secure for the privacy of their data. Passing rights to
access a file should be performed safely

10. Heterogeneity:
It provides flexibility to user for using different platforms for different
applications… one node may be UNIX workstation and another may
be Macintosh

A DFS should be allowed to allow a variety of workstations to


participate in the sharing of files

Also, different storage media (magnetic and optical or any other)


should be allowed

1. 18
File Models
Conceptual model supported…

1. Unstructured and Structured Models


2. Mutable and Immutable Models

1. Unstructured and Structured Models

A file is unstructured sequence of data and Operating system is not


interested in the information stored in the files. Hence, application
programs decide the interpretation and meaning of the file.

In Structured model, a file appears to the file server as an ordered


sequence of records. Records of different files of the same file system
can be of different size.

1. 19
File Models
1. Unstructured and Structured Models

(cont..)
Structured files are of two types:

Files with indexed records and files with non-indexed records

i. )In non-indexed records, files are accessed by specifying its position


within the file (e.g. 5th record)

ii.)In indexed record, records have one or more key fields and can be
addressed by specifying the values of the key fields. (B-tree or hash-
table is suitable data structure for this)

Most modern operating systems use unstructured file model


nowadays because of the flexibility to application program
1. 20
File Models
2. Mutable and Immutable Files

According to criteria of file modifiability

Mutable files: update of a file, overwrites on its old contents and file
is represented as a single stored sequence. Most modern OS are of
this type.

In immutable files, a file cannot be modified once it is created except


it is deleted. Each time file version is changed when updated and old
version is retained. This method makes sharing, caching and
replication features easier…
Cedar File system use a parameter ‘keep’. Value of keep will decide
number of versions to be retained.

1. 21
File Accessing Models
 Decides the manner in which client request to access a file is
serviced.
 Mainly depends on two factors –

Remote Service Model


How to access remote files
Data-caching Model

File-level transfer model


Unit of data access
Block-level transfer model

1. 22
File Accessing Models
 Remote Service Model
Processing of the client’s request is performed at the server’s node.
The client request for file, the server performs the access request and
finally the server replies are transferrred across the network as
messages.
Dis adv: each remote file access generates network traffic
 Data-caching model
To reduce traffic, this method copies the data needed to satisfy the
clients’s access request and cached there The algorithm like LRU is
used to keep the cache updated

Adv: decrease netowrk traffic


Dis adv: write operation incurs overhead for modifying the cached
data as well as original data…. Refereed as cache consistency problem
Hybrid implementations are also suggested…e.g. LOTUS and NFS
(Network File Systems) use remote service model as well as data-
caching model 1. 23
File Accessing Models
 Unit of Data Transfer

The file system that uses data-caching mechanism, important desing


issue is to decide the unit of data transfer

1. File level transfer model


2. Block-level transfer model
3. Byte-level transfer model
4. Record-level transfer model

1. 24
File Accessing Models
 Unit of Data Transfer

1. File level transfer model

Whole file is moved when required

Adv: simple and more efficient as fewer access to file server is


required. Disk access routines can be better optimized. If network
failure occurs in between (after the file is transferred), not affected.

Dis adv: sufficient storage space on the client’s node is required. Not
efficient when file size is large. Not useful for diskless workstations

Amoeba uses this approach

1. 25
File Accessing Models
 Unit of Data Transfer

2. Block level transfer model

Data transfer take place in units of blocks. Also called page level
transfer model as the page size is same as block size

Adv: More storage is not required on client side,


fast when file size is large, useful for diskless workstation
Dis adv: Multiple server request are needed when whole file is
required to transfer. It increases traffic.

LOCUS uses this approach

1. 26
File Accessing Models
 Unit of Data Transfer

3. Byte level transfer model

Data transfer take place in units of bytes.

Adv: Provides max. flexibility. More storage is not required on client


side,

Dis adv: Cache management is difficult

1. 27
File Accessing Models
 Unit of Data Transfer

4. Record-level transfer model

Useful only for structured file model


Data transfer in units of records
Flexibility is there at the cost of complexity ( size of record may not
be fixed)

File Sharing, Caching Semantics…………

1. 28
File Sharing Semantics

 In shared file, the challenge is that when the modification of file data
made by a user is observable by other users…

 Main THREE schemes are there:

1) UNIX Semantics
2) Session Semantics
3) Immutable Semantics
4) Transaction – like semantics

1. 29
File Sharing Semantics
 UNIX Semantics

• Every operation on a file sees the effects of all previous write


operations performed on that file…i.e. write to an open file becomes
immediately visible to other user who have opened the same file
• This is easily possible for single processor system…
• Not advisable for distributed file system:
• Anyhow it is implemented, for distributed file system, …. One
possibility – DON’T ALLOW CACHING for shared files….

• Even if caching is not allowed, the network delay will not make this
possible
• Poor scalability, poor reliability, poor performance – as caching is not
allowed.

• DFS use relaxed schemes and not unix like scheme. Whenever,
immediate effect is expected, mechanism like ‘lock’ should be used
1. 30
File Sharing Semantics
 Session Semantics

• A session is a series of file accesses made between open and close


operations.
• All changes made are visible only to client process that opened the
session and invisible for others (remote clients)
• Once the session is closed, the changes are made visible to remote
processes only in later starting session. Already open instances of the
file do not reflect these changes.
• If multiple clients modify the file …what should be the final image?
This is the challenging question: The semantics suggest to store the
image in order of the session ended (file closed)… but because of
network delay, this may not be correct. Therefore, final file image is
non-deterministic
* This semantics should be used with file level transfer model… as the
session is reading the whole image of the entire file when it is
opened 1. 31
File Sharing Semantics

 Immutable shared-file Semantics

• It is used with Immutable file models


• Changes to the file are visible using new version of the file
• When the file is declared as immutable, it becomes read only
• Files are shared only in read only mode
• Less flexible as well as least complex semantics

1. 32
File Sharing Semantics
Transaction like semantics

• Transaction is a mechanism to control concurrent access to shared,


mutable data.
• Transaction is a set of operation enclosed between begin_transaction
and end_transaction like operations.
• It ensures that partial modifications made to the shared data by a
transaction will not be visible to other concurrently executing
transaction until the transaction ends.
• If transactions are run in some particular order, the final file content
will be the same.

1. 33
File Caching Schemes
 For improving the performance, file caching is used
 By this mechanism, recently used data is retained in memory so that
repeated access to the same information can be handled without
additional disk transfer.
 It is possible to cache the remotely located data on a client mode,
along with performance scalability and reliability improves, network
traffic decreases considerably.

 While implementing following issues should be considered:


 Granularity of cached data ( large Vs small)
 Cache size (large Vs small)
 Replacement policy
 Cache location
 Modification propagation
 Cache validation

1. 34
File Caching Schemes

 Three possible Cache Locations

1) Server’s main memory


2) Client’s disk
3) Client’s main memory

1. 35
File Caching Schemes
Cache Location

1) Server’s Main Memory:

 Total cost involved in one disk access + one network access , even
though considerable performance gain than no caching
 Adv:
 Transparent to clients
 Always original file and cached data are consistent
 Easy to implement
 Easy to support UNIX like file sharing semantics
 Limitation:
 Don’t avoid network access cost
 Not helping in scalability and reliability issues

1. 36
File Caching Schemes
Cache Location

2) Client’s Disk

 It eliminates network access but disk access is still there

 Adv:
 Contribute to scalability and Reliability increases (but in case of
crash, modification are lost)
 Largest storage capacity, so hit ratio can be increased
 Continue to work during disconnected network

 Limitation:
 Don’t work with diskless workstations
 Disk access has not been avoided
According to Tanenbaum, this method is not better than the previous
one. 1. 37
File Caching Schemes
Cache Location

3) Client’s Main Memory

It eliminates both network access and disk access

 Adv:
 Provides max. performance gain
 Permits workstations to be diskless
 Contributes to scalability and reliability issues

 Limitation:
 Not good when large file sizes are there
 Relaiability is less than the cache on client’s disk

1. 38
File Caching Schemes
Modification Propagation

• When to propogate the modification made to a cached data to the


corrosponding file server ( cache may become inconsistent if proper
scheme is not adapted)

• How to verify the validity of cached data

1. 39
File Caching Schemes
Modification Propagation

• Two approaches for propogation

• 1) Write through scheme

When a cache entry is modified, the new value is sent to server


and master copy is updated
• 2) Delayed-write scheme
previous method is not reducing the network traffic, so this
method is prposed:

1. 40
File Caching Schemes
Modification Propagation

• 2) Delayed-write scheme
Three approaches for this:

• I) Write on ejection from cache


• Modified data is sent to server when cache replacement policy
decides to eject it from the client’s cache
• II) Periodic Write:
• III) Write on Close
When the file is closed, the modification is sent to server.

Adv of Delayed write:


• Quickly and effective (temporary data not req. to store)
• All files simultaneously can be updated efficiently
Dis Adv:
Reliability is less 1. 41
File Caching Schemes
• Cache Validataion Schemes:

1) Client Initiated approach


i) check before every access
ii) periodic checking
iii) check on file open

2) Server-Initated approach

Server keep track of which client has opened the file in which mode…

Does not allow a file to be opened in read and write mode


simultaneously

1. 42

Potrebbero piacerti anche