Sei sulla pagina 1di 22

Distributed Operating

Systems
By:

Akshay Dabholkar
Mayur Palankar
Amol Pandit
Based on the paper by Andrew S. Tanenbaum and Robbert Van Renesse
Outline
„ What is a Distributed Operating System ?

„ How is it different ?

„ Why Distributed Operating Systems ?

„ Problems with Distributed Operating Systems

„ Distributed Operating System Models

„ Design Issues

„ Comparison of some Distributed Operating Systems

„ Conclusion
What is a Distributed Operating System ?

„ A Distributed Operating System is the one that runs on multiple,


autonomous CPUs which provides its users an illusion of an ordinary
Centralized Operating System that runs on a Virtual Uniprocessor.

„ Distributed Operating Systems provide resource transparency to the user


processes.

„ “If you can tell which computer you are using, you are not using a
distributed operating system.” - Tanenbaum
How is it different ?

„ The Distributed Operating System is unique and resides on different CPUs.

„ User processes can run on any of the CPUs as allocated by the Distributed
Operating System.

„ Data can be resident on any machine that is the part of the Distributed
System.

„ All multi-machine systems are not Distributed Systems.

„ “It is the software not the hardware that determines whether a system is
distributed or not” - Tanenbaum
Distributed OS vs. Network OS.

¾ User is not aware of the multiple ¾ User is aware of the existence of


CPUs. multiple CPUs.

¾ Each machine runs a part of the ¾ Each machine has its own private
Distributed Operating System. Operating System.

¾The system is fault-tolerant. ¾ The system is not fault-tolerant.


Why Distributed Operating Systems ?

„ Price/Performance advantage (Availability of cheap and powerful


Microprocessors).

„ Incremental growth.

„ Reliability and Availability.

„ Simplicity of Software (Theoretically).

„ Provides Transparency.

„ Creates another level of abstraction (e.g. Process creation).


Problems with Distributed Operating System

„ Communication Protocol Overhead.

„ Lack of Simplicity.

„ High requirement of the degree of fault tolerance.

„ Lack of global state information (e.g. No global Process Tables).

„ Atomic Transactions.

„ Process and Data Migration (e.g. During Load Balancing and Paging
respectively).
Distributed Operating System Models

„ Minicomputer Model
¾ It consists of a few minicomputers each with multiple users.
¾ Simple outgrowth of the Central Time-Sharing Systems.
¾ Each user is locally logged-on to one machine and remotely logged-on to other machines.
¾ (Logged-in Users / Available CPUs) < 1

„ Workstation Model
¾ Each user has his personal workstation and nearly all work is done on the workstation.
¾ Each user is locally logged-on to one machine and remotely logged-on to other machines.
¾ It supports single, global file-system that provides location-independent data access.
¾ (Logged-in Users / Available CPUs) ~ 1

„ Processor Pool Model


¾ When an user needs to perform computation, a processor is allocated from the processor
pool to the user task.
¾ (Logged-in Users / Available CPUs) > 1
Design Issues

„ Communication Primitives

„ Naming and Protection

„ Resource Management

„ Fault-Tolerance

„ Services
Communication Primitives

ƒ Message Passing

Client
sends
Server
request
receives
message
request
message

Client-Server Model of Communication

™ Types of Message Passing Primitives

¾ Blocking versus Non-Blocking Primitives

¾ Buffered versus Unbuffered Primitives


Communication Primitives
„ Remote Procedure Call (RPC)

¾ The idea is to make the semantics of Inter-machine communication as similar to normal


machine calls.

¾ RPC Design Issues:

¾ Parameter Passing: Passing reference parameters over the network is not easy. A unique system-
wide pointer for each object is needed to access it remotely.

¾ Parameter Representation: Incompatible representation of data across network. Conversion to and


from a standard format is expensive and wasteful when both the receiver and sender use the same
formats.

¾ Client-Server Binding: Sometimes it is important to know the details of the servers while handling
RPC calls (Multiple File Server systems). Its difficult to achieve this functionality.
Naming and Protection
„ OS support a large number of objects like files, directories, segments,
mailboxes, processes, services, servers, nodes and I/O devices.
„ Required for Object Recognition.
™ Naming as Mapping
¾ Problem of mapping between two domains.
™ Name Servers

¾ Maintain a table or database of the name-to-


object mapping.

¾ Services, processes, etc need to register with


the underlying naming system.

¾ Name Server Models:

o Centralized Name Server Model: A single server accepts names in one domain and maps them to
names in another domain.
o Distributed Name Lookup Model: Partition the system into domains with each domain having its own
naming server.
Resource Management
„ Managing resources without having accurate global state information is
difficult.

„ Distributed OS do not have tables that provide up-to-date status information


of all the resources being managed.

„ Considerations:

¾ Processor Allocation

¾ Scheduling

¾ Load balancing

¾ Distributed Deadlock Detection


Processor Allocation
„ Processors are organized in a logical hierarchy independent of the physical
structure of the network (MICROS).

¾ Each manager has an idea about the free processors possessed by it.

¾ If it has enough number of free processors for a request then it allocates them
otherwise forwards the request to his immediate boss.
Scheduling
„ In presence of multiple processors, a way is needed to ensure that processes that
communicate frequently run simultaneously so that they can be scheduled together in
a group to run on different processors.

„ It is difficult to dynamically determine the inter-process communication (IPC) patterns.

„ Ousterhout has proposed several algorithms based on the concept of Coscheduling,


which takes IPC patterns into account while scheduling to ensure that all members of
a group run at the same time.

„ One idea is to have each processor use a round-robin scheduling algorithm and
schedule all processes that communicate with each other on different processors in
the same slot, to achieve N-fold parallelism.

„ The disadvantage of this approach is the high overhead incurred for performing IPC
between processes of a group that run on different processors over the network.

„ To avoid high cost of IPC over the network, the closely related groups of processes
should be scheduled on the same processor.
Load balancing
„ In order to avoid one processor from being heavily loaded, load balancing is
required.

„ Techniques:
™ Graph-theoretic Model:
¾ Requires the CPU and memory requirements of each process and the average of traffic
between each pair of processes to be known in advance.
¾ System can be represented as a graph with each process as a node and each pair of
communicating process represented by an arc.
¾ The problem of allocating all the processes to k processors reduces to the problem of
partitioning the graph into k disjoint subgraphs.
¾ Drawback: This model is only of theoretic importance as none of the assumptions are
known in advance.

™ Heuristic Load Balancing:


¾ Each processor estimates its own load continuously, processors exchange load
information and this information is used for process creation and migration.

„ Practical Considerations of load balancing (How to do process migration?).


Fault-Tolerance

„ A fault tolerant system is the one that can continue functioning, perhaps in a
degraded form, even if something goes wrong.

„ One of the advantages of Distributed Operating Systems is that there are


enough resources to achieve fault tolerance.

„ Two radically different approaches:

¾ Redundancy Techniques

¾ Atomic Transactions
Redundancy Techniques

™ Redundancy through backup ™ Redundancy through recorder


process. process.

¾ Provides every process with a backup ¾ A special recorder process records all
process on different processor. messages sent on the network.

¾ All messages sent to a process are also ¾ Every process checkpoints itself onto a
sent to the backup process. remote disk periodically.

¾ If one process crashes, the other can ¾ On a crash the process is started on an
clone itself to make a new backup and idle processor from the most recent
continue. checkpoint. The recorder process sends it
all the messages the original process
received between the checkpoint and the
crash.
Atomic Transactions

„ The property to run-to completion or do nothing is called an atomic


update.

„ A technique for achieving Atomic Transactions proposed by


Lampson is it Building up an Hierarchy of Abstractions.

¾ It makes use of abstraction layers such as careful disk, stable storage


and stable processors to implement multicomputer atomic transactions.

„ How to implement Mutual Exclusion ?


¾ When 2 processes on different CPUs try to access shared memory
using remote semaphores.
¾ Network becomes the bottleneck.
Services
In a Distributed Operating system, it is useful to have user level server
processes to provide functions that have been traditionally provided by the
operating system leading to the microkernel approach of the operating
system design.

„ Server Structure (Single-threaded or Multi-threaded).


„ File Service (disk, flat file & directory services).
„ Print Service.
„ Process Service (Remote process creation and caching of servers
possible).
„ Terminal Service.
„ Mail service.
„ Time Service.
„ Boot Service.
„ Gateway Service.
Comparison of some Distributed Operating Systems
Cambridge Amoeba V Kernel Eden
Project
Developed By Computing Tanenbaum@ David Cheriton@ University of
Laboratory@ Vrije Universiteit- Stanford Washington-
Univ. of Amsterdam University Seattle
Cambridge
Communication RPC RPC RPC RPC
Primitives
Naming and Single Name Sparse Three-level Capabilities
Protection Server capabilities with naming without protection
encryption mechanism

Resources Processor Bank Processor Pool Workstation Workstation


Model Model
Fault tolerance Small server to Some fault No fault tolerance Uses Recorder
startup services tolerance through process.
boot server
File Server Universal file Several file Similar to Unix No file server.
service and Filing services. One process for
Machine each file
Conclusion
„ Distributed systems are interesting and fruitful area of research for the
future.
„ They advocate the use of Microkernel approach to Operating Systems
Design.

„ Latest Research:

¾ Plan 9 @ Bell Labs


¾ 2K @ UIUC
¾ Inferno @ Vita Nuova
¾ The Sprite OS @ Berkeley
¾ Mach @ CMU
¾ AgentOS @ UCI
¾ WebOS @ Berkeley

Potrebbero piacerti anche