Sei sulla pagina 1di 23

SEMINAR REPORT ON

Threads in Distributed Systems

Submitted by-

Chayana Das (CSI17026)

Anjan Kumar Sarma(CSI17015)


Outline:
 Introduction

 Issues in Designing DS

 Multithreaded Clients and Multithreaded Servers

 Thread Scheduling in DS

 Dynamic Thread Scheduling Algorithms

 Multithreaded RMI: A Case Study


Introduction

What is Thread?
A thread is the smallest unit of dispatchable code. That is, it is the smallest unit of a program
which the CPU can schedule and execute independent of the other parts of the program. A
thread basically is a part of a program which defines a separate path of execution.

Every thread comprises a thread id, a program counter, a register set and a stack. It shares its
code section, data section and other operating system resources with all other threads
belonging to the same process.

A multithreaded program contains two or more parts that can run concurrently. Thus,
multithreading is a specialized form of multitasking.

Multitasking can be achieved in two ways: Process-based and Thread-based. Thread-based


multitasking requires less overhead than Process-based multitasking.

Thread based multi-tasking (or multithreading) allows programmers to write very efficient
programs that make maximum use of the CPU, keeping the idle time to a minimum.

Threads Vs Processes
A process is actually a program that is executing. That is a process is actually an instance of a
program. Traditional processes are called heavyweight processes. Every process contains the
following resources:

 An image of the executable machine code associated with a program.


 Memory (typically some region of virtual memory); which includes the executable code,
process-specific data (input and output), a call stack (to keep track of active subroutines
and/or other events), and a heap to hold intermediate computation data generated during
run time.
 Operating system descriptors of resources that are allocated to the process, such as file
descriptors (Unix terminology) or handles (Windows), and data sources and sinks.
 Security attributes, such as the process owner and the process' set of permissions
(allowable operations).
 Processor state (context), such as the content of registers and physical memory addressing.
The state is typically stored in computer registers when the process is executing, and in
memory otherwise.[1]

A process may contain multiple threads of control each of which provides a separate path of
execution. Traditional processes are called heavyweight processes.
Processes have their own address space whereas all threads belonging to a single process share
the same address space.

Inter-process Communication is expensive and limited. Context switching from one process to
another is also costly. Inter-thread communication is, on the other hand, inexpensive and
context switching from one thread to another is low cost.

Threads are also called Light Weight processes.

Single Vs Multithreaded Processes


A single threaded process has only one thread of control. That is, the whole program has only
one execution path. Every process has its own process id, program counter, its code section,
data section, a set of registers, data structures and a stack.

On the other hand, a multi threaded process has multiple threads of execution. Each tread
contains its own thread id, register set and stack. However, threads belonging to the same
process share their code section, data section, and the files and data structures.
Thread Life Cycle
 Threads exist in several states.

 Running

 Ready to run (runnable)

 Suspended

 Resumed

 Blocked

 Terminated

A thread is ready to run as soon as it gets CPU time. A running thread can be suspended. A
suspended thread can be resumed to execute from where it left off. A thread can be blocked
when waiting for a resource. At any time, a thread may be terminated. Once terminated, a
thread cannot be resumed.
Introduction to Distributed Systems
A Distributed System is a collection of autonomous and independent systems (computers). It
appears to its users as a single coherent system.

Neither a Distributed OS nor a Network OS fully qualify to be a DS. A Distributed Operating


System manages the hardware of a tightly coupled Computer System. A Network Operating
System, on the other hand, manages the H/W of a group of loosely coupled heterogeneous
multi-computers.

A DS is a middleware that sits between the NOS and the applications. It hides the heterogeneity
of the computers from the users and presents the system as a single coherent system.

Each computer in a DS has its own CPU, memory, IO bandwidth etc. Synchronization among the
computers is a challenge.

Many models of DS have been evolved

 File-based: In this model, everything (including keyboard, mouse, disk, printer,


network interface etc.) is treated as files. Local files or remote files are treated
exactly the same way by this model. Examples are- UNIX, SUN NFS, Andrew FS,
CODA, Pan9.

 RPC-based: In this model, the remote procedures are called by the local objects
as the same way as they call the local procedures. This model is very much useful
in situations when complex operations are to be performed remotely by a
powerful machine. The remote machine performs the task on behalf of the local
machine and returns the result to the local machine. The most popular example
of RPC-based model is – JAVA RMI.

 Object-based: Same as the RPC-based model. But in this model methods of


remote objects are invked by local objects to perform the required task. In other
words, Object-based model is the object-oriented version of the RPC-based
model. Examples are- CORBA, DCOM, GLOBE.

 Document-based: As like the file based model, the document-based model


considers everything as a document. The documents can be stored either on a
local machine, or a more powerful remote machine. Standard protocols have
been developed to access those documents over a network. Examples are:
WWW, Lotus Notes.
Organization of the Distributed Systems:

As we can see in the diagram, the Distributed System is an additional layer that sits over
multiple machines and provides the applications a common interface. The various machines in
the network have their own operating systems. But the middleware hides the complexity and
heterogeneity of the network and offers its users the same interface. The users feel like they
are dealing with a single coherent system.
Issues in Designing DS
The main issues in designing a Distributed System are:

1. Heterogeneity

2. Openness

3. Security

4. Scalability

5. Failure handling

6. Concurrency

7. Transparency

 Access Transparency
 Location Transparency
 Migration Transparency
 Replication Transparency
 Relocation Transparency
 Concurrency Transparency
 Failure Transparency
8. Quality of Service

9. Reliability

10. Performance

Among these issues, concurrency is a major issue in a Distributed System. Since a distributed
system comprises of many autonomous systems, the interaction and synchronization among
the system is a major concern. Also concurrency transparency is another major issue which
suggests that the users of the system should be unaware of the fact that they are dealing with a
system which has many individual computers and their job is done by one of the many
computers. Concurrency in a DS can be achieved in terms of multi tasking. Distributed Systems
employ multi tasking in two ways- multi-processing and multithreading. In multiprocessing,
multiple processors of the DS execute the processes. Context switching among the processes
may be occurred to switch the CPU from one process to another. On the other hand, in
multithreading, a process is divided into multiple threads of control and the threads are
executed concurrently by either the same processor or by two or more different processors.
Context switching among threads may occur to switch the control of the CPU from one thread
to another. Inter thread communication is much easier than inter process communication since
the threads belonging to the same process have the same address space.
Threads in Distributed Systems
Before discussing about the use of threads in Distributed Systems, we should know that the
communication in a DS is always in terms of message passing. Also, the Communication model is Client-
Server. The Server provides some services and the Clients request for one or more of those services.

As other typical Client-Server models, the Distributed Systems also communicate over a Network. In a
Network-based environment, idle time is common. This idle time may be due to transmission error of
the network or may be that the transmission rate of data over the network is much slower than the rate
at which the computer can process it. In either case, the computers will be sitting idle most of the times
and CPU utilization will be minimal. To maximize the CPU utilization, the Distributed Systems use the
concept of multithreading. The idle time may be reduced by allowing a single process to run multiple
threads of control. For example, while one thread is waiting for a reply for some message, instead of
blocking the entire process, other threads in the process may continue executing.

Threads can provide a convenient means of allowing blocking system calls without blocking the entire
process in which it is running. This property makes threads particularly attractive to use with DS.

Another benefit of Multithreading is that Communication in a DS can be expressed in terms of


maintaining multiple logical connections at the same time.

Multithreading in a DS can be achieved in terms of Multithreaded Clients and Multithreaded Servers.


Multithreading Clients:
The round-trip delay in a wide area network is typically hundreds of milliseconds. This delay is due to
many factors including the time to find the route, time taken by the DNS to resolve the mapping of the
IP address to its Doman name, time to find the subnet, time to find the receiving node, and many more.
DS that operate in a wide area network may need to hide communication latencies in order to achieve a
high degree of distribution transparency. Instead of waiting for a long time after initializing the
communication, clients may proceed with some other tasks. This is what multithreaded clients actually
do.

Multithreading Client Example:


We can better understand the concept of Multithreading Clients with the help of an Example. A typical
example is a Web Browser.

A Web Browser is used to access a web document. A web document contains plain text, images, icons
etc. In a single threaded browser, to fetch each element, the browser sets up a connection, read the
incoming data and passes it to a display module. The display module waits until all the contents of the
web page are received. Web browsers normally use a buffer to hold the incoming data until it has been
received fully.

Connection set-up and reading data in a browser are blocking operations. The browser cannot setup
multiple connections at the same time. Similarly, it can read data sequentially one by one. All data are
directed to the buffer and once all data for the document is received, the browser starts displaying the
contents of the document. It is a tedious work since the user has to wait for a long time to see the
document if the network connection is very slow.

A multithreaded browser works on a different way. A multithreaded browser may start displaying data
while it is still coming in. For example, images take more time to be loaded. Instead of waiting for the
entire image to come, the browser can display other parts of the document such as text, facilities for
scrolling and so on.

Other multithreaded clients also work in a similar manner. Multithreaded clients can perform several
tasks at the same time.

Multithreaded Clients’ Benefits:


• In a single threaded environment, clients send a request and wait till the entire message is
received. A multithreaded client, on the other hand, initiates the communication and starts
doing some other task. It may perform processing on the partially received message. That is,
waiting time is reduced.
• Another important benefit of using multithreaded Clients (e.g. Browsers) is that several
connections to the Server can be opened simultaneously. Each thread sets up a separate
connection and pulls in the data. Each thread is responsible for fetching different data.

• Web Servers are normally replicated across multiple machines. Multithreaded clients allow
setting up connections to different replicas so that data can be transferred in parallel.
Multithreaded Servers:
The main use of multithreading in DS is found at the Server side.

Multithreading server allows simplifying server code and exploiting parallelism to attain high
performance. As the name suggests, multithreading servers use multiple threads of control to
perform several tasks at the same time. Different threads are responsible for performing
different tasks. In this way, multithreading servers not only provide parallelism to the Clients,
but also simplify the server code by distributing the tasks among different threads.
Multithreading Server Example:
A typical example is a multithreading file server.

A file server normally waits for an incoming request for a file operation, carries out the request,
and then sends back the reply. In a multithreaded file server, two separate threads are used for
performing each file operation. One thread, called dispatcher, reads incoming requests for a file
operation. There are many worker threads associated with a dispatcher thread. After examining
the request, the server chooses an idle worker thread to hand over the request. The worker
thread performs a blocking read on the local file system, which may cause the thread to be
suspended. If the thread is suspended, another thread is selected to be executed. E.g.- the
dispatcher may be selected to acquire more work. Or, another worker thread can be selected to
run.

In a single threading server, on the other hand, would have been sitting idle until the
completion of the reading operation. It would not process any other requests.
Another Example:

Single threaded Web Server Vs Multithreaded Web Server:


In a single threaded Web Server, the server receives a connection request from the Web
Browser, creates a request Structure and performs a blocking system call until the requested
document is prepared to send back. It accepts another connection only after the first HTTP
response is sent. On the other hand, a multithreaded Web Server can accept multiple
connection requests at the same time. It receives a HTTP request, creates a request structure
for it, and creates a new worker thread to handle that thread. Now the worker thread is
responsible for processing the Web Page and sending back the HTTP response. As long as the
worker thread performs the required processing of the document, the main thread may accept
another connection for a Web document instead of performing a blocking system call.
Thread Scheduling in DS
Scheduling plays an important role in distributed multithreaded systems.

There are two kinds of thread scheduling:

• Scheduling within a process and

• Scheduling among processes.

Scheduling within process is similar to traditional single-processor scheduling algorithms


(Preemptive, RR scheduling).

Thread scheduling among processes is somehow more complicated, especially for dynamically
growing multithreaded computations. Requirements are-

• Provide enough threads available to prevent a processor from sitting idle.

• Limit resource consumption.(i.e.-total no of active threads is kept within the


space constraint)

• Provide computation locality. (try to keep related thread on the same processor
to minimize the communication overhead)

Dynamic Scheduling Strategies


There are mainly two classes of dynamic scheduling strategies:

- Work sharing (sender-initiated)

- Work stealing (receiver-initiated)

In the work sharing strategy, whenever a processor generates new threads, the scheduler
decides whether to migrate some of them to other processor on the basis of the current system
state.

In the work stealing strategy, whenever processors become idle or under-utilized, they attempt
to steal threads from more busy processors.

Examples- Chowdhury’s Greedy Load Sharing Algorithm, Eager’s Adaptive Load Sharing
Algorithm, and Karp’s Work Stealing Algorithm.
Chowdhury’s Greedy Load Sharing Algorithm (Work Sharing Algorithm):
Operation:

In this strategy, the current state of each processor P represented by a function f(n) ,
where n is the number of tasks currently at the processor.

If a task arrives at P and number of tasks n is greater than zero, then this processor looks
for a remote processor that has its state less than or equal to f(n).

If a remote processor is found with this property, then the task is transferred there.

The performance of this strategy depends on the selection of the function f(n). It has
been proved that f(n) < n must holds in order to achieve good performance.

Also, n-1, n div 2, n div 3, n div 4, etc. are possible values for f(n). Furthermore, it has
been proved that f(n) = n div 3 yields the best results and that the greedy strategy
outperforms the threshold strategy with T=1 in all experiments.

The greedy strategy adopts a cyclic probing mechanism instead of the random selection
used in the threshold strategy. In this cyclic probing mechanism, processor i probes
processor (i+j) mod N, N representing the number of processors in the system, in the jth
probe to locate a suitable destination processor.

For example, in a system with 5 processors numbered 0,1,2,3, and 4 respectively,


Processor 1 will first probe processor 2. If this attempt is not successful, it will probe 3
and so on.

As in the threshold strategy, once a task is transferred to a remote processor it must be


executed there.

Despite the similarities between the two strategies, it has been demonstrated using
simulation results that the greedy strategy outperforms the threshold strategy. This
improvement is attributed to the fact that the greedy strategy attempts to transfer
every task that arrives at a busy processor whereas the threshold strategy attempts to
transfer only when a task arrives at a processor which has reached the threshold T or
higher.
Eager’s Adaptive Load Sharing Algorithm
A load sharing policy has two components: a transfer policy that determines whether to
process a task locally or remotely, and a location policy that determines to which node a
task selected for transfer should be sent.

In Eager’s Adaptive Load Sharing Algorithm, the transfer policy is a THRESHOLD policy: a
distributed, adaptive policy in which each node uses only local state information. No
exchange of state information among the nodes is required in deciding whether to
transfer a task.

A task originating at a node is accepted for processing there if and only if the number of
tasks already in service or waiting for service (the node queue length) is less than some
threshold T.

Otherwise, an attempt is made to transfer that task to another node.

Note that only newly received tasks are eligible for transfer.

Location Policies:
Three Location policies have been suggested:

Random

 Simplest location policy

 Uses no information at all

 A destination is selected at random and task is transferred to that node.

 No exchange of state information is required in deciding where to transfer a


task.

Threshold

 Uses a small amount of information about potential destination nodes.

 A node is selected at random and probed to determine whether the transfer of a


task to that node would place it above threshold. If not, then the task is
transferred.
 The destination node must process the task regardless of its state when the task
actually arrives. If so, then another node is selected at random and probed in the
same manner. This continues until either a suitable destination node is found, or
the number of probes exceeds a static probe limit Lp . In the latter case, the
originating node must process the task.

Shortest

 Acquires additional system state information and attempts to make the “best”
choice given this information.

 Lp distinct nodes are chosen at random, and each is polled in turn to determine
its queue length. The task is transferred to a node with the shortest queue
length, unless that queue length is greater than or equal to the threshold, in
which case the originating node must process the task.

 Uses more state information.


Multithreaded RPC: A Case Study

MT Client Overview

Two Client Threads Using Different Client Handles (Real Time )

MT Client Overview
The client program creates a thread for each host. Each thread creates its own client
handle and makes various RPC calls to the given host. Because the client threads are
using different handles to make the RPC calls, they can carry out the RPC calls
concurrently.
MT Server Overview

MT Server Overview
 The server automatically creates a new thread for every incoming client request. This
thread processes the request, sends a response, and exits.
Summary and Conclusion
 Multithreading plays an important role in Distributed environment.

 Multithreading can be achieved in DS in terms of Multithreading clients and


Multithreading Servers.

 Thread scheduling is a major issue in Distributed environment.

 Conventional RPC based communications in DS can be modified by implementing


multithreading concept to attain concurrency.

Bibliography
1. Andrew S. Tanenbaum, “Distributed Systems, Programs and Paradigms”, Second Edition,
2007
2. Rai Mirchandaney, Don Towsley, “Adaptive Load Sharing in Heterogeneous Systems”,
February, 1989
3. Ali M. Alakeel, “Load Balancing in Distributed Computer Systems”, International Journal
of Computer Science and Information Security, May, 2015
4. Chapter 7, “Multithreaded RPC Programming”, Oracle ONC+ Developer’s Guide
5. Derek L. Eager, Edward D, Lazowska and John Zahorjan, “ A Comparison of the Receiver-
initiated and Sender-Initiated Adaptive Load Sharing”,
6. Ali M. Alakeel,” A Guide to Dynamic Load Balancing in Distributed Computer Systems” ,
International Journal of Computer Science and Network Security, VOL.10 No.6, June
2010
7. T.A. Marsland, Yaoqing Gao, Francis C.M. Lau, “ A Study of Software Multithreading in
Distributed Systems”, Technical Report TR 95-23, November, 1995
8. Internet resources

*******************

Potrebbero piacerti anche