Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Submitted by-
Issues in Designing DS
Thread Scheduling in DS
What is Thread?
A thread is the smallest unit of dispatchable code. That is, it is the smallest unit of a program
which the CPU can schedule and execute independent of the other parts of the program. A
thread basically is a part of a program which defines a separate path of execution.
Every thread comprises a thread id, a program counter, a register set and a stack. It shares its
code section, data section and other operating system resources with all other threads
belonging to the same process.
A multithreaded program contains two or more parts that can run concurrently. Thus,
multithreading is a specialized form of multitasking.
Thread based multi-tasking (or multithreading) allows programmers to write very efficient
programs that make maximum use of the CPU, keeping the idle time to a minimum.
Threads Vs Processes
A process is actually a program that is executing. That is a process is actually an instance of a
program. Traditional processes are called heavyweight processes. Every process contains the
following resources:
A process may contain multiple threads of control each of which provides a separate path of
execution. Traditional processes are called heavyweight processes.
Processes have their own address space whereas all threads belonging to a single process share
the same address space.
Inter-process Communication is expensive and limited. Context switching from one process to
another is also costly. Inter-thread communication is, on the other hand, inexpensive and
context switching from one thread to another is low cost.
On the other hand, a multi threaded process has multiple threads of execution. Each tread
contains its own thread id, register set and stack. However, threads belonging to the same
process share their code section, data section, and the files and data structures.
Thread Life Cycle
Threads exist in several states.
Running
Suspended
Resumed
Blocked
Terminated
A thread is ready to run as soon as it gets CPU time. A running thread can be suspended. A
suspended thread can be resumed to execute from where it left off. A thread can be blocked
when waiting for a resource. At any time, a thread may be terminated. Once terminated, a
thread cannot be resumed.
Introduction to Distributed Systems
A Distributed System is a collection of autonomous and independent systems (computers). It
appears to its users as a single coherent system.
A DS is a middleware that sits between the NOS and the applications. It hides the heterogeneity
of the computers from the users and presents the system as a single coherent system.
Each computer in a DS has its own CPU, memory, IO bandwidth etc. Synchronization among the
computers is a challenge.
RPC-based: In this model, the remote procedures are called by the local objects
as the same way as they call the local procedures. This model is very much useful
in situations when complex operations are to be performed remotely by a
powerful machine. The remote machine performs the task on behalf of the local
machine and returns the result to the local machine. The most popular example
of RPC-based model is – JAVA RMI.
As we can see in the diagram, the Distributed System is an additional layer that sits over
multiple machines and provides the applications a common interface. The various machines in
the network have their own operating systems. But the middleware hides the complexity and
heterogeneity of the network and offers its users the same interface. The users feel like they
are dealing with a single coherent system.
Issues in Designing DS
The main issues in designing a Distributed System are:
1. Heterogeneity
2. Openness
3. Security
4. Scalability
5. Failure handling
6. Concurrency
7. Transparency
Access Transparency
Location Transparency
Migration Transparency
Replication Transparency
Relocation Transparency
Concurrency Transparency
Failure Transparency
8. Quality of Service
9. Reliability
10. Performance
Among these issues, concurrency is a major issue in a Distributed System. Since a distributed
system comprises of many autonomous systems, the interaction and synchronization among
the system is a major concern. Also concurrency transparency is another major issue which
suggests that the users of the system should be unaware of the fact that they are dealing with a
system which has many individual computers and their job is done by one of the many
computers. Concurrency in a DS can be achieved in terms of multi tasking. Distributed Systems
employ multi tasking in two ways- multi-processing and multithreading. In multiprocessing,
multiple processors of the DS execute the processes. Context switching among the processes
may be occurred to switch the CPU from one process to another. On the other hand, in
multithreading, a process is divided into multiple threads of control and the threads are
executed concurrently by either the same processor or by two or more different processors.
Context switching among threads may occur to switch the control of the CPU from one thread
to another. Inter thread communication is much easier than inter process communication since
the threads belonging to the same process have the same address space.
Threads in Distributed Systems
Before discussing about the use of threads in Distributed Systems, we should know that the
communication in a DS is always in terms of message passing. Also, the Communication model is Client-
Server. The Server provides some services and the Clients request for one or more of those services.
As other typical Client-Server models, the Distributed Systems also communicate over a Network. In a
Network-based environment, idle time is common. This idle time may be due to transmission error of
the network or may be that the transmission rate of data over the network is much slower than the rate
at which the computer can process it. In either case, the computers will be sitting idle most of the times
and CPU utilization will be minimal. To maximize the CPU utilization, the Distributed Systems use the
concept of multithreading. The idle time may be reduced by allowing a single process to run multiple
threads of control. For example, while one thread is waiting for a reply for some message, instead of
blocking the entire process, other threads in the process may continue executing.
Threads can provide a convenient means of allowing blocking system calls without blocking the entire
process in which it is running. This property makes threads particularly attractive to use with DS.
A Web Browser is used to access a web document. A web document contains plain text, images, icons
etc. In a single threaded browser, to fetch each element, the browser sets up a connection, read the
incoming data and passes it to a display module. The display module waits until all the contents of the
web page are received. Web browsers normally use a buffer to hold the incoming data until it has been
received fully.
Connection set-up and reading data in a browser are blocking operations. The browser cannot setup
multiple connections at the same time. Similarly, it can read data sequentially one by one. All data are
directed to the buffer and once all data for the document is received, the browser starts displaying the
contents of the document. It is a tedious work since the user has to wait for a long time to see the
document if the network connection is very slow.
A multithreaded browser works on a different way. A multithreaded browser may start displaying data
while it is still coming in. For example, images take more time to be loaded. Instead of waiting for the
entire image to come, the browser can display other parts of the document such as text, facilities for
scrolling and so on.
Other multithreaded clients also work in a similar manner. Multithreaded clients can perform several
tasks at the same time.
• Web Servers are normally replicated across multiple machines. Multithreaded clients allow
setting up connections to different replicas so that data can be transferred in parallel.
Multithreaded Servers:
The main use of multithreading in DS is found at the Server side.
Multithreading server allows simplifying server code and exploiting parallelism to attain high
performance. As the name suggests, multithreading servers use multiple threads of control to
perform several tasks at the same time. Different threads are responsible for performing
different tasks. In this way, multithreading servers not only provide parallelism to the Clients,
but also simplify the server code by distributing the tasks among different threads.
Multithreading Server Example:
A typical example is a multithreading file server.
A file server normally waits for an incoming request for a file operation, carries out the request,
and then sends back the reply. In a multithreaded file server, two separate threads are used for
performing each file operation. One thread, called dispatcher, reads incoming requests for a file
operation. There are many worker threads associated with a dispatcher thread. After examining
the request, the server chooses an idle worker thread to hand over the request. The worker
thread performs a blocking read on the local file system, which may cause the thread to be
suspended. If the thread is suspended, another thread is selected to be executed. E.g.- the
dispatcher may be selected to acquire more work. Or, another worker thread can be selected to
run.
In a single threading server, on the other hand, would have been sitting idle until the
completion of the reading operation. It would not process any other requests.
Another Example:
Thread scheduling among processes is somehow more complicated, especially for dynamically
growing multithreaded computations. Requirements are-
• Provide computation locality. (try to keep related thread on the same processor
to minimize the communication overhead)
In the work sharing strategy, whenever a processor generates new threads, the scheduler
decides whether to migrate some of them to other processor on the basis of the current system
state.
In the work stealing strategy, whenever processors become idle or under-utilized, they attempt
to steal threads from more busy processors.
Examples- Chowdhury’s Greedy Load Sharing Algorithm, Eager’s Adaptive Load Sharing
Algorithm, and Karp’s Work Stealing Algorithm.
Chowdhury’s Greedy Load Sharing Algorithm (Work Sharing Algorithm):
Operation:
In this strategy, the current state of each processor P represented by a function f(n) ,
where n is the number of tasks currently at the processor.
If a task arrives at P and number of tasks n is greater than zero, then this processor looks
for a remote processor that has its state less than or equal to f(n).
If a remote processor is found with this property, then the task is transferred there.
The performance of this strategy depends on the selection of the function f(n). It has
been proved that f(n) < n must holds in order to achieve good performance.
Also, n-1, n div 2, n div 3, n div 4, etc. are possible values for f(n). Furthermore, it has
been proved that f(n) = n div 3 yields the best results and that the greedy strategy
outperforms the threshold strategy with T=1 in all experiments.
The greedy strategy adopts a cyclic probing mechanism instead of the random selection
used in the threshold strategy. In this cyclic probing mechanism, processor i probes
processor (i+j) mod N, N representing the number of processors in the system, in the jth
probe to locate a suitable destination processor.
Despite the similarities between the two strategies, it has been demonstrated using
simulation results that the greedy strategy outperforms the threshold strategy. This
improvement is attributed to the fact that the greedy strategy attempts to transfer
every task that arrives at a busy processor whereas the threshold strategy attempts to
transfer only when a task arrives at a processor which has reached the threshold T or
higher.
Eager’s Adaptive Load Sharing Algorithm
A load sharing policy has two components: a transfer policy that determines whether to
process a task locally or remotely, and a location policy that determines to which node a
task selected for transfer should be sent.
In Eager’s Adaptive Load Sharing Algorithm, the transfer policy is a THRESHOLD policy: a
distributed, adaptive policy in which each node uses only local state information. No
exchange of state information among the nodes is required in deciding whether to
transfer a task.
A task originating at a node is accepted for processing there if and only if the number of
tasks already in service or waiting for service (the node queue length) is less than some
threshold T.
Note that only newly received tasks are eligible for transfer.
Location Policies:
Three Location policies have been suggested:
Random
Threshold
Shortest
Acquires additional system state information and attempts to make the “best”
choice given this information.
Lp distinct nodes are chosen at random, and each is polled in turn to determine
its queue length. The task is transferred to a node with the shortest queue
length, unless that queue length is greater than or equal to the threshold, in
which case the originating node must process the task.
MT Client Overview
MT Client Overview
The client program creates a thread for each host. Each thread creates its own client
handle and makes various RPC calls to the given host. Because the client threads are
using different handles to make the RPC calls, they can carry out the RPC calls
concurrently.
MT Server Overview
MT Server Overview
The server automatically creates a new thread for every incoming client request. This
thread processes the request, sends a response, and exits.
Summary and Conclusion
Multithreading plays an important role in Distributed environment.
Bibliography
1. Andrew S. Tanenbaum, “Distributed Systems, Programs and Paradigms”, Second Edition,
2007
2. Rai Mirchandaney, Don Towsley, “Adaptive Load Sharing in Heterogeneous Systems”,
February, 1989
3. Ali M. Alakeel, “Load Balancing in Distributed Computer Systems”, International Journal
of Computer Science and Information Security, May, 2015
4. Chapter 7, “Multithreaded RPC Programming”, Oracle ONC+ Developer’s Guide
5. Derek L. Eager, Edward D, Lazowska and John Zahorjan, “ A Comparison of the Receiver-
initiated and Sender-Initiated Adaptive Load Sharing”,
6. Ali M. Alakeel,” A Guide to Dynamic Load Balancing in Distributed Computer Systems” ,
International Journal of Computer Science and Network Security, VOL.10 No.6, June
2010
7. T.A. Marsland, Yaoqing Gao, Francis C.M. Lau, “ A Study of Software Multithreading in
Distributed Systems”, Technical Report TR 95-23, November, 1995
8. Internet resources
*******************