Multi Threading

Multithreading in Java (AW3)
Assignment for the M.Sc. Course on Distributed Systems

Adrian Reber Table of Contents
1. Cooperation ...................................................................................................................... 1 2. Introduction...................................................................................................................... 1 2.1. Threads.................................................................................................................... 2 2.2. Threads in Java........................................................................................................ 2 2.3. Monitors in Java...................................................................................................... 2 3. Aims and objectives ......................................................................................................... 2 3.1. Thread-per-request architecture .............................................................................. 2 3.2. Thread-pool architecture ......................................................................................... 3 4. Implementation ................................................................................................................ 3 4.1. The common part .................................................................................................... 3 4.2. The thread-per-request architecture ........................................................................ 4 4.3. The thread-pool architecture ................................................................................... 5 4.4. The jobs................................................................................................................... 5 5. Evaluation......................................................................................................................... 5 5.1. I/O job ..................................................................................................................... 6 5.2. Arithmetic job ......................................................................................................... 7 5.3. Cryptographic job ................................................................................................... 9 6. Conclusion ...................................................................................................................... 11 6.1. Queues................................................................................................................... 11 6.2. Cooperation........................................................................................................... 11 Bibliography ....................................................................................................................... 12 Glossary .............................................................................................................................. 12
1. Cooperation
Some parts of this assignment were carried out in cooperation with Volker Heymann, Jrg Seitter and David Vogler. In this team, it was agreed to use the same interface for the jobs as well as for the jobserver. By using a common interface for the jobs and the jobserver, it is possible to exchange jobserver and jobs very easily, thus making it very easy to compare different architectures and different kinds of jobs.
2. Introduction
2.1. Threads
The traditional approach to concurrent processes is to spawn ( fork()) a new process for every request for the service this process is providing. A different approach to concurrent proccesses is threads. A thread in programming terms is a lightweight process that runs in parallel (or better quasiparallel - at least on a single processor machine) to others within the same address space. The difference to spawning a new process is that creation of a thread produces much less overhead.
2.2. Threads in Java

There are two different ways to create a thread in Java. Either the new class extends the class Thread or the class has to implement the interface Runnable . In both situations the code in the method run() is the part which is executed as a different thread of the process which is starting the thread. The thread is started by executing the method start() of the thread instance.
2.3. Monitors in Java

Monitors are used to protect data. Mutual exclusion is assured in Java if the monitor concept is implemented. The monitor concept guarantees that shared parts of a program are always accessed by only one thread. Implementation of a monitor is rather simple. The methods which should be synchronized (only one of these methods can be executed at a certain time) are just extended using the keyword synchronized.
3. Aims and objectives

The goal of this assignment is to design and implement a jobserver using threads in Java. A jobserver is a system which receives from different clients different kinds of jobs which are executed by the jobserver and then the result is returned to the client. Every job has to access some shared data from the jobserver. This data has to be protected by a mechanism to ensure mutual exclusion. The following two jobserver architectures are studied:

thread-per-request architecture thread-pool architecture
3.1. Thread-per-request architecture

This architectures creates a thread for every request reaching the jobserver. This architectures has obvious drawbacks. One drawback is that, for every job the jobserver receives, a new 2
Multithreading in Java (AW3) thread has to be created. Creation of a new thread or process is always a resource-consuming act and takes a lot of time. A better solution would be that a number of threads is created at startup and they are only activated (and not created) when a job is received. Every thread which has nished executing the job just exits and cannot be used for another job.
3.2. Thread-pool architecture

This architecture avoids the overhead of thread creation by creating a pool of threads at startup and keeping them running as long as the jobserver is running. This means having a queue to manage the incoming jobs. Every job submitted by a client is immediately accepted by the server and put in an incoming queue. The queue is then used by the running threads whenever they have nsihed their current job and are idle again. A possible extension to this architecture is to create more threads if there are many objects in the incoming queue. This would be a combination of both architectures. A given number of threads is spawned at startup and, if there are many requests for the service, then the jobserver can increase the number of threads to react dynamically to the new circumstances. These new threads are created once and then used for many jobs in contrast to the threadper-request architecture. It is Important to set a limit of concurrent threads in the system so that the workload is appropriate for the environment (e.g. hardware) the jobserver is running. From a theoretical point of view, the thread-pool architecture seems much more intelligent and efcient. However, this will be discussed later in more detail in the analysis of the two systems. Another factor of the efency of the system is probably the type of job the jobserver has to perform. Jobs which are very calculation-intensive are probaly different from jobs which are more I/O intensive.
4. Implementation
Both architectures (thread-pool and thread-per-request) were implemented. The implementation of both architectures was straightforward and some points of the implementation are very similiar.
4.1. The common part

As already mentioned in the beginning, some parts of this assignment were carried out in cooperation. The following pictures show the class AbstractJob and the interface JobEngine which are the basis of both implementations.

aw3.common.AbstractJob
+run() +execute() +getCommunicationTime() +getComputingTime() +getIncomingQueueTime() +getOutgoingQueueTime() +getTotalTime() +getClientReceived() +getClientSubmitted() +isDone() +getId() +getInQueueReceived() +getInQueueSubmitted() +getOutQueueReceived() +getOutQueueSubmitted() +getProcessTimeStart() +getProcessTimeStop() +setClientReceived() +setClientSubmitted() +setDone() +setId() +setInQueueReceived() +setInQueueSubmitted() +setOutQueueReceived() +setOutQueueSubmitted() +setProcessTimeStart() +setProcessTimeStop() +setContext() +dumpStat() +getOutQueueSize() +setOutQueueSize() +getInQueueSize() +setInQueueSize()
<<interface>>
aw3.common.JobEngine
+submitJob() +pollResult() +pollResult() +getJobCount() +getAverageJobTime() +getMaxJobTime() +setJobTime() +setJobCounter()
Figure 1. The common parts
The advantage of using a common and well-dened interface is that it is easy to exchange jobserver and jobs between all members of the team to evaluate different jobserver architectures and different kinds of job. Due to the fact that the interface JobEngine is designed with RMI1 in mind, the jobserver implementations are RMI aware and the clients are connect to the server via RMI. This was not a requirement of this assignment, but by using RMI, the idea of a jobserver running on a different machine than that of the client was realized. Another common attribute of the jobservers is that the jobs are not returned to the clients. The jobs are always put in a queue where they wait to be picked up by the client. This polling architecture was chosen, because with this design, the job does not need to hold a reference to the client which submitted it to the jobserver. In general, this would not be a bad idea, but our common design was chosen to be extremely exible and extensible. While using RMI, it would be no problem to use a reference to the client, but if the communication between server and client changed, then it would get very difcult. Another requirement was that the jobs have to access shared data at the jobserver. The methods providing the write access to the shared data are synchronized using the monitor concept to ensure mutual exclusion.
4.2. The thread-per-request architecture

This architecture is really simple. The jobserver receives a job from a client and immediately creates a new thread. This thread executes the method execute of the job and waits until the job has nished. After the job has been completed, it is put in a queue, where it waits to be picked up by the client. The thread dies after the execution of the job and is not reused. This implementation has no limitations regarding the number of threads running.
1. RMI - Remote Method Invocation
Multithreading in Java (AW3) My expectation is that this architecture is slower due to the overhead caused by the fact that for every job a new thread is created and then not reused.
4.3. The thread-pool architecture

This architecture is somewhat more sophisticated. The main difference to the thread-perrequest architecture is that there are number of threads which are always running even if there is nothing to do. To prevent idle threads using too much CPU time, the threads are put in a wait state by issuing the wait() method. The jobs submitted by any number of clients are not directly executed by the waiting threads but are put in an incoming queue. As soon as there are elements in this queue, the sleeping (waiting) threads are activated by calling the method notify() of the thread. The thread then starts the job and, if there are no more jobs in the incoming queue, goes back to its wait state until it is notied again. The nished jobs are put in an outgoing queue just as in the thread-per-request architecture. Another additional feature of this implementation is that the jobserver creates more threads if the incoming queue is rather large. The maximum number of threads running at startup and the total maximum are parameters which strongly affect the performance of this architecture and have to be selected accordingly to the environment the jobserver is running and the kinds of jobs the jobserver is executing.
4.4. The jobs

There are three different kinds of job implemented:

Arithmetic job: This job just calculates prime numbers. It starts at a given point and then searches for a specied amount of following prime numbers. Cryptographic job: This loads a given le and then encrypts it using the Blowsh algorithm. This job combines an I/O job with a computing job. I/O job: This is not a real I/O job but just a simple simulation. As an I/O jobs means that the thread just has to wait until the I/O operation has nished, this job just does a sleep for a given time.
As the jobs are not a very important part they were not implemented but downloaded[SnipSfNet]. Only minor modications were made so that they were more suitable as jobs for the jobserver.
5. Evaluation
To evaluate the different architectures of threaded jobservers, the time each implementation needs to execute the different jobs is measured. Each job has methods with which the server can set timestamps at the different stages the job passes in the jobserver.

Communication time: This is the time the job needs from the client to the server and back. Computation time: The time the jobserver actually needs to execute the job 5

Incoming queue time: This is the time the job was in the incoming queue if the jobserver implementation is using an incoming queue. Outgoing queue time: Same as above but for the time the job is in the outgoing queue after it has nished.
5.1. I/O job

The I/O job described above is the rst to be analyzed. To put it more precisely, it is not the job that is analyzed, but the behaviour of the two different jobserver architectures.
200
Thread-pool execution time (ms) Thread-per-request execution time (ms)
180
160
140
120
100
20
40
60
80
100
Figure 2. I/O Job execution time
The difference between the thread-per-request architecture and the thread-pool architecture is obvious. The execution times of the thread-pool architecture are always very similiar in contrast to the execution times of the thread-per-request architecture where it is almost not predictable how long the next job will take. This difference between the two architecures is easy to explain. The thread-per-request architecture has to create a new thread for every job it receives and the thread creation time is very dependent on the situation of the machine the jobserver is running on and cannot be easily predicted. As the thread-pool architecture does not have a thread creation time overhead, the execution time is equivalent to the real time the jobserver needs to execute the job. 6
Multithreading in Java (AW3) The total time (as opposed to the computing time) that the thread-pool architecture needs to complete 1000 jobs is much greater than that required by the thread-per-request architecture. The total time includes communication time, inqueue time, computing time and outqueue time.

thread-per-request architecture total time: 474,388 milliseconds thread-pool architecture total time: 3,528,618 milliseconds
This big difference (almost factor 10) is due to the fact that the thread-per-request architecture does not have an incoming queue but every job is executed immediately. This difference can be seen in the following gure:
2500
Thread-pool inqueue time (ms) Thread-per-request inqueue time (ms)
2000
1500
1000
500
-500 0 200 400 600 800 1000
Figure 3. I/O Job inqueue time
With this kind of job, it is no problem if over thousand jobs are executed simultaneously: as soon the job is started, it requires no additional resources to be nished as it only sleeps for 100 microseconds. If the kind of jobs is more computing intensive, there will probably be a different result as parrallel execution of computing intensive jobs extends the execution time dramatically and an architecture with fewer parallel threads and an incoming queue becomes much more effective.
5.2. Arithmetic job

The next job used to analyze the different architectures is a pure computing job which strongly depends on the CPU of the machine running the jobserver. The following gure shows the execution time of this job:
16000
14000
12000
10000
8000
6000
4000
2000 0 20 40 60 80 100
Figure 4. Arithmetic job execution time
It can be seen very well that the thread-pool architecture performs much better. This can again be explained by the overhead the creation of each threads needs. Another reason for the better performance of the thread-pool architecture is that there are fewer threads running in parallel. With the thread-per-request architecture, there a lot more threads running in parallel and every thread is requesting the CPU and so another overhead is introduced: context switching. Each time a thread enters and leaves the CPU, the complete thread has to be copied to the CPU and back to the memory again and the more active processes (threads) are running on the CPU the more time is consumed with this task. Again, the total time for 100 jobs is much greater for the thread-pool architecture than for the thread-per-request architecture, whereas the execution time is less for the thread-pool architecture.
thread-per-request architecture total time: 694,955 milliseconds thread-per-request architecture execution time: 663,027 milliseconds 8
thread-pool architecture total time: 4,024,001 milliseconds thread-pool architecture execution time: 489,525 milliseconds
One thing these numbers cannot illustrate is that the total time for the thread-pool architecture might look much greater than the total time of the thread-per-request architecture but, as already mentioned, this is due to the fact that the thread-per-request architecture does not have an incoming queue where the jobs have to wait to be executed. In this implementation of the thread-per-request architecture, the jobserver stops to respond to a request from a client to submit a job when the jobserver has too much work to do. When this happens, the jobs have to wait at the client until submitted to the server and this time is not recorded anywhere. So another difference between the architectures is that the thread-pool architecture can submit all jobs to the server whereas, with the thread-per-request architecture, the incoming queue moves from the server to the client without the intention of implementing it that way. The effect of the incoming queue can be seen very nicely in the following diagramm:
80000
Thread-pool inqueue time (ms) Thread-per-request inqueue time (ms)
70000
60000 cpu time in milliseconds
50000
40000
30000
20000
10000
0 0 20 40 60 80 100
Figure 5. Arithmetic job inqueue time
The thread-pool architecture accepts all jobs for the clients and they are then held in the incoming queue as can be seen very well in this gure. The thread-per-request architecture seems to have no incoming queue time. That is, as already mentioned, because it has no incoming queue. Either all jobs are executed simultaneously or the jobserver just does not accept the jobs and they are waiting on the client side. 9
5.3. Cryptographic job

The execution of this kind of job gives a very interesting outcome:
350000
300000
250000 cpu time in milliseconds
200000
150000
100000
50000
0 0 20 40 60 80 100
Figure 6. Cryptographic job execution time
This result is really fantastic because it displays the effect which was always anticpated. The thread-pool architecture seems much more effective than the thread-per-request architecture. Another nice effect which can be seen is that the rst 6 jobs of the thread-pool architecture take noticeably longer than the rest. This is rather easy to explain. This job loads a le from the disk and encrypts it using the Blowsh algorithm. As the number of parallel threads is six, these are the jobs waiting for the I/O to nish. The other jobs do not have to wait because the operating systems reads the le from the le cache and not from the real device. This effect cannot be seen in thread-per-request architecture probably because this analysis was run after the thread-pool architecture analysis and the le was already in the le cache. A better analysis would have been to force the le to be read again from the real device. One very interesting point cannot be seen in the gure. As the rst task this job does is to read a le, the CPU is not heavily loaded and the thread-per-request architecture jobserver accepts all 100 jobs at once which results in 100 cryptographic jobs running in parallel. This also the reason why there is such a big difference in the execution time of the two architectures. The queuing is almost the same as in the arithmetic job and is not investigated further here. 10
thread-per-request architecture total time: 30,908,326 milliseconds thread-per-request architecture execution time: 30,821,532 milliseconds
thread-pool architecture total time: 15,205,009 milliseconds thread-pool architecture execution time: 1,665,369 milliseconds
6. Conclusion
The results of this analysis are very interesting. Some were as expected, some others were totally different. One thing, for example, was that the thread-pool architecture was expected to be much more efcient for every kind of job and the difference to the thread-per-request architecture much bigger. This result may originate from the implementation of the two architectures or one type of job may t better for one of the two architectures than for the other. So if implementing a real-world jobserver application, the decision which architecture should be implemented depends heavily on the type of job it has to perform. The expected drawback of thread creation overhead for the thread-per-request architecture was not as great as expected. At least for a small number of jobs, the difference between the two architectures was sometimes so small that it could have been just a measuring error. The big disadvantage of unlimited parallel threads was only discovered in the last analysis with the cryptographic job because there, all jobs a client submitted were running at once. This disadvantage does not only come from the architecture but also from the implementation. It would have been no problem to implement a maximum number of active threads. This change would introduce another problem: as this implementation has no incoming queue, the server would just refuse the client request to submit more jobs. If an incoming queue is implemented, then both implementations are very similiar and the difference would not be very obvious. It could only be measured with a huge number of very short jobs, so that the thread creation time is the biggest part of the job execution. This can be seen in the analysis of the I/O job. The main difference is the number of jobs which are running in parallel. The overhead of thread creation gets signicant if there are a large number of jobs. If the ratio of job time and and job number is large, then a thread-pool architecture will show its advantages.
6.1. Queues
The implementation of the queues is not very sophisticated. Elements are always put at the end of the queue and fetched from start. This may sound like a fair approach, but there is no control wether an item has been just submitted to the queue or has already been waiting for a long time. The reality showed that most of the jobs are in the queues for a short time but some jobs were in the queues extremly long. As queue theory is a rather complex matter and not part of this assignment, the strange behaviour of the queues was not further inverstigated.
6.2. Cooperation
The idea of cooperating with others in this assignment was a good idea but, unfortunately, it did not work as well as expected. Due to the fact the everybody in the team had a different 11
Multithreading in Java (AW3) schedule, I was not able to use any of the other implementations of a jobserver or a job to analyze for this assignment. Only the design of the job and the interface for the jobserver was shared as they were dened together in the beginning.
Bibliography Internet
[SnipSfNet] Snippet Library: http://sourceforge.net/snippet/ . [JavaApi] Java 2 Platform, Standard http://java.sun.com/j2se/1.4.1/docs/api/ . Edition, v 1.4.1 API Specication:
Books
[Lea02] Doug Lea, 2000, 0-201-31009-0, Addison Wesley, Concurrent Programming in Java Second Edition: Design Principles and Patterns.
Glossary C
CPU - Central Processing Unit
I
I/O - Input/Output
R
RMI - Remote Method Invocation It is a mechanism that enables an object on one Java virtual machine to invoke methods on an object in another Java virtual machine.
12

Multi Threading

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Multi Threading

Caricato da

Copyright:

Formati disponibili

Multithreading in Java (AW3)

Assignment for the M.Sc. Course on Distributed Systems

Multithreading in Java (AW3)

2.2. Threads in Java

2.3. Monitors in Java

3. Aims and objectives

thread-per-request architecture thread-pool architecture

3.1. Thread-per-request architecture

3.2. Thread-pool architecture

4.1. The common part

Multithreading in Java (AW3)

Figure 1. The common parts

4.2. The thread-per-request architecture

4.3. The thread-pool architecture

4.4. The jobs

Multithreading in Java (AW3)

5.1. I/O job

Thread-pool execution time (ms) Thread-per-request execution time (ms)

Figure 2. I/O Job execution time

Thread-pool inqueue time (ms) Thread-per-request inqueue time (ms)

-500 0 200 400 600 800 1000

Figure 3. I/O Job inqueue time

Multithreading in Java (AW3)

5.2. Arithmetic job

Thread-pool execution time (ms) Thread-per-request execution time (ms)

Figure 4. Arithmetic job execution time

Multithreading in Java (AW3)

Thread-pool inqueue time (ms) Thread-per-request inqueue time (ms)

60000 cpu time in milliseconds

Figure 5. Arithmetic job inqueue time

Multithreading in Java (AW3)

5.3. Cryptographic job

Thread-pool execution time (ms) Thread-per-request execution time (ms)

250000 cpu time in milliseconds

Figure 6. Cryptographic job execution time

Multithreading in Java (AW3)

Potrebbero piacerti anche