Sei sulla pagina 1di 37

CS 550 Operating Systems

Spring 2019

“OSTEP Piece #2” – Concurrency


Threads

1
If you want to do one task
• Start one process

P1
If you want to do two task “concurrently”
• Start two processes
• Maybe P1 forks P2
• and P3…PN etc if
more than two tasks

P1 P2
• Problem:
• fork is expensive
• cold-start penalty
If P1 and P2 want to talk to each other?
• E.g. access the same data or
synchronize?
• Two different address spaces
• Need to use IPC
• shared memory, pipes, sockets,
signals
• Problem
P1 IPC P2
• kernel transitions are expensive
• May need to copy data
• user—->kernel—>user
• Inter-process Shared memory is
a pain to set up.
Option 1:Event-driven programming
• Make one process do all the tasks
P1
• Busy loop polls for events and
executes tasks for each event
while(1)
• No IPC needed
{
• Length of the busy loop determines if (event 1) do task 1;
response latency if (event 2) do task 2;

• Stateful event responses complicate if (event N) do task N;
the code }
• What if ith occurrence of event 1
effects the jth event processing ?
Option 2: Use threads
• Multiple threads of execution per
process
• Each thread has its own
• Program counter P1
• Stack, stack pointer Shared address space
• Registers
• All threads share
• one virtual address space
• code, heap and static data
• Lower context switching overhead
• No IPC
• Zero data transfer cost
• Only need inter-thread T1 T2 T3 T4
syncrhonization
Other Shared and non-shared components
• Shared components
• Open descriptors (files, sockets etc)
• Signals and Signal handlers

• Not shared
• Thread ID
• Errno
• Priority
Address space layout
Example: A word processor with three threads

• First thread handles keyboard input


• Second thread handles screen display
• Third thread handles saving the document to disk
Example: a multi-threaded web server

• A dispatcher thread waits for and accepts network connections


• Several worker threads
• Each worker processes one network connection concurrently
Advantages of threads
• Lower inter-thread context switching overhead than processes

• No Inter-process communication
• Zero data transfer cost between threads
• Only need inter-thread synchronization

• Threads can be pre-empted at any point


• Long-running threads are OK
• As opposed to event-driven tasks that must be short.

• Threads can exploit parallelism


• But it depends…more later

• Threads could block without blocking other threads


• But it depends…more later
Disadvantages of Threads
• Shared State!
• Global variables are shared between threads.
• Accidental data changes can cause errors.

• Threads and signals don’t mix well


• Common signal handler for all threads in a process
• Which thread to signal? Everybody!
• Royal pain to program correctly.

• Lack of robustness
• Crash in one thread will crash the entire process.

• Some library functions may not be thread-safe


• Library Functions that return pointers to static internal memory. E.g. gethostbyname()
• Less of a problem these days.
Example: a multi-threaded web server

• A dispatcher thread waits for and accepts network connections


• Several worker threads
• Each worker processes one network connection concurrently
Two types of threads: user-level and kernel-level

User-level threads Kernel-level threads


• User-level libraries provide multiple • OS kernel provides multiple threads
threads, per process
• OS kernel does not recognize user-level
threads • Each thread is scheduled
independently by the kernel’s CPU
• Threads execute when the process is scheduler
scheduled
Hybrid Implementations

Multiplexing user-level threads within each kernel- level threads


Local Thread Scheduling
• Next thread is picked from among the threads
belonging to the current process
• Each process gets a timeslice from kernel.
• Then the timeslice is divided up among the
threads within the current process

• Local scheduling can be implemented with


either
• Kernel-level threads OR
• User-level threads.

• Scheduling decision requires only local


knowledge of threads within the current • For example, say process
process. timeslice may be 50ms, and each
thread within the process runs for
5 msec/CPU burst
Global Thread scheduling
• Next thread to be scheduled is picked
up from ANY process in the system.
• Not just the current process

• Timeslice is allocated at the


granularity of threads
• No notion of per-process timeslice

• Global scheduling can be


implemented only with kernel-level • For example each thread
threads runs for 10msec per CPU
• Picking the next thread requires
burst
global knowledge of threads in all
processes.
POSIX threads API: pthread
• Implementations of the API are available on many
Unix-like POSIX-conformant operating systems.
• There are around 100 Pthreads procedures, all
prefixed "pthread_" and they can be categorized
into four groups:
• Thread management - creating, joining threads
etc.
• Mutexes
• Condition variables
• Synchronization between threads using
read/write locks and barriers 18
Pthread API: thread identification
• A thread ID is represented by the pthread_t data type.
• On many implementations the pthread_t type is
represented using integers (e.g., unsigned long in
Linux).
• Different from the pid_t type, implementations are
allowed to use a structure to represent the pthread_t
data type.
• Therefore, portable applications can’t treat the
pthread_t type as integer → we need a function to
compare thread IDs, instead of using “==“ operator.

19
Pthread API: process identification
• A thread can obtain its own thread ID by calling the
pthread_self() function.

• Why do threads need to know their own thread IDs?


• Various pthreads functions use thread IDs to identify the
thread on which they are to act.
• In some applications, it can be useful to tag dynamic data
structures with the ID of a particular thread. This can
serve to identify the thread that created or “owns” a data
structure.
20
Pthread API: thread creation

• The traditional UNIX process model supports


only one thread of control per process.
• Conceptually, this is the same as a threads-based model
whereby each process is made up of only one thread.
• With pthreads, when a program runs, it also
starts out as a single process with a single
thread of control.
• As the program runs, its behavior should be
indistinguishable from the traditional process, until
it creates more threads of control. 21
Pthread API: thread creation

• The thread argument set is to the thread ID of the newly


created thread before pthread_create() returns.
• The attr argument is a pointer to a pthread_attr_t object
that specifies various attributes for the new thread.
• If attr is specified as NULL, then the thread is created
with various default attributes
22
Pthread API: thread creation

• The new thread commences execution by calling the


function identified by start, with the argument arg (i.e., run
as start(arg)).
• start is a pointer to a function that takes a void pointer as
input, and returns a void pointer.
• arg points to a global or heap variable, but it can also be
specified as NULL.
• Can we pass a pointer to a local variable to arg?
23
Pthread API: thread creation

• Why are the argument and the return value are made as void
pointer?
• To pass multiple arguments or return multiple values
• For example, if we need to pass multiple arguments to
start, then arg can be specified as a pointer to a structure
containing the arguments as separate fields.
• The return value of the thread start function is also a void
pointer (void *). It can be captured by pthread_join().
24
Pthread API: thread creation

• When a thread is created, there is no guarantee which will


run first: the newly created thread or the calling thread.
• Note the difference of return value of process related
functions and thread related functions.
• Process functions → 0:success -1:error errno:failure
reason
• Thread functions → 0:success positive number:failure
reason
25
may not give correct thread ID
if the thread ID type is not
implemented as integer

26
Pthread API: thread termination

• If any thread within a process calls exit() or


_exit(), or the main thread performs a return in the
main() function, then the entire process terminates.
• A single thread can exit in three ways, thereby
stopping its flow of control, without terminating
the entire process.
• The thread can simply return from the start
routine. The return value is the thread’s exit
code.
• The thread can be canceled by another thread in
the same process.
• The thread can call pthread_exit(). 27
Pthread API: thread termination

• Calling pthread_exit() is equivalent to performing a


return in the thread’s start function.
• The difference that pthread_exit() can be called from
any function called by the thread’s start function.
• The rval_ptr argument is a void pointer.
• This pointer is visible to other threads in the process
by calling the pthread_join() function.
28
Pthread API: joining a terminated thread

• The pthread_join() function waits for the thread identified by thread


to terminate. This operation is termed joining.
• The calling thread will block until the specified thread calls
pthread_exit, returns from its start routine, or is canceled.
• If the thread calls pthread_exit(), retval points to the retval
argument of pthread_exit().
• If the thread simply returned from its start routine, retval points to the
return value of the start routine (which is also a void pointer).
• If the thread was canceled, the memory location specified by
rval_ptr is set to PTHREAD_CANCELED.
29
Pthread API: joining a terminated thread
• Why do we need to join a terminated process?
• If a thread is not detached, then we must join with it using
pthread_join().
• Otherwise, the thread terminates, it produces the thread
equivalent of a zombie process.
• Aside from wasting system resources, if enough thread
zombies accumulate, we won’t be able to create additional
threads

30
31
Pthread API: detaching a thread
• By default, a thread is joinable, meaning that when it
terminates, another thread can obtain its return status
using pthread_join().
• If don’t care about the thread’s return status and want
the system to automatically clean up and remove the
thread when it terminates.
• In this case, we can mark the thread as detached, by
making a call to pthread_detach() specifying the
thread’s identifier in thread.

32
int idata = 111; /* Allocated in data segment */

int main(int argc, char *argv[])


{
int istack = 222; /* Allocated in stack segment */
pid_t childPid;

childPid = fork();
if (childPid == -1) {exit(-1);}
else if (childPid == 0) {idata *= 333; istack *= 666;} /* Child Process: modify data */
else {sleep(3)} /* Parent process: give child a chance to execute */

/* Both parent and child come here */


printf("PID=%ld %s idata=%d istack=%d\n", (long) getpid(), (childPid == 0) ? "(child) " :
"(parent)", idata, istack);

exit(0);
}

33
int idata = 111; /* Allocated in data segment */

void *mythread(void *arg) {


idata *= 333;
istack *= 666; Undefined variable (each thread has its own stack)
return (void *) 0;
}

int main(int argc, char *argv[])


{
int m, istack = 222; /* Allocated in stack segment */
pthread_t tid;

pthread_create(&tid, NULL, mythread, NULL);

sleep(1);
printf(“idata=%d istack=%d\n”, idata, istack);
exit(0); ?
}

idata=333
34
Threads race condition: sharing data

35
What we expect …

But actually …

36
Source of the problem: uncontrolled
scheduling and unprotected shared data

37

Potrebbero piacerti anche