Sei sulla pagina 1di 139

OPERATING

SYSTEMS

SEMESTSER – V
INFORMATION TECHNOLOGY

NILESH M. PATIL
CHAPTER 1
OVERVIEW OF OPERATING SYSTEM

1.1 Introduction
An operating System (OS) is an intermediary between users and computer hardware. It provides
users an environment in which a user can execute programs conveniently and efficiently.

In technical terms, it is software which manages hardware. An operating System controls the
allocation of resources and services such as memory, processors, devices and information.

An operating system is a program that acts as an interface between the user and the computer
hardware and controls the execution of all kinds of programs.

1.2 Abstract View of Computer System

1. Hardware provides basic computing resources like CPU, memory, I/O devices.
2. Operating system controls and coordinates the use of the hardware among the various
application programs for the various users.
3. Applications programs define the ways in which the system resources are used to solve
the computing problems of the users. These can be compilers, database systems, video
games, business programs.
4. Users are the one who interact with the OS. They can be people, machines, other
computers.

1.3 Functions of Operating System

Following are some of important functions of an operating System.

Memory Management
Processor Management
Device Management
File Management
Security
Control over system performance
Job accounting
Error detecting aids
Coordination between other software and users

Memory Management

Memory management refers to management of Primary Memory or Main Memory. Main


memory is a large array of words or bytes where each word or byte has its own address.
Main memory provides a fast storage that can be access directly by the CPU. So for a program to
be executed, it must in the main memory. Operating System does the following activities for
memory management.
 Keeps tracks of primary memory i.e. what part of it are in use by whom, what part are not
in use.
 In multiprogramming, OS decides which process will get memory when and how much.
 Allocates the memory when the process requests it to do so.
 De-allocates the memory when the process no longer needs it or has been terminated.

Processor Management

In multiprogramming environment, OS decides which process gets the processor when and how
much time. This function is called process scheduling. Operating System does the following
activities for processor management.

 Keeps tracks of processor and status of process. Program responsible for this task is
known as traffic controller.
 Allocates the processor (CPU) to a process.
 De-allocates processor when processor is no longer required.

Device Management

OS manages device communication via their respective drivers. Operating System does the
following activities for device management.
 Keeps tracks of all devices. Program responsible for this task is known as the I/O
controller.
 Decides which process gets the device when and for how much time.
 Allocates the device in the efficient way.
 De-allocates devices.
File Management

A file system is normally organized into directories for easy navigation and usage. These
directories may contain files and other directions. Operating System does the following activities
for file management.

 Keeps track of information, location, uses, status etc. The collective facilities are often
known as file system.
 Decides who gets the resources.
 Allocates the resources.
 De-allocates the resources.

Other Important Activities


Following are some of the important activities that Operating System does.
 Security -- By means of password and similar other techniques, preventing unauthorized
access to programs and data.
 Control over system performance -- Recording delays between request for a service
and response from the system.
 Job accounting -- Keeping track of time and resources used by various jobs and users.
 Error detecting aids -- Production of dumps, traces, error messages and other debugging
and error detecting aids.
 Coordination between other software and users -- Coordination and assignment of
compilers, interpreters, assemblers and other software to the various users of the
computer systems.

1.4 Types of Operating Systems

Operating systems are there from the very first computer generation. Operating systems keep
evolving over the period of time. Following are few of the important types of operating system
which are most commonly used.

1. Batch operating system


 The users of batch operating system do not interact with the computer directly.
 Each user prepares his job on an off-line device like punch cards and submits it to the
computer operator.
 To speed up processing, jobs with similar needs are batched together and run as a
group. Thus, the programmers left their programs with the operator.
 The operator then sorts programs into batches with similar requirements.

The problems with Batch Systems are following.

 Lack of interaction between the user and job.


 CPU is often idle, because the speeds of the mechanical I/O devices are slower than
CPU.
 Difficult to provide the desired priority.

2. Time-sharing operating systems


 Time sharing is a technique which enables many people, located at various terminals,
to use a particular computer system at the same time.
 Time-sharing or multitasking is a logical extension of multiprogramming.
 Processor's time which is shared among multiple users simultaneously is termed as
time-sharing.
 The main difference between Multiprogrammed Batch Systems and Time-Sharing
Systems is that in case of multiprogrammed batch systems, objective is to maximize
processor use, whereas in Time-Sharing Systems objective is to minimize response
time.
 Multiple jobs are executed by the CPU by switching between them, but the switches
occur so frequently.
 Thus, the user can receive an immediate response. For example, in a transaction
processing, processor execute each user program in a short burst or quantum of
computation.
 That is if n users are present, each user can get time quantum. When the user submits
the command, the response time is in few seconds at most.
 Operating system uses CPU scheduling and multiprogramming to provide each user
with a small portion of a time.
 Computer systems that were designed primarily as batch systems have been modified
to time-sharing systems.

Advantages of Timesharing operating systems are following


 Provide advantage of quick response.
 Avoids duplication of software.
 Reduces CPU idle time.

Disadvantages of Timesharing operating systems are following.


 Problem of reliability.
 Question of security and integrity of user programs and data.
 Problem of data communication.

3. Distributed operating System


 Distributed systems use multiple central processors to serve multiple real time application
and multiple users.
 Data processing jobs are distributed among the processors accordingly to which one can
perform each job most efficiently.
 The processors communicate with one another through various communication lines
(such as high-speed buses or telephone lines). These are referred as loosely coupled
systems or distributed systems.
 Processors in a distributed system may vary in size and function. These processors are
referred as sites, nodes, and computers and so on.
The advantages of distributed systems are following.
 With resource sharing facility user at one site may be able to use the resources available
at another.
 Speedup the exchange of data with one another via electronic mail.
 If one site fails in a distributed system, the remaining sites can potentially continue
operating.
 Better service to the customers.
 Reduction of the load on the host computer.
 Reduction of delays in data processing.

4. Network operating System


 Network Operating System runs on a server and provides server the capability to manage
data, users, groups, security, applications, and other networking functions.
 The primary purpose of the network operating system is to allow shared file and printer
access among multiple computers in a network, typically a local area network (LAN), a
private network or to other networks.
 Examples of network operating systems are Microsoft Windows Server 2003, Microsoft
Windows Server 2008, UNIX, Linux, Mac OS X, Novell NetWare, and BSD.

The advantages of network operating systems are following.


 Centralized servers are highly stable.
 Security is server managed.
 Upgrades to new technologies and hardware can be easily integrated into the system.
 Remote access to servers is possible from different locations and types of systems.

The disadvantages of network operating systems are following.


 High cost of buying and running a server.
 Dependency on a central location for most operations.
 Regular maintenance and updates are required.

5. Real Time operating System


 Real time system is defines as a data processing system in which the time interval
required to process and respond to inputs is so small that it controls the environment.
 Real time processing is always on line whereas on line system need not be real time.
 The time taken by the system to respond to an input and display of required updated
information is termed as response time. So in this method response time is very less as
compared to the online processing.
 Real-time systems are used when there are rigid time requirements on the operation of a
processor or the flow of data and real-time systems can be used as a control device in a
dedicated application.
 Real-time operating system has well-defined, fixed time constraints otherwise system
will fail. For example Scientific experiments, medical imaging systems, industrial control
systems, weapon systems, robots, and home-appliance controllers, Air traffic control
system etc.
There are two types of real-time operating systems.
Hard real-time systems
 Hard real-time systems guarantee that critical tasks complete on time.
 In hard real-time systems secondary storage is limited or missing with data stored in
ROM.
 In these systems virtual memory is almost never found.

Soft real-time systems


 Soft real time systems are less restrictive.
 Critical real-time task gets priority over other tasks and retains the priority until it
completes.
 Soft real-time systems have limited utility than hard real-time systems.
 For example, Multimedia, virtual reality, Advanced Scientific Projects like undersea
exploration and planetary rovers etc.

1.5 Operating System Services


An Operating System provides services to both the users and to the programs.
 It provides programs, an environment to execute.
 It provides users, services to execute the programs in a convenient manner.

Following are few common services provided by operating systems.


 Program execution
 I/O operations
 File System manipulation
 Communication
 Error Detection
 Resource Allocation

Program execution
 Operating system handles many kinds of activities from user programs to system
programs like printer spooler, name servers, file server etc. Each of these activities is
encapsulated as a process.
 A process includes the complete execution context (code to execute, data to manipulate,
registers, OS resources in use).
 Following are the major activities of an operating system with respect to program
management.
 Loads a program into memory.
 Executes the program.
 Handles program's execution.
 Provides a mechanism for process synchronization.
 Provides a mechanism for process communication.
 Provides a mechanism for deadlock handling.

I/O Operation
 I/O subsystem comprised of I/O devices and their corresponding driver software.
 Drivers hides the peculiarities of specific hardware devices from the user as the device
driver knows the peculiarities of the specific device.
 Operating System manages the communication between user and device drivers.
 Following are the major activities of an operating system with respect to I/O Operation.
 I/O operation means read or write operation with any file or any specific I/O device.
 Program may require any I/O device while running.
 Operating system provides the access to the required I/O device when required.

File system manipulation


 A file represents a collection of related information.
 Computer can store files on the disk (secondary storage), for long term storage purpose.
 Few examples of storage media are magnetic tape, magnetic disk and optical disk drives
like CD, DVD.
 Each of these media has its own properties like speed, capacity, data transfer rate and
data access methods.
 A file system is normally organized into directories for easy navigation and usage. These
directories may contain files and other directions.
 Following are the major activities of an operating system with respect to file
management.
 Program needs to read a file or write a file.
 The operating system gives the permission to the program for operation on file.
 Permission varies from read-only, read-write, denied and so on.
 Operating System provides an interface to the user to create/delete files.
 Operating System provides an interface to the user to create/delete directories.
 Operating System provides an interface to create the backup of file system.

Communication
 In case of distributed systems which are a collection of processors that do not share
memory, peripheral devices, or a clock, operating system manages communications
between processes.
 Multiple processes with one another through communication lines in the network.
 OS handles routing and connection strategies, and the problems of contention and
security.
 Following are the major activities of an operating system with respect to communication.
 Two processes often require data to be transferred between them.
 Both processes can be on the one computer or on different computer but are
connected through computer network.
 Communication may be implemented by two methods either by Shared Memory or
by Message Passing.

Error handling
 Error can occur anytime and anywhere.
 Error may occur in CPU, in I/O devices or in the memory hardware.
 Following are the major activities of an operating system with respect to error handling.
 OS constantly remains aware of possible errors.
 OS takes the appropriate action to ensure correct and consistent computing.

Resource Management
 In case of multi-user or multi-tasking environment, resources such as main memory, CPU
cycles and files storage are to be allocated to each user or job.
 Following are the major activities of an operating system with respect to resource
management.
 OS manages all kind of resources using schedulers.
 CPU scheduling algorithms are used for better utilization of CPU.
Protection
 Considering computer systems having multiple users the concurrent execution of multiple
processes, then the various processes must be protected from each another's activities.
 Protection refers to mechanism or a way to control the access of programs, processes, or
users to the resources defined by computer systems.
 Following are the major activities of an operating system with respect to protection.
 OS ensures that all access to system resources is controlled.
 OS ensures that external I/O devices are protected from invalid access attempts.
 OS provides authentication feature for each user by means of a password.

1.6 Operating System Properties/ Characteristics


Following are few of very important tasks that Operating System handles.

1. Batch processing
Batch processing is a technique in which Operating System collects one programs and data
together in a batch before processing starts. Operating system does the following activities
related to batch processing.
 OS defines a job which has predefined sequence of commands, programs and data as a
single unit.
 OS keeps a number of jobs in memory and executes them without any manual
information.
 Jobs are processed in the order of submission i.e. first come first served fashion.
 When job completes its execution, its memory is released and the output for the job gets
copied into an output spool for later printing or processing.

Advantages
 Batch processing takes much of the work of the operator to the computer.
 Increased performance as a new job gets started as soon as the previous job finished
without any manual intervention.
Disadvantages
 Difficult to debug program.
 A job could enter an infinite loop.
 Due to lack of protection scheme, one batch job can affect pending jobs.

2. Multitasking
Multitasking refers to term where multiple jobs are executed by the CPU simultaneously by
switching between them. Switches occur so frequently that the users may interact with each
program while it is running. Operating system does the following activities related to
multitasking.
 The user gives instructions to the operating system or to a program directly, and receives
an immediate response.
 Operating System handles multitasking in the way that it can handle multiple operations /
executes multiple programs at a time.
 Multitasking Operating Systems are also known as Time-sharing systems.
 These Operating Systems were developed to provide interactive use of a computer system
at a reasonable cost.
 A time-shared operating system uses concept of CPU scheduling and multiprogramming
to provide each user with a small portion of a time-shared CPU.
 Each user has at least one separate program in memory.
 A program that is loaded into memory and is executing is commonly referred to as a
process.
 When a process executes, it typically executes for only a very short time before it either
finishes or needs to perform I/O.
 Since interactive I/O typically runs at people speeds, it may take a long time to complete.
During this time a CPU can be utilized by another process.
 Operating system allows the users to share the computer simultaneously. Since each
action or command in a time-shared system tends to be short, only a little CPU time is
needed for each user.
 As the system switches CPU rapidly from one user/program to the next, each user is
given the impression that he/she has his/her own CPU, whereas actually one CPU is
being shared among many users.

Cooperative and Preemptive Multitasking

Preemptive multitasking means that task switches can be initiated directly out of interrupt
handlers. With cooperative (non-preemptive) multitasking, a task switch is only performed
when a task calls the kernel, i.e., it behaves "cooperatively" and voluntarily gives the kernel a
chance to perform a task switch.

Example:
A receive interrupt handler for a serial port writes data to a mailbox. If a task is waiting at the
mailbox, it is immediately activated by the scheduler during preemptive scheduling. In
cooperative scheduling, however, the task is only brought into the state "Ready". A task
switch does not immediately take place; after the interrupt handler has completed, the task
having been interrupted continues to run. Such a "pending" task switch is performed by the
kernel at some later time, as soon as the active task calls the kernel.
RTKernel-32 supports both cooperative and preemptive scheduling.
3. Multiprogramming
When two or more programs are residing in memory at the same time, then sharing the
processor is referred to the multiprogramming. Multiprogramming assumes a single shared
processor. Multiprogramming increases CPU utilization by organizing jobs so that the CPU
always has one to execute.
Following figure shows the memory layout for a multiprogramming system.

Operating system does the following activities related to multiprogramming.


 The operating system keeps several jobs in memory at a time.
 This set of jobs is a subset of the jobs kept in the job pool.
 The operating system picks and begins to execute one of the jobs in the memory.
 Multiprogramming operating system monitors the state of all active programs and system
resources using memory management programs to ensures that the CPU is never idle
unless there are no jobs

Advantages
 High and efficient CPU utilization.
 User feels that many programs are allotted CPU almost simultaneously.

Disadvantages
 CPU scheduling is required.
 To accommodate many jobs in memory, memory management is required.

4. Multiprocessing System
 Multi-processing refers to the ability of a system to support more than one processor at
the same time.
 Applications in a multi-processing system are broken to smaller routines that run
independently.
 The operating system allocates these threads to the processors improving performance of
the system.
 In a symmetric multi-processing, a single OS instance controls two or more identical
processors connected to a single shared main memory. Most of the multi-processing PC
motherboards utilize symmetric multiprocessing.
 On the other hand, asymmetric multi-processing designates system tasks to be
performed by some processors and applications on others. This is generally not as
efficient as symmetric processing due to the fact that under certain conditions a single
processor might be completely engaged while the other is idle.

5. Interactivity
Interactivity refers that a User is capable to interact with computer system. Operating system
does the following activities related to interactivity.
 OS provides user an interface to interact with system.
 OS managers input devices to take inputs from the user. For example, keyboard.
 OS manages output devices to show outputs to the user. For example, Monitor.
 OS Response time needs to be short since the user submits and waits for the result.

6. Real Time System


Real time systems represents are usually dedicated embedded systems. Operating system
does the following activities related to real time system activity.
 In such systems, Operating Systems typically read from and react to sensor data.
 The Operating system must guarantee response to events within fixed periods of time to
ensure correct performance.

7. Distributed Environment
Distributed environment refers to multiple independent CPUs or processors in a computer
system. Operating system does the following activities related to distributed environment.
 OS Distributes computation logics among several physical processors.
 The processors do not share memory or a clock.
 Instead, each processor has its own local memory.
 OS manages the communications between the processors. They communicate with each
other through various communication lines.

8. Spooling
Spooling is an acronym for Simultaneous Peripheral Operations on Line. Spooling refers to
putting data of various I/O jobs in a buffer. This buffer is a special area in memory or hard
disk which is accessible to I/O devices. Operating system does the following activities related
to distributed environment.
 OS handles I/O device data spooling as devices have different data access rates.
 OS maintains the spooling buffer which provides a waiting station where data can rest
while the slower device catches up.
 OS maintains parallel computation because of spooling process as a computer can
perform I/O in parallel fashion.
 It becomes possible to have the computer read data from a tape, write data to disk and to
write out to a tape printer while it is doing its computing task.

Advantages
 The spooling operation uses a disk as a very large buffer.
 Spooling is capable of overlapping I/O operation for one job with processor
operations for another job.

1.7 System Calls


 System calls provide an interface between the process an the operating system.
 System calls allow user-level processes to request some services from the operating
system which process itself is not allowed to do.
 In handling the trap, the operating system will enter in the kernel mode, where it has
access to privileged instructions, and can perform the desired service on the behalf of
user-level process.
 It is because of the critical nature of operations that the operating system itself does them
every time they are needed.
 For example, for I/O a process involves a system call telling the operating system to read
or write particular area and this request is satisfied by the operating system.
 System programs provide basic functioning to users so that they do not need to write their
own environment for program development (editors, compilers) and program execution
(shells). In some sense, they are bundles of useful system calls.
System Call Parameters
Three general methods exist for passing parameters to the OS:
1. Parameters can be passed in registers.
2. When there are more parameters than registers, parameters can be stored in a block and
the block address can be passed as a parameter to a register.
3. Parameters can also be pushed on or popped off the stack by the operating system.

Types of System Calls


1. Process Control
 A process is basically a single running program. It may be a "system" program
(e.g login, update) or program initiated by the user (textedit).
 When UNIX runs a process it gives each process a unique number - a process
ID, pid.
The system calls for process management are as follows:
i. execl() : execl stands for execute and leave which means that a process will get
executed and then terminated by execl.

It is defined by:

execl(char *path, char *arg0,...,char *argn, 0);


The last parameter must always be 0. It is a NULL terminator. Since the argument list is
variable we must have some way of telling C when it is to end. The NULL terminator
does this job where path points to the name of a file holding a command that is to be
executed, argo points to a string that is the same as path (or at least its last component.

arg1 ... argn are pointers to arguments for the command and 0 simply marks the end of
the (variable) list of arguments.

Example:
main()
{ printf("Files in Directory are:\n");
execl("/bin/ls","ls", "-l",0);
}

ii. fork()

int fork() turns a single process into 2 identical processes, known as the parent and
the child. On success, fork() returns 0 to the child process and returns the process ID of
the child process to the parent process. On failure, fork() returns -1 to the parent process,
sets errno to indicate the error, and no child process is created.

NOTE: The child process will have its own unique PID.

The following program illustrates a simple use of fork, where two copies are made and
run together (multitasking)

main()
{ int return_value;

printf("Forking process\n");
fork();
printf("The process id is %d and return value is %d\n",getpid(),
return_value);

The Output of this would be:

Forking process
The process id is 6753 and return value is 0
The process id is 6754 and return value is 0

iii. wait()

int wait (int *status_location) -- will force a parent process to wait for a child process to
stop or terminate. wait() return the pid of the child or -1 for an error. The exit status of the
child is returned to status_location.
iv. exit()

void exit(int status) -- terminates the process which calls this function and returns the
exit status value. Both UNIX and C (forked) programs can read the status value.

By convention, a status of 0 means normal termination, any other value indicates an error or
unusual occurrence.

2. File management.

i. open():

open system call can be used to open an existing file or to create a new file if it does not exist
already. The syntax of open has following form:

int open(const char *path, int flags);

If the file cannot be opened or created, it returns -1. The first parameter path specifies the file
name to be opened or created. The second parameter (flags) specifies how the file may be
used.

ii. read():

The system call for reading from a file is read. Its syntax is

read(int fd, void *buf, int nbytes);

The first parameter fd is the file descriptor of the file you want to read from, it is normally
returned from open. The second parameter buf is a pointer pointing the memory location
where the input data should be stored. The last parameter nbytes specifies the maximum
number of bytes you want to read. The system call returns the number of bytes it actually
read, and normally this number is either smaller or equal to nbytes.

iii. write():

The system call write is to write data to a file. Its syntax is

write(int fd, const void *buf, int nbytes);

It writes nbytes of data to the file referenced by file descriptor fd from the buffer pointed
by buf. The write starts at the position pointed by the offset of the file. Upon returning
from write, the offset is advanced by the number of bytes which were successfully written.
The function returns the number of bytes that were actually written, or it returns the value -1
if failed.

iv. close():

The close system call closes a file. Its syntax is


int close(int fd);

It returns the value 0 if successful; otherwise the value -1 is returned.

3. Directory Handling

A directory can be read as a file by anyone whoever has reading permissions for it. Writing a
directory as a file can only be done by the kernel. The structure of the directory appears to
the user as a succession of structures named directory entries. A directory entry contains,
among other information, the name of the file and the inode of this. For reading the directory
entries one after the other we can use the following functions:

#include <sys/types.h>
#include <dirent.h>
DIR* opendir(const char* pathname);
struct dirent* readdir(DIR* dp);
void rewinddir(DIR* dp);
int closedir(DIR* dp);

The opendir() function opens a directory. It returns a valid pointer if the opening was
successful and NULL otherwise.

The readdir() function, at every call, reads another directory entry from the current directory.
The first readdir will read the first directory entry; the second call will read the next entry
and so on. In case of a successful reading the function will return a valid pointer to a
structure of type dirent and NULL otherwise (in case it reached the end of the directory, for
example).

The rewinddir() function repositions the file pointer to the first directory entry (the beginning
of the directory).

The closedir() function closes a previously opened directory. In case of an error it returns the
value -1.

The structure dirent is defined in the dirent.h file.

4. Device Management.
o request device, release device
o read, write, reposition
o get/set device attributes
o logically attach or detach devices
o execute device specific operation
5. Information Maintenance.
o get/set time or date
o get/set system data
o get/set process, file, or device attributes
6. Communication.
o create, delete communication connection
o send, receive messages
o transfer status information
o attach or detach remote devices

1.8 Shell and Kernel of the OS

The Kernel

The kernel is the heart and brain of the operating system. The kernel is a layer of software that
sits between the user of a computer and its hardware, and is responsible for efficiently managing
the system's resources. It also schedules the work being done on the system so that each task gets
its fair share of system resources.

There are two different concepts of kernels:


 Monolithic kernel
 μ-kernel (Micro kernel)

Monolithic kernel: The older approach is the monolithic kernel, of which Unix, MS-DOS and
the early Mac OS are typical examples. It runs every basic system service like process and
memory management, interrupt handling and I/O communication, file system, etc. in kernel
space see Figure below. It is constructed in a layered fashion, built up from the fundamental
process management up to the interfaces to the rest of the operating system (libraries and on top
of them the applications). The inclusion of all basic services in kernel space has three big
drawbacks.
 The kernel size increase.
 Lack of extensibility.
 The bad maintainability.
Bug-fixing or the addition of new features means a recompilation of the whole kernel. This is
time and resource consuming because the compilation of a new kernel can take several hours and
a lot of memory. Every time someone adds a new feature or fixes a bug, it means recompilation
of the whole kernel.

To overcome these limitations of extensibility and maintain-ability, the idea of μ-kernels


appeared at the end of the 1980’s.

Microkernel: The concept (figure below) was to reduce the kernel to basic process
communication and I/O control, and let the other system services reside in user space in form of
normal processes (as so called servers). There is a server for managing memory issues, one
server does process management, another one manages drivers, and so on. Because the servers
do not run in kernel space anymore, so called ‖context switches‖ are needed, to allow user
processes to enter privileged mode (and to exit again). That way, the μ-kernel is not a block of
system services anymore, but represents just several basic abstractions and primitives to control
the communication between the processes and between a process and the underlying hardware.
Because communication is not done in a direct way anymore, a message system is introduced,
which allows independent communication and favours extensibility.
BASIS FOR
MICROKERNEL MONOLITHIC KERNEL
COMPARISON
In microkernel user services and In monolithic kernel, both user services
Basic kernel, services are kept in separate and kernel services are kept in the same
address space. address space.
Monolithic kernel is larger than
Size Microkernel are smaller in size.
microkernel.
Execution Slow execution. Fast execution.
The microkernel is easily
Extendible The monolithic kernel is hard to extend.
extendible.
If a service crashes, it does effect If a service crashes, the whole system
Security
on working of microkernel. crashes in monolithic kernel.
To write a microkernel, more code To write a monolithic kernel, less code
Code
is required. is required.
QNX, Symbian, L4Linux, Linux, BSDs (FreeBSD, OpenBSD,
Singularity, K42, Mac OS X, NetBSD), Microsoft Windows
Example
Integrity, PikeOS, HURD, Minix, (95,98,Me), Solaris, OS-9, AIX, HP-UX,
and Coyotos. DOS, OpenVMS, XTS-400 etc.

The Shell

As you can see from the diagram above, the shell is not part of the kernel, but it does
communicate directly with the kernel. It is the "shell around the kernel."
The shell is a command line interpreter that executes the commands you type in. It translates
your commands from a human-readable format into a format that can be understood by the
computer. In addition to carrying out your commands, the shell also allows you to manage your
working environment and is an effective programming language.
Since the shell is a program, just like a word processor or spreadsheet application, different
shells can be used on a single system. This allows users to work with the shell they like the best,
and can also make the computer system appear different to users using different shells because
each shell has its own way of doing things.

The following is a list of commonly used shell programs:


 The Bourne Shell
 The C Shell
 The Bourne Again Shell
 The Korn Shell
CHAPTER 2
PROCESS MANAGEMENT

2.1 Process
 A process is a program in execution.
 The execution of a process must progress in a sequential fashion.
 A process is defined as an entity which represents the basic unit of work to be
implemented in the system.

2.2 Components of Process


1 Object Program: Code to be executed.
2 Data: Data to be used for executing the program.
3 Resources: While executing the program, it may require some resources (e.g. I/O Devices)
Status: Verifies the status of the process execution. A process can run to completion only
4 when all requested resources have been allocated to the process. Two or more processes
could be executing the same program, each using their own data and resources.

2.3 Program Vs Process


Sr. Program Process
No.
1 It consists of instructions in any It is the program in execution (instructions in
programming language. the form of machine code).
2 It is a static object. It is a dynamic object.
3 Resides in secondary storage devices. Resides in main memory.
4 Lifetime is unlimited. Lifetime is limited.
5 It is a passive entity. It is an active entity.

2.4 Process States


As a process executes, it changes state. The state of a process is defined as the current activity of
the process. Process can have one of the following five states at a time.
Sr. No. State & Description

New
1
The process is being created.

Ready
2 The process is waiting to be assigned to a processor. Ready processes are waiting to
have the processor allocated to them by the operating system so that they can run.

Running
3 Process instructions are being executed (i.e. The process that is currently being
executed).

Waiting
4
The process is waiting for some event to occur (such as the completion of an I/O
operation).

Terminated
5
The process has finished execution.

 Processes entering the system must initially go into the ready state.
 A process can only enter the running state from the ready state.
 A process can normally only leave the system from the running state, although a process
in the ready or blocked state may be aborted by the system (in the event of an error, for
example), or by the user.
 Although the model shown above is sufficient to describe the behavior of processes
generally, the model must be extended to allow for other possibilities, such as the
suspension and resumption of a process.
 For example, the process may be swapped out of working memory by the operating
system's memory manager in order to free up memory for another process.
 When a process is suspended, it essentially becomes dormant until resumed by the system
(or by a user).
 Because a process can be suspended while it is either ready or blocked, it may also exist
in one of two further states - ready suspended and blocked suspended (a running process
may also be suspended, in which case it becomes ready suspended).
 The queue of ready processes is maintained in priority order, so the next process to
execute will be the one at the head of the ready queue.
 The queue of blocked process is typically unordered, since there is no sure way to tell
which of these processes will become unblocked first (although if several processes are
blocked awaiting the same event, they may be prioritized within that context).
 To prevent one process from monopolizing the processor, a system timer is started each
time a new process starts executing.
 The process will be allowed to run for a set period of time, after which the timer
generates an interrupt that causes the operating system to regain control of the processor.
 The operating system sends the previously running process to the end of the ready queue,
changing its status from running to ready, and assigns the first process in the ready queue
to the processor, changing its status from ready to running.

2.5 Process Control Block


 Each process is represented in the operating system by a process control block (PCB).
 It is also called a task control block or process descriptor.
 PCB is the data structure used by the operating system. Operating system groups all
information that needs about particular process.
 PCB contains many pieces of information associated with a specific process which is
described below.

Sr. No. Information & Description

Pointer
1 Pointer points to another process control block. Pointer is used for maintaining the
scheduling list.

Process State
2
Process state may be new, ready, running, waiting and so on.

Process Number
3
Process Number indicates the id of the process executing.

Program Counter
4 Program Counter indicates the address of the next instruction to be executed for this
process.

CPU registers
5 CPU registers include general purpose register, stack pointers, index registers and
accumulators etc. number of register and type of register totally depends upon the
computer architecture.

Memory management information


This information may include the value of base and limit registers, the page tables, or
6
the segment tables depending on the memory system used by the operating system.
This information is useful for de-allocating the memory when the process terminates.

Accounting information
7 This information includes the amount of CPU and real time used, time limits, job or
process numbers, account numbers etc.

 Process control block includes CPU scheduling, I/O resource management, file
management information etc.
 The PCB serves as the repository for any information which can vary from process to
process.
 Loader/linker sets flags and registers when a process is created.
 If that process gets suspended, the contents of the registers are saved on a stack and the
pointer to the particular stack frame is stored in the PCB.
 By this technique, the hardware state can be restored so that the process can be scheduled
to run again.

2.6 Threads and Thread Management


 A thread is a flow of execution through the process code, with its own program counter,
system registers and stack.
 A thread is also called a light weight process.
 Threads provide a way to improve application performance through parallelism.
 Threads represent a software approach to improving performance of operating system by
reducing the overhead thread is equivalent to a classical process.
 Each thread belongs to exactly one process and no thread can exist outside a process.
 Each thread represents a separate flow of control.
 Threads have been successfully used in implementing network servers and web server.
 They also provide a suitable foundation for parallel execution of applications on shared
memory multiprocessors.
 Following figure shows the working of the single and multithreaded processes.

Difference between Process and Thread


Sr. No. Process Thread
1 A process is a program in execution. A thread is a path of execution within a
process.
2 Processes are generally used to execute Threads are used to carry out much smaller
large, ‗heavyweight‘ jobs such as or ‗lightweight‘ jobs such as auto saving a
running different applications. document in a program, downloading files,
etc.
3 Each process runs in a separate address Threads, on the other hand, may share
space in the CPU. address space with other threads within the
same process.
4 Switching between processes is Sharing of space facilitates communication
complex and slow. between them. Therefore, Switching
between threads is much simpler and faster.
5 Processes only have control over child Threads also have a great degree of control
processes. over other threads of the same process.
6 Changes made to the parent processes Any changes made to the main thread may
do not affect the child processes. affect the behavior of other threads within
the same process.
7 System calls are required to System calls are not required.
communicate among each other.
8 Process is loosely coupled. Threads are tightly coupled.
9 It requires more resources to execute. Requires fewer resources to execute.
10 Processes are not suitable for parallel Threads are suitable for parallel activities.
activities.
Advantages of Thread
 Threads minimize context switching time.
 Use of threads provides concurrency within a process.
 Efficient communication.
 Economy- It is more economical to create and context switch threads.
 Utilization of multiprocessor architectures to a greater scale and efficiency.

Types of Thread
Threads are implemented in following two ways
 User Level Threads -- User managed threads
 Kernel Level Threads -- Operating System managed threads acting on kernel, an
operating system core.

User Level Threads


In this case, application manages thread management kernel is not aware of the existence of
threads. The thread library contains code for creating and destroying threads, for passing
message and data between threads, for scheduling thread execution and for saving and restoring
thread contexts. The application begins with a single thread and begins running in that thread.

ADVANTAGES
 Thread switching does not require Kernel mode privileges.
 User level thread can run on any operating system.
 Scheduling can be application specific in the user level thread.
 User level threads are fast to create and manage.

DISADVANTAGES
 In a typical operating system, most system calls are blocking.
 Multithreaded application cannot take advantage of multiprocessing.
Kernel Level Threads
In this case, thread management done by the Kernel. There is no thread management code in the
application area. Kernel threads are supported directly by the operating system. Any application
can be programmed to be multithreaded. All of the threads within an application are supported
within a single process.
The Kernel maintains context information for the process as a whole and for individuals‘ threads
within the process. Scheduling by the Kernel is done on a thread basis. The Kernel performs
thread creation, scheduling and management in Kernel space. Kernel threads are generally
slower to create and manage than the user threads.

ADVANTAGES
 Kernel can simultaneously schedule multiple threads from the same process on multiple
processes.
 If one thread in a process is blocked, the Kernel can schedule another thread of the same
process.
 Kernel routines themselves can multithreaded.

DISADVANTAGES
 Kernel threads are generally slower to create and manage than the user threads.
 Transfer of control from one thread to another within same process requires a mode
switch to the Kernel.
Difference between User Level & Kernel Level Thread
Sr. No. User Level Threads Kernel Level Thread
User level threads are faster to create and Kernel level threads are slower to
1
manage. create and manage.
Implementation is by a thread library at Operating system supports creation of
2
the user level. Kernel threads.
User level thread is generic and can run Kernel level thread is specific to the
3
on any operating system. operating system.
Multi-threaded application cannot take Kernel routines themselves can be
4
advantage of multiprocessing. multithreaded.

Multithreading Models
Some operating system provides a combined user level thread and Kernel level thread facility.
Solaris is a good example of this combined approach. In a combined system, multiple threads
within the same application can run in parallel on multiple processors and a blocking system call
need not block the entire process. Multithreading models are three types
 Many to many relationship.
 Many to one relationship.
 One to one relationship.

Many to Many Model


In this model, many user level threads multiplex to the Kernel thread of smaller or equal
numbers. The number of Kernel threads may be specific to either a particular application or a
particular machine.
Following diagram shows the many to many model. In this model, developers can create as
many user threads as necessary and the corresponding Kernel threads can run in parallels on a
multiprocessor.

Many to One Model


Many to one model maps many user level threads to one Kernel level thread. Thread
management is done in user space. When thread makes a blocking system call, the entire process
will be blocks. Only one thread can access the Kernel at a time, so multiple threads are unable to
run in parallel on multiprocessors.
If the user level thread libraries are implemented in the operating system in such a way that
system does not support them then Kernel threads use the many to one relationship modes.

One to One Model


There is one to one relationship of user level thread to the kernel level thread. This model
provides more concurrency than the many to one model. It also allows another thread to run
when a thread makes a blocking system call. It supports multiple threads to execute in parallel on
microprocessors.
Disadvantage of this model is that creating user thread requires the corresponding Kernel thread.
OS/2, Windows NT and windows 2000 use one to one relationship model.

2.7 Process Scheduling


The objective of multiprogramming is to have some process running at all times, to maximize
CPU utilization. The objective of time sharing is to switch the CPU among processes so
frequently. In uniprocessor only one process is running. A process migrates between various
scheduling queues throughout its lifetime. The process of selecting processes from among these
queues is carried out by a scheduler. The aim of processor scheduling is to assign processes to
be executed by the processor. Scheduling affects the performance of the system, because it
determines which process will wait and which will progress.

Types of Schedulers
Schedulers are special system software which handles process scheduling in various ways. Their
main task is to select the jobs to be submitted into the system and to decide which process to run.
Schedulers are of three types
 Long Term Scheduler
 Short Term Scheduler
 Medium Term Scheduler
Long Term Scheduler
 It is also called job scheduler.
 Long term scheduler determines which programs are admitted to the system for
processing.
 Job scheduler selects processes from the queue and loads them into memory for
execution.
 Process loads into the memory for CPU scheduling.
 The primary objective of the job scheduler is to provide a balanced mix of jobs, such as
I/O bound and processor bound.
 It also controls the degree of multiprogramming.
 If the degree of multiprogramming is stable, then the average rate of process creation
must be equal to the average departure rate of processes leaving the system.
 On some systems, the long term scheduler may not be available or minimal.
 Time-sharing operating systems have no long term scheduler.
 When process changes the state from new to ready, then there is use of long term
scheduler.

Short Term Scheduler


 It is also called CPU scheduler.
 Main objective is increasing system performance in accordance with the chosen set of
criteria.
 It is the change of ready state to running state of the process.
 CPU scheduler selects process among the processes that are ready to execute and
allocates CPU to one of them.
 Short term scheduler also known as dispatcher, execute most frequently and makes the
fine grained decision of which process to execute next.
 Short term scheduler is faster than long term scheduler.

Medium Term Scheduler


 Medium term scheduling is part of the swapping.
 It removes the processes from the memory.
 It reduces the degree of multiprogramming.
 The medium term scheduler is in-charge of handling the swapped out-processes.

 Running process may become suspended if it makes an I/O request.


 Suspended processes cannot make any progress towards completion.
 In this condition, to remove the process from memory and make space for other process,
the suspended process is moved to the secondary storage.
 This process is called swapping, and the process is said to be swapped out or rolled out.
 Swapping may be necessary to improve the process mix.

Comparison between Scheduler


Sr.
Long Term Scheduler Short Term Scheduler Medium Term Scheduler
No.
It is a process swapping
1 It is a job scheduler It is a CPU scheduler
scheduler.

Speed is in between both


Speed is lesser than short Speed is fastest among
2 short and long term
term scheduler other two
scheduler.

It provides lesser
It controls the degree of It reduces the degree of
3 control over degree of
multiprogramming multiprogramming.
multiprogramming

It is almost absent or
It is also minimal in It is a part of Time sharing
4 minimal in time sharing
time sharing system systems.
system

It can re-introduce the


It selects processes from It selects those
process into memory and
5 pool and loads them into processes which are
execution can be
memory for execution ready to execute
continued.
2.8 Context Switch
 A context switch is the mechanism to store and restore the state or context of a CPU in
Process Control block so that a process execution can be resumed from the same point at
a later time.
 Using this technique a context switcher enables multiple processes to share a single CPU.
 Context switching is an essential part of a multitasking operating system features.
 When the scheduler switches the CPU from executing one process to execute another, the
context switcher saves the content of all processor registers for the process being
removed from the CPU, in its process descriptor.
 The context of a process is represented in the process control block of a process.
 Context switch time is pure overhead.
 Context switching can significantly affect performance as modern computers have a lot
of general and status registers to be saved.
 Context switching times are highly dependent on hardware support.
 Context switch requires time units to save the state of the processor
with n general registers, assuming b are the store operations required to save n and m
registers of two process control blocks and each store instruction requires K time units.

Some hardware systems employ two or more sets of processor registers to reduce the amount of
context switching time. When the process is switched, the following information is stored.
 Program Counter
 Scheduling Information
 Base and limit register value
 Currently used register
 Changed State
 I/O State
 Accounting

2.9 Scheduling Criteria


 Scheduling criteria is also called as scheduling methodology.
 Key to multiprogramming is scheduling.
 Different CPU scheduling algorithms have different properties.
 The criteria used for comparing these algorithms include the following:

 CPU Utilization: Keep the CPU as busy as possible. It ranges from 0 to 100%. In
practice, it ranges from 40 to 90%.
 Throughput: Throughput is the rate at which processes are completed per unit of
time.
 Turnaround time: This is the how long a process takes to execute a process. It is
calculated as the time gap between the submission of a process and its completion.
 Waiting time: Waiting time is the sum of the time periods spent in waiting in the
ready queue.
 Response time: Response time is the time it takes to start responding from
submission time. It is calculated as the amount of time it takes from when a request
was submitted until the first response is produced.
 Fairness: Each process should have a fair share of CPU.

Non-preemptive Scheduling:
In non-preemptive mode, once if a process enters into running state, it continues to execute until
it terminates or blocks itself to wait for Input/output or by requesting some operating system
service.

Preemptive Scheduling:
In preemptive mode, currently running process may be interrupted and moved to the ready state
by the operating system.
When a new process arrives or when an interrupt occurs, preemptive policies may incur greater
overhead than non-preemptive version but preemptive version may provide better service.
It is desirable to maximize CPU utilization and throughput, and to minimize turnaround time,
waiting time and response time.

Scheduling Algorithms

Scheduling algorithms or scheduling policies are mainly used for short-term scheduling. The
main objective of short-term scheduling is to allocate processor time in such a way as to optimize
one or more aspects of system behavior.
For these scheduling algorithms assume only a single processor is present. Scheduling
algorithms decide which of the processes in the ready queue is to be allocated to the CPU is basis
on the type of scheduling policy and whether that policy is either preemptive or non-preemptive.
For scheduling, arrival time and service time also will play a role.

List of scheduling algorithms are as follows:


 First-come, first-served scheduling (FCFS) algorithm
 Shortest Job First Scheduling (SJF) algorithm
 Shortest Remaining time (SRT) algorithm
 Non-preemptive priority Scheduling algorithm
 Preemptive priority Scheduling algorithm
 Round-Robin Scheduling algorithm
 Multilevel Queue Scheduling algorithm
 Multilevel Feedback Queue Scheduling algorithm

For describing various scheduling policies, we would use the following information, present
below:
Process Arrival Time Burst Time Priority
P1 0 4 2
P2 3 6 1
P3 5 3 3
P4 8 2 1

1. First-come First-served Scheduling (FCFS)


 First-come First-served Scheduling follow first in first out method.
 As each process becomes ready, it joins the ready queue.
 When the current running process ceases to execute, the oldest process in the Ready
queue is selected for running.
 That is first entered process among the available processes in the ready queue.
 The average waiting time for FCFS is often quite long.
 It is non-preemptive.

Advantages
Better for long processes
 Simple method (i.e., minimum overhead on processor)
 No starvation
Disadvantages
 Convoy effect occurs. Even very small process should wait for its turn to come to utilize
the CPU. Short process behind long process results in lower CPU utilization.
 Throughput is not emphasized.
2. Shortest Job First Scheduling (SJF)
 This algorithm associates with each process the length of the next CPU burst.
 Shortest-job-first scheduling is also called as shortest process next (SPN).
 The process with the shortest expected processing time is selected for execution, among
the available processes in the ready queue.
 Thus, a short process will jump to the head of the queue over long jobs.
 If the next CPU bursts of two processes are the same then FCFS scheduling is used to
break the tie.
 SJF scheduling algorithm is probably optimal.
 It gives the minimum average time for a given set of processes.
 It cannot be implemented at the level of short term CPU scheduling.
 There is no way of knowing the shortest CPU burst.
 SJF can be pre-emptive or non-preemptive.
 A pre-emptive SJF algorithm will preempt the currently executing process if the next
CPU burst of newly arrived process may be shorter than what is left to the currently
executing process.
 A non-preemptive SJF algorithm will allow the currently running process to finish.
 Preemptive SJF Scheduling is sometimes called Shortest Remaining Time First
algorithm.

Advantages
 It gives superior turnaround time performance to shortest process next because a short job
is given immediate preference to a running longer job.
 Throughput is high.
Disadvantages
 Elapsed time (i.e., execution-completed-time) must be recorded, it results an additional
overhead on the processor.
 Starvation may be possible for the longer processes.

3. Priority Scheduling

 The SJF is a special case of general priority scheduling algorithm.


 A Priority (an integer) is associated with each process.
 The CPU is allocated to the process with the highest priority.
 Generally smallest integer is considered as the highest priority.
 Equal priority processes are scheduled in First Come First serve order.
 It can be preemptive or Non-preemptive.
Non-preemptive Priority Scheduling
In this type of scheduling the CPU is allocated to the process with the highest priority after
completing the present running process.

Advantage
 Good response for the highest priority processes.
Disadvantage
 Starvation may be possible for the lowest priority processes.

Preemptive Priority Scheduling

 In this type of scheduling the CPU is allocated to the process with the highest priority
immediately upon the arrival of the highest priority process.
 If the equal priority process is in running state, after the completion of the present
running process CPU is allocated to this even though one more equal priority process is
to arrive.

Advantage
 Very good response for the highest priority process over non-preemptive version of it.
Disadvantage
 Starvation may be possible for the lowest priority processes.

4. Round-Robin Scheduling
 This type of scheduling algorithm is basically designed for time sharing system.
 It is similar to FCFS with preemption added.
 Round-Robin Scheduling is also called as time-slicing scheduling and it is a preemptive
version based on a clock. That is a clock interrupt is generated at periodic intervals
usually 10-100ms.
 When the interrupt occurs, the currently running process is placed in the ready queue and
the next ready job is selected on a First-come, First-serve basis.
 This process is known as time-slicing, because each process is given a slice of time
before being preempted.

One of the following happens:


 The process may have a CPU burst of less than the time quantum or
 CPU burst of currently executing process be longer than the time quantum. In this case
then context switch occurs and the process is put at the tail of the ready queue.

In round-robin scheduling, the principal design issue is the length of the time quantum or time-
slice to be used. If the quantum is very short, then short processes will move quickly.
Advantages
 Round-robin is effective in a general-purpose, times-sharing system or transaction-
processing system.
 Fair treatment for all the processes.
 Overhead on processor is low.
 Overhead on processor is low.
 Good response time for short processes.
Disadvantages
 Care must be taken in choosing quantum value.
 Processing overhead is there in handling clock interrupt.
 Throughput is low if time quantum is too small.
UNIVERSITY QUESTION
Use following scheduling algorithm to calculate ATAT and AWT for the following process.
i) FCFS ii) Pre-emptive and non-pre-emptive SJF iii) Pre-emptive Priority
Process Arrival Time Burst Time Priority
P1 0 8 3
P2 1 1 1
P3 2 3 2
P4 3 2 3
P5 4 6 4
Solution:

i) FCFS
P1 P2 P3 P4 P5
0 8 9 12 14 20

Process Waiting Time Turnaround Time


P1 0-0=0 8-0=8
P2 8-1=7 9-1=8
P3 9-2=7 12-2=10
P4 12-3=9 14-3=11
P5 14-4=10 20-4=16
∑= 33 53
Average 6.6 10.6

ii) Pre-emptive SJF


P1 P2 P3 P4 P5 P1
0 1 2 5 7 13 20
Process Waiting Time Turnaround Time
P1 (0-0)+(13-1)=12 (1-0)+(20-1)=20
P2 1-1=0 2-1=1
P3 2-2=0 5-2=3
P4 5-3=2 7-3=4
P5 7-4=3 13-4=9
∑= 17 37
Average 3.4 7.4

iii) Non-Pre-emptive SJF


P1 P2 P4 P3 P5
0 8 9 11 14 20

Process Waiting Time Turnaround Time


P1 0-0=0 8-0=8
P2 8-1=7 9-1=8
P3 11-2=9 14-2=12
P4 9-3=6 11-3=8
P5 14-4=10 20-4=16
∑= 32 52
Average 6.4 10.4

iv) Pre-emptive Priority


P1 P2 P3 P4 P1 P5
0 1 2 5 7 14 20

Process Waiting Time Turnaround Time


P1 (0-0)+(7-1)=6 (1-0)+(14-1)=14
P2 1-1=0 2-1=1
P3 2-2=0 5-2=3
P4 5-3=2 7-3=4
P5 14-4=10 20-4=16
∑= 18 38
Average 3.6 7.6

5. Multilevel Queue Scheduling


 When processes can be readily categorized, then multiple separate queues can be
established, each implementing whatever scheduling algorithm is most appropriate for
that type of job, and/or with different parametric adjustments.
 Scheduling must also be done between queues; that is scheduling one queue to get time
relative to other queues. Two common options are strict priority (no job in a lower
priority queue runs until all higher priority queues are empty) and round-robin (each
queue gets a time slice in turn, possibly of different sizes).
 Note that under this algorithm jobs cannot switch from queue to queue - once they are
assigned a queue; that is their queue until they finish.

6. Multilevel Feedback-Queue Scheduling


 Multilevel feedback queue scheduling is similar to the ordinary multilevel queue
scheduling described above, except jobs may be moved from one queue to another for a
variety of reasons:
o If the characteristics of a job change between CPU-intensive and I/O intensive,
then it may be appropriate to switch a job from one queue to another.
o Aging can also be incorporated, so that a job that has waited for a long time can
get bumped up into a higher priority queue for a while.
 Multilevel feedback queue scheduling is the most flexible, because it can be tuned for
any situation. But it is also the most complex to implement because of all the adjustable
parameters. Some of the parameters which define one of these systems include:
o The number of queues.
o The scheduling algorithm for each queue.
o The methods used to upgrade or demote processes from one queue to another.
(which may be different. )
o The method used to determine which queue a process enters initially.
Consider a system running ten I/O-bound tasks and one CPU-bound task. Assume that the I/O-
bound tasks issue an I/O operation once for every millisecond of CPU computing and that each
I/O operation takes 10 milliseconds to complete. Also assume that the context-switching
overhead is 0.1millisecond and that all processes are long-running tasks. Describe the CPU
utilization for a round-robin scheduler when:
a. The time quantum is 1 millisecond
b. The time quantum is 10 milliseconds
Solution:
(a) Time quantum is 1ms.
Whether a CPU bound or I/O bound process, it switches every one millisecond and when doing
so, it incurs a 0.1 ms overhead. Thus, for every 1.1 ms, the CPU is actually utilized only 1 ms. So
CPU utilization is (1/1.1)∗100=91%
(b) Time quantum is 10ms.
Here, there is a difference between CPU bound and I/O bound processes. A CPU bound process
can use the full 10 ms time slot, whereas an I/O bound process can have it only for 1 ms because
another I/O bound process in the queue will snatch the time from it. So a CPU bound process
takes 10 ms, 10 I/O bound processes would take 10*1=10ms. So, the CPU would be utilized for
a total of 20 ms out of 21.1 ms. (Total time is 10*1.1 + 10.1=21.1ms). Thus the CPU utilization
is (20/21.1)∗100 =95 %
CHAPTER 3
INTERPROCESS COMMUNICATION

3.1 Introduction
In chapter 2, we have studied the concept of processes. In addition to process scheduling,
another important responsibility of the operating system is process synchronization.
Synchronization involves the orderly sharing of system resources by processes.

Concurrency specifies two or more sequential programs (a sequential program specifies


sequential execution of a list of statements) that may be executed concurrently as a parallel
process. For example, an airline reservation system that involves processing transactions from
many terminals has a natural specification as a concurrent program in which each terminal is
controlled by its own sequential process. Even when processes are not executed simultaneously,
it is often easier to structure as a collection of cooperating sequential processes rather than as a
single sequential program.

Concurrent processing is the basis of operating system which supports multiprogramming.

The operating system supports concurrent execution of a program without necessarily supporting
elaborate form of memory and file management. This form of operation is also known as
multitasking. One of the benefits of multitasking is that several processes can be made to
cooperate in order to achieve their goals. To do this, they must do one of the following:

Communicate: Interprocess communication (IPC) involves sending information from one


process to another. This can be achieved using a ―mailbox‖ system, a socket which behaves like
a virtual communication network (loopback), or through the use of ―pipes‖. Pipes are system
constructions which enable one process to open another process as if it were a file for writing or
reading.

Share Data: A segment of memory must be available to both the processes. (Most memory is
locked to a single process).

Waiting: Some processes wait for other processes to give a signal before continuing. This is an
issue of synchronization.

In order to cooperate concurrently executing processes must communicate and synchronize.


Interprocess communication is based on the use of shared variables (variables that can be
referenced by more than one process) or message passing.

Synchronization is often necessary when processes communicate. Processes are executed with
unpredictable speeds. Yet to communicate one process must perform some action such as setting
the value of a variable or sending a message that the other detects. This only works if the events
perform an action or detect an action are constrained to happen in that order. Thus one can view
synchronization as a set of constraints on the ordering of events. The programmer employs a
synchronization mechanism to delay execution of a process in order to satisfy such constraints.
In this chapter, we will study the concept of interprocess communication and synchronization,
need of semaphores, classical problems in concurrent processing, critical regions, monitors and
message passing.

3.2 Interprocess Communication

Interprocess communication (IPC) is a capability supported by operating system that allows


one process to communicate with another process. The processes can be running on the same
computer or on different computers connected through a network. IPC enables one application to
control another application, and for several applications to share the same data without
interfering with one another. IPC is required in all multiprogramming systems, but it is not
generally supported by single-process operating systems such as DOS. OS/2 and MS-Windows
support an IPC mechanism called Dynamic Data Exchange.

IPC allows the process to communicate and to synchronize their actions without sharing the
same address space. This concept can be illustrated with the example of a shared printer as given
below:

Consider a machine with a single printer running a time-sharing operation system. If a process
needs to print its results, it must request that the operating system gives it access to the printer‘s
device driver. At this point, the operating system must decide whether to grant this request,
depending upon whether the printer is already being used by another process. If it is not, the
operating system should grant the request and allow the process to continue; otherwise, the
operating system should deny the request and perhaps classify the process as a waiting process
until the printer becomes available. Indeed, if two processes were given simultaneous access to the
machine‘s printer, the results would be worthless to both.

Consider the following related definitions to understand the example in a better way:

Critical Resource: It is a resource shared with constraints on its use (e.g., memory, files,
printers, etc).

Critical Section: It is code that accesses a critical resource.

Mutual Exclusion: At most one process may be executing a critical section with respect to a
particular critical resource simultaneously.

In the example given above, the printer is the critical resource. Let‘s suppose that the processes
which are sharing this resource are called process A and process B. The critical sections of
process A and process B are the sections of the code which issue the print command. In order to
ensure that both processes do not attempt to use the printer at the same, they must be granted
mutually exclusive access to the printer driver.

First we consider the interprocess communication part. There exist two complementary inter-
process communication types: a) shared-memory system and b) message-passing system. It is
clear that these two schemes are not mutually exclusive, and could be used simultaneously
within a single operating system.

3.2.1 Shared-Memory System

Shared-memory systems require communication processes to share some variables. The


processes are expected to exchange information through the use of these shared variables. In a
shared-memory scheme, the responsibility for providing communication rests with the
application programmers. The operating system only needs to provide shared memory.

A critical problem occurring in shared-memory system is that two or more processes are reading
or writing some shared variables or shared data, and the final results depend on who runs
precisely and when. Such situations are called race conditions. In order to avoid race conditions
we must find some way to prevent more than one process from reading and writing shared
variables or shared data at the same time, i.e., we need the concept of mutual exclusion (which
we will discuss in the later section). It must be sure that if one process is using a shared variable,
the other process will be excluded from doing the same thing.

3.2.2 Message-Passing System

Message passing systems allow communication processes to exchange messages. In this scheme,
the responsibility rests with the operating system itself.

The function of a message-passing system is to allow processes to communicate with each other
without the need to resort to shared variable. An interprocess communication facility basically
provides two operations: send (message) and receive (message). In order to send and to receive
messages, a communication link must exist between two involved processes. This link can be
implemented in different ways. The possible basic implementation questions are:

• How are links established?

• Can a link be associated with more than two processes?

• How many links can there be between every pair of process?

• What is the capacity of a link? That is, does the link have some buffer space? If so, how much?

• What is the size of the message? Can the link accommodate variable size or fixed-size
message?

• Is the link unidirectional or bi-directional?

In the following we consider several methods for logically implementing a communication link
and the send/receive operations. These methods can be classified into two categories: a) Naming,
consisting of direct and indirect communication and b) Buffering, consisting of capacity and
messages proprieties.
Direct Communication

In direct communication, each process that wants to send or receive a message must explicitly
name the recipient or sender of the communication. In this case, the send and receive primitives
are defined as follows:

• send (P, message): To send a message to process P

• receive (Q, message): To receive a message from process Q

This scheme shows the symmetry in addressing, i.e., both the sender and the receiver have to
name one another in order to communicate. In contrast to this, asymmetry in addressing can be
used, i.e., only the sender has to name the recipient; the recipient is not required to name the
sender. So the send and receive primitives can be defined as follows:

• send (P, message): To send a message to the process P

• receive (id, message): To receive a message from any process; id is set to the name of the
process with whom the communication has taken place.

Indirect Communication

With indirect communication, the messages are sent to, and received from a mailbox. A mailbox
can be abstractly viewed as an object into which messages may be placed and from which
messages may be removed by processes. In order to distinguish one from the other, each mailbox
owns a unique identification. A process may communicate with some other process by a number
of different mailboxes. The send and receive primitives are defined as follows:

• send (A, message): To send a message to the mailbox A

• receive (A, message): To receive a message from the mailbox A

Mailboxes may be owned either by a process or by the system. If the mailbox is owned by a
process, then we distinguish between the owner who can only receive from this mailbox and user
who can only send message to the mailbox. When a process that owns a mailbox terminates, its
mailbox disappears. Any process that sends a message to this mailbox must be notified in the
form of an exception that the mailbox no longer exists.

If the mailbox is owned by the operating system, then it has an existence of its own, i.e., it is
independent and not attached to any particular process. The operating system provides a
mechanism that allows a process to: a) create a new mailbox, b) send and receive message
through the mailbox and c) destroy a mailbox. Since all processes with access rights to a mailbox
may terminate, a mailbox may no longer be accessible by any process after some time. In this
case, the operating system should reclaim whatever space was used for the mailbox.
Capacity Link

A link has some capacity that determines the number of messages that can temporarily reside in
it. This propriety can be viewed as a queue of messages attached to the link. Basically there are
three ways through which such a queue can be implemented:

Zero capacity: This link has a message queue length of zero, i.e., no message can wait in it. The
sender must wait until the recipient receives the message. The two processes must be
synchronized for a message transfer to take place. The zero-capacity link is referred to as a
message-passing system without buffering.

Bounded capacity: This link has a limited message queue length of n, i.e., at most n messages
can reside in it. If a new message is sent, and the queue is not full, it is placed in the queue either
by copying the message or by keeping a pointer to the message and the sender should continue
execution without waiting. Otherwise, the sender must be delayed until space is available in the
queue.

Unbounded capacity: This queue has potentially infinite length, i.e., any number of messages
can wait in it. That is why the sender is never delayed.

Bounded and unbounded capacity link provide message-passing system with automatic
buffering.

Messages

Messages sent by a process may be one of three varieties: a) fixed-sized, b) variable-sized and c)
typed messages. If only fixed-sized messages can be sent, the physical implementation is
straightforward. However, this makes the task of programming more difficult. On the other hand,
variable-size messages require more complex physical implementation, but the programming
becomes simpler. Typed messages, i.e., associating a type with each mailbox, are applicable only
to indirect communication. The messages that can be sent to, and received from a mailbox are
restricted to the designated type.

3.3 Race Conditions

In some operating systems, processes that are working together may share some common storage
that each one can read and write. The shared storage may be in main memory (possibly in a
kernel data structure) or it may be a shared file: the location of the shared memory does not
change the nature of the communication or the problems that arise. To see how interprocess
communication works in practice, let us consider a simple but common example: a print spooler.
When a process wants to print a file, it enters the file name in a special spooler directory.
Another process, the printer daemon, periodically checks to see if there are any files to be
printed, and if there are, it prints them and then removes their names from the directory.

Imagine that our spooler directory has a very large number of slots, numbered 0, 1, 2, …, each
one capable of holding a file name. Also imagine that there are two shared variables, out, which
points to the next file to be printed, and in, which points to the next free slot in the directory.
These two variables might well be kept on a two-word file available to all processes. At a certain
instant, slots 0 to 3 are empty (the files have already been printed) and slots 4 to 6 are full (with
the names of files queued for printing). More or less simultaneously, processes A and B decide
they want to queue a file for printing. This situation is shown in Fig. 3.1.

Figure 3.1: Two processes want to access shared memory at same time

In jurisdictions where Murphy‘s law is applicable, the following might happen. Process A reads
in and stores the value, 7, in a local variable called next_free_slot. Just then a clock interrupt
occurs and the CPU decides that process A has run long enough, so it switches to process B,
Process B also reads in, and also gets a 7. It too stores it in its local variable next_free_slot. At
this instant both processes think that the next available slot is 7.
Process B now continues to run. It stores the name of its file in slot 7 and updates in to be an 8.
Then it goes off and does other things.

Eventually, process A runs again, starting from the place it left off. It looks at next_free_slot,
finds a 7 there, and writes its file name in slot 7, erasing the name that process B just put there.
Then it computes next_free_slot + 1, which is 8, and sets in to 8. The spooler directory is now
internally consistent, so the printer daemon will not notice anything wrong, but process B will
never receive any output. User B will hang around the printer room for years, wistfully hoping
for output that never comes. Situations like this, where two or more processes are reading or
writing some shared data and the final result depends on who runs precisely when, are called
race conditions.

3.4 Critical Regions

How do we avoid race conditions? The key to preventing trouble here and in many other
situations involving shared memory, shared files, and shared everything else is to find some way
to prohibit more than one process from reading and writing the shared data at the same time. Put
in other words, what we need is mutual exclusion, that is, some way of making sure that if one
process is using a shared variable or file, the other processes will be excluded from doing the
same thing. The difficulty above occurred because process B started using one of the shared
variables before process A was finished with it. The choice of appropriate primitive operations
for achieving mutual exclusion is a major design issue in any operating system, and a subject that
we will examine in great detail in the following sections.

The problem of avoiding race conditions can also be formulated in an abstract way. Part of the
time, a process is busy doing internal computations and other things that do not lead to race
conditions. However, sometimes a process has to access shared memory or files, or doing other
critical things that can lead to races. That part of the program where the shared memory is
accessed is called the critical region or critical section. If we could arrange matters such that no
two processes were ever in their critical regions at the same time, we could avoid races.

Although this requirement avoids race conditions, this is not sufficient for having parallel
processes cooperate correctly and efficiently using shared data. We need four conditions to hold
to have a good solution:

1. No two processes may be simultaneously inside their critical regions.

2. No assumptions may be made about speeds or the number of CPUs.

3. No process running outside its critical region may block other processes.

4. No process should have to wait forever to enter its critical region.

In an abstract sense, the behavior that we want is shown in Fig. 3.2. Here process A enters its
critical region at time T1, A little later, at time T2 process B attempts to enter its critical region
but fails because another process is already in its critical region and we allow only one at a time.
Consequently, B is temporarily suspended until time T3 when A leaves its critical region,
allowing B to enter immediately. Eventually B leaves (at T4) and we are back to the original
situation with no processes in their critical regions.

Figure 3.2: Mutual exclusion using critical regions

3.5 Mutual Exclusion with Busy Waiting

In this section we will examine various proposals for achieving mutual exclusion, so that while
one process is busy updating shared memory in its critical region, no other process will enter its
critical region and cause trouble.

Disabling Interrupts

 The simplest solution is to have each process disable all interrupts just after entering its
critical region and re-enable them just before leaving it.
 With interrupts disabled, no clock interrupts can occur.
 The CPU is only switched from process to process as a result of clock or other interrupts,
after all, and with interrupts turned off the CPU will not be switched to another process.
 Thus, once a process has disabled interrupts, it can examine and update the shared
memory without fear that any other process will intervene.
 This approach is generally unattractive because it is unwise to give user processes the
power to turn off interrupts.
 Suppose that one of them did it and never turned them on again? That could be the end of
the system.
 Furthermore if the system is a multiprocessor, with two or more CPUs, disabling
interrupts affects only the CPU that executed the disable instruction. The other ones will
continue running and can access the shared memory.
 On the other hand, it is frequently convenient for the kernel itself to disable interrupts for
a few instructions while it is updating variables or lists.
 If an interrupt occurred while the list of ready processes, for example, was in an
inconsistent state, race conditions could occur.
 The conclusion is: disabling interrupts is often a useful technique within the operating
system itself but is not appropriate as a general mutual exclusion mechanism for user
processes.

Lock Variables

As a second attempt, let us look for a software solution.

 Consider having a single, shared (lock) variable, initially 0. When a process wants to
enter its critical region, it first tests the lock.
 If the lock is 0, the process sets it to 1 and enters the critical region. If the lock is already
1, the process just waits until it becomes 0.
 Thus, a 0 means that no process is in its critical region, and a 1 means that some process
is in its critical region.
 Unfortunately, this idea contains exactly the same fatal flaw that we saw in the spooler
directory.
 Suppose that one process reads the lock and sees that it is 0. Before it can set the lock to
1, another process is scheduled, runs, and sets the lock to 1.
 When the first process runs again, it will also set the lock to 1, and two processes will be
in their critical regions at the same time.

Strict Alternation
A third approach to the mutual exclusion problem is shown in Fig.3.3. This program fragment is
written in C.

 In Fig. 3.3, the integer variable turn, initially 0, keeps track of whose turn it is to enter
the critical region and examine or update the shared memory.
 Initially, process 0 inspects turn, finds it to be 0, and enters its critical region.
 Process 1 also finds it to be 0 and therefore sits in a tight loop continually testing turn to
see when it becomes 1.
 Continuously testing a variable until some value appears is called busy waiting.
 It should usually be avoided, since it wastes CPU time. Only when there is a reasonable
expectation that the wait will be short is busy waiting used.
 A lock that uses busy waiting is called a spin lock.

Figure 3.3: Proposed solution to critical region problem


(a) Process 0. (b) Process 1.
 When process 0 leaves the critical region, it sets turn to 1, to allow process 1 to enter its
critical region.
 Suppose that process 1 finishes its critical region quickly, so both processes are in their
noncritical regions, with turn set to 0.
 Now process 0 executes its whole loop quickly, exiting its critical region and setting turn
to 1.
 At this point turn is 1 and both processes are executing in their noncritical regions.
 Suddenly, process 0 finishes its noncritical region and goes back to the top of its loop.
 Unfortunately, it is not permitted to enter its critical region now, because turn is 1 and
process 1 is busy with its noncritical region.
 It hangs in its while loop until process 1 sets turn to 0. Put differently, taking turns is not
a good idea when one of the processes is much slower than the other.
 This situation violates condition 3 set out above: process 0 is being blocked by a process
not in its critical region.
 Going back to the spooler directory discussed above, if we now associate the critical
region with reading and writing the spooler directory, process 0 would not be allowed to
print another file because process 1 was doing something else.
 In fact, this solution requires that the two processes strictly alternate in entering their
critical regions, for example, in spooling files. Neither one would be permitted to spool
two in a row. While this algorithm does avoid all races, it is not really a serious candidate
as a solution because it violates condition 3.

3.6 Peterson’s Solution

By combining the idea of taking turns with the idea of lock variables and warning variables, a
Dutch mathematician, T. Dekker, was the first one to devise a software solution to the mutual
exclusion problem that does not require strict alternation. In 1981, G.L. Peterson discovered a
much simpler way to achieve mutual exclusion, thus rendering Dekker‘s solution obsolete.
Peterson‘s algorithm is shown in Fig. 3.4.

Figure 3.4: Peterson's solution for achieving mutual exclusion


 Before using the shared variables (i.e., before entering its critical region), each process
calls enter_region with its own process number, 0 or 1, as parameter.
 This call will cause it to wait, if need be, until it is safe to enter.
 After it has finished with the shared variables, the process calls leave_region to indicate
that it is done and to allow the other process to enter, if it so desires.

Let us see how this solution works.


 Initially neither process is in its critical region. Now process 0 calls enter_region. It
indicates its interest by setting its array element and sets turn to 0.
 Since process 1 is not interested, enter_region returns immediately. If process 1 now calls
enter_region, it will hang there until interested[0] goes to FALSE, an event that only
happens when process 0 calls leave_region to exit the critical region.
 Now consider the case that both processes call enter_region almost simultaneously. Both
will store their process number in turn.
 Whichever store is done last is the one that counts; the first one is overwritten and lost.
Suppose that process 1 stores last, so turn is 1.
 When both processes come to the while statement, process 0 executes it zero times and
enters its critical region. Process 1 loops and does not enter its critical region until
process 0 exits its critical region.

3.7 The TSL Instruction


Now let us look at a proposal that requires a little help from the hardware. Many computers,
especially those designed with multiple processors in mind, have an instruction

TSL RX,LOCK

(Test and Set Lock) that works as follows.

 It reads the contents of the memory word lock into register RX and then stores a nonzero
value at the memory address lock.
 The operations of reading the word and storing into it are guaranteed to be indivisible—
no other processor can access the memory word until the instruction is finished.
 The CPU executing the TSL instruction locks the memory bus to prohibit other CPUs
from accessing memory until it is done.
 To use the TSL instruction, we will use a shared variable, lock, to coordinate access to
shared memory.
 When lock is 0, any process may set it to 1 using the TSL instruction and then read or
write the shared memory.
 When it is done, the process sets lock back to 0 using an ordinary move instruction.
How can this instruction be used to prevent two processes from simultaneously entering their
critical regions? The solution is given in Fig.3.5. There a four-instruction subroutine in a
fictitious (but typical) assembly language is shown.
 The first instruction copies the old value of lock to the register and then sets lock to 1.
 Then the old value is compared with 0. If it is nonzero, the lock was already set, so the
program just goes back to the beginning and tests it again.
 Sooner or later it will become 0 (when the process currently in its critical region is done
with its critical region), and the subroutine returns, with the lock set.
 Clearing the lock is simple. The program just stores a 0 in lock. No special instructions
are needed.
enter_region:
TSL REGISTER,LOCK | copy lock to register and set lock to 1
CMP REGISTER,#0 | was lock zero?
JNE enter_region | if it was non zero, lock was set, so loop
RET | return to caller; critical region entered

leave_region:
MOVE LOCK,#0 | store a 0 in lock
RET | return to caller
Figure 3.5. Entering and leaving a critical region using the TSL instruction.

One solution to the critical region problem is now straightforward.

 Before entering its critical region, a process calls enter_region, which does busy waiting
until the lock is free; then it acquires the lock and returns.
 After the critical region the process calls leave_region, which stores a 0 in lock.
 As with all solutions based on critical regions, the processes must call enter_region and
leave_region at the correct times for the method to work.
 If a process cheats, the mutual exclusion will fail.

3.8 Sleep and Wakeup


Both Peterson’s solution and the solution using TSL are correct, but both have the defect of
requiring busy waiting. In essence, what these solutions do is this: when a process wants to enter
its critical region, it checks to see if the entry is allowed. If it is not, the process just sits in a tight
loop waiting until it is.

Not only does this approach waste CPU time, but it can also have unexpected effects.

Consider a computer with two processes, H, with high priority and L, with low priority. The
scheduling rules are such that H is run whenever it is in ready state. At a certain moment, with L
in its critical region, H becomes ready to run (e.g., an I/O operation completes). H now begins
busy waiting, but since L is never scheduled while H is running, L never gets the chance to leave
its critical region, so H loops forever. This situation is sometimes referred to as the priority
inversion problem.
Now let us look at some interprocess communication primitives that block instead of wasting
CPU time when they are not allowed to enter their critical regions. One of the simplest is the pair
sleep and wakeup.

 Sleep is a system call that causes the caller to block, that is, be suspended until another
process wakes it up.

 The wakeup call has one parameter, the process to be awakened.

 Alternatively, both sleep and wakeup each have one parameter, a memory address used to
match up sleeps with wakeups.

3.8.1 The Producer-Consumer Problem

As an example of how these primitives can be used, let us consider the producer-consumer
problem (also known as the bounded-buffer problem).

 Two processes share a common, fixed-size buffer. One of them, the producer, puts
information into the buffer, and the other one, the consumer, takes it out. (It is also
possible to generalize the problem to have m producers and n consumers, but we will
only consider the case of one producer and one consumer because this assumption
simplifies the solutions).

 Trouble arises when the producer wants to put a new item in the buffer, but it is already
full. The solution is for the producer to go to sleep, to be awakened when the consumer
has removed one or more items.

 Similarly, if the consumer wants to remove an item from the buffer and sees that the
buffer is empty, it goes to sleep until the producer puts something in the buffer and wakes
it up.

This approach sounds simple enough, but it leads to the same kinds of race conditions we
saw earlier with the spooler directory.

 To keep track of the number of items in the buffer, we will need a variable, count. If the
maximum number of items the buffer can hold is N, the producer‘s code will first test to
see if count is N. If it is, the producer will go to sleep; if it is not, the producer will add an
item and increment count.

 The consumer‘s code is similar: first test count to see if it is 0. If it is, go to sleep, if it is
nonzero, remove an item and decrement the counter. Each of the processes also tests to
see if the other should be awakened, and if so, wakes it up. The code for both producer
and consumer is shown in Fig. 3.6.
#define N 100 /* number of slots in the buffer */
int count = 0; /* number of items in the buffer */
void producer (void)
{
int item;
while (TRUE) /* repeat forever */

{
item = produce_item(); /* generate next item */
if (count = = N) sleep(); /* if buffer is full, go to sleep */
insert_item(item); /* put item in buffer */
count = count + 1; /* increment count of items in buffer */
if (count = = 1) wakeup(consumer); /* was buffer empty? */
}
}
void consumer(void)
{
int item;
while (TRUE) /* repeat forever */
{
if (count = = 0) sleep(); /* if buffer is empty, got to sleep */
item = remove_item(); /* take item out of buffer */
count = count − 1; /* decrement count of items in buffer */
if (count = = N − 1) wakeup(producer); /* was buffer full? */
consume_item(item); /* print item */
}
}
Figure 3.6. The producer-consumer problem with a fatal race condition.

 The procedures insert_item and remove_item, which are not shown, handle the
bookkeeping of putting items into the buffer and taking items out of the buffer.

Now let us get back to the race condition.


 It can occur because access to count is unconstrained.
 The following situation could possibly occur.
 The buffer is empty and the consumer has just read count to see if it is 0.
 At that instant, the scheduler decides to stop running the consumer temporarily and start
running the producer.
 The producer inserts an item in the buffer, increments count, and notices that it is now 1.
 Reasoning that count was just 0, and thus the consumer must be sleeping, the producer
calls wakeup to wake the consumer up.
 Unfortunately, the consumer is not yet logically asleep, so the wakeup signal is lost.
 When the consumer next runs, it will test the value of count it previously read, find it to
be 0, and go to sleep.
 Sooner or later the producer will fill up the buffer and also go to sleep. Both will sleep
forever.
 The essence of the problem here is that a wakeup sent to a process that is not (yet)
sleeping is lost.
 If it were not lost, everything would work. A quick fix is to modify the rules to add a
wakeup waiting bit to the picture. When a wakeup is sent to a process that is still awake,
this bit is set. Later, when the process tries to go to sleep, if the wakeup waiting bit is on,
it will be turned off, but the process will stay awake. The wakeup waiting bit is a piggy
bank for wakeup signals.

3.9 Semaphores

This was the situation in 1965, when E. W. Dijkstra (1965) suggested using an integer variable to
count the number of wakeups saved for future use. In his proposal, a new variable type, called a
semaphore, was introduced. A semaphore could have the value 0, indicating that no wakeups
were saved, or some positive value if one or more wakeups were pending.

Dijkstra proposed having two operations, down and up (generalizations of sleep and wakeup,
respectively).
 The down operation on a semaphore checks to see if the value is greater than 0. If so, it
decrements the value (i.e., uses up one stored wakeup) and just continues. If the value is
0, the process is put to sleep without completing the down for the moment.

 The up operation increments the value of the semaphore addressed. If one or more
processes were sleeping on that semaphore, unable to complete an earlier down
operation, one of them is chosen by the system (e.g., at random) and is allowed to
complete its down.

 Thus, after an up on a semaphore with processes sleeping on it, the semaphore will still
be 0, but there will be one fewer process sleeping on it.

 The operation of incrementing the semaphore and waking up one process is also
indivisible.

 No process ever blocks doing an up, just as no process ever blocks doing a wakeup in the
earlier model.

3.9.1 Solving the Producer-Consumer Problem using Semaphores

Semaphores solve the lost-wakeup problem, as shown in Fig.3.7.

#define N 100 /* number of slots in the buffer */

typedef int semaphore; /* semaphores are a special kind of int */


semaphore mutex = 1; /* controls access to critical region */
semaphore empty = N; /* counts empty buffer slots */
semaphore full = 0; /* counts full buffer slots */

void producer(void)
{
int item;

while (TRUE) /* TRUE is the constant 1 */


{
item = produce_item(); /* generate something to put in buffer */
down(&empty); /* decrement empty count */
down(&mutex); /* enter critical region */
insert_item(item); /* put new item in buffer */
up(&mutex); /* leave critical region */
up(&full); /* increment count of full slots */
}
}

void consumer(void)
{
int item;

while (TRUE) /* infinite loop */


{
down(&full); /* decrement full count */
down(&mutex); /* enter critical region */
item a= remove_item(); /* take item from buffer */
up(&mutex); /* leave critical region */
up(&empty); /* increment count of empty slots */
consume_item(item); /* do something with the item */
}
}
Figure 3.7. The producer-consumer problem using semaphores.

 This solution uses three semaphores: one called full for counting the number of slots that
are full, one called empty for counting the number of slots that are empty, and one called
mutex to make sure the producer and consumer do not access the buffer at the same time.
 Full is initially 0, empty is initially equal to the number of slots in the buffer, and mutex is
initially 1.
 Semaphores that are initialized to 1 and used by two or more processes to ensure that
only one of them can enter its critical region at the same time are called binary
semaphores.
 If each process does a down just before entering its critical region and an up just after
leaving it, mutual exclusion is guaranteed.

 In the example of Fig. 3.7, we have actually used semaphores in two different ways. This
difference is important enough to make explicit.
 The mutex semaphore is used for mutual exclusion. It is designed to guarantee that only
one process at a time will be reading or writing the buffer and the associated variables.
This mutual exclusion is required to prevent chaos.
 The other use of semaphores is for synchronization. The full and empty semaphores are
needed to guarantee that certain event sequences do or do not occur. In this case, they
ensure that the producer stops running when the buffer is full, and the consumer stops
running when it is empty. This use is different from mutual exclusion.

3.11 Monitors

A monitor is a collection of procedures, variables, and data structures that are all grouped
together in a special kind of module or package. Processes may call the procedures in a monitor
whenever they want to, but they cannot directly access the monitor‘s internal data structures
from procedures declared outside the monitor. Figure 3.8 illustrates a monitor written in an
imaginary language, Pidgin Pascal.
monitor example
integer i;
condition c;
procedure producer( );
·
end;
procedure consumer( );
· · ·
end;
end monitor;
Figure 3.8. A monitor
 Monitors have an important property that makes them useful for achieving mutual
exclusion: only one process can be active in a monitor at any instant.
 Monitors are a programming language construct, so the compiler knows they are special
and can handle calls to monitor procedures differently from other procedure calls.
 Typically, when a process calls a monitor procedure, the first few instructions of the
procedure will check to see, if any other process is currently active within the monitor.
 If so, the calling process will be suspended until the other process has left the monitor. If
no other process is using the monitor, the calling process may enter.
 It is up to the compiler to implement the mutual exclusion on monitor entries, but a
common way is to use a mutex or binary semaphore. Because the compiler, not the
programmer, is arranging for the mutual exclusion, it is much less likely that something
will go wrong.
 In any event, the person writing the monitor does not have to be aware of how the
compiler arranges for mutual exclusion. It is sufficient to know that by turning all the
critical regions into monitor procedures, no two processes will ever execute their critical
regions at the same time.
 Although monitors provide an easy way to achieve mutual exclusion, as we have seen
above, that is not enough. We also need a way for processes to block when they cannot
proceed. In the producer-consumer problem, it is easy enough to put all the tests for
buffer-full and buffer-empty in monitor procedures, but how should the producer block
when it finds the buffer full?
 The solution lies in the introduction of condition variables, along with two operations on
them, wait and signal. When a monitor procedure discovers that it cannot continue (e.g.,
the producer finds the buffer full), it does a wait on some condition variable, say, full.
This action causes the calling process to block. It also allows another process that had
been previously prohibited from entering the monitor to enter now.
 Condition variables are not counters. They do not accumulate signals for later use the
way semaphores do. Thus if a condition variable is signaled with no one waiting on it, the
signal is lost forever. In other words, the wait must come before the signal. This rule
makes the implementation much simpler. In practice it is not a problem because it is easy
to keep track of the state of each process with variables, if need be. A process that might
otherwise do a signal can see that this operation is not necessary by looking at the
variables.
 A skeleton of the producer-consumer problem with monitors is given in Fig. 3.9 in an
imaginary language, Pidgin Pascal.

monitor ProducerConsumer
condition full, empty;
integer count;
procedure insert(item: integer);
begin
if count = N then wait(full);
insert_item(item);
count := count + 1:
if count = 1 then signal(empty)
end;
function remove: integer;
begin
if count = 0 then wait(empty);
remove = remove_item;
count := count − 1;
if count = N − 1 then signal(full)
end;
count := 0;
end monitor;
procedure producer;
begin
while true do
begin
item = produce_item;
ProducerConsumer.insert(item)
end
end;
procedure consumer;
begin
while true do
begin
item = ProducerConsumer.remove;
consume_item(item)
end
end;

Figure 3.9. An outline of the producer-consumer problem with monitors. Only one monitor
procedure at a time is active. The buffer has N slots.
 You may be thinking that the operations wait and signal look similar to sleep and
wakeup, which we saw earlier had fatal race conditions.
 They are very similar, but with one crucial difference: sleep and wakeup failed because
while one process was trying to go to sleep, the other one was trying to wake it up. With
monitors, that cannot happen.
 The automatic mutual exclusion on monitor procedures guarantees that if, say, the
producer inside a monitor procedure discovers that the buffer is full, it will be able to
complete the wait operation without having to worry about the possibility that the
scheduler may switch to the consumer just before the wait completes.
 The consumer will not even be let into the monitor at all until the wait is finished and the
producer has been marked as no longer runnable.

3.12 Message Passing

This method of interprocess communication uses two primitives, send and receive, which, like
semaphores and unlike monitors, are system calls rather than language constructs. As such, they
can easily be put into library procedures, such as

send(destination, &message);
and
receive(source, &message);

The former call sends a message to a given destination and the latter one receives a message
from a given source (or from ANY, if the receiver does not care). If no message is available, the
receiver can block until one arrives. Alternatively, it can return immediately with an error code.

Design Issues for Message Passing Systems

Message passing systems have many challenging problems and design issues that do not arise
with semaphores or monitors, especially if the communicating processes are on different
machines connected by a network. For example, messages can be lost by the network. To guard
against lost messages, the sender and receiver can agree that as soon as a message has been
received, the receiver will send back a special acknowledgement message. If the sender has not
received the acknowledgement within a certain time interval, it retransmits the message.

Now consider what happens if the message itself is received correctly, but the acknowledgement
is lost. The sender will retransmit the message, so the receiver will get it twice. It is essential that
the receiver be able to distinguish a new message from the retransmission of an old one. Usually,
this problem is solved by putting consecutive sequence numbers in each original message. If the
receiver gets a message bearing the same sequence number as the previous message, it knows
that the message is a duplicate that can be ignored. Successfully communicating in the face of
unreliable message passing is a major part of the study of computer networks.

Message systems also have to deal with the question of how processes are named, so that the
process specified in a send or receive call is unambiguous. Authentication is also an issue in
message systems: how can the client tell that he is communicating with the real file server, and
not with an imposter?
At the other end of the spectrum, there are also design issues that are important when the sender
and receiver are on the same machine. One of these is performance. Copying messages from one
process to another is always slower than doing a semaphore operation or entering a monitor.
Much work has gone into making message passing efficient.

The Producer-Consumer Problem with Message Passing

Now let us see how the producer-consumer problem can be solved with message passing and no
shared memory. A solution is given in Fig. 3.10. We assume that all messages are the same size
and that messages sent but not yet received are buffered automatically by the operating system.
In this solution, a total of N messages is used, analogous to the N slots in a shared memory
buffer. The consumer starts out by sending N empty messages to the producer. Whenever the
producer has an item to give to the consumer, it takes an empty message and sends back a full
one. In this way, the total number of messages in the system remains constant in time, so they
can be stored in a given amount of memory known in advance.

If the producer works faster than the consumer, all the messages will end up full, waiting for the
consumer: the producer will be blocked, waiting for an empty to come back. If the consumer
works faster, then the reverse happens: all the messages will be empties waiting for the producer
to fill them up: the consumer will be blocked, waiting for a full message.

#define N 100 /* number of slots in the buffer */


void producer(void)
{
int item;
message m; /* message buffer */
while (TRUE) {
item = produce_item( ); /* generate something to put in buffer */
receive(consumer, &m); /* wait for an empty to arrive */
build_message (&m, item); /* construct a message to send */
send(consumer, &m); /* send item to consumer */
}
}
void consumer(void)
{
int item, i;
message m;
for (i = 0; i < N; i++) send(producer, &m); /* send N empties */
while (TRUE) {
receive(producer, &m); /* get message containing item */
item = extract_item(&m); /* extract item from message */
send(producer, &m); /* send back empty reply */
consume_item(tem); /* do something with the item */
}
}

Figure 3.10. The producer-consumer problem with N messages.


Many variants are possible with message passing. For starters, let us look at how messages are
addressed. One way is to assign each process a unique address and have messages be addressed
to processes. A different way is to invent a new data structure, called a mailbox. A mailbox is a
place to buffer a certain number of messages, typically specified when the mailbox is created.
When mailboxes are used, the address parameters, in the send and receive calls, are mailboxes,
not processes. When a process tries to send to a mailbox that is full, it is suspended until a
message is removed from that mailbox, making room for a new one.

For the producer-consumer problem, both the producer and consumer would create mailboxes
large enough to hold N messages. The producer would send messages containing data to the
consumer‘s mailbox, and the consumer would send empty messages to the producer‘s mailbox.
When mailboxes are used, the buffering mechanism is clear: the destination mailbox holds
messages that have been sent to the destination process but have not yet been accepted.

The other extreme from having mailboxes is to eliminate all buffering. When this approach is
followed, if the send is done before the receive, the sending process is blocked until the receive
happens, at which time the message can be copied directly from the sender to the receiver, with
no intermediate buffering. Similarly, if the receive is done first, the receiver is blocked until a
send happens. This strategy is often known as a rendezvous. It is easier to implement than a
buffered message scheme but is less flexible since the sender and receiver are forced to run in
lockstep.

Message passing is commonly used in parallel programming systems. One well-known message-
passing system, for example, is MPI (Message-Passing Interface). It is widely used for
scientific computing.
CHAPTER 4
DEADLOCKS

4.1 Introduction

Computer systems are full of resources that can only be used by one process at a time. Common
examples include printers, tape drives, and slots in the system‘s internal tables. Having two
processes simultaneously writing to the printer leads to gibberish. Having two processes using
the same file system table slot will invariably lead to a corrupted file system. Consequently, all
operating systems have the ability to (temporarily) grant a process exclusive access to certain
resources.

For many applications, a process needs exclusive access to not one resource, but several.
Suppose, for example, two processes each want to record a scanned document on a CD. Process
A requests permission to use the scanner and is granted it. Process B is programmed differently
and requests the CD recorder first and is also granted it. Now A asks for the CD recorder, but the
request is denied until B releases it. Unfortunately, instead of releasing the CD recorder B asks
for the scanner. At this point both processes are blocked and will remain so forever. This
situation is called a deadlock.

Deadlocks can occur in a variety of situations besides requesting dedicated I/O devices. In a
database system, for example, a program may have to lock several records it is using, to avoid
race conditions. If process A locks record R1 and process B locks record R2, and then each
process tries to lock the other one‘s record, we also have a deadlock. Thus deadlocks can occur
on hardware resources or on software resources.

4.2 Resources

A resource can be a hardware device (e.g., a tape drive) or a piece of information (e.g., a locked
record in a database). A computer will normally have many different resources that can be
acquired. For some resources, several identical instances may be available, such as three tape
drives. When several copies of a resource are available, any one of them can be used to satisfy
any request for the resource. In short, a resource is anything that can be used by only a single
process at any instant of time.

4.2.1 Preemptable and Nonpreemptable Resources

Resources come in two types: preemptable and nonpreemptable.

A preemptable resource is one that can be taken away from the process owning it with no ill
effects. Memory is an example of a preemptable resource.

A nonpreemptable resource, in contrast, is one that cannot be taken away from its current
owner without causing the computation to fail. If a process has begun to burn a CD-ROM,
suddenly taking the CD recorder away from it and giving it to another process will result in a
garbled CD, CD recorders are not preemptable at an arbitrary moment.
In general, deadlocks involve nonpreemptable resources. Potential deadlocks that involve
preemptable resources can usually be resolved by reallocating resources from one process to
another. Thus our treatment will focus on nonpreemptable resources.

The sequence of events required to use a resource is given below in an abstract form.
1. Request the resource.
2. Use the resource.
3. Release the resource.

If the resource is not available when it is requested, the requesting process is forced to wait. In
some operating systems, the process is automatically blocked when a resource request fails, and
awakened when it becomes available. In other systems, the request fails with an error code, and
it is up to the calling process to wait a little while and try again.

A process whose resource request has just been denied will normally sit in a tight loop requesting
the resource, then sleeping, then trying again. Although this process is not blocked, for all intents
and purposes, it is as good as blocked, because it cannot do any useful work. In our further
treatment, we will assume that when a process is denied a resource request, it is put to sleep.

The exact nature of requesting a resource is highly system dependent. In some systems, a request
system call is provided to allow processes to explicitly ask for resources. In others, the only
resources that the operating system knows about are special files that only one process can have
open at a time. These are opened by the usual open call. If the file is already in use, the caller is
blocked until its current owner closes it.

4.3 Introduction to Deadlock

Definition: A set of processes is in a deadlock state if each process in the set is waiting for an
event that can be caused by only another process in the set.

In other words, each member of the set of deadlock processes is waiting for a resource that can
be released only by a deadlock process. None of the processes can run, none of them can release
any resources and none of them can be awakened. It is important to note that the number of
processes and the number and kind of resources possessed and requested are unimportant.

Let us understand the deadlock situation with the help of examples.

Example 1: The simplest example of deadlock is where process 1 has been allocated a non-
shareable resource A, say, a tap drive, and process 2 has been allocated a non-sharable resource
B, say, a printer. Now, if it turns out that process 1 needs resource B (printer) to proceed and
process 2 needs resource A (the tape drive) to proceed and these are the only two processes in the
system, each has blocked the other and all useful work in the system stops. This situation is
termed as deadlock.

The system is in deadlock state because each process holds a resource being requested by the
other process and neither process is willing to release the resource it holds.
Example 2: Consider a system with three disk drives. Suppose there are three processes, each is
holding one of these three disk drives. If each process now requests another disk drive, three
processes will be in a deadlock state, because each process is waiting for the event ―disk drive is
released‖, which can only be caused by one of the other waiting processes. Deadlock state
involves processes competing not only for the same resource type, but also for different resource
types.

4.4 Conditions for Deadlock to Occur

Coffman (1971) identified four necessary conditions that must hold simultaneously for a
deadlock to occur.
4.4.1 Mutual Exclusion Condition
The resources involved are non-shareable. At least one resource must be held in a non-shareable
mode, that is, only one process at a time claims exclusive control of the resource. If another
process requests that resource, the requesting process must be delayed until the resource has been
released.

4.4.2 Hold and Wait Condition


In this condition, a requesting process already holds resources and waiting for the requested
resources. A process, holding a resource allocated to it waits for an additional resource(s) that
is/are currently being held by other processes.

4.4.3 No-Preemptive Condition


Resources already allocated to a process cannot be preempted. Resources cannot be removed
forcibly from the processes. After completion, they will be released voluntarily by the process
holding it.

4.4.4 Circular Wait Condition


The processes in the system form a circular list or chain where each process in the list is waiting
for a resource held by the next process in the list.

Let us understand this by a common example. Consider the traffic deadlock shown in the Figure
1.

Consider each section of the street as a resource. In this situation:

• Mutual exclusion condition applies, since only one vehicle can be on a section of the street at a
time.

• Hold-and-wait condition applies, since each vehicle is occupying a section of the street, and
waiting to move on to the next section of the street.
• Non-preemptive condition applies, since a section of the street that is occupied by a vehicle
cannot be taken away from it.

• Circular wait condition applies, since each vehicle is waiting for the next vehicle to move. That
is, each vehicle in the traffic is waiting for a section of the street held by the next vehicle in the
traffic.

The simple rule to avoid traffic deadlock is that a vehicle should only enter an intersection if it is
assured that it will not have to stop inside the intersection.

It is not possible to have a deadlock involving only one single process. The deadlock involves a
circular ―hold-and-wait‖ condition between two or more processes, so ―one‖ process cannot hold
a resource, yet be waiting for another resource that it is holding. In addition, deadlock is not
possible between two threads in a process, because it is the process that holds resources, not the
thread, that is, each thread has access to the resources held by the process.

4.5 Resource Allocation Graph

The idea behind the resource allocation graph is to have a graph which has two different types of
nodes, the process nodes and resource nodes (process represented by circles, resource node
represented by rectangles). For different instances of a resource, there is a dot in the resource
node rectangle. For example, if there are two identical printers, the printer resource might have
two dots to indicate that we don‘t really care which is used, as long as we acquire the resource.
The edges among these nodes represent resource allocation and release. Edges are directed, and
if the edge goes from resource to process node that means the process has acquired the resource.
If the edge goes from process node to resource node that means the process has requested the
resource.
We can use these graphs to determine if a deadline has occurred or may occur. If for example, all
resources have only one instance (all resource node rectangles have one dot) and the graph is
circular, then a deadlock has occurred. If on the other hand some resources have several
instances, then a deadlock may occur. If the graph is not circular, a deadlock cannot occur (the
circular wait condition wouldn‘t be satisfied).
The following are the tips which will help you to check the graph easily to predict the presence
of cycles.

• If no cycle exists in the resource allocation graph, there is no deadlock.

• If there is a cycle in the graph and each resource has only one instance, then there is a deadlock.
In this case, a cycle is a necessary and sufficient condition for deadlock.

• If there is a cycle in the graph, and each resource has more than one instance, there may or may
not be a deadlock. (A cycle may be broken if some process outside the cycle has a resource
instance that can break the cycle). Therefore, a cycle in the resource allocation graph is a
necessary but not sufficient condition for deadlock, when multiple resource instances are
considered.

Figure 4.2: Resource Allocation Graphs


Figure 4.3: Resource Allocation Graph Showing Deadlock

The above graph shown in Figure 4.3 has a cycle and is in Deadlock.

R1 P1 P1 R2
R2 P2 P2 R1

Figure 4.4: Resource Allocation Graph having a cycle and not in a Deadlock

The above graph shown in Figure 4.4 has a cycle and is not in Deadlock.
(Resource 1 has one instance shown by a star)
(Resource 2 has two instances a and b, shown as two stars)

R1 P1 P1 R2 (a)


R2 (b) P2 P2 R1

If P1 finishes, P2 can get R1 and finish, so there is no Deadlock.


4.6 Deadlock Prevention

Havender in his pioneering work showed that since all four of the conditions are necessary for
deadlock to occur, it follows that deadlock might be prevented by denying any one of the
conditions. Let us study Havender‘s algorithm.

Havender’s Algorithm

Elimination of “Mutual Exclusion” Condition


The mutual exclusion condition must hold for non-shareable resources. That is, several processes
cannot simultaneously share a single resource. This condition is difficult to eliminate because
some resources, such as the tape drive and printer, are inherently non-shareable. Note that
shareable resources like read-only-file do not require mutually exclusive access and thus cannot
be involved in deadlock.

Elimination of “Hold and Wait” Condition


There are two possibilities for the elimination of the second condition. The first alternative is that
a process request be granted all the resources it needs at once, prior to execution. The second
alternative is to disallow a process from requesting resources whenever it has previously
allocated resources. This strategy requires that all the resources a process will need must be
requested at once. The system must grant resources on ―all or none‖ basis. If the complete set of
resources needed by a process is not currently available, then the process must wait until the
complete set is available. While the process waits, however, it may not hold any resources. Thus
the ―wait for‖ condition is denied and deadlocks simply cannot occur. This strategy can lead to
serious waste of resources.

For example, a program requiring ten tape drives must request and receive all ten drives before it
begins executing. If the program needs only one tap drive to begin execution and then does not
need the remaining tap drives for several hours then substantial computer resources (9 tape
drives) will sit idle for several hours. This strategy can cause indefinite postponement
(starvation), since not all the required resources may become available at once.

Elimination of “No-preemption” Condition


The no-preemption condition can be alleviated by forcing a process waiting for a resource that
cannot immediately be allocated, to relinquish all of its currently held resources, so that other
processes may use them to finish their needs. Suppose a system does allow processes to hold
resources while requesting additional resources. Consider what happens when a request cannot
be satisfied. A process holds resources a second process may need in order to proceed, while the
second process may hold the resources needed by the first process. This is a deadlock. This
strategy requires that when a process that is holding some resources is denied a request for
additional resources, the process must release its held resources and, if necessary, request them
again together with additional resources. Implementation of this strategy denies the ―no-
preemptive‖ condition effectively.

High Cost
When a process releases resources, the process may lose all its work to that point. One serious
consequence of this strategy is the possibility of indefinite postponement (starvation). A process
might be held off indefinitely as it repeatedly requests and releases the same resources.
Elimination of “Circular Wait” Condition
The last condition, the circular wait, can be denied by imposing a total ordering on all of the
resource types and than forcing all processes to request the resources in order (increasing or
decreasing). This strategy imposes a total ordering of all resource types, and requires that each
process requests resources in a numerical order of enumeration. With this rule, the resource
allocation graph can never have a cycle.

For example, provide a global numbering of all the resources, as shown in the given Table 1:
Table 1: Numbering the resources
Number Resource
1 Floppy drive
2 Printer
3 Plotter
4 Tape Drive
5 CD Drive
Now we will see the rule for this:
Rule: Processes can request resources whenever they want to, but all requests must be made in
numerical order. A process may request first printer and then a tape drive (order: 2, 4), but it may
not request first a plotter and then a printer (order: 3, 2). The problem with this strategy is that it
may be impossible to find an ordering that satisfies everyone.
This strategy, if adopted, may result in low resource utilization and in some cases starvation is
possible too.

4.7 Deadlock Avoidance

Deadlock Avoidance requires that the system has some additional a priori information
available.
 Simplest and most useful model requires that each process declare the maximum
number of resources of each type that it may need
 The deadlock-avoidance algorithm dynamically examines the resource-allocation
state to ensure that there can never be a circular-wait condition
 Resource-allocation state is defined by the number of available and allocated
resources, and the maximum demands of the processes

• Safe state: Such a state occurs when the system can allocate resources to each process (up to
its maximum) in some order and avoid a deadlock. This state will be characterized by a safe
sequence. It must be mentioned here that we should not falsely conclude that all unsafe states are
deadlocked although it may eventually lead to a deadlock.

• Unsafe State: If the system did not follow the safe sequence of resource allocation from the
beginning and it is now in a situation, which may lead to a deadlock, then it is in an unsafe state.

• Deadlock State: If the system has some circular wait condition existing for some processes,
then it is in deadlock state.

When a process requests an available resource, system must decide if immediate allocation
leaves the system in a safe state
System is in safe state if there exists a sequence <P1, P2, …, Pn> of ALL the processes is the
systems such that for each Pi, the resources that Pi can still request can be satisfied by currently
available resources + resources held by all the Pj, with j < I

That is:
 If Pi resource needs are not immediately available, then Pi can wait until all Pj have
finished
 When Pj is finished, Pi can obtain needed resources, execute, return allocated resources,
and terminate
 When Pi terminates, Pi +1 can obtain its needed resources, and so on

From the above diagram we can draw some conclusion as:


 If a system is in safe state  no deadlocks
 If a system is in unsafe state  possibility of deadlock
 Avoidance  ensure that a system will never enter an unsafe state.

4.7.1 Avoidance algorithms

1. Single instance of a resource type


 Use a resource-allocation graph
2. Multiple instances of a resource type
 Use the banker‘s algorithm

4.7.1.1 Resource-Allocation Graph Scheme

 Claim edge Pi  Rj indicates that process Pj may request resource Rj; and is
represented by a dashed line
 Claim edge converts to request edge when a process requests a resource
 Request edge is converted to an assignment edge when the resource is allocated to
the process
 When a resource is released by a process, assignment edge is reconverted to a claim
edge
 Resources must be claimed a priori in the system

(a)Resource-Allocation Graph

(b)Unsafe State In Resource-Allocation Graph

Resource-Allocation Graph Algorithm

 Suppose that process Pi requests a resource Rj


 The request can be granted only if converting the request edge to an assignment edge
does not result in the formation of a cycle in the resource allocation graph

4.7.1.2 Banker’s Algorithm

The most famous deadlock avoidance algorithm, from Dijkstra [1965], is the Banker‘s algorithm.
It is named as Banker‘s algorithm because the process is analogous to that used by a banker in
deciding if a loan can be safely made a not.
The Banker‘s Algorithm is based on the banking system, which never allocates its available cash
in such a manner that it can no longer satisfy the needs of all its customers. Here we must have
the advance knowledge of the maximum possible claims for each process, which is limited by
the resource availability. During the run of the system we should keep monitoring the resource
allocation status to ensure that no circular wait condition can exist.

If the necessary conditions for a deadlock are in place, it is still possible to avoid deadlock by
being careful when resources are allocated. The following are the features that are to be
considered for avoidance of the deadlock s per the Banker‘s Algorithms.

 Each process declares maximum number of resources of each type that it may need.
 Keep the system in a safe state in which we can allocate resources to each process in
some order and avoid deadlock.
 Check for the safe state by finding a safe sequence: <P1, P2, ..., Pn> where resources that
Pi needs can be satisfied by available resources plus resources held by Pj where j < i.

Data Structures for the Banker’s Algorithm


Let n = number of processes, and m = number of resources types.

 Available: Vector of length m. If available [j] = k, there are k instances of resource


type Rj available

 Max: n x m matrix. If Max [i,j] = k, then process Pi may request at most k instances
of resource type Rj

 Allocation: n x m matrix. If Allocation[i,j] = k then Pi is currently allocated k


instances of Rj

 Need: n x m matrix. If Need[i,j] = k, then Pi may need k more instances of Rj to


complete its task

Need [i,j] = Max[i,j] – Allocation [i,j]

Safety Algorithm

1. Let Work and Finish be vectors of length m and n, respectively. Initialize:


Work = Available
Finish [i] = false for i = 0, 1, …, n- 1

2. Find and i such that both:


(a) Finish [i] = false
(b) Needi  Work
If no such i exists, go to step 4

3. Work = Work + Allocationi


Finish[i] = true
go to step 2
4. If Finish [i] == true for all i, then the system is in a safe state

Resource-Request Algorithm for Process Pi

Request = request vector for process Pi. If Requesti [j] = k then process Pi wants k instances
of resource type Rj
1. If Requesti  Needi go to step 2. Otherwise, raise error condition, since process has
exceeded its maximum claim
2. If Requesti  Available, go to step 3. Otherwise Pi must wait, since resources are not
available
3. Pretend to allocate requested resources to Pi by modifying the state as follows:
Available = Available – Request;
Allocationi = Allocationi + Requesti;
Needi = Needi – Requesti;
 If safe  the resources are allocated to Pi
 If unsafe  Pi must wait, and the old resource-allocation state is restored

Limitations of the Banker’s Algorithm

There are some problems with the Banker‘s algorithm as given below:

• It is time consuming to execute on the operation of every resource.


• If the claim information is not accurate, system resources may be underutilized.
• Another difficulty can occur when a system is heavily loaded.
• New processes arriving may cause a problem.
 The process‘s claim must be less than the total number of units of the resource in the
system. If not, the process is not accepted by the manager.
 Since the state without the new process is safe, so is the state with the new process. Just
use the order you had originally and put the new process at the end.
 Ensuring fairness (starvation freedom) needs a little more work, but isn‘t too hard either
(once every hour stop taking new processes until all current processes finish).
 A resource becoming unavailable (e.g., a tape drive breaking), can result in an unsafe
state.
Example of Banker’s Algorithm
1. Consider the following snapshot of the system.

Process Allocation Max Available

A B C D A B C D A B C D
P0 0 2 1 2 0 3 2 2 2 5 3 2
P1 1 1 0 2 2 7 5 2
P2 2 2 5 4 2 3 7 6
P3 0 3 1 2 1 6 4 2
P4 2 4 1 4 3 6 5 8

Answer the following questions using Banker‘s algorithm:


(i) What is the content of the matrix Need?
(ii) Is the system in safe state?
(iii) If a request from process P1 arrives for (1,3,2,1) can the request be granted immediately?

Solution:
(i) The content of Need Matrix: Need = Max - Allocation

Process Need

A B C D
P0 0 1 1 0
P1 1 6 5 0
P2 0 1 2 2
P3 1 3 3 0
P4 1 2 4 4

(ii) To check whether system is in safe state: Safe sequence is calculated as follows:

1. Need of each process is compared with Available. If Need[i,j]≤Available[i,j], then


resources are allocated to that process. After its safe execution, process will release the
resources.

2. If Need[i,j]>Available[i,j], next process is taken for comparison.

Process P0:
Need = <0,1,1,0> and Available = <2,5,3,2>
Need[i,j]≤Available[i,j], then resources are allocated to P0.
Process P0 executes safely.
Available becomes <2,5,3,2> + <0,2,1,2> = <2,7,4,4>

Process P1:
Need = <0,6,5,0> and Available = <2,7,4,4> Need[i,j]>Available[i,j], then resources cannot be
allocated to P1.

Process P2:
Need = <0,1,2,2> and Available = <2,7,4,4>
Need[i,j]≤Available[i,j], then resources are allocated to P2.
Process P2 executes safely.
Available becomes <2,7,4,4> + <2,2,5,4> = <4,9,9,8>

Process P3:
Need = <1,3,3,0> and Available = <4,9,9,8>
Need[i,j]≤Available[i,j], then resources are allocated to P3.
Process P3 executes safely.
Available becomes <4,9,9,8> + <0,3,1,2> = <4,12,10,10>

Process P4:
Need = <1,2,4,4> and Available = <4,12,10,10>
Need[i,j]≤Available[i,j], then resources are allocated to P4.
Process P4 executes safely.
Available becomes <4,12,10,10> + <2,4,1,4> = <6,16,11,14>

Process P1:

Need = <0,6,5,0> and Available = <6,16,11,14>


Need[i,j]≤Available[i,j], then resources are allocated to P0.
Process P0 executes safely.
Available becomes <6,16,11,14> + <1,1,0,2> = <7,17,11,16>
Hence, the system is in safe state.

Safe sequence is <P0,P2,P3,P4,P1>

(iii) If a request from process P1 arrives for (1,3,2,1): Need = <1,3,2,1> and Available =
<2,5,3,2> Need[i,j]≤Available[i,j], then the request be granted immediately. Hence, the system is
in safe state.
Safe sequence is <P0, P1,P2,P3,P4 >

2. Consider the following snapshot of the system.


Process Max Allocation Available
A B C A B C A B C
P0 0 0 1 0 0 1 1 5 2
P1 1 7 5 1 0 0
P2 2 3 5 1 3 5
P3 0 6 5 0 6 3
Using Banker‘s algorithm answer the following questions
i) How many resources are there of type (A, B, C)?
ii) What are the contents of Need matrix?
iii) Is the system in safe state? Why?

i. Total Number of resources


Total resources = Allocation + Available
Hence, resources of type A are 0+1+1+0+1=3
Resources of type B are 0+0+3+6+5=14
Resources of type C are 1+0+5+3+2=11
Therefore, resources of type (A, B, C) are (3, 14, 11)

ii. The contents of Need matrix


Need = Max – Allocation
Hence, the Need matrix is
Process Need
A B C
P0 0 0 0
P1 0 7 5
P2 1 0 0
P3 0 0 2
iii. Is the system in safe state? Why?
Here, we will check whether the processes are executed safely.
For Process P0,
<Need> = <0, 0, 0>
<Available> = <1, 5, 2>
Need < Available.
Hence P0 can be executed safely.
Now, the available resources become <Available>+ <Allocation> of P0
<Available>=<1, 5, 2> + <0, 0, 1> = <1, 5, 3>

For Process P1,


<Need> = <0, 7, 5>
<Available> = <1, 5, 3>
Need > Available for resources B and C.
Hence, we can‘t execute P1 now.

Consider Process P2,


<Need> = <1, 0, 0>
<Available> = <1, 5, 3>
Need < Available.
Hence P2 can be executed safely.
Now, the available resources become <Available>+ <Allocation> of P2
<Available>=<1, 5, 3> + <1, 3, 5> = <2, 8, 8>

Now, we can execute either process P1 first and then P3 or vice versa.
Consider Process P1,
<Need> = <0, 7, 5>
<Available> = <2, 8, 8>
Need < Available.
Hence P1 can be executed safely.
Now, the available resources become <Available>+ <Allocation> of P1
<Available>=<2, 8, 8> + <1, 0, 0> = <3, 8, 8>

Consider Process P3,


<Need> = <0, 0, 2>
<Available> = <3, 8, 8>
Need < Available.
Hence P3 can be executed safely.
Now, the available resources become <Available>+ <Allocation> of P3
<Available>= <3, 8, 8> + <0, 6, 3> = <3, 14, 11>

Hence, the system is in Safe State if processes are executed in the order <P0, P2, P1,
P3> or <P0, P2, P3, P1>

4.8 Deadlock Detection and Recovery

Detection of deadlocks is the most practical policy, which being both liberal and cost efficient,
most operating systems deploy. To detect a deadlock, we must go about in a recursive manner
and simulate the most favored execution of each unblocked process.
 An unblocked process may acquire all the needed resources and will execute.
 It will then release all the acquired resources and remain dormant thereafter.
 The now released resources may wake up some previously blocked process.
 Continue the above steps as long as possible.
 If any blocked processes remain, they are deadlocked.

4.8.1 Recovery from Deadlock

(a) Recovery by process termination

In this approach we terminate deadlocked processes in a systematic way taking into account their
priorities. The moment, enough processes are terminated to recover from deadlock, we stop the
process terminations. Though the policy is simple, there are some problems associated with it.

Consider the scenario where a process is in the state of updating a data file and it is terminated.
The file may be left in an incorrect state by the unexpected termination of the updating process.
Further, processes should be terminated based on some criterion/policy. Some of the criteria may
be as follows:

• Priority of a process
• CPU time used and expected usage before completion
• Number and type of resources being used (can they be preempted easily?)
• Number of resources needed for completion
• Number of processes needed to be terminated
• Are the processes interactive or batch?

(b) Recovery by Checkpointing and Rollback (Resource preemption)

Some systems facilitate deadlock recovery by implementing checkpointing and rollback.


Checkpointing is saving enough state of a process so that the process can be restarted at the point
in the computation where the checkpoint was taken. Autosaving file edits are a form of
checkpointing. Checkpointing costs depend on the underlying algorithm. Very simple algorithms
(like linear primality testing) can be checkpointed with a few words of data. More complicated
processes may have to save all the process state and memory.

If a deadlock is detected, one or more processes are restarted from their last checkpoint.
Restarting a process from a checkpoint is called rollback. It is done with the expectation that the
resource requests will not interleave again to produce deadlock.

Deadlock recovery is generally used when deadlocks are rare, and the cost of recovery (process
termination or rollback) is low.

Process checkpointing can also be used to improve reliability (long running computations), assist
in process migration, or reduce startup costs.
CHAPTER 5
MEMORY MANAGEMENT

5.1 Introduction
 Memory is central to the operation of a modern computer system.
 Memory is a large array of words or bytes, each with its own address.
 Memory management is the functionality of an operating system which handles or
manages primary memory.
 Memory management keeps track of each and every memory location either it is
allocated to some process or it is free.
 It checks how much memory is to be allocated to processes.
 It decides which process will get memory at what time.
 It tracks whenever some memory gets freed or unallocated and correspondingly it updates
the status.
 Memory management provides protection by using two registers, a base register and a
limit register.
 The base register holds the smallest legal physical memory address and the limit
register specifies the size of the range. For example, if the base register holds 300000
and the limit register is 1209000, then the program can legally access all addresses from
300000 through 411999.

5.2 Address Binding


The binding of instructions and data to memory addresses can be done in following ways:
 Compile time -- When it is known at compile time where the process will reside,
compile time binding is used to generate the absolute code.
 Load time -- When it is not known at compile time where the process will reside in
memory, then the compiler generates re-locatable code.
 Execution time -- If the process can be moved during its execution from one memory
segment to another, then binding must be delayed to be done at run time
5.3 Dynamic Loading
 In dynamic loading, a routine of a program is not loaded until it is called by the program.
 All routines are kept on disk in a re-locatable load format.
 The main program is loaded into memory and is executed.
 Other routines methods or modules are loaded on request.
 Dynamic loading makes better memory space utilization and unused routines are never
loaded.
5.4 Dynamic Linking
 Linking is the process of collecting and combining various modules of code and data into
a executable file that can be loaded into memory and executed.
 Operating system can link system level libraries to a program. When it combines the
libraries at load time, the linking is called static linking and when this linking is done at
the time of execution, it is called as dynamic linking.
 In static linking, libraries are linked at compile time, so program code size becomes
bigger whereas in dynamic linking libraries are linked at execution time so program code
size remains smaller.
 The entire program and data of a process must be in physical memory for the process to
execute.
 The size of a process is limited to the size of physical memory. So that a process can be
larger than the amount of memory allocated to it, a technique called overlays is
sometimes used.
 The idea of overlays is to keep in memory only those instructions and data that are
needed at any given time.
 When other instructions are needed, they are loaded into space that was occupied
previously by instructions that are no longer needed.

5.5 Logical versus Physical Address Space


 The address generated by the CPU is a logical address, whereas the address actually seen
by the memory hardware is a physical address.
 Addresses bound at compile time or load time have identical logical and physical
addresses.
 Addresses created at execution time, however, have different logical and physical
addresses.
o In this case the logical address is also known as a virtual address, and the two
terms are used interchangeably by our text.
o The set of all logical addresses used by a program composes the logical address
space, and the set of all corresponding physical addresses composes the physical
address space.
 The run time mapping of logical to physical addresses is handled by the memory-
management unit, MMU.
o The MMU can take on many forms. One of the simplest is a modification of the
base-register scheme described earlier.
o The base register is now termed a relocation register, whose value is added to
every memory request at the hardware level.
 Note that user programs never see physical addresses. User programs work entirely in
logical address space, and any memory references or manipulations are done using purely
logical addresses. Only when the address gets sent to the physical memory chips is the
physical memory address generated.

5.6 Swapping
 A process, can be swapped temporarily out of memory to a backing store, and then
brought back into memory for continued execution.
 Backing store is a usually a hard disk drive or any other secondary storage which fast in
access and large enough to accommodate copies of all memory images for all users. It
must be capable of providing direct access to these memory images.
 Assume a multiprogramming environment with a round robin CPU-scheduling algorithm.
When a quantum expires, the memory manager will start to swap out the process that just
finished, and to swap in another process to the memory space that has been freed as
shown in figure. When each process finishes its quantum, it will be swapped with another
process.

 A variant of this swapping policy is used for priority - based scheduling algorithms. If a
higher - priority process arrives and wants service, the memory manager can swap out the
lower - priority process so that it can load and execute the higher priority process. When
the higher priority process finishes, the lower – priority process can be swapped back in
and continued. This variant of swapping is sometimes called rollout, roll in.
 A process that is swapped out will be swapped back into the same memory space that it
occupies previously.
 If binding is done at assembly or load time, then the process cannot be moved to different
location. If execution-time binding is being used, then it is possible to swap a process into
a different memory space.
 Major time consuming part of swapping is transfer time. Total transfer time is directly
proportional to the amount of memory swapped.
 Let us assume that the user process is of size 100KB and the backing store is a standard
hard disk with transfer rate of 1 MB per second. The actual transfer of the 100K process
to or from memory will take
100KB / 1000KB per second
= 1/10 second
= 100 milliseconds
5.7 Contiguous Memory Allocation
 One approach to memory management is to load each process into a contiguous space.
The operating system is allocated space first, usually at either low or high memory
locations, and then the remaining available memory is allocated to processes as needed.
( The OS is usually loaded low, because that is where the interrupt vectors are located,
but on older systems part of the OS was loaded high to make more room in low memory (
within the 640K barrier ) for user processes. )
5.7.1 Memory Protection (was Memory Mapping and Protection)
 The system shown in Figure below allows protection against user programs accessing
areas that they should not, allows programs to be relocated to different memory starting
addresses as needed, and allows the memory space devoted to the OS to grow or shrink
dynamically as needs change.

5.7.2 Memory Allocation


 One method of allocating contiguous memory is to divide all available memory into
equal sized partitions, and to assign each process to their own partition. This restricts both
the number of simultaneous processes and the maximum size of each process, and is no
longer used.
 An alternate approach is to keep a list of unused (free) memory blocks (holes), and to
find a hole of a suitable size whenever a process needs to be loaded into memory. There
are many different strategies for finding the "best" allocation of memory to processes,
including the three most commonly discussed:
1. First fit - Search the list of holes until one is found that is big enough to satisfy
the request, and assign a portion of that hole to that process. Whatever fraction of
the hole not needed by the request is left on the free list as a smaller hole.
Subsequent requests may start looking either from the beginning of the list or
from the point at which this search ended.
2. Best fit - Allocate the smallest hole that is big enough to satisfy the request. This
saves large holes for other process requests that may need them later, but the
resulting unused portions of holes may be too small to be of any use, and will
therefore be wasted. Keeping the free list sorted can speed up the process of
finding the right hole.
3. Worst fit - Allocate the largest hole available, thereby increasing the likelihood
that the remaining portion will be usable for satisfying future requests.
 Simulations show that either first or best fit are better than worst fit in terms of both time
and storage utilization. First and best fits are about equal in terms of storage utilization,
but first fit is faster.

Example: Given five memory partitions of 100kB, 500kB, 200kB, 300kB, and 600kB (in
order), how would the first-fit, best-fit, and worst-fit algorithms place processes of
212kB, 417kB, 112kB, and 426kB (in order)? Which algorithm makes the most efficient
use of memory?

Solution:

The total memory size that is not used for each algorithm (Fragmentation):
First-fit = 1700 – 741 = 959
Best-fit = 1700 – 1167 = 533
Worst-fit = 1700 – 741 = 959

Memory Utilization Ratio:


First-fit = 741 / 1700 = 43.5%
Best-fit = 1167 / 1700 = 68.6%
Worst-fit = 741 / 1700 = 43.5 %
Best-fit has the most efficient use of the memory.

5.7.3 Fragmentation
 All the memory allocation strategies suffer from external fragmentation, though first and
best fits experience the problems more so than worst fit. External fragmentation means
that the available memory is broken up into lots of little pieces, none of which is big
enough to satisfy the next memory requirement, although the sum total could.
 The amount of memory lost to fragmentation may vary with algorithm, usage patterns,
and some design decisions such as which end of a hole to allocate and which end to save
on the free list.
 Statistical analysis of first fit, for example, shows that for N blocks of allocated memory,
another 0.5 N will be lost to fragmentation.
 Internal fragmentation also occurs, with all memory allocation strategies. This is caused
by the fact that memory is allocated in blocks of a fixed size, whereas the actual memory
needed will rarely be that exact size. For a random distribution of memory requests, on
the average 1/2 block will be wasted per memory request, because on the average the last
allocated block will be only half full.
o Note that the same effect happens with hard drives, and that modern hardware
gives us increasingly larger drives and memory at the expense of ever larger block
sizes, which translates to more memory lost to internal fragmentation.
o Some systems use variable size blocks to minimize losses due to internal
fragmentation.
 If the programs in memory are relocatable, ( using execution-time address binding ), then
the external fragmentation problem can be reduced via compaction, i.e. moving all
processes down to one end of physical memory. This only involves updating the
relocation register for each process, as all internal work is done using logical addresses.
 Another solution as we will see in upcoming sections is to allow processes to use non-
contiguous blocks of physical memory, with a separate relocation register for each block.

5.8 Paging
 External fragmentation is avoided by using paging technique.
 The basic idea behind paging is to divide physical memory into a number of equal
sized blocks called frames, and to divide programs logical memory space into blocks
of the same size called pages.
 When a process is to be executed, its corresponding pages are loaded into any available
memory frames.
 Logical address space of a process can be non-contiguous and a process is allocated
physical memory whenever the free memory frame is available.
 Operating system keeps track of all free frames. Operating system needs n free frames to
run a program of size n pages.
 The page table is used to look up what frame a particular page is stored in at the moment.
In the following example, for instance, page 2 of the program's logical memory is
currently stored in frame 3 of physical memory:
 Address generated by CPU is divided into
 Page number (p) -- page number is used as an index into a page table which
contains base address of each page in physical memory.
 Page offset (d) -- page offset is combined with base address to define the physical
memory address.
 The page table maps the page number to a frame number, to yield a physical address
which also has two parts: The frame number and the offset within that frame. The number
of bits in the frame number determines how many frames the system can address, and the
number of bits in the offset determines the size of each frame.
 Page numbers, frame numbers, and frame sizes are determined by the architecture, but
are typically powers of two, allowing addresses to be split at a certain number of bits. For
example, if the logical address size is 2m and the page size is 2n, then the high-order m-n
bits of a logical address designate the page number and the remaining n bits represent the
offset.
 Note also that the number of bits in the page number and the number of bits in the frame
number do not have to be identical. The former determines the address range of the
logical address space, and the latter relates to the physical address space.

 For a ‗m‘ bit processor, the logical address will be m bits long. Let the page size be 2n
bytes.
 Then, the lower order ‗n‘ bits of a logical address L will represent the page offset (d) and
the higher order ‗(m-n)‘ bits will represent the page number (p).
 Then, .
 Let ‗f‘ be the frame number that holds the page referenced by logical address L.
 Then, f can be obtained by indexing into page-table by using page number p as index i.e.,
.
 Corresponding physical address = ∗
Example: A 16 bit computer is implementing the paging scheme. The page size is of 4096
bytes. The page table for process A is as follows:
Page Number Frame Number
0 7
1 2
2 5
3 1
4 12
5 6
6 0
Convert the following logical addresses into corresponding physical addresses:
(i) 3720
(ii) 7512
(iii) 22340
(iv) 17510
(v) 11225
Solution:
(i) L = 3720 (ii) L = 7512

⸫f=7 ⸫f=2

physical address physical address


= ∗ = ∗
∗ ∗

(iiii) L = 22340 (iv) L = 17510

⸫f=6 ⸫ f = 12

physical address physical address


= ∗ = ∗
∗ ∗

(v) L = 11225

⸫f=5

physical address
= ∗

5.9 Segmentation
 Segmentation is a technique to break memory into logical pieces where each piece
represents a group of related information.
 For example, data segments or code segment for each process, data segment for operating
system and so on.
 Segmentation can be implemented using or without using paging.
 Unlike paging, segments are having varying sizes and thus eliminate internal
fragmentation. External fragmentation still exists but to lesser extent.

 Address generated by CPU is divided into


 Segment number (s) -- segment number is used as an index into a segment table
which contains base address of each segment in physical memory and a limit of
segment.
 Segment offset (o) -- segment offset is first checked against limit and then is
combined with base address to define the physical memory address.
 A segment table maps segment-offset addresses to physical addresses, and
simultaneously checks for invalid addresses, using a system similar to the page tables and
relocation base registers.

Example 1: Consider the following segment table:


Segment Base Length
0 219 600
1 2300 14
2 90 100
3 1327 580
4 1952 96
What are the physical addresses for the following logical addresses?
(i) 0,430, (ii) 1,10, (iii) 2,500, (iv) 3,400, (v) 4,112.
Solution:
(i) Logical address (0, 430)
Segment Number = 0, Length = 600, Offset = 430
Offset < Length
Physical address = Base + Offset = 219 + 430 = 649

(ii) Logical address (1, 10)


Segment Number = 1, Length = 14, Offset = 10
Offset < Length
Physical address = Base + Offset = 2300 + 10 = 2310

(iii) Logical address (2, 500)


Segment Number = 2, Length = 100, Offset = 500
Offset > Length
Hence, illegal reference occurs and OS goes into trap state.

(iv) Logical address (3, 400)


Segment Number = 3, Length = 580, Offset = 400
Offset < Length
Physical address = Base + Offset = 1327 + 400 = 1727
(v) Logical address (4, 112)
Segment Number = 4, Length = 96, Offset = 112
Offset > Length
Hence, illegal reference occurs and OS goes into trap state.

Example 2: Consider the following segment table:


Segment Base Length
0 330 124
1 876 211
2 111 99
3 498 302
What are the physical addresses for the following logical addresses?
(i) 0,99, (ii) 2,78, (iii) 1,256, (iv) 3,222, (v) 0,111.
Solution:
(i) Logical address (0, 99)
Segment Number = 0, Length = 124, Offset = 99
Offset < Length
Physical address = Base + Offset = 330 + 99 = 429

(ii) Logical address (2, 78)


Segment Number = 2, Length = 99, Offset = 78
Offset < Length
Physical address = Base + Offset = 111 + 78 = 189

(iii) Logical address (1, 256)


Segment Number = 1, Length = 211, Offset = 256
Offset > Length
Hence, illegal reference occurs and OS goes into trap state.

(iv) Logical address (3, 222)


Segment Number = 3, Length = 302, Offset = 222
Offset < Length
Physical address = Base + Offset = 498 + 222 = 720

(v) Logical address (0, 111)


Segment Number = 0, Length = 124, Offset = 111
Offset < Length
Physical address = Base + Offset = 330 + 111 = 441

Example 3: On a simple paging system with 224 bytes of a physical memory, 256 pages of
logical address space, and a page size of 210 bytes,
1. How many bytes are in a page frame?
2. How many bits in the physical address specify the page frame?
3. How many entries are in the page table?
4. How many bits are in a logical address?
Solution:
1. A frame is where a page can be mapped into memory, so a frame has to be the same size
as a page - 210 bytes.
2. You have 24 bits of physical address, and a frame is 210 big, so that leaves 14 of the bits
(24 - 10) for the frame's base address.
3. The page table is the full list of pages, whether mapped or unmapped - so there are 256
entries in the page table, since there are 256 pages.
4. That's the page address bits plus the number of pages bits. The upper portion of an
address is the page number (8 bits, since 28=256), and the lower portion is the offset
within that address (10 bits), so the whole address size is 18 bits. Total logical address
space = 218 bytes.

5.10 Virtual Memory

Virtual memory is a technique that allows the execution of processes which are not
completely available in memory. The main visible advantage of this scheme is that programs
can be larger than physical memory. Virtual memory is the separation of user logical memory
from physical memory.
This separation allows an extremely large virtual memory to be provided for programmers when
only a smaller physical memory is available. Following are the situations, when entire program is
not required to be loaded fully in main memory.
 User written error handling routines are used only when an error occurred in the data or
computation.
 Certain options and features of a program may be used rarely.
 Many tables are assigned a fixed amount of address space even though only a small
amount of the table is actually used.
 The ability to execute a program that is only partially in memory would counter many
benefits.
 Less number of I/O would be needed to load or swap each user program into memory.
 A program would no longer be constrained by the amount of physical memory that is
available.
 Each user program could take less physical memory, and more programs could be run the
same time, with a corresponding increase in CPU utilization and throughput.
Virtual memory is commonly implemented by demand paging. It can also be implemented in a
segmentation system. Demand segmentation can also be used to provide virtual memory.
5.11 Demand Paging
A demand paging system is quite similar to a paging system with swapping. When we want to
execute a process, we swap it into memory. Rather than swapping the entire process into
memory, however, we use a lazy swapper called pager.

When a process is to be swapped in, the pager guess pages which will be used before the process
is swapped out again. Instead of swapping in a whole process, the pager brings only those
necessary pages into memory. Thus, it avoids reading into memory pages that will not be used in
anyway, decreasing the swap time and the amount of physical memory needed.

Hardware support is required to distinguish between those pages that are in memory and those
pages that are on the disk using the valid-invalid bit scheme, where valid and invalid pages can
be checked by checking the bit. Marking a page will have no effect if the process never attempts
to access the page. While the process executes and accesses pages that are memory resident,
execution proceeds normally.

Access to a page marked invalid causes a page-fault trap. This trap is the result of the operating
system's failure to bring the desired page into memory. But page fault can be handled as
following
Step Description
Check an internal table for this process, to determine whether the reference
Step 1
was a valid or it was an invalid memory access.
If the reference was invalid, terminate the process. If it was valid, but page
Step 2
have not yet brought in, page in the latter.
Step 3 Find a free frame.
Schedule a disk operation to read the desired page into the newly allocated
Step 4
frame.
When the disk read is complete, modify the internal table kept with the
Step 5
process and the page table to indicate that the page is now in memory.
Restart the instruction that was interrupted by the illegal address trap. The
process can now access the page as though it had always been in memory.
Step 6
Therefore, the operating system reads the desired page into memory and
restarts the process as though the page had always been in memory.
Advantages
Following are the advantages of Demand Paging
 Large virtual memory.
 More efficient use of memory.
 Unconstrained multiprogramming. There is no limit on degree of multiprogramming.
Disadvantages
Following are the disadvantages of Demand Paging
 Number of tables and amount of processor overhead for handling page interrupts are
greater than in the case of the simple paged management techniques.
 Due to the lack of explicit constraints on jobs address space size increases.

5.12 Page Replacement Algorithm


Page replacement algorithms are the techniques using which Operating System decides which
memory pages to swap out, write to disk when a page of memory needs to be allocated. Paging
happens whenever a page fault occurs and a free page cannot be used for allocation purpose
accounting to reason that pages are not available or the number of free pages is lower than
required pages.
When the page that was selected for replacement and was paged out, is referenced again then it
has to read in from disk, and this requires for I/O completion. This process determines the quality
of the page replacement algorithm: the lesser the time waiting for page-ins, the better is the
algorithm. A page replacement algorithm looks at the limited information about accessing the
pages provided by hardware, and tries to select which pages should be replaced to minimize the
total number of page misses, while balancing it with the costs of primary storage and processor
time of the algorithm itself. There are many different page replacement algorithms. We evaluate
an algorithm by running it on a particular string of memory reference and computing the number
of page faults.
Reference String
The string of memory references is called reference string. Reference strings are generated
artificially or by tracing a given system and recording the address of each memory reference. The
latter choice produces a large number of data, where we note two things.
 For a given page size we need to consider only the page number, not the entire address.
 If we have a reference to a page p, then any immediately following references to page p
will never cause a page fault. Page p will be in memory after the first reference; the
immediately following references will not fault.
 For example, consider the following sequence of addresses - 123,215,600,1234,76,96
 If page size is 100 then the reference string is 1,2,6,12,0,0
First In First Out (FIFO) algorithm
 Oldest page in main memory is the one which will be selected for replacement.
 Easy to implement, keep a list, replace pages from the tail and add new pages at the head.

Page Faults = 15
Hits = 5

 Although FIFO is simple and easy, it is not always optimal, or even efficient.
 An interesting effect that can occur with FIFO is Belady's anomaly, in which increasing
the number of frames available can actually increase the number of page faults that
occur! Consider, for example, the following chart based on the page sequence ( 1, 2, 3, 4,
1, 2, 5, 1, 2, 3, 4, 5 ) and a varying number of available frames. Obviously the maximum
number of faults is 12 ( every request generates a fault ), and the minimum number is 5 (
each page loaded only once ), but in between there are some interesting results:
Optimal Page algorithm
 An optimal page-replacement algorithm has the lowest page-fault rate of all algorithms.
An optimal page-replacement algorithm exists, and has been called OPT or MIN.
 Replace the page that will not be used for the longest period of time. Use the time when a
page is to be used.

Page Faults = 9
Hits = 11
Least Recently Used (LRU) algorithm
 Page which has not been used for the longest time in main memory is the one which will
be selected for replacement.
 Easy to implement, keep a list, replace pages by looking back into time.
Page Faults = 12
Hits = 8
Calculate page faults and Hits using FIFO, LRU and Optimal page replacement algorithm for the
following page sequence (2, 3, 5, 4, 2, 5, 7, 3, 8, 7). Assume page frame size is 3.
FIFO:
Page Sequence 2 3 5 4 2 5 7 3 8 7
Frame 1 2 2 2 4 4 4 4 3 3 3
Frame 2 3 3 3 2 2 2 2 8 8
Frame 3 5 5 5 5 7 7 7 7
F F F F F H F F F H
Page Faults = 8,
Hits = 2
LRU:
Page Sequence 2 3 5 4 2 5 7 3 8 7
Frame 1 2 2 2 4 4 4 7 7 7 7
Frame 2 3 3 3 2 2 2 3 3 3
Frame 3 5 5 5 5 5 5 8 8
F F F F F H F F F H
Page Faults = 8,
Hits = 2

OPT:
Page Sequence 2 3 5 4 2 5 7 3 8 7
Frame 1 2 2 2 2 2 2 7 7 7 7
Frame 2 3 3 4 4 4 4 3 8 8
Frame 3 5 5 5 5 5 5 5 5
F F F F H H F F F H
Page Faults = 7,
Hits = 3

5.13 Thrashing
 If a process cannot maintain its minimum required number of frames, then it must be
swapped out, freeing up frames for other processes. This is an intermediate level of CPU
scheduling.
 But what about a process that can keep its minimum, but cannot keep all of the frames
that it is currently using on a regular basis? In this case it is forced to page out pages that
it will need again in the very near future, leading to large numbers of page faults.
 A process that is spending more time paging than executing is said to be thrashing.

Cause of Thrashing
 Early process scheduling schemes would control the level of multiprogramming allowed
based on CPU utilization, adding in more processes when CPU utilization was low.
 The problem is that when memory filled up and processes started spending lots of time
waiting for their pages to page in, then CPU utilization would lower, causing the schedule
to add in even more processes and exacerbating the problem! Eventually the system
would essentially grind to a halt.
 Local page replacement policies can prevent one thrashing process from taking pages
away from other processes, but it still tends to clog up the I/O queue, thereby slowing
down any other process that needs to do even a little bit of paging (or any other I/O for
that matter. )
CHAPTER 6
FILE MANAGEMENT
File
 A file is a named collection of related information that is recorded on secondary storage
such as magnetic disks, magnetic tapes and optical disks.
 In general, a file is a sequence of bits, bytes, lines or records whose meaning is defined
by the files creator and user.

6.1 File Structure


File structure is a structure, which is according to a required format that operating system can
understand.
 A file has a certain defined structure according to its type.
 A text file is a sequence of characters organized into lines.
 A source file is a sequence of procedures and functions.
 An object file is a sequence of bytes organized into blocks that are understandable by the
machine.
 When operating system defines different file structures, it also contains the code to
support these file structure.
 UNIX, MS-DOS support minimum number of file structure.

6.2 File Attributes


Different OSes keep track of different file attributes, including:
 Name – only information kept in human-readable form
 Identifier – unique tag (number) identifies file within file system
 Type – needed for systems that support different types
 Location – pointer to file location on device
 Size – current file size
 Protection – controls who can do reading, writing, executing
 Time, date, and user identification – data for protection, security, and usage monitoring
 Information about files are kept in the directory structure, which is maintained on the disk

6.3 File Operations


 The file ADT supports many common operations:
o Creating a file
o Writing a file
o Reading a file
o Repositioning within a file
o Deleting a file
o Truncating a file.
 Most OSes require that files be opened before access and closed after all access is
complete. Normally the programmer must open and close files explicitly, but some rare
systems open the file automatically at first access. Information about currently open files
is stored in an open file table, containing for example:
o File pointer - records the current position in the file, for the next read or write
access.
o File-open count - How many times has the current file been opened
(simultaneously by different processes) and not yet closed? When this counter
reaches zero the file can be removed from the table.
o Disk location of the file.
o Access rights
 Some systems provide support for file locking.
o A shared lock is for reading only.
o An exclusive lock is for writing as well as reading.
o An advisory lock is informational only, and not enforced. A mandatory lock is
enforced.
o UNIX used advisory locks, and Windows uses mandatory locks.

6.4 File Type


File type refers to the ability of the operating system to distinguish different types of file such as
text files source files and binary files etc. Many operating systems support many types of files.
Operating system like MS-DOS and UNIX has the following types of files:

 Ordinary files
 These are the files that contain user information.
 These may have text, databases or executable program.
 The user can apply various operations on such files like add, modify, delete or even
remove the entire file.
 Directory files
 These files contain list of file names and other information related to these files.
 Special files:
 These files are also known as device files.
 These files represent physical device like disks, terminals, printers, networks, tape
drive etc.
 These files are of two types
1. Character special files - data is handled character by character as in case of
terminals or printers.
2. Block special files - data is handled in blocks as in the case of disks and
tapes.
6.5 File Access Mechanisms
File access mechanism refers to the manner in which the records of a file may be accessed.
There are several ways to access files
 Sequential access
 Direct/Random access
 Indexed sequential access

Sequential access
 A sequential access is that in which the records are accessed in some sequence i.e. the
information in the file is processed in order, one record after the other.
 This access method is the most primitive one.
 Example: Compilers usually access files in this fashion.
 A sequential access file emulates magnetic tape operation, and generally supports a few
operations:
 read next - read a record and advance the tape to the next position.
 write next - write a record and advance the tape to the next position.
 rewind
 skip N records - May or may not be supported. N may be limited to positive numbers, or
may be limited to +/- 1.
Direct/Random access
 Random access file organization provides, accessing the records directly.
 Each record has its own address on the file with by the help of which it can be directly
accessed for reading or writing.
 The records need not be in any sequence within the file and they need not be in adjacent
locations on the storage medium.
 Jump to any record and read that record. Operations supported include:
 read n - read record number n. ( Note an argument is now required. )
 write n - write record number n. ( Note an argument is now required. )
 jump to record n - could be 0 or the end of file.
 Query current record - used to return back to this record later.
 Sequential access can be easily emulated using direct access. The inverse is complicated
and inefficient.

Indexed sequential access


 This mechanism is built up on base of sequential access.
 An index is created for each file which contains pointers to various blocks.
 Index is searched sequentially and its pointer is used to access the file directly.

6.6 File Allocation Methods


Files are allocated disk spaces by operating system. Operating systems deploy following three
main ways to allocate disk space to files.
 Contiguous Allocation
 Linked Allocation
 Indexed Allocation

Contiguous Allocation
 Each file occupies a contiguous address space on disk.
 Assigned disk address is in linear order.
 Easy to implement.
 External fragmentation is a major issue with this type of allocation technique.

Linked Allocation
 Each file carries a list of links to disk blocks.
 Directory contains link / pointer to first block of a file.
 No external fragmentation
 Effectively used in sequential access file.
 Inefficient in case of direct access file.
Indexed Allocation
 Provides solutions to problems of contiguous and linked allocation.
 A index block is created having all pointers to files.
Each file has its own index block which stores the addresses of disk space occupied by the file.
Directory contains the addresses of index blocks of files.

6.7 Directory Structure

6.7.1 Storage Structure


 A disk can be used in its entirety for a file system.
 Alternatively a physical disk can be broken up into multiple partitions, slices, or mini-
disks, each of which becomes a virtual disk and can have its own filesystem. ( or be used
for raw storage, swap space, etc. )
 Or, multiple physical disks can be combined into one volume, i.e. a larger virtual disk,
with its own filesystem spanning the physical disks.
Figure - A typical file-system organization.

6.7.2 Directory Overview


 Directory operations to be supported include:
o Search for a file
o Create a file - add to the directory
o Delete a file - erase from the directory
o List a directory - possibly ordered in different ways.
o Rename a file - may change sorting order
o Traverse the file system.
6.7.3. Single-Level Directory
 Simple to implement, but each file must have a unique name.

Figure - Single-level directory.

6.7.4 Two-Level Directory


 Each user gets their own directory space.
 File names only need to be unique within a given user's directory.
 A master file directory is used to keep track of each users directory, and must be
maintained when users are added to or removed from the system.
 A separate directory is generally needed for system ( executable ) files.
 Systems may or may not allow users to access other directories besides their own
o If access to other directories is allowed, then provision must be made to specify
the directory being accessed.
o If access is denied, then special consideration must be made for users to run
programs located in system directories. A search path is the list of directories in
which to search for executable programs, and can be set uniquely for each user.

Figure - Two-level directory structure.

6.7.5 Tree-Structured Directories


 An obvious extension to the two-tiered directory structure, and the one with which we are
all most familiar.
 Each user / process has the concept of a current directory from which all ( relative )
searches take place.
 Files may be accessed using either absolute pathnames ( relative to the root of the tree )
or relative pathnames ( relative to the current directory. )
 Directories are stored the same as any other file in the system, except there is a bit that
identifies them as directories, and they have some special structure that the OS
understands.
 One question for consideration is whether or not to allow the removal of directories that
are not empty - Windows requires that directories be emptied first, and UNIX provides an
option for deleting entire sub-trees.
Figure - Tree-structured directory structure.

6.7.6 Acyclic-Graph Directories


 When the same files need to be accessed in more than one place in the directory structure
( e.g. because they are being shared by more than one user / process ), it can be useful to
provide an acyclic-graph structure. ( Note the directed arcs from parent to child. )
 UNIX provides two types of links for implementing the acyclic-graph structure. ( See
"man ln" for more details. )
o A hard link ( usually just called a link ) involves multiple directory entries that
both refer to the same file. Hard links are only valid for ordinary files in the same
filesystem.
o A symbolic link, that involves a special file, containing information about where
to find the linked file. Symbolic links may be used to link directories and/or files
in other filesystems, as well as ordinary files in the current filesystem.
 Windows only supports symbolic links, termed shortcuts.
 Hard links require a reference count, or link count for each file, keeping track of how
many directory entries are currently referring to this file. Whenever one of the references
is removed the link count is reduced, and when it reaches zero, the disk space can be
reclaimed.
 For symbolic links there is some question as to what to do with the symbolic links when
the original file is moved or deleted:
o One option is to find all the symbolic links and adjust them also.
o Another is to leave the symbolic links dangling, and discover that they are no
longer valid the next time they are used.
o What if the original file is removed, and replaced with another file having the
same name before the symbolic link is next used?
Figure - Acyclic-graph directory structure.
6.7.7 General Graph Directory
 If cycles are allowed in the graphs, then several problems can arise:
o Search algorithms can go into infinite loops. One solution is to not follow links in
search algorithms. (or not to follow symbolic links, and to only allow symbolic
links to refer to directories. )
o Sub-trees can become disconnected from the rest of the tree and still not have
their reference counts reduced to zero. Periodic garbage collection is required to
detect and resolve this problem. (chkdsk in DOS and fsck in UNIX search for
these problems, among others, even though cycles are not supposed to be allowed
in either system. Disconnected disk blocks that are not marked as free are added
back to the file systems with made-up file names, and can usually be safely
deleted. )

Figure - General graph directory.

6.8 File Sharing

6.8.1 Multiple Users


 On a multi-user system, more information needs to be stored for each file:
o The owner (user) who owns the file, and who can control its access.
o The group of other user IDs that may have some special access to the file.
o What access rights are afforded to the owner (User), the Group, and to the rest of
the world ( the universe, a.k.a. Others. )
o Some systems have more complicated access control, allowing or denying
specific accesses to specifically named users or groups.

6.8.2 Remote File Systems


 The advent of the Internet introduces issues for accessing files stored on remote
computers
o The original method was ftp, allowing individual files to be transported across
systems as needed. Ftp can be either account or password controlled, or
anonymous, not requiring any user name or password.
o Various forms of distributed file systems allow remote file systems to be mounted
onto a local directory structure, and accessed using normal file access commands.
(The actual files are still transported across the network as needed, possibly using
ftp as the underlying transport mechanism. )
o The WWW has made it easy once again to access files on remote systems without
mounting their file systems, generally using (anonymous) ftp as the underlying
file transport mechanism.

6.8.2.1 The Client-Server Model


 When one computer system remotely mounts a file system that is physically located on
another system, the system which physically owns the files acts as a server, and the
system which mounts them is the client.
 User IDs and group IDs must be consistent across both systems for the system to work
properly. (i.e. this is most applicable across multiple computers managed by the same
organization, shared by a common group of users )
 The same computer can be both a client and a server. E.g. cross-linked file systems )
 There are a number of security concerns involved in this model:
o Servers commonly restrict mount permission to certain trusted systems only.
Spoofing ( a computer pretending to be a different computer ) is a potential
security risk.
o Servers may restrict remote access to read-only.
o Servers restrict which file systems may be remotely mounted. Generally the
information within those subsystems is limited, relatively public, and protected by
frequent backups.
 The NFS ( Network File System ) is a classic example of such a system.

6.8.2.2 Distributed Information Systems


 The Domain Name System, DNS, provides for a unique naming system across all of the
Internet.
 Domain names are maintained by the Network Information System, NIS, which
unfortunately has several security issues. NIS+ is a more secure version, but has not yet
gained the same widespread acceptance as NIS.
 Microsoft's Common Internet File System, CIFS, establishes a network login for each
user on a networked system with shared file access. Older Windows systems used
domains, and newer systems ( XP, 2000 ), use active directories. User names must match
across the network for this system to be valid.
 A newer approach is the Lightweight Directory-Access Protocol, LDAP, which provides
a secure single sign-on for all users to access all resources on a network. This is a secure
system which is gaining in popularity, and which has the maintenance advantage of
combining authorization information in one central location.

6.9 Protection
 Files must be kept safe for reliability (against accidental damage ), and protection
( against deliberate malicious access. ) The former is usually managed with backup
copies. This section discusses the latter.
 One simple protection scheme is to remove all access to a file. However this makes the
file unusable, so some sort of controlled access must be arranged.

6.9.1 Types of Access


 The following low-level operations are often controlled:
o Read - View the contents of the file
o Write - Change the contents of the file.
o Execute - Load the file onto the CPU and follow the instructions contained
therein.
o Append - Add to the end of an existing file.
o Delete - Remove a file from the system.
o List -View the name and other attributes of files on the system.
 Higher-level operations, such as copy, can generally be performed through combinations
of the above.

6.9.2 Access Control


 One approach is to have complicated Access Control Lists, ACL, which specify exactly
what access is allowed or denied for specific users or groups.
o The AFS (Andrew File System) uses this system for distributed access.
o Control is very finely adjustable, but may be complicated, particularly when the
specific users involved are unknown. ( AFS allows some wild cards, so for
example all users on a certain remote system may be trusted, or a given username
may be trusted when accessing from any remote system. )
 UNIX uses a set of 9 access control bits, in three groups of three. These correspond to R,
W, and X permissions for each of the Owner, Group, and Others. ( See "man chmod" for
full details. ) The RWX bits control the following privileges for ordinary files and
directories:
bit Files Directories
Read ( view )
R Read directory contents. Required to get a listing of the directory.
file contents.
Write
W ( change ) file Change directory contents. Required to create or delete files.
contents.
X Execute file Access detailed directory information. Required to get a long listing,
contents as a or to access any specific file in the directory. Note that if a user has
program. X but not R permissions on a directory, they can still access specific
files, but only if they already know the name of the file they are
trying to access.
 In addition there are some special bits that can also be applied:
o The set user ID (SUID) bit and/or the set group ID (SGID) bits applied to
executable files temporarily change the identity of whoever runs the program to
match that of the owner / group of the executable program. This allows users
running specific programs to have access to files (while running that program )
to which they would normally be unable to access. Setting of these two bits is
usually restricted to root, and must be done with caution, as it introduces a
potential security leak.
o The sticky bit on a directory modifies write permission, allowing users to only
delete files for which they are the owner. This allows everyone to create files in
/tmp, for example, but to only delete files which they have created, and not
anyone else's.
o The SUID, SGID, and sticky bits are indicated with an S, S, and T in the positions
for execute permission for the user, group, and others, respectively. If the letter is
lower case, ( s, s, t ), then the corresponding execute permission is not also given.
If it is upper case, ( S, S, T ), then the corresponding execute permission IS given.
o The numeric form of chmod is needed to set these advanced bits.

Sample permissions in a UNIX system.


CHAPTER 7
I/O MANAGEMENT

7.1 Overview
 Management of I/O devices is a very important part of the operating system - so
important and so varied that entire I/O subsystems are devoted to its operation. (Consider
the range of devices on a modern computer, from mice, keyboards, disk drives, display
adapters, USB devices, network connections, audio I/O, printers, special devices for the
handicapped, and many special-purpose peripherals.)
 I/O Subsystems must contend with two (conflicting?) trends: (1) The gravitation towards
standard interfaces for a wide range of devices, making it easier to add newly developed
devices to existing systems, and (2) the development of entirely new types of devices, for
which the existing standard interfaces are not always easy to apply.
 Device drivers are modules that can be plugged into an OS to handle a particular device
or category of similar devices.

I/O Devices
External Devices that are used in I/O with computer systems can be roughly grouped into three
classes
 Human readable: Suitable for communicating with the computer user. Eg.Display,
Keyboard, and perhaps other devices such as mouse
 Machine Readable: Suitable for communication with electronic equipment Eg. disk
,tape drives, sensors, controllers, and actuators
 Communication: Suitable for communication with remote devices. Eg. Digital line
drivers and modems.

There are great differences across these classes as


 Data rate : There may be differences in data transfer rate
 Application: The use of the device on the software and policies in the operating system
and supporting utilities.
For eg. A disk used for files requires the support of file management.
o Complexity of control: A printer requires a relatively simple control interface.

A disk is much more complex.


 Unit of transfer : Data may transferred as a stream of bytes or characters(eg. terminal I/O)
or in larger blocks(e.g disk I/O)
 Data representation : Different data encoding schemes are used by different device,
including differences in character code and parity conventions
 Error conditions : The nature errors, the way in which they are reported, their
consequences, and the available range of responses differ widely from one device to
another

7.2 I/O Hardware


 I/O devices can be roughly categorized as storage, communications, user-interface, and
other
 Devices communicate with the computer via signals sent over wires or through the air.
 Devices connect with the computer via ports, e.g. a serial or parallel port.
 A common set of wires connecting multiple devices is termed a bus.
o Buses include rigid protocols for the types of messages that can be sent across the
bus and the procedures for resolving contention issues.
o Figure 7.1 below illustrates three of the four bus types commonly found in a
modern PC:
1. The PCI bus connects high-speed high-bandwidth devices to the memory
subsystem (and the CPU.)
2. The expansion bus connects slower low-bandwidth devices, which
typically deliver data one character at a time (with buffering.)
3. The SCSI bus connects a number of SCSI devices to a common SCSI
controller.
4. A daisy-chain bus, (not shown) is when a string of devices is connected to
each other like beads on a chain, and only one of the devices is directly
connected to the host.

Figure 7.1 - A typical PC bus structure.

 One way of communicating with devices is through registers associated with each port.
Registers may be one to four bytes in size, and may typically include ( a subset of ) the
following four:
1. The data-in register is read by the host to get input from the device.
2. The data-out register is written by the host to send output.
3. The status register has bits read by the host to ascertain the status of the device,
such as idle, ready for input, busy, error, transaction complete, etc.
4. The control register has bits written by the host to issue commands or to change
settings of the device such as parity checking, word length, or full- versus half-
duplex operation.
 Another technique for communicating with devices is memory-mapped I/O.

 In this case a certain portion of the processor's address space is mapped to the device,
and communications occur by reading and writing directly to/from those memory
areas.
 Memory-mapped I/O is suitable for devices which must move large quantities of data
quickly, such as graphics cards.
 Memory-mapped I/O can be used either instead of or more often in combination with
traditional registers. For example, graphics cards still use registers for control
information such as setting the video mode.
 A potential problem exists with memory-mapped I/O, if a process is allowed to write
directly to the address space used by a memory-mapped I/O device.

7.2.1 Polling
 One simple means of device handshaking involves polling:
1. The host repeatedly checks the busy bit on the device until it becomes clear.
2. The host writes a byte of data into the data-out register, and sets the write bit in
the command register ( in either order. )
3. The host sets the command ready bit in the command register to notify the device
of the pending command.
4. When the device controller sees the command-ready bit set, it first sets the busy
bit.
5. Then the device controller reads the command register, sees the write bit set,
reads the byte of data from the data-out register, and outputs the byte of data.
6. The device controller then clears the error bit in the status register, the command-
ready bit, and finally clears the busy bit, signaling the completion of the
operation.
 Polling can be very fast and efficient, if both the device and the controller are fast and if
there is significant data to transfer. It becomes inefficient, however, if the host must wait
a long time in the busy loop waiting for the device, or if frequent checks need to be made
for data that is infrequently there.

7.2.2 Interrupts
 Interrupts allow devices to notify the CPU when they have data to transfer or when an
operation is complete, allowing the CPU to perform other duties when no I/O transfers
need its immediate attention.
 The CPU has an interrupt-request line that is sensed after every instruction.
o A device's controller raises an interrupt by asserting a signal on the interrupt
request line.
o The CPU then performs a state save, and transfers control to the interrupt handler
routine at a fixed address in memory. ( The CPU catches the interrupt and
dispatches the interrupt handler. )
o The interrupt handler determines the cause of the interrupt, performs the
necessary processing, performs a state restore, and executes a return from
interrupt instruction to return control to the CPU. ( The interrupt handler clears
the interrupt by servicing the device. )
 ( Note that the state restored does not need to be the same state as the one
that was saved when the interrupt went off. See below for an example
involving time-slicing. )
 Figure 7.3 illustrates the interrupt-driven I/O procedure:

Figure 7.3 - Interrupt-driven I/O cycle.


 The above description is adequate for simple interrupt-driven I/O, but there are three
needs in modern computing which complicate the picture:
1. The need to defer interrupt handling during critical processing,
2. The need to determine which interrupt handler to invoke, without having to poll
all devices to see which one needs attention, and
3. The need for multi-level interrupts, so the system can differentiate between high-
and low-priority interrupts for proper response.
 These issues are handled in modern computer architectures with interrupt-controller
hardware.
o Most CPUs now have two interrupt-request lines: One that is non-maskable for
critical error conditions and one that is maskable, that the CPU can temporarily
ignore during critical processing.
o The interrupt mechanism accepts an address, which is usually one of a small set
of numbers for an offset into a table called the interrupt vector. This table
( usually located at physical address zero ? ) holds the addresses of routines
prepared to process specific interrupts.
o The number of possible interrupt handlers still exceeds the range of defined
interrupt numbers, so multiple handlers can be interrupt chained. Effectively the
addresses held in the interrupt vectors are the head pointers for linked-lists of
interrupt handlers.
o Figure 7.4 shows the Intel Pentium interrupt vector. Interrupts 0 to 31 are non-
maskable and reserved for serious hardware and other errors. Maskable interrupts,
including normal device I/O interrupts begin at interrupt 32.
o Modern interrupt hardware also supports interrupt priority levels, allowing
systems to mask off only lower-priority interrupts while servicing a high-priority
interrupt, or conversely to allow a high-priority signal to interrupt the processing
of a low-priority one.

Figure 7.4 - Intel Pentium processor event-vector table.

 At boot time the system determines which devices are present, and loads the appropriate
handler addresses into the interrupt table.
 During operation, devices signal errors or the completion of commands via interrupts.
 Exceptions, such as dividing by zero, invalid memory accesses, or attempts to access
kernel mode instructions can be signaled via interrupts.
 Time slicing and context switches can also be implemented using the interrupt
mechanism.
o The scheduler sets a hardware timer before transferring control over to a user
process.
o When the timer raises the interrupt request line, the CPU performs a state-save,
and transfers control over to the proper interrupt handler, which in turn runs the
scheduler.
o The scheduler does a state-restore of a different process before resetting the timer
and issuing the return-from-interrupt instruction.
 A similar example involves the paging system for virtual memory - A page fault causes
an interrupt, which in turn issues an I/O request and a context switch as described above,
moving the interrupted process into the wait queue and selecting a different process to
run. When the I/O request has completed ( i.e. when the requested page has been loaded
up into physical memory ), then the device interrupts, and the interrupt handler moves the
process from the wait queue into the ready queue, ( or depending on scheduling
algorithms and policies, may go ahead and context switch it back onto the CPU. )
 System calls are implemented via software interrupts, a.k.a. traps. When a ( library )
program needs work performed in kernel mode, it sets command information and
possibly data addresses in certain registers, and then raises a software interrupt. ( E.g. 21
hex in DOS. ) The system does a state save and then calls on the proper interrupt handler
to process the request in kernel mode. Software interrupts generally have low priority, as
they are not as urgent as devices with limited buffering space.
 Interrupts are also used to control kernel operations, and to schedule activities for optimal
performance. For example, the completion of a disk read operation involves two
interrupts:
o A high-priority interrupt acknowledges the device completion, and issues the next
disk request so that the hardware does not sit idle.
o A lower-priority interrupt transfers the data from the kernel memory space to the
user space, and then transfers the process from the waiting queue to the ready
queue.
 The Solaris OS uses a multi-threaded kernel and priority threads to assign different
threads to different interrupt handlers. This allows for the "simultaneous" handling of
multiple interrupts, and the assurance that high-priority interrupts will take precedence
over low-priority ones and over user processes.

7.2.3 Direct Memory Access


 For devices that transfer large quantities of data ( such as disk controllers ), it is wasteful
to tie up the CPU transferring data in and out of registers one byte at a time.
 Instead this work can be off-loaded to a special processor, known as the Direct Memory
Access, DMA, Controller.
 The host issues a command to the DMA controller, indicating the location where the data
is located, the location where the data is to be transferred to, and the number of bytes of
data to transfer. The DMA controller handles the data transfer, and then interrupts the
CPU when the transfer is complete.
 A simple DMA controller is a standard component in modern PCs, and many bus-
mastering I/O cards contain their own DMA hardware.
 Handshaking between DMA controllers and their devices is accomplished through two
wires called the DMA-request and DMA-acknowledge wires.
 While the DMA transfer is going on the CPU does not have access to the PCI bus (
including main memory ), but it does have access to its internal registers and primary and
secondary caches.
 DMA can be done in terms of either physical addresses or virtual addresses that are
mapped to physical addresses. The latter approach is known as Direct Virtual Memory
Access, DVMA, and allows direct data transfer from one memory-mapped device to
another without using the main memory chips.
 Direct DMA access by user processes can speed up operations, but is generally forbidden
by modern systems for security and protection reasons. ( I.e. DMA is a kernel-mode
operation. )
 Figure 7.5 below illustrates the DMA process.

Figure 7.5 - Steps in a DMA transfer.


7.3 Magnetic Disks
 Traditional magnetic disks have the following basic structure:
o One or more platters in the form of disks covered with magnetic media. Hard
disk platters are made of rigid metal, while "floppy" disks are made of more
flexible plastic.
o Each platter has two working surfaces. Older hard disk drives would sometimes
not use the very top or bottom surface of a stack of platters, as these surfaces were
more susceptible to potential damage.
o Each working surface is divided into a number of concentric rings called tracks.
The collection of all tracks that are the same distance from the edge of the platter,
( i.e. all tracks immediately above one another in the following diagram ) is called
a cylinder.
o Each track is further divided into sectors, traditionally containing 512 bytes of
data each, although some modern disks occasionally use larger sector sizes.
(Sectors also include a header and a trailer, including checksum information
among other things. Larger sector sizes reduce the fraction of the disk consumed
by headers and trailers, but increase internal fragmentation and the amount of disk
that must be marked bad in the case of errors.)
o The data on a hard drive is read by read-write heads. The standard configuration
(shown below) uses one head per surface, each on a separate arm, and controlled
by a common arm assembly which moves all heads simultaneously from one
cylinder to another. (Other configurations, including independent read-write
heads, may speed up disk access, but involve serious technical difficulties.)
o The storage capacity of a traditional disk drive is equal to the number of heads
(i.e. the number of working surfaces), times the number of tracks per surface,
times the number of sectors per track, times the number of bytes per sector. A
particular physical block of data is specified by providing the head-sector-cylinder
number at which it is located.

Figure - Moving-head disk mechanism.


 In operation the disk rotates at high speed, such as 7200 rpm ( 120 revolutions per
second. ) The rate at which data can be transferred from the disk to the computer is
composed of several steps:
o The positioning time, a.k.a. the seek time or random access time is the time
required to move the heads from one cylinder to another, and for the heads to
settle down after the move. This is typically the slowest step in the process and
the predominant bottleneck to overall transfer rates.
o The rotational latency is the amount of time required for the desired sector to
rotate around and come under the read-write head. This can range anywhere from
zero to one full revolution, and on the average will equal one-half revolution. This
is another physical step and is usually the second slowest step behind seek time. (
For a disk rotating at 7200 rpm, the average rotational latency would be 1/2
revolution / 120 revolutions per second, or just over 4 milliseconds, a long time
by computer standards.
o The transfer rate, which is the time required to move the data electronically from
the disk to the computer. ( Some authors may also use the term transfer rate to
refer to the overall transfer rate, including seek time and rotational latency as well
as the electronic data transfer rate. )
 Disk heads "fly" over the surface on a very thin cushion of air. If they should accidentally
contact the disk, then a head crash occurs, which may or may not permanently damage
the disk or even destroy it completely. For this reason it is normal to park the disk heads
when turning a computer off, which means to move the heads off the disk or to an area of
the disk where there is no data stored.
 Floppy disks are normally removable. Hard drives can also be removable, and some are
even hot-swappable, meaning they can be removed while the computer is running, and a
new hard drive inserted in their place.
 Disk drives are connected to the computer via a cable known as the I/O Bus. Some of the
common interface formats include Enhanced Integrated Drive Electronics, EIDE;
Advanced Technology Attachment, ATA; Serial ATA, SATA, Universal Serial Bus,
USB; Fiber Channel, FC, and Small Computer Systems Interface, SCSI.
 The host controller is at the computer end of the I/O bus, and the disk controller is built
into the disk itself. The CPU issues commands to the host controller via I/O ports. Data is
transferred between the magnetic surface and onboard cache by the disk controller, and
then the data is transferred from that cache to the host controller and the motherboard
memory at electronic speeds.

7.4 I/O Buffering


There is a speed mismatch between I/O devices and CPU. This leads to inefficiency in processes
being completed. To increase the efficiency, it may be convenient to perform input transfers in
advance of requests being made and to perform output transfers some time after the request is
made. This technique is known as buffering.
In discussing the various approaches to buffering, it is sometimes important to make a distinction
between two types of I/O devices:
 Block-oriented devices store information in blocks that are usually of fixed size and
transfers are made one block at a time. Eg.Disk,tape
 Stream-oriented devices transfer data in and out as a stream of bytes, with no block
structure. Eg. Terminals, printers, communications port, mouse.

7.4.1 Single Buffer


The simplest type of buffering is single buffering. When a user process issues and I/O request,
the operating system assigns a buffer in the system portion of main memory to the operation. For
block-oriented devices, the single buffering scheme can be described as follows: Input transfers
are made to the system buffer. When the transfer is complete, the process moves the block into
user space and immediately requests another block.

For stream oriented I/O, the single buffering scheme can be used in a line- at- a time Fashion or a
byte-at-a-time fashion. In line-at-a-time fashion user input and output to the terminal is one line
at a time. For eg. Scroll-mode terminals, line printer.

Suppose that T is required to input one block and that C is the computation time.
Without buffering, the execution time per block is essentially T+C. With a single buffer, the
time is max[C, T] +M, Where M is the time required to move the data from the system buffer to
user memory.

7.4.2 Double Buffer


An improvement over single buffering can be done by assigning two system buffers to operation.
A process now transfers data to one buffer while the operating system empties the other. This
technique is known as double buffering or buffer swapping.

For block-oriented transfer, we can roughly estimate the execution time as max[C,T].In both
cases(C<= T and C>T) an improvement over single buffering is achieved. Again, this
improvement comes at the cost of increased complexity. For stream-oriented input, there are two
alternative modes of operation. For line-at-a time I/O, the user process need not be suspended for
input or output, unless the process runs ahead of the double buffers. For byte-at-a time operation,
the double buffer offers no particular advantage over a single buffer of twice the length.

Figure: I/O Buffering schemes

7.4.3 Circular Buffering


Double buffering may be inadequate if the process performs rapid bursts of I/O. In this case, the
problem can be solved by using more than one buffer. When more than two buffers are used, the
collection of buffers is itself referred to as a circular buffer.

7.4.4 The utility of buffering


Buffering is a technique that smoothens out peaks in I/O demand. However, no amount of
buffering will allow an I/O device to keep pace with a process indefinitely when average demand
of the process is greater than the I/O device can service. Even with multiple buffers, all of the
buffers will eventually fill up and the process will have to wait after processing each chunk of
data.

7.5 Disk scheduling


In this section, we highlight some of the key issues of the performance of disk system.

7.5.1Disk performance parameters


The actual details of disk I/O operation depend on the computer system, operating system and
the nature of I/O channel and disk controller hardware. When the disk drive is operating, the disk
is rotating at constant speed. To read or write, the head must be positioned at the desired track
and at the beginning of desired sector on that track. Track can be selected using either moving-
head or fixed-head. On a movable-head system, the time it takes to position the head at the track
is known as seek time. In either case, once the track is selected, the disk controller waits until the
appropriate sector is reached. The time it takes to find the right sector is known as rotational
delay or rotational latency. The sum of seeking time and the rotational delay are access time (the
time taken to read or write).
In addition to these times, there are several queuing times associated with disk operation. When a
process issues an I/O request, it must first wait in a queue for the device to be available. At that
time, the device is assigned to the process. If the device shares a single I/O channel or a set of
I/O channels with other disk drives, then there may be an additional wait for the channel to be
available.

Seek time
Seek time is the time required to move the disk arm to the required track. The seek time consists
of two key components: the initial startup time, and the time taken to traverse the tracks. The
traversal time is not a linear function of the number of tracks but includes a startup time and a
settling time.

Rotational delay
Magnetic disks, other than floppy disks, have rotational speeds in the range 400 to 10,000 rpm.
Floppy disks typically rotate at between 300 to 600rpm. Thus the average delay will be between
100 and 200 ms.

Transfer Time
The transfer time to or from the disk depends on the rotation speed of the disk in the following
fashion:
T=b/rN
Where
T=transfer time
B= number bytes to be transferred
N = number of bytes on a track
R= rotation speed, in revolutions per second
Thus the total average access time can be expressed as Ta=Ts+1/2r+b/rN
Where Ts is the average seek time

7.5.2 The following are the disk scheduling algorithms:

 First Come-First Serve (FCFS)


 Shortest Seek Time First (SSTF)
 Elevator (SCAN)
 Circular SCAN (C-SCAN)
 LOOK
 C-LOOK

These algorithms are not hard to understand, but they can confuse someone because they are
so similar. What we are striving for by using these algorithms is keeping Head Movements
(no. of tracks) to the least amount as possible. The less the head has to move the faster the
seek time will be. I will show you and explain to you why C-LOOK is the best algorithm to
use in trying to establish less seek time.
Given the following queue -- 95, 180, 34, 119, 11, 123, 62, 64 with the Read-write head initially
at the track 50 and the tail track being at 199 let us now discuss the different algorithms.
1. First Come -First Serve (FCFS)

All incoming requests are placed at the end of the queue. Whatever number that is next in the
queue will be the next number served. Using this algorithm doesn't provide the best results. To
determine the number of head movements you would simply find the number of tracks it took to
move from one request to the next. For this case it went from 50 to 95 to 180 and so on. From 50
to 95 it moved 45 tracks. If you tally up the total number of tracks you will find how many tracks
it had to go through before finishing the entire request. In this example, it had a total head
movement of 640 tracks. The disadvantage of this algorithm is noted by the oscillation from track
50 to track 180 and then back to track 11 to 123 then to 64. As you will soon see, this is the
worse algorithm that one can use.

2. Shortest Seek Time First (SSTF)


In this case request is serviced according to next shortest distance. Starting at 50, the next
shortest distance would be 62 instead of 34 since it is only 12 tracks away from 62 and 16 tracks
away from 34. The process would continue until all the process are taken care of. For example
the next case would be to move from 62 to 64 instead of 34 since there are only 2 tracks between
them and not 18 if it were to go the other way. Although this seems to be a better service being
that it moved a total of 236 tracks, this is not an optimal one. There is a great chance that
starvation would take place. The reason for this is if there were a lot of requests close to
eachother the other requests will never be handled since the distance will always be greater.

3. Elevator (SCAN)

This approach works like an elevator does. It scans down towards the nearest end and then when
it hits the bottom it scans up servicing the requests that it didn't get going down. If a request
comes in after it has been scanned it will not be serviced until the process comes back down or
moves back up. This process moved a total of 230 tracks. Once again this is more optimal than
the previous algorithm, but it is not the best.

4. Circular Scan (C-SCAN)

Circular scanning works just like the elevator to some extent. It begins its scan toward the
nearest end and works it way all the way to the end of the system. Once it hits the bottom or top
it jumps to the other end and moves in the same direction. Keep in mind that the huge jump
doesn't count as a head movement. The total head movement for this algorithm is only 187 track,
but still this isn't the most sufficient.
5. C-LOOK
This is just an enhanced version of C-SCAN. In this the scanning doesn't go past the last
request in the direction that it is moving. It too jumps to the other end but not all the way to
the end. Just to the furthest request. C-SCAN had a total movement of 187 but this scan (C-
LOOK) reduced it down to 157 tracks.

7.6 Record Blocking:


• Records are the logical unit of access of a structured file
• Blocks are the unit for I/O with secondary storage
• For I/O to be performed, records must be organized as blocks.
• Three methods of blocking are common
– Fixed length blocking
– Variable length spanned blocking
– Variable-length unspanned blocking

Fixed Blocking
• Fixed-length records are used, and an integral number of records are stored in a block
• Unused space at the end of a block is internal fragmentation
• Common for sequential files with fixed-length records

Variable Length Spanned Blocking


• Variable-length records are used and are packed into blocks with no unused space
• Some records may span multiple blocks
– Continuation is indicated by a pointer to the successor block
•  Efficient for storage and does not limit the size of records
•  Difficult to implement
•  Records that span two blocks require two I/O operations

Variable-length unspanned blocking


• Uses variable length records without spanning
•  Wasted space in most blocks because of the inability to use the remainder of a block if
the next record is larger than the remaining unused space
•  Limits record size to the size of a block
CHAPTER 8
CASE STUDY

8.1 What is file system?


 Any computer file is stored on some kind of storage with a given capacity.
 Actually, each storage is a linear space to read or both read and write digital information.
 Each byte of information on the storage has its own offset from the storage start (address)
and is referenced by this address.
 A storage can be presented as a grid with a set of numbered cells (each cell – single byte).
Any file saved to the storage takes a number of these cells.
 Generally, computer storages use a pair of sector and in-sector offset to reference any
byte of information on the storage.
 The sector is a group of bytes (usually 512 bytes) that is a minimum addressable unit of
the physical storage. For example, byte 1030 on a hard disk will be referenced as
sector #3 and offset in sector 6 bytes ([sector]+[sector]+[6 bytes]).
 This scheme is applied to optimize storage addressing and use a smaller number to
reference any portion of information on the storage.
 As a whole, file system is a structured data representation and a set of metadata that
describe the stored data.
 File system can not only serve for the purposes of the whole storage but also be a part of
an isolated storage segment – disk partition.
 Usually the file system operates blocks, not sectors.
 File system blocks are groups of sectors that optimize storage addressing.
 Modern file systems generally use block sizes from 1 up to 128 sectors (512-65536
bytes).
 Files are usually stored from the start of a block and take entire blocks.

8.2 Windows file systems


Microsoft Windows OS use two major file systems: FAT, inherited from old DOS with its later
extension FAT32, and widely-used NTFS file systems. Recently released ReFS file system was
developed by Microsoft as a new generation file system for Windows 8 Servers.

FAT (File Allocation Table):


 FAT file system is one of the simplest types of file systems.
 It consists of file system descriptor sector (boot sector or superblock), file system block
allocation table (referenced as File Allocation Table) and plain storage space to store
files and folders.
 Files on FAT are stored in directories. Each directory is an array of 32-byte records, each
defines file or file extended attributes (e.g. long file name).
 File record references the first block of file. Any next block can be found through block
allocation table by using it as linked-list.
 Block allocation table contains an array of block descriptors. Zero value indicates that the
block is not used and non-zero – reference to the next block of the file or special value for
file end.
 The number in FAT12, FAT16, and FAT32 stands for the number of bits used to
enumerate file system block. This means that FAT12 may use up to 4096 different block
references, FAT16- 65536 and FAT32 - 4294967296. Actual maximum count of blocks
is even less and depends on implementation of file system driver.
 FAT12 was used for old floppy disks. FAT16 (or simply FAT) and FAT32 are widely
used for flash memory cards and USB flash sticks. It is supported by mobile phones,
digital cameras and other portable devices.
 FAT or FAT32 is a file system, used on Windows-compatible external storages or disk
partitions with size below 2GB (for FAT) or 32GB (for FAT32). Windows cannot create
FAT32 file system over 32GB (however Linux supports FAT32 up to 2TB).

NTFS (New Technology File System):


 NTFS was introduced in Windows NT and at present is major file system for Windows.
 This is a default file system for disk partitions and the only file system that is supported
for disk partitions over 32GB.
 The file system is quite extensible and supports many file properties, including access
control, encryption etc.
 Each file on NTFS is stored as file descriptor in Master File Table and file content.
 Master file table contains all information about the file: size, allocation, name etc. The
first and the last sectors of the file system contain file system settings (boot record
or superblock).
 This file system uses 48 and 64 bit values to reference files, thus supporting quite large
disk storages.
ReFS (Resilient File System):
 ReFS is the latest development of Microsoft presently available for Windows 8 Servers.
 File system architecture absolutely differs from other Windows file systems and is mainly
organized in form of B+-tree.
 ReFS has high tolerance to failures achieved due to new features included into the
system.
 And, namely, Copy-on-Write (CoW): no metadata is modified without being copied; no
data is written over the existing ones and rather into a new disk space.
 With any file modifications a new copy of metadata is created into any free storage space,
and then the system creates a link from older metadata to the newer ones.
 As a result a system stores significant quantity of older backups in different places which
provides for easy file recovery unless this storage space is overwritten.

8.3 Linux File System


Ext2
 Ext2 stands for second extended file system.
 It was introduced in 1993. Developed by Rémy Card.
 This was developed to overcome the limitation of the original ext file system.
 Ext2 does not have journaling feature.
 On flash drives, usb drives, ext2 is recommended, as it doesn‘t need to do the over head
of journaling.
 Maximum individual file size can be from 16 GB to 2 TB
 Overall ext2 file system size can be from 2 TB to 32 TB

Ext3
 Ext3 stands for third extended file system.
 It was introduced in 2001. Developed by Stephen Tweedie.
 Starting from Linux Kernel 2.4.15 ext3 was available.
 The main benefit of ext3 is that it allows journaling.
 Journaling has a dedicated area in the file system, where all the changes are tracked.
When the system crashes, the possibility of file system corruption is less because of
journaling.
 Maximum individual file size can be from 16 GB to 2 TB
 Overall ext3 file system size can be from 2 TB to 32 TB
 There are three types of journaling available in ext3 file system.
 Journal – Metadata and content are saved in the journal.
 Ordered – Only metadata is saved in the journal. Metadata are journaled only after
writing the content to disk. This is the default.
 Writeback – Only metadata is saved in the journal. Metadata might be journaled
either before or after the content is written to the disk.
 You can convert a ext2 file system to ext3 file system directly (without backup/restore).

Ext4
 Ext4 stands for fourth extended file system.
 It was introduced in 2008.
 Starting from Linux Kernel 2.6.19 ext4 was available.
 Supports huge individual file size and overall file system size.
 Maximum individual file size can be from 16 GB to 16 TB
 Overall maximum ext4 file system size is 1 EB (exabyte). 1 EB = 1024 PB (petabyte). 1
PB = 1024 TB (terabyte).
 Directory can contain a maximum of 64,000 subdirectories (as opposed to 32,000 in ext3)
 You can also mount an existing ext3 fs as ext4 fs (without having to upgrade it).
 Several other new features are introduced in ext4: multiblock allocation, delayed
allocation, journal checksum. fast fsck, etc. All you need to know is that these new
features have improved the performance and reliability of the filesystem when compared
to ext3.
 In ext4, you also have the option of turning the journaling feature ―off‖.

Reiserfs file system

 ReiserFS is a general-purpose, journaled computer file system designed and implemented


by a team at Namesys led by Hans Reiser.
 ReiserFS is currently supported on Linux (without quota support).
 Introduced in version 2.4.1 of the Linux kernel, it was the first journaling file system to
be included in the standard kernel.

ReiserFS offered features that had not been available in existing Linux file systems:
 Metadata-only journaling (also block journaling, since Linux 2.6.8), its most-publicized
advantage over what was the stock Linux file system at the time, ext2.
 Online resizing (growth only), with or without an underlying volume manager such
as LVM. Since then, Namesys has also provided tools to resize (both grow and shrink)
ReiserFS file systems offline.
 Tail packing, a scheme to reduce internal fragmentation. Tail packing, however, can
have a significant performance impact. Reiser4 may have improved this by packing tails
where it does not hurt performance
Btrfs
 Btrfs is a new copy on write (CoW) filesystem for Linux aimed at implementing
advanced features while focusing on fault tolerance, repair and easy administration.
 It addresses concerns regarding huge storage backend volumes, multi-device spanning,
snapshotting and more.
 Although its primary target was enterprise usage, it also offers interesting features to
home users such as online grow/shrink (both on file system as well as underlying storage
level), object-level redundancy, transparent compression and cloning.

Xfs
 The xfs file system is an enterprise-ready, high performance journaling file system. It
offers very high parallel throughput and is therefore a common choice amongst
enterprises.

Zfs
 The zfs file system (ZFSonLinux) is a multi-featured file system offering block-level
checksumming, compression, snapshotting, copy-on-write, deduplication, extremely
large volumes, remote replication and more.
 It has been recently ported from (Open) Solaris to Linux and is gaining ground.

Inodes In Unix

In a standard Unix file system, files are made up of two different types of objects. Every file has
an index node (inode for short) associated with it that contains the metadata about that file:
permissions, ownerships, timestamps, etc. The contents of the file are stored in a collection of
data blocks. At this point in the discussion, a lot of people just wave their hands and say
something like, "And there are pointers in the inode that link to the data blocks."
As it turns out, there are only fifteen block pointers in the inode. Assuming standard 4K data
blocks, that means that the largest possible file that could be addressed directly would be 60K--
obviously not nearly large enough. In fact, only the first 12 block pointers in the inode are
reserved for direct block pointers. This means you can address files of up to 48K just using the
direct pointers in the inode.
Beyond that, you start getting into indirect blocks:
 The thirteenth pointer is the indirect block pointer. Once the file grows beyond 48K, the
file system grabs a data block and starts using it to store additional block pointers, setting
the thirteenth block pointer in the inode to the address of this block. Block pointers are 4-
byte quantities, so the indirect block can store 1024 of them. That means that the total file
size that can be addressed via the indirect block is 4MB (plus the 48K of storage
addressed by the direct blocks in the inode).
 Once the file size grows beyond 4MB + 48KB, the file system starts using doubly
indirect blocks. The fourteenth block pointer points to a data block that contains the
addresses of other indirect blocks, which in turn contain the addresses of the actual data
blocks that make up the file's contents. That means we have up to 1024 indirect blocks
that in turn point to up to 1024 data blocks-- in other words up to 1M total 4K blocks, or
up to 4GB of storage.
 At this point, you've probably figured out that the fifteenth inode pointer is the trebly
indirect block pointer. With three levels of indirect blocks, you can address up to 4TB
(+4GB from the doubly indirect pointer, +4M from the indirect block pointer, +48K from
the direct block pointers) for a single file.
Here's a picture to help you visualize what I'm talking about here:

UNIX PROCESS STATE TRANSITION DIAGRAM


8.4 ANDROID OPERATING SYSTEM

 Android is an open source and Linux-based operating system for mobile devices such as
Smartphone and tablet computers.
 Android was developed by the Open Handset Alliance, led by Google, and other
companies.
 Android offers a unified approach to application development for mobile devices which
means developers need to only develop for Android, and their applications should be able
to run on different devices powered by Android.
 The first beta version of the Android Software Development Kit (SDK) was released by
Google in 2007 where as the first commercial version, Android 1.0, was released in
September 2008.
 On June 27, 2012, at the Google I/O conference, Google announced the next Android
version, 4.1 Jelly Bean. Jelly Bean is an incremental update, with the primary aim of
improving the user interface, both in terms of functionality and performance.
 The source code for Android is available under free and open source software licenses.
Google publishes most of the code under the Apache License version 2.0 and the rest,
Linux kernel changes, under the GNU General Public License version 2.

8.4.1 Features of Android


Android is a powerful operating system competing with Apple 4GS and supports great features.
Few of them are listed below:
Feature Description
Beautiful UI Android OS basic screen provides a beautiful and intuitive user interface.
GSM/EDGE, IDEN, CDMA, EV-DO, UMTS, Bluetooth, Wi-Fi, LTE, NFC
Connectivity
and WiMAX.
Storage SQLite, a lightweight relational database, is used for data storage purposes.
H.263, H.264, MPEG-4 SP, AMR, AMR-WB, AAC, HE-AAC, AAC 5.1,
Media support
MP3, MIDI, Ogg Vorbis, WAV, JPEG, PNG, GIF, and BMP
Messaging SMS and MMS
Based on the open-source WebKit layout engine, coupled with Chrome's V8
Web browser
JavaScript engine supporting HTML5 and CSS3.
Android has native support for multi-touch which was initially made
Multi-touch
available in handsets such as the HTC Hero.
User can jump from one task to another and same time various application
Multi-tasking
can run simultaneously.
Widgets are resizable, so users can expand them to show more content or
Resizable widgets
shrink them to save space
Multi-Language Supports single direction and bi-directional text.
Google Cloud Messaging (GCM) is a service that lets developers send short
GCM message data to their users on Android devices, without needing a
proprietary sync solution.
A technology that lets apps discover and pair directly, over a high-
Wi-Fi Direct
bandwidth peer-to-peer connection.
A popular NFC-based technology that lets users instantly share, just by
Android Beam
touching two NFC-enabled phones together.

8.4.2 Android Applications


 Android applications are usually developed in the Java language using the Android
Software Development Kit.
 Once developed, Android applications can be packaged easily and sold out either through
a store such as Google Play or the Amazon Appstore.

8.4.3 Android Architecture

Android operating system is a stack of software components which is roughly divided into five
sections and four main layers as shown below in the architecture diagram.
Linux kernel
 At the bottom of the layers is Linux kernel - Linux 2.6 with approximately 115 patches.
 This provides basic system functionality like process management, memory management,
device management like camera, keypad, display etc.
 Also, the kernel handles all the things that Linux is really good at such as networking and
a vast array of device drivers, which take the pain out of interfacing to peripheral
hardware.
Libraries
 On top of Linux kernel there is a set of libraries including open-source Web browser
engine WebKit, well known library libc, SQLite database which is a useful repository for
storage and sharing of application data, libraries to play and record audio and video, SSL
libraries responsible for Internet security etc.
Android Runtime
 This is the third section of the architecture and available on the second layer from the
bottom. This section provides a key component called Dalvik Virtual Machine which is
a kind of Java Virtual Machine specially designed and optimized for Android.
 The Dalvik VM makes use of Linux core features like memory management and multi-
threading, which is intrinsic in the Java language. The Dalvik VM enables every Android
application to run in its own process, with its own instance of the Dalvik virtual machine.
 The Android runtime also provides a set of core libraries which enable Android
application developers to write Android applications using standard Java programming
language.
Application Framework
 The Application Framework layer provides many higher-level services to applications in
the form of Java classes. Application developers are allowed to make use of these
services in their applications.
Applications
 You will find all the Android application at the top layer. You will write your application
to be installed on this layer only. Examples of such applications are Contacts Books,
Browser, and Games etc.

8.4.4 Android File System Structure


 Most of the Android users are using their Android phone just for calls, SMS, browsing
and basic apps.
 But from the development prospective, we should know about Android internal structure.
 Android uses several partitions (like boot, system, recovery, data etc) to organize files
and folders on the device just like Windows OS.
 Each of these partitions has its own functionality.
 There are mainly 6 partitions in Android phones, tablets and other Android devices.
 Below is the list of partition for Android File System. Note that there might be some
other partitions available; it differs from Model to Model. but logically below 6 partitions
can be found in any Android devices.
 /boot
 /system
 /recovery
 /data
 /cache
 /misc

 Also below are the SD Card Fie System Partitions.


 /sdcard
 /sd-ext

/boot
 This is the boot partition of your Android device, as the name suggests. It includes the
Android kernel and the RAM disk. The device will not boot without this partition.
 Wiping this partition from recovery should only be done if absolutely required and once
done, the device must NOT be rebooted before installing a new one, which can be done
by installing a ROM that includes a /boot partition.

/system
 As the name suggests, this partition contains the entire Android OS, other than the kernel
and the RAM disk.
 This includes the Android GUI and all the system applications that come pre-installed on
the device.
 Wiping this partition will remove Android from the device without rendering it
unbootable, and you will still be able to put the phone into recovery or bootloader mode
to install a new ROM.
/recovery
 This is specially designed for backup.
 The recovery partition can be considered as an alternative boot partition, that lets the
device boot into a recovery console for performing advanced recovery and maintenance
operations on it.

/data
 Again as the name suggest, it is called user data partition.
 This partition contains the user‘s data like your contacts, sms, settings and all android
applications that you have installed.
 While you are doing factory reset on your device, this partition will wipe out, then your
device will be in the state, when you use for the first time, or the way it was after the last
official or custom ROM installation.

/cache
 This is the partition where Android stores frequently accessed data and app components.
 Wiping the cache doesn‘t affect your personal data but simply gets rid of the existing data
there, which gets automatically rebuilt as you continue using the device.
/misc
 This partition contains miscellaneous system settings in form of on/off switches.
 These settings may include CID (Carrier or Region ID), USB configuration and certain
hardware settings etc.
 This is an important partition and if it is corrupt or missing, several of the device‘s
features will not function normally.

/sdcard
 This is not a partition on the internal memory of the device but rather the SD card.
 In terms of usage, this is your storage space to use as you see fit, to store your media,
documents, ROMs etc. on it.
 Wiping it is perfectly safe as long as you backup all the data you require from it, to your
computer first. Though several user-installed apps save their data and settings on the SD
card and wiping this partition will make you lose all that data.
 On devices with both an internal and an external SD card – devices like the Samsung
Galaxy S and several tablets – the /sdcard partition is always used to refer to the internal
SD card.
 For the external SD card – if present – an alternative partition is used, which differs from
device to device.
 In case of Samsung Galaxy S series devices, it is /sdcard/sd while in many other devices,
it is /sdcard2.
 Unlike /sdcard, no system or app data whatsoever is stored automatically on this external
SD card and everything present on it has been added there by the user.
 You can safely wipe it after backing up any data from it that you need to save.

/sd-ext
 This is not a standard Android partition, but has become popular in the custom ROM
scene.
 It is basically an additional partition on your SD card that acts as the /data partition when
used with certain ROMs that have special features called APP2SD+ or data2ext enabled.
 It is especially useful on devices with little internal memory allotted to the /data partition.
 Thus, users who want to install more programs than the internal memory allows can
make this partition and use it with a custom ROM that supports this feature, to get
additional storage for installing their apps.
 Wiping this partition is essentially the same as wiping the /data partition – you lose your
contacts, SMS, market apps and settings.
 Now when you install a new binary, you can know what you‘re going to lose, make sure
to backup your data before flash new binary in your android device.

8.5 RAID Levels


In 1987, Patterson, Gibson and Katz at the University of California Berkeley, published a paper
entitled ― A Case for Redundant Array of Inexpensive Disks(RAID) ‖ which described the
various types of Disk Arrays, referred to as the acronym RAID.
The basic idea of RAID was to combine multiple, small inexpensive disks drive into an array of
disk drives which yields performance exceeding that of a Single, Large Expensive Drive(SLED).
Additionally this array of drives appears to the computer as a single logical storage unit or drive.

 Data are distributed across the array of disk drives


 Redundant disk capacity is used to store parity information, which guarantees data
recoverability in case of a disk failure.
 Levels decided according to schemes to provide redundancy at lower cost by using
striping and ―parity‖ bits.
 Different cost-performance trade-offs.

Mirroring provides reliability but is expensive; Striping improves performance, but does not
improve reliability. Accordingly there are a number of different schemes that combine the
principals of mirroring and striping in different ways, in order to balance reliability versus
performance versus cost. These are described by different RAID levels, as follows: (In the
diagram that follows, "C" indicates a copy, and "P" indicates parity, i.e. checksum bits.)

o Raid Level 0 - This level includes striping only, with no mirroring.


o Raid Level 1 - This level includes mirroring only, no striping.
o Raid Level 2 - This level stores error-correcting codes on additional disks,
allowing for any damaged data to be reconstructed by subtraction from the
remaining undamaged data. Note that this scheme requires only three extra disks
to protect 4 disks worth of data, as opposed to full mirroring. (The number of
disks required is a function of the error-correcting algorithms, and the means by
which the particular bad bit(s) is(are) identified.)
o Raid Level 3 - This level is similar to level 2, except that it takes advantage of the
fact that each disk is still doing its own error-detection, so that when an error
occurs, there is no question about which disk in the array has the bad data. As a
result a single parity bit is all that is needed to recover the lost data from an array
of disks. Level 3 also includes striping, which improves performance. The
downside with the parity approach is that every disk must take part in every disk
access, and the parity bits must be constantly calculated and checked, reducing
performance. Hardware-level parity calculations and NVRAM cache can help
with both of those issues. In practice level 3 is greatly preferred over level 2.
o Raid Level 4 - This level is similar to level 3, employing block-level striping
instead of bit-level striping. The benefits are that multiple blocks can be read
independently, and changes to a block only require writing two blocks ( data and
parity ) rather than involving all disks. Note that new disks can be added
seamlessly to the system provided they are initialized to all zeros, as this does not
affect the parity results.
o Raid Level 5 - This level is similar to level 4, except the parity blocks are
distributed over all disks, thereby more evenly balancing the load on the system.
For any given block on the disk(s), one of the disks will hold the parity
information for that block and the other N-1 disks will hold the data. Note that the
same disk cannot hold both data and parity for the same block, as both would be
lost in the event of a disk crash.
o Raid Level 6 - This level extends raid level 5 by storing multiple bits of error-
recovery codes, ( such as the Reed-Solomon codes ), for each bit position of data,
rather than a single parity bit. In the example shown below 2 bits of ECC are
stored for every 4 bits of data, allowing data recovery in the face of up to two
simultaneous disk failures. Note that this still involves only 50% increase in
storage needs, as opposed to 100% for simple mirroring which could only tolerate
a single disk failure.

Potrebbero piacerti anche