Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
SYSTEMS
SEMESTSER – V
INFORMATION TECHNOLOGY
NILESH M. PATIL
CHAPTER 1
OVERVIEW OF OPERATING SYSTEM
1.1 Introduction
An operating System (OS) is an intermediary between users and computer hardware. It provides
users an environment in which a user can execute programs conveniently and efficiently.
In technical terms, it is software which manages hardware. An operating System controls the
allocation of resources and services such as memory, processors, devices and information.
An operating system is a program that acts as an interface between the user and the computer
hardware and controls the execution of all kinds of programs.
1. Hardware provides basic computing resources like CPU, memory, I/O devices.
2. Operating system controls and coordinates the use of the hardware among the various
application programs for the various users.
3. Applications programs define the ways in which the system resources are used to solve
the computing problems of the users. These can be compilers, database systems, video
games, business programs.
4. Users are the one who interact with the OS. They can be people, machines, other
computers.
Memory Management
Processor Management
Device Management
File Management
Security
Control over system performance
Job accounting
Error detecting aids
Coordination between other software and users
Memory Management
Processor Management
In multiprogramming environment, OS decides which process gets the processor when and how
much time. This function is called process scheduling. Operating System does the following
activities for processor management.
Keeps tracks of processor and status of process. Program responsible for this task is
known as traffic controller.
Allocates the processor (CPU) to a process.
De-allocates processor when processor is no longer required.
Device Management
OS manages device communication via their respective drivers. Operating System does the
following activities for device management.
Keeps tracks of all devices. Program responsible for this task is known as the I/O
controller.
Decides which process gets the device when and for how much time.
Allocates the device in the efficient way.
De-allocates devices.
File Management
A file system is normally organized into directories for easy navigation and usage. These
directories may contain files and other directions. Operating System does the following activities
for file management.
Keeps track of information, location, uses, status etc. The collective facilities are often
known as file system.
Decides who gets the resources.
Allocates the resources.
De-allocates the resources.
Operating systems are there from the very first computer generation. Operating systems keep
evolving over the period of time. Following are few of the important types of operating system
which are most commonly used.
Program execution
Operating system handles many kinds of activities from user programs to system
programs like printer spooler, name servers, file server etc. Each of these activities is
encapsulated as a process.
A process includes the complete execution context (code to execute, data to manipulate,
registers, OS resources in use).
Following are the major activities of an operating system with respect to program
management.
Loads a program into memory.
Executes the program.
Handles program's execution.
Provides a mechanism for process synchronization.
Provides a mechanism for process communication.
Provides a mechanism for deadlock handling.
I/O Operation
I/O subsystem comprised of I/O devices and their corresponding driver software.
Drivers hides the peculiarities of specific hardware devices from the user as the device
driver knows the peculiarities of the specific device.
Operating System manages the communication between user and device drivers.
Following are the major activities of an operating system with respect to I/O Operation.
I/O operation means read or write operation with any file or any specific I/O device.
Program may require any I/O device while running.
Operating system provides the access to the required I/O device when required.
Communication
In case of distributed systems which are a collection of processors that do not share
memory, peripheral devices, or a clock, operating system manages communications
between processes.
Multiple processes with one another through communication lines in the network.
OS handles routing and connection strategies, and the problems of contention and
security.
Following are the major activities of an operating system with respect to communication.
Two processes often require data to be transferred between them.
Both processes can be on the one computer or on different computer but are
connected through computer network.
Communication may be implemented by two methods either by Shared Memory or
by Message Passing.
Error handling
Error can occur anytime and anywhere.
Error may occur in CPU, in I/O devices or in the memory hardware.
Following are the major activities of an operating system with respect to error handling.
OS constantly remains aware of possible errors.
OS takes the appropriate action to ensure correct and consistent computing.
Resource Management
In case of multi-user or multi-tasking environment, resources such as main memory, CPU
cycles and files storage are to be allocated to each user or job.
Following are the major activities of an operating system with respect to resource
management.
OS manages all kind of resources using schedulers.
CPU scheduling algorithms are used for better utilization of CPU.
Protection
Considering computer systems having multiple users the concurrent execution of multiple
processes, then the various processes must be protected from each another's activities.
Protection refers to mechanism or a way to control the access of programs, processes, or
users to the resources defined by computer systems.
Following are the major activities of an operating system with respect to protection.
OS ensures that all access to system resources is controlled.
OS ensures that external I/O devices are protected from invalid access attempts.
OS provides authentication feature for each user by means of a password.
1. Batch processing
Batch processing is a technique in which Operating System collects one programs and data
together in a batch before processing starts. Operating system does the following activities
related to batch processing.
OS defines a job which has predefined sequence of commands, programs and data as a
single unit.
OS keeps a number of jobs in memory and executes them without any manual
information.
Jobs are processed in the order of submission i.e. first come first served fashion.
When job completes its execution, its memory is released and the output for the job gets
copied into an output spool for later printing or processing.
Advantages
Batch processing takes much of the work of the operator to the computer.
Increased performance as a new job gets started as soon as the previous job finished
without any manual intervention.
Disadvantages
Difficult to debug program.
A job could enter an infinite loop.
Due to lack of protection scheme, one batch job can affect pending jobs.
2. Multitasking
Multitasking refers to term where multiple jobs are executed by the CPU simultaneously by
switching between them. Switches occur so frequently that the users may interact with each
program while it is running. Operating system does the following activities related to
multitasking.
The user gives instructions to the operating system or to a program directly, and receives
an immediate response.
Operating System handles multitasking in the way that it can handle multiple operations /
executes multiple programs at a time.
Multitasking Operating Systems are also known as Time-sharing systems.
These Operating Systems were developed to provide interactive use of a computer system
at a reasonable cost.
A time-shared operating system uses concept of CPU scheduling and multiprogramming
to provide each user with a small portion of a time-shared CPU.
Each user has at least one separate program in memory.
A program that is loaded into memory and is executing is commonly referred to as a
process.
When a process executes, it typically executes for only a very short time before it either
finishes or needs to perform I/O.
Since interactive I/O typically runs at people speeds, it may take a long time to complete.
During this time a CPU can be utilized by another process.
Operating system allows the users to share the computer simultaneously. Since each
action or command in a time-shared system tends to be short, only a little CPU time is
needed for each user.
As the system switches CPU rapidly from one user/program to the next, each user is
given the impression that he/she has his/her own CPU, whereas actually one CPU is
being shared among many users.
Preemptive multitasking means that task switches can be initiated directly out of interrupt
handlers. With cooperative (non-preemptive) multitasking, a task switch is only performed
when a task calls the kernel, i.e., it behaves "cooperatively" and voluntarily gives the kernel a
chance to perform a task switch.
Example:
A receive interrupt handler for a serial port writes data to a mailbox. If a task is waiting at the
mailbox, it is immediately activated by the scheduler during preemptive scheduling. In
cooperative scheduling, however, the task is only brought into the state "Ready". A task
switch does not immediately take place; after the interrupt handler has completed, the task
having been interrupted continues to run. Such a "pending" task switch is performed by the
kernel at some later time, as soon as the active task calls the kernel.
RTKernel-32 supports both cooperative and preemptive scheduling.
3. Multiprogramming
When two or more programs are residing in memory at the same time, then sharing the
processor is referred to the multiprogramming. Multiprogramming assumes a single shared
processor. Multiprogramming increases CPU utilization by organizing jobs so that the CPU
always has one to execute.
Following figure shows the memory layout for a multiprogramming system.
Advantages
High and efficient CPU utilization.
User feels that many programs are allotted CPU almost simultaneously.
Disadvantages
CPU scheduling is required.
To accommodate many jobs in memory, memory management is required.
4. Multiprocessing System
Multi-processing refers to the ability of a system to support more than one processor at
the same time.
Applications in a multi-processing system are broken to smaller routines that run
independently.
The operating system allocates these threads to the processors improving performance of
the system.
In a symmetric multi-processing, a single OS instance controls two or more identical
processors connected to a single shared main memory. Most of the multi-processing PC
motherboards utilize symmetric multiprocessing.
On the other hand, asymmetric multi-processing designates system tasks to be
performed by some processors and applications on others. This is generally not as
efficient as symmetric processing due to the fact that under certain conditions a single
processor might be completely engaged while the other is idle.
5. Interactivity
Interactivity refers that a User is capable to interact with computer system. Operating system
does the following activities related to interactivity.
OS provides user an interface to interact with system.
OS managers input devices to take inputs from the user. For example, keyboard.
OS manages output devices to show outputs to the user. For example, Monitor.
OS Response time needs to be short since the user submits and waits for the result.
7. Distributed Environment
Distributed environment refers to multiple independent CPUs or processors in a computer
system. Operating system does the following activities related to distributed environment.
OS Distributes computation logics among several physical processors.
The processors do not share memory or a clock.
Instead, each processor has its own local memory.
OS manages the communications between the processors. They communicate with each
other through various communication lines.
8. Spooling
Spooling is an acronym for Simultaneous Peripheral Operations on Line. Spooling refers to
putting data of various I/O jobs in a buffer. This buffer is a special area in memory or hard
disk which is accessible to I/O devices. Operating system does the following activities related
to distributed environment.
OS handles I/O device data spooling as devices have different data access rates.
OS maintains the spooling buffer which provides a waiting station where data can rest
while the slower device catches up.
OS maintains parallel computation because of spooling process as a computer can
perform I/O in parallel fashion.
It becomes possible to have the computer read data from a tape, write data to disk and to
write out to a tape printer while it is doing its computing task.
Advantages
The spooling operation uses a disk as a very large buffer.
Spooling is capable of overlapping I/O operation for one job with processor
operations for another job.
It is defined by:
arg1 ... argn are pointers to arguments for the command and 0 simply marks the end of
the (variable) list of arguments.
Example:
main()
{ printf("Files in Directory are:\n");
execl("/bin/ls","ls", "-l",0);
}
ii. fork()
int fork() turns a single process into 2 identical processes, known as the parent and
the child. On success, fork() returns 0 to the child process and returns the process ID of
the child process to the parent process. On failure, fork() returns -1 to the parent process,
sets errno to indicate the error, and no child process is created.
NOTE: The child process will have its own unique PID.
The following program illustrates a simple use of fork, where two copies are made and
run together (multitasking)
main()
{ int return_value;
printf("Forking process\n");
fork();
printf("The process id is %d and return value is %d\n",getpid(),
return_value);
Forking process
The process id is 6753 and return value is 0
The process id is 6754 and return value is 0
iii. wait()
int wait (int *status_location) -- will force a parent process to wait for a child process to
stop or terminate. wait() return the pid of the child or -1 for an error. The exit status of the
child is returned to status_location.
iv. exit()
void exit(int status) -- terminates the process which calls this function and returns the
exit status value. Both UNIX and C (forked) programs can read the status value.
By convention, a status of 0 means normal termination, any other value indicates an error or
unusual occurrence.
2. File management.
i. open():
open system call can be used to open an existing file or to create a new file if it does not exist
already. The syntax of open has following form:
If the file cannot be opened or created, it returns -1. The first parameter path specifies the file
name to be opened or created. The second parameter (flags) specifies how the file may be
used.
ii. read():
The system call for reading from a file is read. Its syntax is
The first parameter fd is the file descriptor of the file you want to read from, it is normally
returned from open. The second parameter buf is a pointer pointing the memory location
where the input data should be stored. The last parameter nbytes specifies the maximum
number of bytes you want to read. The system call returns the number of bytes it actually
read, and normally this number is either smaller or equal to nbytes.
iii. write():
It writes nbytes of data to the file referenced by file descriptor fd from the buffer pointed
by buf. The write starts at the position pointed by the offset of the file. Upon returning
from write, the offset is advanced by the number of bytes which were successfully written.
The function returns the number of bytes that were actually written, or it returns the value -1
if failed.
iv. close():
3. Directory Handling
A directory can be read as a file by anyone whoever has reading permissions for it. Writing a
directory as a file can only be done by the kernel. The structure of the directory appears to
the user as a succession of structures named directory entries. A directory entry contains,
among other information, the name of the file and the inode of this. For reading the directory
entries one after the other we can use the following functions:
#include <sys/types.h>
#include <dirent.h>
DIR* opendir(const char* pathname);
struct dirent* readdir(DIR* dp);
void rewinddir(DIR* dp);
int closedir(DIR* dp);
The opendir() function opens a directory. It returns a valid pointer if the opening was
successful and NULL otherwise.
The readdir() function, at every call, reads another directory entry from the current directory.
The first readdir will read the first directory entry; the second call will read the next entry
and so on. In case of a successful reading the function will return a valid pointer to a
structure of type dirent and NULL otherwise (in case it reached the end of the directory, for
example).
The rewinddir() function repositions the file pointer to the first directory entry (the beginning
of the directory).
The closedir() function closes a previously opened directory. In case of an error it returns the
value -1.
4. Device Management.
o request device, release device
o read, write, reposition
o get/set device attributes
o logically attach or detach devices
o execute device specific operation
5. Information Maintenance.
o get/set time or date
o get/set system data
o get/set process, file, or device attributes
6. Communication.
o create, delete communication connection
o send, receive messages
o transfer status information
o attach or detach remote devices
The Kernel
The kernel is the heart and brain of the operating system. The kernel is a layer of software that
sits between the user of a computer and its hardware, and is responsible for efficiently managing
the system's resources. It also schedules the work being done on the system so that each task gets
its fair share of system resources.
Monolithic kernel: The older approach is the monolithic kernel, of which Unix, MS-DOS and
the early Mac OS are typical examples. It runs every basic system service like process and
memory management, interrupt handling and I/O communication, file system, etc. in kernel
space see Figure below. It is constructed in a layered fashion, built up from the fundamental
process management up to the interfaces to the rest of the operating system (libraries and on top
of them the applications). The inclusion of all basic services in kernel space has three big
drawbacks.
The kernel size increase.
Lack of extensibility.
The bad maintainability.
Bug-fixing or the addition of new features means a recompilation of the whole kernel. This is
time and resource consuming because the compilation of a new kernel can take several hours and
a lot of memory. Every time someone adds a new feature or fixes a bug, it means recompilation
of the whole kernel.
Microkernel: The concept (figure below) was to reduce the kernel to basic process
communication and I/O control, and let the other system services reside in user space in form of
normal processes (as so called servers). There is a server for managing memory issues, one
server does process management, another one manages drivers, and so on. Because the servers
do not run in kernel space anymore, so called ‖context switches‖ are needed, to allow user
processes to enter privileged mode (and to exit again). That way, the μ-kernel is not a block of
system services anymore, but represents just several basic abstractions and primitives to control
the communication between the processes and between a process and the underlying hardware.
Because communication is not done in a direct way anymore, a message system is introduced,
which allows independent communication and favours extensibility.
BASIS FOR
MICROKERNEL MONOLITHIC KERNEL
COMPARISON
In microkernel user services and In monolithic kernel, both user services
Basic kernel, services are kept in separate and kernel services are kept in the same
address space. address space.
Monolithic kernel is larger than
Size Microkernel are smaller in size.
microkernel.
Execution Slow execution. Fast execution.
The microkernel is easily
Extendible The monolithic kernel is hard to extend.
extendible.
If a service crashes, it does effect If a service crashes, the whole system
Security
on working of microkernel. crashes in monolithic kernel.
To write a microkernel, more code To write a monolithic kernel, less code
Code
is required. is required.
QNX, Symbian, L4Linux, Linux, BSDs (FreeBSD, OpenBSD,
Singularity, K42, Mac OS X, NetBSD), Microsoft Windows
Example
Integrity, PikeOS, HURD, Minix, (95,98,Me), Solaris, OS-9, AIX, HP-UX,
and Coyotos. DOS, OpenVMS, XTS-400 etc.
The Shell
As you can see from the diagram above, the shell is not part of the kernel, but it does
communicate directly with the kernel. It is the "shell around the kernel."
The shell is a command line interpreter that executes the commands you type in. It translates
your commands from a human-readable format into a format that can be understood by the
computer. In addition to carrying out your commands, the shell also allows you to manage your
working environment and is an effective programming language.
Since the shell is a program, just like a word processor or spreadsheet application, different
shells can be used on a single system. This allows users to work with the shell they like the best,
and can also make the computer system appear different to users using different shells because
each shell has its own way of doing things.
2.1 Process
A process is a program in execution.
The execution of a process must progress in a sequential fashion.
A process is defined as an entity which represents the basic unit of work to be
implemented in the system.
New
1
The process is being created.
Ready
2 The process is waiting to be assigned to a processor. Ready processes are waiting to
have the processor allocated to them by the operating system so that they can run.
Running
3 Process instructions are being executed (i.e. The process that is currently being
executed).
Waiting
4
The process is waiting for some event to occur (such as the completion of an I/O
operation).
Terminated
5
The process has finished execution.
Processes entering the system must initially go into the ready state.
A process can only enter the running state from the ready state.
A process can normally only leave the system from the running state, although a process
in the ready or blocked state may be aborted by the system (in the event of an error, for
example), or by the user.
Although the model shown above is sufficient to describe the behavior of processes
generally, the model must be extended to allow for other possibilities, such as the
suspension and resumption of a process.
For example, the process may be swapped out of working memory by the operating
system's memory manager in order to free up memory for another process.
When a process is suspended, it essentially becomes dormant until resumed by the system
(or by a user).
Because a process can be suspended while it is either ready or blocked, it may also exist
in one of two further states - ready suspended and blocked suspended (a running process
may also be suspended, in which case it becomes ready suspended).
The queue of ready processes is maintained in priority order, so the next process to
execute will be the one at the head of the ready queue.
The queue of blocked process is typically unordered, since there is no sure way to tell
which of these processes will become unblocked first (although if several processes are
blocked awaiting the same event, they may be prioritized within that context).
To prevent one process from monopolizing the processor, a system timer is started each
time a new process starts executing.
The process will be allowed to run for a set period of time, after which the timer
generates an interrupt that causes the operating system to regain control of the processor.
The operating system sends the previously running process to the end of the ready queue,
changing its status from running to ready, and assigns the first process in the ready queue
to the processor, changing its status from ready to running.
Pointer
1 Pointer points to another process control block. Pointer is used for maintaining the
scheduling list.
Process State
2
Process state may be new, ready, running, waiting and so on.
Process Number
3
Process Number indicates the id of the process executing.
Program Counter
4 Program Counter indicates the address of the next instruction to be executed for this
process.
CPU registers
5 CPU registers include general purpose register, stack pointers, index registers and
accumulators etc. number of register and type of register totally depends upon the
computer architecture.
Accounting information
7 This information includes the amount of CPU and real time used, time limits, job or
process numbers, account numbers etc.
Process control block includes CPU scheduling, I/O resource management, file
management information etc.
The PCB serves as the repository for any information which can vary from process to
process.
Loader/linker sets flags and registers when a process is created.
If that process gets suspended, the contents of the registers are saved on a stack and the
pointer to the particular stack frame is stored in the PCB.
By this technique, the hardware state can be restored so that the process can be scheduled
to run again.
Types of Thread
Threads are implemented in following two ways
User Level Threads -- User managed threads
Kernel Level Threads -- Operating System managed threads acting on kernel, an
operating system core.
ADVANTAGES
Thread switching does not require Kernel mode privileges.
User level thread can run on any operating system.
Scheduling can be application specific in the user level thread.
User level threads are fast to create and manage.
DISADVANTAGES
In a typical operating system, most system calls are blocking.
Multithreaded application cannot take advantage of multiprocessing.
Kernel Level Threads
In this case, thread management done by the Kernel. There is no thread management code in the
application area. Kernel threads are supported directly by the operating system. Any application
can be programmed to be multithreaded. All of the threads within an application are supported
within a single process.
The Kernel maintains context information for the process as a whole and for individuals‘ threads
within the process. Scheduling by the Kernel is done on a thread basis. The Kernel performs
thread creation, scheduling and management in Kernel space. Kernel threads are generally
slower to create and manage than the user threads.
ADVANTAGES
Kernel can simultaneously schedule multiple threads from the same process on multiple
processes.
If one thread in a process is blocked, the Kernel can schedule another thread of the same
process.
Kernel routines themselves can multithreaded.
DISADVANTAGES
Kernel threads are generally slower to create and manage than the user threads.
Transfer of control from one thread to another within same process requires a mode
switch to the Kernel.
Difference between User Level & Kernel Level Thread
Sr. No. User Level Threads Kernel Level Thread
User level threads are faster to create and Kernel level threads are slower to
1
manage. create and manage.
Implementation is by a thread library at Operating system supports creation of
2
the user level. Kernel threads.
User level thread is generic and can run Kernel level thread is specific to the
3
on any operating system. operating system.
Multi-threaded application cannot take Kernel routines themselves can be
4
advantage of multiprocessing. multithreaded.
Multithreading Models
Some operating system provides a combined user level thread and Kernel level thread facility.
Solaris is a good example of this combined approach. In a combined system, multiple threads
within the same application can run in parallel on multiple processors and a blocking system call
need not block the entire process. Multithreading models are three types
Many to many relationship.
Many to one relationship.
One to one relationship.
Types of Schedulers
Schedulers are special system software which handles process scheduling in various ways. Their
main task is to select the jobs to be submitted into the system and to decide which process to run.
Schedulers are of three types
Long Term Scheduler
Short Term Scheduler
Medium Term Scheduler
Long Term Scheduler
It is also called job scheduler.
Long term scheduler determines which programs are admitted to the system for
processing.
Job scheduler selects processes from the queue and loads them into memory for
execution.
Process loads into the memory for CPU scheduling.
The primary objective of the job scheduler is to provide a balanced mix of jobs, such as
I/O bound and processor bound.
It also controls the degree of multiprogramming.
If the degree of multiprogramming is stable, then the average rate of process creation
must be equal to the average departure rate of processes leaving the system.
On some systems, the long term scheduler may not be available or minimal.
Time-sharing operating systems have no long term scheduler.
When process changes the state from new to ready, then there is use of long term
scheduler.
It provides lesser
It controls the degree of It reduces the degree of
3 control over degree of
multiprogramming multiprogramming.
multiprogramming
It is almost absent or
It is also minimal in It is a part of Time sharing
4 minimal in time sharing
time sharing system systems.
system
Some hardware systems employ two or more sets of processor registers to reduce the amount of
context switching time. When the process is switched, the following information is stored.
Program Counter
Scheduling Information
Base and limit register value
Currently used register
Changed State
I/O State
Accounting
CPU Utilization: Keep the CPU as busy as possible. It ranges from 0 to 100%. In
practice, it ranges from 40 to 90%.
Throughput: Throughput is the rate at which processes are completed per unit of
time.
Turnaround time: This is the how long a process takes to execute a process. It is
calculated as the time gap between the submission of a process and its completion.
Waiting time: Waiting time is the sum of the time periods spent in waiting in the
ready queue.
Response time: Response time is the time it takes to start responding from
submission time. It is calculated as the amount of time it takes from when a request
was submitted until the first response is produced.
Fairness: Each process should have a fair share of CPU.
Non-preemptive Scheduling:
In non-preemptive mode, once if a process enters into running state, it continues to execute until
it terminates or blocks itself to wait for Input/output or by requesting some operating system
service.
Preemptive Scheduling:
In preemptive mode, currently running process may be interrupted and moved to the ready state
by the operating system.
When a new process arrives or when an interrupt occurs, preemptive policies may incur greater
overhead than non-preemptive version but preemptive version may provide better service.
It is desirable to maximize CPU utilization and throughput, and to minimize turnaround time,
waiting time and response time.
Scheduling Algorithms
Scheduling algorithms or scheduling policies are mainly used for short-term scheduling. The
main objective of short-term scheduling is to allocate processor time in such a way as to optimize
one or more aspects of system behavior.
For these scheduling algorithms assume only a single processor is present. Scheduling
algorithms decide which of the processes in the ready queue is to be allocated to the CPU is basis
on the type of scheduling policy and whether that policy is either preemptive or non-preemptive.
For scheduling, arrival time and service time also will play a role.
For describing various scheduling policies, we would use the following information, present
below:
Process Arrival Time Burst Time Priority
P1 0 4 2
P2 3 6 1
P3 5 3 3
P4 8 2 1
Advantages
Better for long processes
Simple method (i.e., minimum overhead on processor)
No starvation
Disadvantages
Convoy effect occurs. Even very small process should wait for its turn to come to utilize
the CPU. Short process behind long process results in lower CPU utilization.
Throughput is not emphasized.
2. Shortest Job First Scheduling (SJF)
This algorithm associates with each process the length of the next CPU burst.
Shortest-job-first scheduling is also called as shortest process next (SPN).
The process with the shortest expected processing time is selected for execution, among
the available processes in the ready queue.
Thus, a short process will jump to the head of the queue over long jobs.
If the next CPU bursts of two processes are the same then FCFS scheduling is used to
break the tie.
SJF scheduling algorithm is probably optimal.
It gives the minimum average time for a given set of processes.
It cannot be implemented at the level of short term CPU scheduling.
There is no way of knowing the shortest CPU burst.
SJF can be pre-emptive or non-preemptive.
A pre-emptive SJF algorithm will preempt the currently executing process if the next
CPU burst of newly arrived process may be shorter than what is left to the currently
executing process.
A non-preemptive SJF algorithm will allow the currently running process to finish.
Preemptive SJF Scheduling is sometimes called Shortest Remaining Time First
algorithm.
Advantages
It gives superior turnaround time performance to shortest process next because a short job
is given immediate preference to a running longer job.
Throughput is high.
Disadvantages
Elapsed time (i.e., execution-completed-time) must be recorded, it results an additional
overhead on the processor.
Starvation may be possible for the longer processes.
3. Priority Scheduling
Advantage
Good response for the highest priority processes.
Disadvantage
Starvation may be possible for the lowest priority processes.
In this type of scheduling the CPU is allocated to the process with the highest priority
immediately upon the arrival of the highest priority process.
If the equal priority process is in running state, after the completion of the present
running process CPU is allocated to this even though one more equal priority process is
to arrive.
Advantage
Very good response for the highest priority process over non-preemptive version of it.
Disadvantage
Starvation may be possible for the lowest priority processes.
4. Round-Robin Scheduling
This type of scheduling algorithm is basically designed for time sharing system.
It is similar to FCFS with preemption added.
Round-Robin Scheduling is also called as time-slicing scheduling and it is a preemptive
version based on a clock. That is a clock interrupt is generated at periodic intervals
usually 10-100ms.
When the interrupt occurs, the currently running process is placed in the ready queue and
the next ready job is selected on a First-come, First-serve basis.
This process is known as time-slicing, because each process is given a slice of time
before being preempted.
In round-robin scheduling, the principal design issue is the length of the time quantum or time-
slice to be used. If the quantum is very short, then short processes will move quickly.
Advantages
Round-robin is effective in a general-purpose, times-sharing system or transaction-
processing system.
Fair treatment for all the processes.
Overhead on processor is low.
Overhead on processor is low.
Good response time for short processes.
Disadvantages
Care must be taken in choosing quantum value.
Processing overhead is there in handling clock interrupt.
Throughput is low if time quantum is too small.
UNIVERSITY QUESTION
Use following scheduling algorithm to calculate ATAT and AWT for the following process.
i) FCFS ii) Pre-emptive and non-pre-emptive SJF iii) Pre-emptive Priority
Process Arrival Time Burst Time Priority
P1 0 8 3
P2 1 1 1
P3 2 3 2
P4 3 2 3
P5 4 6 4
Solution:
i) FCFS
P1 P2 P3 P4 P5
0 8 9 12 14 20
3.1 Introduction
In chapter 2, we have studied the concept of processes. In addition to process scheduling,
another important responsibility of the operating system is process synchronization.
Synchronization involves the orderly sharing of system resources by processes.
The operating system supports concurrent execution of a program without necessarily supporting
elaborate form of memory and file management. This form of operation is also known as
multitasking. One of the benefits of multitasking is that several processes can be made to
cooperate in order to achieve their goals. To do this, they must do one of the following:
Share Data: A segment of memory must be available to both the processes. (Most memory is
locked to a single process).
Waiting: Some processes wait for other processes to give a signal before continuing. This is an
issue of synchronization.
Synchronization is often necessary when processes communicate. Processes are executed with
unpredictable speeds. Yet to communicate one process must perform some action such as setting
the value of a variable or sending a message that the other detects. This only works if the events
perform an action or detect an action are constrained to happen in that order. Thus one can view
synchronization as a set of constraints on the ordering of events. The programmer employs a
synchronization mechanism to delay execution of a process in order to satisfy such constraints.
In this chapter, we will study the concept of interprocess communication and synchronization,
need of semaphores, classical problems in concurrent processing, critical regions, monitors and
message passing.
IPC allows the process to communicate and to synchronize their actions without sharing the
same address space. This concept can be illustrated with the example of a shared printer as given
below:
Consider a machine with a single printer running a time-sharing operation system. If a process
needs to print its results, it must request that the operating system gives it access to the printer‘s
device driver. At this point, the operating system must decide whether to grant this request,
depending upon whether the printer is already being used by another process. If it is not, the
operating system should grant the request and allow the process to continue; otherwise, the
operating system should deny the request and perhaps classify the process as a waiting process
until the printer becomes available. Indeed, if two processes were given simultaneous access to the
machine‘s printer, the results would be worthless to both.
Consider the following related definitions to understand the example in a better way:
Critical Resource: It is a resource shared with constraints on its use (e.g., memory, files,
printers, etc).
Mutual Exclusion: At most one process may be executing a critical section with respect to a
particular critical resource simultaneously.
In the example given above, the printer is the critical resource. Let‘s suppose that the processes
which are sharing this resource are called process A and process B. The critical sections of
process A and process B are the sections of the code which issue the print command. In order to
ensure that both processes do not attempt to use the printer at the same, they must be granted
mutually exclusive access to the printer driver.
First we consider the interprocess communication part. There exist two complementary inter-
process communication types: a) shared-memory system and b) message-passing system. It is
clear that these two schemes are not mutually exclusive, and could be used simultaneously
within a single operating system.
A critical problem occurring in shared-memory system is that two or more processes are reading
or writing some shared variables or shared data, and the final results depend on who runs
precisely and when. Such situations are called race conditions. In order to avoid race conditions
we must find some way to prevent more than one process from reading and writing shared
variables or shared data at the same time, i.e., we need the concept of mutual exclusion (which
we will discuss in the later section). It must be sure that if one process is using a shared variable,
the other process will be excluded from doing the same thing.
Message passing systems allow communication processes to exchange messages. In this scheme,
the responsibility rests with the operating system itself.
The function of a message-passing system is to allow processes to communicate with each other
without the need to resort to shared variable. An interprocess communication facility basically
provides two operations: send (message) and receive (message). In order to send and to receive
messages, a communication link must exist between two involved processes. This link can be
implemented in different ways. The possible basic implementation questions are:
• What is the capacity of a link? That is, does the link have some buffer space? If so, how much?
• What is the size of the message? Can the link accommodate variable size or fixed-size
message?
In the following we consider several methods for logically implementing a communication link
and the send/receive operations. These methods can be classified into two categories: a) Naming,
consisting of direct and indirect communication and b) Buffering, consisting of capacity and
messages proprieties.
Direct Communication
In direct communication, each process that wants to send or receive a message must explicitly
name the recipient or sender of the communication. In this case, the send and receive primitives
are defined as follows:
This scheme shows the symmetry in addressing, i.e., both the sender and the receiver have to
name one another in order to communicate. In contrast to this, asymmetry in addressing can be
used, i.e., only the sender has to name the recipient; the recipient is not required to name the
sender. So the send and receive primitives can be defined as follows:
• receive (id, message): To receive a message from any process; id is set to the name of the
process with whom the communication has taken place.
Indirect Communication
With indirect communication, the messages are sent to, and received from a mailbox. A mailbox
can be abstractly viewed as an object into which messages may be placed and from which
messages may be removed by processes. In order to distinguish one from the other, each mailbox
owns a unique identification. A process may communicate with some other process by a number
of different mailboxes. The send and receive primitives are defined as follows:
Mailboxes may be owned either by a process or by the system. If the mailbox is owned by a
process, then we distinguish between the owner who can only receive from this mailbox and user
who can only send message to the mailbox. When a process that owns a mailbox terminates, its
mailbox disappears. Any process that sends a message to this mailbox must be notified in the
form of an exception that the mailbox no longer exists.
If the mailbox is owned by the operating system, then it has an existence of its own, i.e., it is
independent and not attached to any particular process. The operating system provides a
mechanism that allows a process to: a) create a new mailbox, b) send and receive message
through the mailbox and c) destroy a mailbox. Since all processes with access rights to a mailbox
may terminate, a mailbox may no longer be accessible by any process after some time. In this
case, the operating system should reclaim whatever space was used for the mailbox.
Capacity Link
A link has some capacity that determines the number of messages that can temporarily reside in
it. This propriety can be viewed as a queue of messages attached to the link. Basically there are
three ways through which such a queue can be implemented:
Zero capacity: This link has a message queue length of zero, i.e., no message can wait in it. The
sender must wait until the recipient receives the message. The two processes must be
synchronized for a message transfer to take place. The zero-capacity link is referred to as a
message-passing system without buffering.
Bounded capacity: This link has a limited message queue length of n, i.e., at most n messages
can reside in it. If a new message is sent, and the queue is not full, it is placed in the queue either
by copying the message or by keeping a pointer to the message and the sender should continue
execution without waiting. Otherwise, the sender must be delayed until space is available in the
queue.
Unbounded capacity: This queue has potentially infinite length, i.e., any number of messages
can wait in it. That is why the sender is never delayed.
Bounded and unbounded capacity link provide message-passing system with automatic
buffering.
Messages
Messages sent by a process may be one of three varieties: a) fixed-sized, b) variable-sized and c)
typed messages. If only fixed-sized messages can be sent, the physical implementation is
straightforward. However, this makes the task of programming more difficult. On the other hand,
variable-size messages require more complex physical implementation, but the programming
becomes simpler. Typed messages, i.e., associating a type with each mailbox, are applicable only
to indirect communication. The messages that can be sent to, and received from a mailbox are
restricted to the designated type.
In some operating systems, processes that are working together may share some common storage
that each one can read and write. The shared storage may be in main memory (possibly in a
kernel data structure) or it may be a shared file: the location of the shared memory does not
change the nature of the communication or the problems that arise. To see how interprocess
communication works in practice, let us consider a simple but common example: a print spooler.
When a process wants to print a file, it enters the file name in a special spooler directory.
Another process, the printer daemon, periodically checks to see if there are any files to be
printed, and if there are, it prints them and then removes their names from the directory.
Imagine that our spooler directory has a very large number of slots, numbered 0, 1, 2, …, each
one capable of holding a file name. Also imagine that there are two shared variables, out, which
points to the next file to be printed, and in, which points to the next free slot in the directory.
These two variables might well be kept on a two-word file available to all processes. At a certain
instant, slots 0 to 3 are empty (the files have already been printed) and slots 4 to 6 are full (with
the names of files queued for printing). More or less simultaneously, processes A and B decide
they want to queue a file for printing. This situation is shown in Fig. 3.1.
Figure 3.1: Two processes want to access shared memory at same time
In jurisdictions where Murphy‘s law is applicable, the following might happen. Process A reads
in and stores the value, 7, in a local variable called next_free_slot. Just then a clock interrupt
occurs and the CPU decides that process A has run long enough, so it switches to process B,
Process B also reads in, and also gets a 7. It too stores it in its local variable next_free_slot. At
this instant both processes think that the next available slot is 7.
Process B now continues to run. It stores the name of its file in slot 7 and updates in to be an 8.
Then it goes off and does other things.
Eventually, process A runs again, starting from the place it left off. It looks at next_free_slot,
finds a 7 there, and writes its file name in slot 7, erasing the name that process B just put there.
Then it computes next_free_slot + 1, which is 8, and sets in to 8. The spooler directory is now
internally consistent, so the printer daemon will not notice anything wrong, but process B will
never receive any output. User B will hang around the printer room for years, wistfully hoping
for output that never comes. Situations like this, where two or more processes are reading or
writing some shared data and the final result depends on who runs precisely when, are called
race conditions.
How do we avoid race conditions? The key to preventing trouble here and in many other
situations involving shared memory, shared files, and shared everything else is to find some way
to prohibit more than one process from reading and writing the shared data at the same time. Put
in other words, what we need is mutual exclusion, that is, some way of making sure that if one
process is using a shared variable or file, the other processes will be excluded from doing the
same thing. The difficulty above occurred because process B started using one of the shared
variables before process A was finished with it. The choice of appropriate primitive operations
for achieving mutual exclusion is a major design issue in any operating system, and a subject that
we will examine in great detail in the following sections.
The problem of avoiding race conditions can also be formulated in an abstract way. Part of the
time, a process is busy doing internal computations and other things that do not lead to race
conditions. However, sometimes a process has to access shared memory or files, or doing other
critical things that can lead to races. That part of the program where the shared memory is
accessed is called the critical region or critical section. If we could arrange matters such that no
two processes were ever in their critical regions at the same time, we could avoid races.
Although this requirement avoids race conditions, this is not sufficient for having parallel
processes cooperate correctly and efficiently using shared data. We need four conditions to hold
to have a good solution:
3. No process running outside its critical region may block other processes.
In an abstract sense, the behavior that we want is shown in Fig. 3.2. Here process A enters its
critical region at time T1, A little later, at time T2 process B attempts to enter its critical region
but fails because another process is already in its critical region and we allow only one at a time.
Consequently, B is temporarily suspended until time T3 when A leaves its critical region,
allowing B to enter immediately. Eventually B leaves (at T4) and we are back to the original
situation with no processes in their critical regions.
In this section we will examine various proposals for achieving mutual exclusion, so that while
one process is busy updating shared memory in its critical region, no other process will enter its
critical region and cause trouble.
Disabling Interrupts
The simplest solution is to have each process disable all interrupts just after entering its
critical region and re-enable them just before leaving it.
With interrupts disabled, no clock interrupts can occur.
The CPU is only switched from process to process as a result of clock or other interrupts,
after all, and with interrupts turned off the CPU will not be switched to another process.
Thus, once a process has disabled interrupts, it can examine and update the shared
memory without fear that any other process will intervene.
This approach is generally unattractive because it is unwise to give user processes the
power to turn off interrupts.
Suppose that one of them did it and never turned them on again? That could be the end of
the system.
Furthermore if the system is a multiprocessor, with two or more CPUs, disabling
interrupts affects only the CPU that executed the disable instruction. The other ones will
continue running and can access the shared memory.
On the other hand, it is frequently convenient for the kernel itself to disable interrupts for
a few instructions while it is updating variables or lists.
If an interrupt occurred while the list of ready processes, for example, was in an
inconsistent state, race conditions could occur.
The conclusion is: disabling interrupts is often a useful technique within the operating
system itself but is not appropriate as a general mutual exclusion mechanism for user
processes.
Lock Variables
Consider having a single, shared (lock) variable, initially 0. When a process wants to
enter its critical region, it first tests the lock.
If the lock is 0, the process sets it to 1 and enters the critical region. If the lock is already
1, the process just waits until it becomes 0.
Thus, a 0 means that no process is in its critical region, and a 1 means that some process
is in its critical region.
Unfortunately, this idea contains exactly the same fatal flaw that we saw in the spooler
directory.
Suppose that one process reads the lock and sees that it is 0. Before it can set the lock to
1, another process is scheduled, runs, and sets the lock to 1.
When the first process runs again, it will also set the lock to 1, and two processes will be
in their critical regions at the same time.
Strict Alternation
A third approach to the mutual exclusion problem is shown in Fig.3.3. This program fragment is
written in C.
In Fig. 3.3, the integer variable turn, initially 0, keeps track of whose turn it is to enter
the critical region and examine or update the shared memory.
Initially, process 0 inspects turn, finds it to be 0, and enters its critical region.
Process 1 also finds it to be 0 and therefore sits in a tight loop continually testing turn to
see when it becomes 1.
Continuously testing a variable until some value appears is called busy waiting.
It should usually be avoided, since it wastes CPU time. Only when there is a reasonable
expectation that the wait will be short is busy waiting used.
A lock that uses busy waiting is called a spin lock.
By combining the idea of taking turns with the idea of lock variables and warning variables, a
Dutch mathematician, T. Dekker, was the first one to devise a software solution to the mutual
exclusion problem that does not require strict alternation. In 1981, G.L. Peterson discovered a
much simpler way to achieve mutual exclusion, thus rendering Dekker‘s solution obsolete.
Peterson‘s algorithm is shown in Fig. 3.4.
TSL RX,LOCK
It reads the contents of the memory word lock into register RX and then stores a nonzero
value at the memory address lock.
The operations of reading the word and storing into it are guaranteed to be indivisible—
no other processor can access the memory word until the instruction is finished.
The CPU executing the TSL instruction locks the memory bus to prohibit other CPUs
from accessing memory until it is done.
To use the TSL instruction, we will use a shared variable, lock, to coordinate access to
shared memory.
When lock is 0, any process may set it to 1 using the TSL instruction and then read or
write the shared memory.
When it is done, the process sets lock back to 0 using an ordinary move instruction.
How can this instruction be used to prevent two processes from simultaneously entering their
critical regions? The solution is given in Fig.3.5. There a four-instruction subroutine in a
fictitious (but typical) assembly language is shown.
The first instruction copies the old value of lock to the register and then sets lock to 1.
Then the old value is compared with 0. If it is nonzero, the lock was already set, so the
program just goes back to the beginning and tests it again.
Sooner or later it will become 0 (when the process currently in its critical region is done
with its critical region), and the subroutine returns, with the lock set.
Clearing the lock is simple. The program just stores a 0 in lock. No special instructions
are needed.
enter_region:
TSL REGISTER,LOCK | copy lock to register and set lock to 1
CMP REGISTER,#0 | was lock zero?
JNE enter_region | if it was non zero, lock was set, so loop
RET | return to caller; critical region entered
leave_region:
MOVE LOCK,#0 | store a 0 in lock
RET | return to caller
Figure 3.5. Entering and leaving a critical region using the TSL instruction.
Before entering its critical region, a process calls enter_region, which does busy waiting
until the lock is free; then it acquires the lock and returns.
After the critical region the process calls leave_region, which stores a 0 in lock.
As with all solutions based on critical regions, the processes must call enter_region and
leave_region at the correct times for the method to work.
If a process cheats, the mutual exclusion will fail.
Not only does this approach waste CPU time, but it can also have unexpected effects.
Consider a computer with two processes, H, with high priority and L, with low priority. The
scheduling rules are such that H is run whenever it is in ready state. At a certain moment, with L
in its critical region, H becomes ready to run (e.g., an I/O operation completes). H now begins
busy waiting, but since L is never scheduled while H is running, L never gets the chance to leave
its critical region, so H loops forever. This situation is sometimes referred to as the priority
inversion problem.
Now let us look at some interprocess communication primitives that block instead of wasting
CPU time when they are not allowed to enter their critical regions. One of the simplest is the pair
sleep and wakeup.
Sleep is a system call that causes the caller to block, that is, be suspended until another
process wakes it up.
Alternatively, both sleep and wakeup each have one parameter, a memory address used to
match up sleeps with wakeups.
As an example of how these primitives can be used, let us consider the producer-consumer
problem (also known as the bounded-buffer problem).
Two processes share a common, fixed-size buffer. One of them, the producer, puts
information into the buffer, and the other one, the consumer, takes it out. (It is also
possible to generalize the problem to have m producers and n consumers, but we will
only consider the case of one producer and one consumer because this assumption
simplifies the solutions).
Trouble arises when the producer wants to put a new item in the buffer, but it is already
full. The solution is for the producer to go to sleep, to be awakened when the consumer
has removed one or more items.
Similarly, if the consumer wants to remove an item from the buffer and sees that the
buffer is empty, it goes to sleep until the producer puts something in the buffer and wakes
it up.
This approach sounds simple enough, but it leads to the same kinds of race conditions we
saw earlier with the spooler directory.
To keep track of the number of items in the buffer, we will need a variable, count. If the
maximum number of items the buffer can hold is N, the producer‘s code will first test to
see if count is N. If it is, the producer will go to sleep; if it is not, the producer will add an
item and increment count.
The consumer‘s code is similar: first test count to see if it is 0. If it is, go to sleep, if it is
nonzero, remove an item and decrement the counter. Each of the processes also tests to
see if the other should be awakened, and if so, wakes it up. The code for both producer
and consumer is shown in Fig. 3.6.
#define N 100 /* number of slots in the buffer */
int count = 0; /* number of items in the buffer */
void producer (void)
{
int item;
while (TRUE) /* repeat forever */
{
item = produce_item(); /* generate next item */
if (count = = N) sleep(); /* if buffer is full, go to sleep */
insert_item(item); /* put item in buffer */
count = count + 1; /* increment count of items in buffer */
if (count = = 1) wakeup(consumer); /* was buffer empty? */
}
}
void consumer(void)
{
int item;
while (TRUE) /* repeat forever */
{
if (count = = 0) sleep(); /* if buffer is empty, got to sleep */
item = remove_item(); /* take item out of buffer */
count = count − 1; /* decrement count of items in buffer */
if (count = = N − 1) wakeup(producer); /* was buffer full? */
consume_item(item); /* print item */
}
}
Figure 3.6. The producer-consumer problem with a fatal race condition.
The procedures insert_item and remove_item, which are not shown, handle the
bookkeeping of putting items into the buffer and taking items out of the buffer.
3.9 Semaphores
This was the situation in 1965, when E. W. Dijkstra (1965) suggested using an integer variable to
count the number of wakeups saved for future use. In his proposal, a new variable type, called a
semaphore, was introduced. A semaphore could have the value 0, indicating that no wakeups
were saved, or some positive value if one or more wakeups were pending.
Dijkstra proposed having two operations, down and up (generalizations of sleep and wakeup,
respectively).
The down operation on a semaphore checks to see if the value is greater than 0. If so, it
decrements the value (i.e., uses up one stored wakeup) and just continues. If the value is
0, the process is put to sleep without completing the down for the moment.
The up operation increments the value of the semaphore addressed. If one or more
processes were sleeping on that semaphore, unable to complete an earlier down
operation, one of them is chosen by the system (e.g., at random) and is allowed to
complete its down.
Thus, after an up on a semaphore with processes sleeping on it, the semaphore will still
be 0, but there will be one fewer process sleeping on it.
The operation of incrementing the semaphore and waking up one process is also
indivisible.
No process ever blocks doing an up, just as no process ever blocks doing a wakeup in the
earlier model.
void producer(void)
{
int item;
void consumer(void)
{
int item;
This solution uses three semaphores: one called full for counting the number of slots that
are full, one called empty for counting the number of slots that are empty, and one called
mutex to make sure the producer and consumer do not access the buffer at the same time.
Full is initially 0, empty is initially equal to the number of slots in the buffer, and mutex is
initially 1.
Semaphores that are initialized to 1 and used by two or more processes to ensure that
only one of them can enter its critical region at the same time are called binary
semaphores.
If each process does a down just before entering its critical region and an up just after
leaving it, mutual exclusion is guaranteed.
In the example of Fig. 3.7, we have actually used semaphores in two different ways. This
difference is important enough to make explicit.
The mutex semaphore is used for mutual exclusion. It is designed to guarantee that only
one process at a time will be reading or writing the buffer and the associated variables.
This mutual exclusion is required to prevent chaos.
The other use of semaphores is for synchronization. The full and empty semaphores are
needed to guarantee that certain event sequences do or do not occur. In this case, they
ensure that the producer stops running when the buffer is full, and the consumer stops
running when it is empty. This use is different from mutual exclusion.
3.11 Monitors
A monitor is a collection of procedures, variables, and data structures that are all grouped
together in a special kind of module or package. Processes may call the procedures in a monitor
whenever they want to, but they cannot directly access the monitor‘s internal data structures
from procedures declared outside the monitor. Figure 3.8 illustrates a monitor written in an
imaginary language, Pidgin Pascal.
monitor example
integer i;
condition c;
procedure producer( );
·
end;
procedure consumer( );
· · ·
end;
end monitor;
Figure 3.8. A monitor
Monitors have an important property that makes them useful for achieving mutual
exclusion: only one process can be active in a monitor at any instant.
Monitors are a programming language construct, so the compiler knows they are special
and can handle calls to monitor procedures differently from other procedure calls.
Typically, when a process calls a monitor procedure, the first few instructions of the
procedure will check to see, if any other process is currently active within the monitor.
If so, the calling process will be suspended until the other process has left the monitor. If
no other process is using the monitor, the calling process may enter.
It is up to the compiler to implement the mutual exclusion on monitor entries, but a
common way is to use a mutex or binary semaphore. Because the compiler, not the
programmer, is arranging for the mutual exclusion, it is much less likely that something
will go wrong.
In any event, the person writing the monitor does not have to be aware of how the
compiler arranges for mutual exclusion. It is sufficient to know that by turning all the
critical regions into monitor procedures, no two processes will ever execute their critical
regions at the same time.
Although monitors provide an easy way to achieve mutual exclusion, as we have seen
above, that is not enough. We also need a way for processes to block when they cannot
proceed. In the producer-consumer problem, it is easy enough to put all the tests for
buffer-full and buffer-empty in monitor procedures, but how should the producer block
when it finds the buffer full?
The solution lies in the introduction of condition variables, along with two operations on
them, wait and signal. When a monitor procedure discovers that it cannot continue (e.g.,
the producer finds the buffer full), it does a wait on some condition variable, say, full.
This action causes the calling process to block. It also allows another process that had
been previously prohibited from entering the monitor to enter now.
Condition variables are not counters. They do not accumulate signals for later use the
way semaphores do. Thus if a condition variable is signaled with no one waiting on it, the
signal is lost forever. In other words, the wait must come before the signal. This rule
makes the implementation much simpler. In practice it is not a problem because it is easy
to keep track of the state of each process with variables, if need be. A process that might
otherwise do a signal can see that this operation is not necessary by looking at the
variables.
A skeleton of the producer-consumer problem with monitors is given in Fig. 3.9 in an
imaginary language, Pidgin Pascal.
monitor ProducerConsumer
condition full, empty;
integer count;
procedure insert(item: integer);
begin
if count = N then wait(full);
insert_item(item);
count := count + 1:
if count = 1 then signal(empty)
end;
function remove: integer;
begin
if count = 0 then wait(empty);
remove = remove_item;
count := count − 1;
if count = N − 1 then signal(full)
end;
count := 0;
end monitor;
procedure producer;
begin
while true do
begin
item = produce_item;
ProducerConsumer.insert(item)
end
end;
procedure consumer;
begin
while true do
begin
item = ProducerConsumer.remove;
consume_item(item)
end
end;
Figure 3.9. An outline of the producer-consumer problem with monitors. Only one monitor
procedure at a time is active. The buffer has N slots.
You may be thinking that the operations wait and signal look similar to sleep and
wakeup, which we saw earlier had fatal race conditions.
They are very similar, but with one crucial difference: sleep and wakeup failed because
while one process was trying to go to sleep, the other one was trying to wake it up. With
monitors, that cannot happen.
The automatic mutual exclusion on monitor procedures guarantees that if, say, the
producer inside a monitor procedure discovers that the buffer is full, it will be able to
complete the wait operation without having to worry about the possibility that the
scheduler may switch to the consumer just before the wait completes.
The consumer will not even be let into the monitor at all until the wait is finished and the
producer has been marked as no longer runnable.
This method of interprocess communication uses two primitives, send and receive, which, like
semaphores and unlike monitors, are system calls rather than language constructs. As such, they
can easily be put into library procedures, such as
send(destination, &message);
and
receive(source, &message);
The former call sends a message to a given destination and the latter one receives a message
from a given source (or from ANY, if the receiver does not care). If no message is available, the
receiver can block until one arrives. Alternatively, it can return immediately with an error code.
Message passing systems have many challenging problems and design issues that do not arise
with semaphores or monitors, especially if the communicating processes are on different
machines connected by a network. For example, messages can be lost by the network. To guard
against lost messages, the sender and receiver can agree that as soon as a message has been
received, the receiver will send back a special acknowledgement message. If the sender has not
received the acknowledgement within a certain time interval, it retransmits the message.
Now consider what happens if the message itself is received correctly, but the acknowledgement
is lost. The sender will retransmit the message, so the receiver will get it twice. It is essential that
the receiver be able to distinguish a new message from the retransmission of an old one. Usually,
this problem is solved by putting consecutive sequence numbers in each original message. If the
receiver gets a message bearing the same sequence number as the previous message, it knows
that the message is a duplicate that can be ignored. Successfully communicating in the face of
unreliable message passing is a major part of the study of computer networks.
Message systems also have to deal with the question of how processes are named, so that the
process specified in a send or receive call is unambiguous. Authentication is also an issue in
message systems: how can the client tell that he is communicating with the real file server, and
not with an imposter?
At the other end of the spectrum, there are also design issues that are important when the sender
and receiver are on the same machine. One of these is performance. Copying messages from one
process to another is always slower than doing a semaphore operation or entering a monitor.
Much work has gone into making message passing efficient.
Now let us see how the producer-consumer problem can be solved with message passing and no
shared memory. A solution is given in Fig. 3.10. We assume that all messages are the same size
and that messages sent but not yet received are buffered automatically by the operating system.
In this solution, a total of N messages is used, analogous to the N slots in a shared memory
buffer. The consumer starts out by sending N empty messages to the producer. Whenever the
producer has an item to give to the consumer, it takes an empty message and sends back a full
one. In this way, the total number of messages in the system remains constant in time, so they
can be stored in a given amount of memory known in advance.
If the producer works faster than the consumer, all the messages will end up full, waiting for the
consumer: the producer will be blocked, waiting for an empty to come back. If the consumer
works faster, then the reverse happens: all the messages will be empties waiting for the producer
to fill them up: the consumer will be blocked, waiting for a full message.
For the producer-consumer problem, both the producer and consumer would create mailboxes
large enough to hold N messages. The producer would send messages containing data to the
consumer‘s mailbox, and the consumer would send empty messages to the producer‘s mailbox.
When mailboxes are used, the buffering mechanism is clear: the destination mailbox holds
messages that have been sent to the destination process but have not yet been accepted.
The other extreme from having mailboxes is to eliminate all buffering. When this approach is
followed, if the send is done before the receive, the sending process is blocked until the receive
happens, at which time the message can be copied directly from the sender to the receiver, with
no intermediate buffering. Similarly, if the receive is done first, the receiver is blocked until a
send happens. This strategy is often known as a rendezvous. It is easier to implement than a
buffered message scheme but is less flexible since the sender and receiver are forced to run in
lockstep.
Message passing is commonly used in parallel programming systems. One well-known message-
passing system, for example, is MPI (Message-Passing Interface). It is widely used for
scientific computing.
CHAPTER 4
DEADLOCKS
4.1 Introduction
Computer systems are full of resources that can only be used by one process at a time. Common
examples include printers, tape drives, and slots in the system‘s internal tables. Having two
processes simultaneously writing to the printer leads to gibberish. Having two processes using
the same file system table slot will invariably lead to a corrupted file system. Consequently, all
operating systems have the ability to (temporarily) grant a process exclusive access to certain
resources.
For many applications, a process needs exclusive access to not one resource, but several.
Suppose, for example, two processes each want to record a scanned document on a CD. Process
A requests permission to use the scanner and is granted it. Process B is programmed differently
and requests the CD recorder first and is also granted it. Now A asks for the CD recorder, but the
request is denied until B releases it. Unfortunately, instead of releasing the CD recorder B asks
for the scanner. At this point both processes are blocked and will remain so forever. This
situation is called a deadlock.
Deadlocks can occur in a variety of situations besides requesting dedicated I/O devices. In a
database system, for example, a program may have to lock several records it is using, to avoid
race conditions. If process A locks record R1 and process B locks record R2, and then each
process tries to lock the other one‘s record, we also have a deadlock. Thus deadlocks can occur
on hardware resources or on software resources.
4.2 Resources
A resource can be a hardware device (e.g., a tape drive) or a piece of information (e.g., a locked
record in a database). A computer will normally have many different resources that can be
acquired. For some resources, several identical instances may be available, such as three tape
drives. When several copies of a resource are available, any one of them can be used to satisfy
any request for the resource. In short, a resource is anything that can be used by only a single
process at any instant of time.
A preemptable resource is one that can be taken away from the process owning it with no ill
effects. Memory is an example of a preemptable resource.
A nonpreemptable resource, in contrast, is one that cannot be taken away from its current
owner without causing the computation to fail. If a process has begun to burn a CD-ROM,
suddenly taking the CD recorder away from it and giving it to another process will result in a
garbled CD, CD recorders are not preemptable at an arbitrary moment.
In general, deadlocks involve nonpreemptable resources. Potential deadlocks that involve
preemptable resources can usually be resolved by reallocating resources from one process to
another. Thus our treatment will focus on nonpreemptable resources.
The sequence of events required to use a resource is given below in an abstract form.
1. Request the resource.
2. Use the resource.
3. Release the resource.
If the resource is not available when it is requested, the requesting process is forced to wait. In
some operating systems, the process is automatically blocked when a resource request fails, and
awakened when it becomes available. In other systems, the request fails with an error code, and
it is up to the calling process to wait a little while and try again.
A process whose resource request has just been denied will normally sit in a tight loop requesting
the resource, then sleeping, then trying again. Although this process is not blocked, for all intents
and purposes, it is as good as blocked, because it cannot do any useful work. In our further
treatment, we will assume that when a process is denied a resource request, it is put to sleep.
The exact nature of requesting a resource is highly system dependent. In some systems, a request
system call is provided to allow processes to explicitly ask for resources. In others, the only
resources that the operating system knows about are special files that only one process can have
open at a time. These are opened by the usual open call. If the file is already in use, the caller is
blocked until its current owner closes it.
Definition: A set of processes is in a deadlock state if each process in the set is waiting for an
event that can be caused by only another process in the set.
In other words, each member of the set of deadlock processes is waiting for a resource that can
be released only by a deadlock process. None of the processes can run, none of them can release
any resources and none of them can be awakened. It is important to note that the number of
processes and the number and kind of resources possessed and requested are unimportant.
Example 1: The simplest example of deadlock is where process 1 has been allocated a non-
shareable resource A, say, a tap drive, and process 2 has been allocated a non-sharable resource
B, say, a printer. Now, if it turns out that process 1 needs resource B (printer) to proceed and
process 2 needs resource A (the tape drive) to proceed and these are the only two processes in the
system, each has blocked the other and all useful work in the system stops. This situation is
termed as deadlock.
The system is in deadlock state because each process holds a resource being requested by the
other process and neither process is willing to release the resource it holds.
Example 2: Consider a system with three disk drives. Suppose there are three processes, each is
holding one of these three disk drives. If each process now requests another disk drive, three
processes will be in a deadlock state, because each process is waiting for the event ―disk drive is
released‖, which can only be caused by one of the other waiting processes. Deadlock state
involves processes competing not only for the same resource type, but also for different resource
types.
Coffman (1971) identified four necessary conditions that must hold simultaneously for a
deadlock to occur.
4.4.1 Mutual Exclusion Condition
The resources involved are non-shareable. At least one resource must be held in a non-shareable
mode, that is, only one process at a time claims exclusive control of the resource. If another
process requests that resource, the requesting process must be delayed until the resource has been
released.
Let us understand this by a common example. Consider the traffic deadlock shown in the Figure
1.
• Mutual exclusion condition applies, since only one vehicle can be on a section of the street at a
time.
• Hold-and-wait condition applies, since each vehicle is occupying a section of the street, and
waiting to move on to the next section of the street.
• Non-preemptive condition applies, since a section of the street that is occupied by a vehicle
cannot be taken away from it.
• Circular wait condition applies, since each vehicle is waiting for the next vehicle to move. That
is, each vehicle in the traffic is waiting for a section of the street held by the next vehicle in the
traffic.
The simple rule to avoid traffic deadlock is that a vehicle should only enter an intersection if it is
assured that it will not have to stop inside the intersection.
It is not possible to have a deadlock involving only one single process. The deadlock involves a
circular ―hold-and-wait‖ condition between two or more processes, so ―one‖ process cannot hold
a resource, yet be waiting for another resource that it is holding. In addition, deadlock is not
possible between two threads in a process, because it is the process that holds resources, not the
thread, that is, each thread has access to the resources held by the process.
The idea behind the resource allocation graph is to have a graph which has two different types of
nodes, the process nodes and resource nodes (process represented by circles, resource node
represented by rectangles). For different instances of a resource, there is a dot in the resource
node rectangle. For example, if there are two identical printers, the printer resource might have
two dots to indicate that we don‘t really care which is used, as long as we acquire the resource.
The edges among these nodes represent resource allocation and release. Edges are directed, and
if the edge goes from resource to process node that means the process has acquired the resource.
If the edge goes from process node to resource node that means the process has requested the
resource.
We can use these graphs to determine if a deadline has occurred or may occur. If for example, all
resources have only one instance (all resource node rectangles have one dot) and the graph is
circular, then a deadlock has occurred. If on the other hand some resources have several
instances, then a deadlock may occur. If the graph is not circular, a deadlock cannot occur (the
circular wait condition wouldn‘t be satisfied).
The following are the tips which will help you to check the graph easily to predict the presence
of cycles.
• If there is a cycle in the graph and each resource has only one instance, then there is a deadlock.
In this case, a cycle is a necessary and sufficient condition for deadlock.
• If there is a cycle in the graph, and each resource has more than one instance, there may or may
not be a deadlock. (A cycle may be broken if some process outside the cycle has a resource
instance that can break the cycle). Therefore, a cycle in the resource allocation graph is a
necessary but not sufficient condition for deadlock, when multiple resource instances are
considered.
The above graph shown in Figure 4.3 has a cycle and is in Deadlock.
R1 P1 P1 R2
R2 P2 P2 R1
Figure 4.4: Resource Allocation Graph having a cycle and not in a Deadlock
The above graph shown in Figure 4.4 has a cycle and is not in Deadlock.
(Resource 1 has one instance shown by a star)
(Resource 2 has two instances a and b, shown as two stars)
Havender in his pioneering work showed that since all four of the conditions are necessary for
deadlock to occur, it follows that deadlock might be prevented by denying any one of the
conditions. Let us study Havender‘s algorithm.
Havender’s Algorithm
For example, a program requiring ten tape drives must request and receive all ten drives before it
begins executing. If the program needs only one tap drive to begin execution and then does not
need the remaining tap drives for several hours then substantial computer resources (9 tape
drives) will sit idle for several hours. This strategy can cause indefinite postponement
(starvation), since not all the required resources may become available at once.
High Cost
When a process releases resources, the process may lose all its work to that point. One serious
consequence of this strategy is the possibility of indefinite postponement (starvation). A process
might be held off indefinitely as it repeatedly requests and releases the same resources.
Elimination of “Circular Wait” Condition
The last condition, the circular wait, can be denied by imposing a total ordering on all of the
resource types and than forcing all processes to request the resources in order (increasing or
decreasing). This strategy imposes a total ordering of all resource types, and requires that each
process requests resources in a numerical order of enumeration. With this rule, the resource
allocation graph can never have a cycle.
For example, provide a global numbering of all the resources, as shown in the given Table 1:
Table 1: Numbering the resources
Number Resource
1 Floppy drive
2 Printer
3 Plotter
4 Tape Drive
5 CD Drive
Now we will see the rule for this:
Rule: Processes can request resources whenever they want to, but all requests must be made in
numerical order. A process may request first printer and then a tape drive (order: 2, 4), but it may
not request first a plotter and then a printer (order: 3, 2). The problem with this strategy is that it
may be impossible to find an ordering that satisfies everyone.
This strategy, if adopted, may result in low resource utilization and in some cases starvation is
possible too.
Deadlock Avoidance requires that the system has some additional a priori information
available.
Simplest and most useful model requires that each process declare the maximum
number of resources of each type that it may need
The deadlock-avoidance algorithm dynamically examines the resource-allocation
state to ensure that there can never be a circular-wait condition
Resource-allocation state is defined by the number of available and allocated
resources, and the maximum demands of the processes
• Safe state: Such a state occurs when the system can allocate resources to each process (up to
its maximum) in some order and avoid a deadlock. This state will be characterized by a safe
sequence. It must be mentioned here that we should not falsely conclude that all unsafe states are
deadlocked although it may eventually lead to a deadlock.
• Unsafe State: If the system did not follow the safe sequence of resource allocation from the
beginning and it is now in a situation, which may lead to a deadlock, then it is in an unsafe state.
• Deadlock State: If the system has some circular wait condition existing for some processes,
then it is in deadlock state.
When a process requests an available resource, system must decide if immediate allocation
leaves the system in a safe state
System is in safe state if there exists a sequence <P1, P2, …, Pn> of ALL the processes is the
systems such that for each Pi, the resources that Pi can still request can be satisfied by currently
available resources + resources held by all the Pj, with j < I
That is:
If Pi resource needs are not immediately available, then Pi can wait until all Pj have
finished
When Pj is finished, Pi can obtain needed resources, execute, return allocated resources,
and terminate
When Pi terminates, Pi +1 can obtain its needed resources, and so on
Claim edge Pi Rj indicates that process Pj may request resource Rj; and is
represented by a dashed line
Claim edge converts to request edge when a process requests a resource
Request edge is converted to an assignment edge when the resource is allocated to
the process
When a resource is released by a process, assignment edge is reconverted to a claim
edge
Resources must be claimed a priori in the system
(a)Resource-Allocation Graph
The most famous deadlock avoidance algorithm, from Dijkstra [1965], is the Banker‘s algorithm.
It is named as Banker‘s algorithm because the process is analogous to that used by a banker in
deciding if a loan can be safely made a not.
The Banker‘s Algorithm is based on the banking system, which never allocates its available cash
in such a manner that it can no longer satisfy the needs of all its customers. Here we must have
the advance knowledge of the maximum possible claims for each process, which is limited by
the resource availability. During the run of the system we should keep monitoring the resource
allocation status to ensure that no circular wait condition can exist.
If the necessary conditions for a deadlock are in place, it is still possible to avoid deadlock by
being careful when resources are allocated. The following are the features that are to be
considered for avoidance of the deadlock s per the Banker‘s Algorithms.
Each process declares maximum number of resources of each type that it may need.
Keep the system in a safe state in which we can allocate resources to each process in
some order and avoid deadlock.
Check for the safe state by finding a safe sequence: <P1, P2, ..., Pn> where resources that
Pi needs can be satisfied by available resources plus resources held by Pj where j < i.
Max: n x m matrix. If Max [i,j] = k, then process Pi may request at most k instances
of resource type Rj
Safety Algorithm
Request = request vector for process Pi. If Requesti [j] = k then process Pi wants k instances
of resource type Rj
1. If Requesti Needi go to step 2. Otherwise, raise error condition, since process has
exceeded its maximum claim
2. If Requesti Available, go to step 3. Otherwise Pi must wait, since resources are not
available
3. Pretend to allocate requested resources to Pi by modifying the state as follows:
Available = Available – Request;
Allocationi = Allocationi + Requesti;
Needi = Needi – Requesti;
If safe the resources are allocated to Pi
If unsafe Pi must wait, and the old resource-allocation state is restored
There are some problems with the Banker‘s algorithm as given below:
A B C D A B C D A B C D
P0 0 2 1 2 0 3 2 2 2 5 3 2
P1 1 1 0 2 2 7 5 2
P2 2 2 5 4 2 3 7 6
P3 0 3 1 2 1 6 4 2
P4 2 4 1 4 3 6 5 8
Solution:
(i) The content of Need Matrix: Need = Max - Allocation
Process Need
A B C D
P0 0 1 1 0
P1 1 6 5 0
P2 0 1 2 2
P3 1 3 3 0
P4 1 2 4 4
(ii) To check whether system is in safe state: Safe sequence is calculated as follows:
Process P0:
Need = <0,1,1,0> and Available = <2,5,3,2>
Need[i,j]≤Available[i,j], then resources are allocated to P0.
Process P0 executes safely.
Available becomes <2,5,3,2> + <0,2,1,2> = <2,7,4,4>
Process P1:
Need = <0,6,5,0> and Available = <2,7,4,4> Need[i,j]>Available[i,j], then resources cannot be
allocated to P1.
Process P2:
Need = <0,1,2,2> and Available = <2,7,4,4>
Need[i,j]≤Available[i,j], then resources are allocated to P2.
Process P2 executes safely.
Available becomes <2,7,4,4> + <2,2,5,4> = <4,9,9,8>
Process P3:
Need = <1,3,3,0> and Available = <4,9,9,8>
Need[i,j]≤Available[i,j], then resources are allocated to P3.
Process P3 executes safely.
Available becomes <4,9,9,8> + <0,3,1,2> = <4,12,10,10>
Process P4:
Need = <1,2,4,4> and Available = <4,12,10,10>
Need[i,j]≤Available[i,j], then resources are allocated to P4.
Process P4 executes safely.
Available becomes <4,12,10,10> + <2,4,1,4> = <6,16,11,14>
Process P1:
(iii) If a request from process P1 arrives for (1,3,2,1): Need = <1,3,2,1> and Available =
<2,5,3,2> Need[i,j]≤Available[i,j], then the request be granted immediately. Hence, the system is
in safe state.
Safe sequence is <P0, P1,P2,P3,P4 >
Now, we can execute either process P1 first and then P3 or vice versa.
Consider Process P1,
<Need> = <0, 7, 5>
<Available> = <2, 8, 8>
Need < Available.
Hence P1 can be executed safely.
Now, the available resources become <Available>+ <Allocation> of P1
<Available>=<2, 8, 8> + <1, 0, 0> = <3, 8, 8>
Hence, the system is in Safe State if processes are executed in the order <P0, P2, P1,
P3> or <P0, P2, P3, P1>
Detection of deadlocks is the most practical policy, which being both liberal and cost efficient,
most operating systems deploy. To detect a deadlock, we must go about in a recursive manner
and simulate the most favored execution of each unblocked process.
An unblocked process may acquire all the needed resources and will execute.
It will then release all the acquired resources and remain dormant thereafter.
The now released resources may wake up some previously blocked process.
Continue the above steps as long as possible.
If any blocked processes remain, they are deadlocked.
In this approach we terminate deadlocked processes in a systematic way taking into account their
priorities. The moment, enough processes are terminated to recover from deadlock, we stop the
process terminations. Though the policy is simple, there are some problems associated with it.
Consider the scenario where a process is in the state of updating a data file and it is terminated.
The file may be left in an incorrect state by the unexpected termination of the updating process.
Further, processes should be terminated based on some criterion/policy. Some of the criteria may
be as follows:
• Priority of a process
• CPU time used and expected usage before completion
• Number and type of resources being used (can they be preempted easily?)
• Number of resources needed for completion
• Number of processes needed to be terminated
• Are the processes interactive or batch?
If a deadlock is detected, one or more processes are restarted from their last checkpoint.
Restarting a process from a checkpoint is called rollback. It is done with the expectation that the
resource requests will not interleave again to produce deadlock.
Deadlock recovery is generally used when deadlocks are rare, and the cost of recovery (process
termination or rollback) is low.
Process checkpointing can also be used to improve reliability (long running computations), assist
in process migration, or reduce startup costs.
CHAPTER 5
MEMORY MANAGEMENT
5.1 Introduction
Memory is central to the operation of a modern computer system.
Memory is a large array of words or bytes, each with its own address.
Memory management is the functionality of an operating system which handles or
manages primary memory.
Memory management keeps track of each and every memory location either it is
allocated to some process or it is free.
It checks how much memory is to be allocated to processes.
It decides which process will get memory at what time.
It tracks whenever some memory gets freed or unallocated and correspondingly it updates
the status.
Memory management provides protection by using two registers, a base register and a
limit register.
The base register holds the smallest legal physical memory address and the limit
register specifies the size of the range. For example, if the base register holds 300000
and the limit register is 1209000, then the program can legally access all addresses from
300000 through 411999.
5.6 Swapping
A process, can be swapped temporarily out of memory to a backing store, and then
brought back into memory for continued execution.
Backing store is a usually a hard disk drive or any other secondary storage which fast in
access and large enough to accommodate copies of all memory images for all users. It
must be capable of providing direct access to these memory images.
Assume a multiprogramming environment with a round robin CPU-scheduling algorithm.
When a quantum expires, the memory manager will start to swap out the process that just
finished, and to swap in another process to the memory space that has been freed as
shown in figure. When each process finishes its quantum, it will be swapped with another
process.
A variant of this swapping policy is used for priority - based scheduling algorithms. If a
higher - priority process arrives and wants service, the memory manager can swap out the
lower - priority process so that it can load and execute the higher priority process. When
the higher priority process finishes, the lower – priority process can be swapped back in
and continued. This variant of swapping is sometimes called rollout, roll in.
A process that is swapped out will be swapped back into the same memory space that it
occupies previously.
If binding is done at assembly or load time, then the process cannot be moved to different
location. If execution-time binding is being used, then it is possible to swap a process into
a different memory space.
Major time consuming part of swapping is transfer time. Total transfer time is directly
proportional to the amount of memory swapped.
Let us assume that the user process is of size 100KB and the backing store is a standard
hard disk with transfer rate of 1 MB per second. The actual transfer of the 100K process
to or from memory will take
100KB / 1000KB per second
= 1/10 second
= 100 milliseconds
5.7 Contiguous Memory Allocation
One approach to memory management is to load each process into a contiguous space.
The operating system is allocated space first, usually at either low or high memory
locations, and then the remaining available memory is allocated to processes as needed.
( The OS is usually loaded low, because that is where the interrupt vectors are located,
but on older systems part of the OS was loaded high to make more room in low memory (
within the 640K barrier ) for user processes. )
5.7.1 Memory Protection (was Memory Mapping and Protection)
The system shown in Figure below allows protection against user programs accessing
areas that they should not, allows programs to be relocated to different memory starting
addresses as needed, and allows the memory space devoted to the OS to grow or shrink
dynamically as needs change.
Example: Given five memory partitions of 100kB, 500kB, 200kB, 300kB, and 600kB (in
order), how would the first-fit, best-fit, and worst-fit algorithms place processes of
212kB, 417kB, 112kB, and 426kB (in order)? Which algorithm makes the most efficient
use of memory?
Solution:
The total memory size that is not used for each algorithm (Fragmentation):
First-fit = 1700 – 741 = 959
Best-fit = 1700 – 1167 = 533
Worst-fit = 1700 – 741 = 959
5.7.3 Fragmentation
All the memory allocation strategies suffer from external fragmentation, though first and
best fits experience the problems more so than worst fit. External fragmentation means
that the available memory is broken up into lots of little pieces, none of which is big
enough to satisfy the next memory requirement, although the sum total could.
The amount of memory lost to fragmentation may vary with algorithm, usage patterns,
and some design decisions such as which end of a hole to allocate and which end to save
on the free list.
Statistical analysis of first fit, for example, shows that for N blocks of allocated memory,
another 0.5 N will be lost to fragmentation.
Internal fragmentation also occurs, with all memory allocation strategies. This is caused
by the fact that memory is allocated in blocks of a fixed size, whereas the actual memory
needed will rarely be that exact size. For a random distribution of memory requests, on
the average 1/2 block will be wasted per memory request, because on the average the last
allocated block will be only half full.
o Note that the same effect happens with hard drives, and that modern hardware
gives us increasingly larger drives and memory at the expense of ever larger block
sizes, which translates to more memory lost to internal fragmentation.
o Some systems use variable size blocks to minimize losses due to internal
fragmentation.
If the programs in memory are relocatable, ( using execution-time address binding ), then
the external fragmentation problem can be reduced via compaction, i.e. moving all
processes down to one end of physical memory. This only involves updating the
relocation register for each process, as all internal work is done using logical addresses.
Another solution as we will see in upcoming sections is to allow processes to use non-
contiguous blocks of physical memory, with a separate relocation register for each block.
5.8 Paging
External fragmentation is avoided by using paging technique.
The basic idea behind paging is to divide physical memory into a number of equal
sized blocks called frames, and to divide programs logical memory space into blocks
of the same size called pages.
When a process is to be executed, its corresponding pages are loaded into any available
memory frames.
Logical address space of a process can be non-contiguous and a process is allocated
physical memory whenever the free memory frame is available.
Operating system keeps track of all free frames. Operating system needs n free frames to
run a program of size n pages.
The page table is used to look up what frame a particular page is stored in at the moment.
In the following example, for instance, page 2 of the program's logical memory is
currently stored in frame 3 of physical memory:
Address generated by CPU is divided into
Page number (p) -- page number is used as an index into a page table which
contains base address of each page in physical memory.
Page offset (d) -- page offset is combined with base address to define the physical
memory address.
The page table maps the page number to a frame number, to yield a physical address
which also has two parts: The frame number and the offset within that frame. The number
of bits in the frame number determines how many frames the system can address, and the
number of bits in the offset determines the size of each frame.
Page numbers, frame numbers, and frame sizes are determined by the architecture, but
are typically powers of two, allowing addresses to be split at a certain number of bits. For
example, if the logical address size is 2m and the page size is 2n, then the high-order m-n
bits of a logical address designate the page number and the remaining n bits represent the
offset.
Note also that the number of bits in the page number and the number of bits in the frame
number do not have to be identical. The former determines the address range of the
logical address space, and the latter relates to the physical address space.
For a ‗m‘ bit processor, the logical address will be m bits long. Let the page size be 2n
bytes.
Then, the lower order ‗n‘ bits of a logical address L will represent the page offset (d) and
the higher order ‗(m-n)‘ bits will represent the page number (p).
Then, .
Let ‗f‘ be the frame number that holds the page referenced by logical address L.
Then, f can be obtained by indexing into page-table by using page number p as index i.e.,
.
Corresponding physical address = ∗
Example: A 16 bit computer is implementing the paging scheme. The page size is of 4096
bytes. The page table for process A is as follows:
Page Number Frame Number
0 7
1 2
2 5
3 1
4 12
5 6
6 0
Convert the following logical addresses into corresponding physical addresses:
(i) 3720
(ii) 7512
(iii) 22340
(iv) 17510
(v) 11225
Solution:
(i) L = 3720 (ii) L = 7512
⸫f=7 ⸫f=2
⸫f=6 ⸫ f = 12
(v) L = 11225
⸫f=5
physical address
= ∗
∗
5.9 Segmentation
Segmentation is a technique to break memory into logical pieces where each piece
represents a group of related information.
For example, data segments or code segment for each process, data segment for operating
system and so on.
Segmentation can be implemented using or without using paging.
Unlike paging, segments are having varying sizes and thus eliminate internal
fragmentation. External fragmentation still exists but to lesser extent.
Example 3: On a simple paging system with 224 bytes of a physical memory, 256 pages of
logical address space, and a page size of 210 bytes,
1. How many bytes are in a page frame?
2. How many bits in the physical address specify the page frame?
3. How many entries are in the page table?
4. How many bits are in a logical address?
Solution:
1. A frame is where a page can be mapped into memory, so a frame has to be the same size
as a page - 210 bytes.
2. You have 24 bits of physical address, and a frame is 210 big, so that leaves 14 of the bits
(24 - 10) for the frame's base address.
3. The page table is the full list of pages, whether mapped or unmapped - so there are 256
entries in the page table, since there are 256 pages.
4. That's the page address bits plus the number of pages bits. The upper portion of an
address is the page number (8 bits, since 28=256), and the lower portion is the offset
within that address (10 bits), so the whole address size is 18 bits. Total logical address
space = 218 bytes.
Virtual memory is a technique that allows the execution of processes which are not
completely available in memory. The main visible advantage of this scheme is that programs
can be larger than physical memory. Virtual memory is the separation of user logical memory
from physical memory.
This separation allows an extremely large virtual memory to be provided for programmers when
only a smaller physical memory is available. Following are the situations, when entire program is
not required to be loaded fully in main memory.
User written error handling routines are used only when an error occurred in the data or
computation.
Certain options and features of a program may be used rarely.
Many tables are assigned a fixed amount of address space even though only a small
amount of the table is actually used.
The ability to execute a program that is only partially in memory would counter many
benefits.
Less number of I/O would be needed to load or swap each user program into memory.
A program would no longer be constrained by the amount of physical memory that is
available.
Each user program could take less physical memory, and more programs could be run the
same time, with a corresponding increase in CPU utilization and throughput.
Virtual memory is commonly implemented by demand paging. It can also be implemented in a
segmentation system. Demand segmentation can also be used to provide virtual memory.
5.11 Demand Paging
A demand paging system is quite similar to a paging system with swapping. When we want to
execute a process, we swap it into memory. Rather than swapping the entire process into
memory, however, we use a lazy swapper called pager.
When a process is to be swapped in, the pager guess pages which will be used before the process
is swapped out again. Instead of swapping in a whole process, the pager brings only those
necessary pages into memory. Thus, it avoids reading into memory pages that will not be used in
anyway, decreasing the swap time and the amount of physical memory needed.
Hardware support is required to distinguish between those pages that are in memory and those
pages that are on the disk using the valid-invalid bit scheme, where valid and invalid pages can
be checked by checking the bit. Marking a page will have no effect if the process never attempts
to access the page. While the process executes and accesses pages that are memory resident,
execution proceeds normally.
Access to a page marked invalid causes a page-fault trap. This trap is the result of the operating
system's failure to bring the desired page into memory. But page fault can be handled as
following
Step Description
Check an internal table for this process, to determine whether the reference
Step 1
was a valid or it was an invalid memory access.
If the reference was invalid, terminate the process. If it was valid, but page
Step 2
have not yet brought in, page in the latter.
Step 3 Find a free frame.
Schedule a disk operation to read the desired page into the newly allocated
Step 4
frame.
When the disk read is complete, modify the internal table kept with the
Step 5
process and the page table to indicate that the page is now in memory.
Restart the instruction that was interrupted by the illegal address trap. The
process can now access the page as though it had always been in memory.
Step 6
Therefore, the operating system reads the desired page into memory and
restarts the process as though the page had always been in memory.
Advantages
Following are the advantages of Demand Paging
Large virtual memory.
More efficient use of memory.
Unconstrained multiprogramming. There is no limit on degree of multiprogramming.
Disadvantages
Following are the disadvantages of Demand Paging
Number of tables and amount of processor overhead for handling page interrupts are
greater than in the case of the simple paged management techniques.
Due to the lack of explicit constraints on jobs address space size increases.
Page Faults = 15
Hits = 5
Although FIFO is simple and easy, it is not always optimal, or even efficient.
An interesting effect that can occur with FIFO is Belady's anomaly, in which increasing
the number of frames available can actually increase the number of page faults that
occur! Consider, for example, the following chart based on the page sequence ( 1, 2, 3, 4,
1, 2, 5, 1, 2, 3, 4, 5 ) and a varying number of available frames. Obviously the maximum
number of faults is 12 ( every request generates a fault ), and the minimum number is 5 (
each page loaded only once ), but in between there are some interesting results:
Optimal Page algorithm
An optimal page-replacement algorithm has the lowest page-fault rate of all algorithms.
An optimal page-replacement algorithm exists, and has been called OPT or MIN.
Replace the page that will not be used for the longest period of time. Use the time when a
page is to be used.
Page Faults = 9
Hits = 11
Least Recently Used (LRU) algorithm
Page which has not been used for the longest time in main memory is the one which will
be selected for replacement.
Easy to implement, keep a list, replace pages by looking back into time.
Page Faults = 12
Hits = 8
Calculate page faults and Hits using FIFO, LRU and Optimal page replacement algorithm for the
following page sequence (2, 3, 5, 4, 2, 5, 7, 3, 8, 7). Assume page frame size is 3.
FIFO:
Page Sequence 2 3 5 4 2 5 7 3 8 7
Frame 1 2 2 2 4 4 4 4 3 3 3
Frame 2 3 3 3 2 2 2 2 8 8
Frame 3 5 5 5 5 7 7 7 7
F F F F F H F F F H
Page Faults = 8,
Hits = 2
LRU:
Page Sequence 2 3 5 4 2 5 7 3 8 7
Frame 1 2 2 2 4 4 4 7 7 7 7
Frame 2 3 3 3 2 2 2 3 3 3
Frame 3 5 5 5 5 5 5 8 8
F F F F F H F F F H
Page Faults = 8,
Hits = 2
OPT:
Page Sequence 2 3 5 4 2 5 7 3 8 7
Frame 1 2 2 2 2 2 2 7 7 7 7
Frame 2 3 3 4 4 4 4 3 8 8
Frame 3 5 5 5 5 5 5 5 5
F F F F H H F F F H
Page Faults = 7,
Hits = 3
5.13 Thrashing
If a process cannot maintain its minimum required number of frames, then it must be
swapped out, freeing up frames for other processes. This is an intermediate level of CPU
scheduling.
But what about a process that can keep its minimum, but cannot keep all of the frames
that it is currently using on a regular basis? In this case it is forced to page out pages that
it will need again in the very near future, leading to large numbers of page faults.
A process that is spending more time paging than executing is said to be thrashing.
Cause of Thrashing
Early process scheduling schemes would control the level of multiprogramming allowed
based on CPU utilization, adding in more processes when CPU utilization was low.
The problem is that when memory filled up and processes started spending lots of time
waiting for their pages to page in, then CPU utilization would lower, causing the schedule
to add in even more processes and exacerbating the problem! Eventually the system
would essentially grind to a halt.
Local page replacement policies can prevent one thrashing process from taking pages
away from other processes, but it still tends to clog up the I/O queue, thereby slowing
down any other process that needs to do even a little bit of paging (or any other I/O for
that matter. )
CHAPTER 6
FILE MANAGEMENT
File
A file is a named collection of related information that is recorded on secondary storage
such as magnetic disks, magnetic tapes and optical disks.
In general, a file is a sequence of bits, bytes, lines or records whose meaning is defined
by the files creator and user.
Ordinary files
These are the files that contain user information.
These may have text, databases or executable program.
The user can apply various operations on such files like add, modify, delete or even
remove the entire file.
Directory files
These files contain list of file names and other information related to these files.
Special files:
These files are also known as device files.
These files represent physical device like disks, terminals, printers, networks, tape
drive etc.
These files are of two types
1. Character special files - data is handled character by character as in case of
terminals or printers.
2. Block special files - data is handled in blocks as in the case of disks and
tapes.
6.5 File Access Mechanisms
File access mechanism refers to the manner in which the records of a file may be accessed.
There are several ways to access files
Sequential access
Direct/Random access
Indexed sequential access
Sequential access
A sequential access is that in which the records are accessed in some sequence i.e. the
information in the file is processed in order, one record after the other.
This access method is the most primitive one.
Example: Compilers usually access files in this fashion.
A sequential access file emulates magnetic tape operation, and generally supports a few
operations:
read next - read a record and advance the tape to the next position.
write next - write a record and advance the tape to the next position.
rewind
skip N records - May or may not be supported. N may be limited to positive numbers, or
may be limited to +/- 1.
Direct/Random access
Random access file organization provides, accessing the records directly.
Each record has its own address on the file with by the help of which it can be directly
accessed for reading or writing.
The records need not be in any sequence within the file and they need not be in adjacent
locations on the storage medium.
Jump to any record and read that record. Operations supported include:
read n - read record number n. ( Note an argument is now required. )
write n - write record number n. ( Note an argument is now required. )
jump to record n - could be 0 or the end of file.
Query current record - used to return back to this record later.
Sequential access can be easily emulated using direct access. The inverse is complicated
and inefficient.
Contiguous Allocation
Each file occupies a contiguous address space on disk.
Assigned disk address is in linear order.
Easy to implement.
External fragmentation is a major issue with this type of allocation technique.
Linked Allocation
Each file carries a list of links to disk blocks.
Directory contains link / pointer to first block of a file.
No external fragmentation
Effectively used in sequential access file.
Inefficient in case of direct access file.
Indexed Allocation
Provides solutions to problems of contiguous and linked allocation.
A index block is created having all pointers to files.
Each file has its own index block which stores the addresses of disk space occupied by the file.
Directory contains the addresses of index blocks of files.
6.9 Protection
Files must be kept safe for reliability (against accidental damage ), and protection
( against deliberate malicious access. ) The former is usually managed with backup
copies. This section discusses the latter.
One simple protection scheme is to remove all access to a file. However this makes the
file unusable, so some sort of controlled access must be arranged.
7.1 Overview
Management of I/O devices is a very important part of the operating system - so
important and so varied that entire I/O subsystems are devoted to its operation. (Consider
the range of devices on a modern computer, from mice, keyboards, disk drives, display
adapters, USB devices, network connections, audio I/O, printers, special devices for the
handicapped, and many special-purpose peripherals.)
I/O Subsystems must contend with two (conflicting?) trends: (1) The gravitation towards
standard interfaces for a wide range of devices, making it easier to add newly developed
devices to existing systems, and (2) the development of entirely new types of devices, for
which the existing standard interfaces are not always easy to apply.
Device drivers are modules that can be plugged into an OS to handle a particular device
or category of similar devices.
I/O Devices
External Devices that are used in I/O with computer systems can be roughly grouped into three
classes
Human readable: Suitable for communicating with the computer user. Eg.Display,
Keyboard, and perhaps other devices such as mouse
Machine Readable: Suitable for communication with electronic equipment Eg. disk
,tape drives, sensors, controllers, and actuators
Communication: Suitable for communication with remote devices. Eg. Digital line
drivers and modems.
One way of communicating with devices is through registers associated with each port.
Registers may be one to four bytes in size, and may typically include ( a subset of ) the
following four:
1. The data-in register is read by the host to get input from the device.
2. The data-out register is written by the host to send output.
3. The status register has bits read by the host to ascertain the status of the device,
such as idle, ready for input, busy, error, transaction complete, etc.
4. The control register has bits written by the host to issue commands or to change
settings of the device such as parity checking, word length, or full- versus half-
duplex operation.
Another technique for communicating with devices is memory-mapped I/O.
In this case a certain portion of the processor's address space is mapped to the device,
and communications occur by reading and writing directly to/from those memory
areas.
Memory-mapped I/O is suitable for devices which must move large quantities of data
quickly, such as graphics cards.
Memory-mapped I/O can be used either instead of or more often in combination with
traditional registers. For example, graphics cards still use registers for control
information such as setting the video mode.
A potential problem exists with memory-mapped I/O, if a process is allowed to write
directly to the address space used by a memory-mapped I/O device.
7.2.1 Polling
One simple means of device handshaking involves polling:
1. The host repeatedly checks the busy bit on the device until it becomes clear.
2. The host writes a byte of data into the data-out register, and sets the write bit in
the command register ( in either order. )
3. The host sets the command ready bit in the command register to notify the device
of the pending command.
4. When the device controller sees the command-ready bit set, it first sets the busy
bit.
5. Then the device controller reads the command register, sees the write bit set,
reads the byte of data from the data-out register, and outputs the byte of data.
6. The device controller then clears the error bit in the status register, the command-
ready bit, and finally clears the busy bit, signaling the completion of the
operation.
Polling can be very fast and efficient, if both the device and the controller are fast and if
there is significant data to transfer. It becomes inefficient, however, if the host must wait
a long time in the busy loop waiting for the device, or if frequent checks need to be made
for data that is infrequently there.
7.2.2 Interrupts
Interrupts allow devices to notify the CPU when they have data to transfer or when an
operation is complete, allowing the CPU to perform other duties when no I/O transfers
need its immediate attention.
The CPU has an interrupt-request line that is sensed after every instruction.
o A device's controller raises an interrupt by asserting a signal on the interrupt
request line.
o The CPU then performs a state save, and transfers control to the interrupt handler
routine at a fixed address in memory. ( The CPU catches the interrupt and
dispatches the interrupt handler. )
o The interrupt handler determines the cause of the interrupt, performs the
necessary processing, performs a state restore, and executes a return from
interrupt instruction to return control to the CPU. ( The interrupt handler clears
the interrupt by servicing the device. )
( Note that the state restored does not need to be the same state as the one
that was saved when the interrupt went off. See below for an example
involving time-slicing. )
Figure 7.3 illustrates the interrupt-driven I/O procedure:
At boot time the system determines which devices are present, and loads the appropriate
handler addresses into the interrupt table.
During operation, devices signal errors or the completion of commands via interrupts.
Exceptions, such as dividing by zero, invalid memory accesses, or attempts to access
kernel mode instructions can be signaled via interrupts.
Time slicing and context switches can also be implemented using the interrupt
mechanism.
o The scheduler sets a hardware timer before transferring control over to a user
process.
o When the timer raises the interrupt request line, the CPU performs a state-save,
and transfers control over to the proper interrupt handler, which in turn runs the
scheduler.
o The scheduler does a state-restore of a different process before resetting the timer
and issuing the return-from-interrupt instruction.
A similar example involves the paging system for virtual memory - A page fault causes
an interrupt, which in turn issues an I/O request and a context switch as described above,
moving the interrupted process into the wait queue and selecting a different process to
run. When the I/O request has completed ( i.e. when the requested page has been loaded
up into physical memory ), then the device interrupts, and the interrupt handler moves the
process from the wait queue into the ready queue, ( or depending on scheduling
algorithms and policies, may go ahead and context switch it back onto the CPU. )
System calls are implemented via software interrupts, a.k.a. traps. When a ( library )
program needs work performed in kernel mode, it sets command information and
possibly data addresses in certain registers, and then raises a software interrupt. ( E.g. 21
hex in DOS. ) The system does a state save and then calls on the proper interrupt handler
to process the request in kernel mode. Software interrupts generally have low priority, as
they are not as urgent as devices with limited buffering space.
Interrupts are also used to control kernel operations, and to schedule activities for optimal
performance. For example, the completion of a disk read operation involves two
interrupts:
o A high-priority interrupt acknowledges the device completion, and issues the next
disk request so that the hardware does not sit idle.
o A lower-priority interrupt transfers the data from the kernel memory space to the
user space, and then transfers the process from the waiting queue to the ready
queue.
The Solaris OS uses a multi-threaded kernel and priority threads to assign different
threads to different interrupt handlers. This allows for the "simultaneous" handling of
multiple interrupts, and the assurance that high-priority interrupts will take precedence
over low-priority ones and over user processes.
For stream oriented I/O, the single buffering scheme can be used in a line- at- a time Fashion or a
byte-at-a-time fashion. In line-at-a-time fashion user input and output to the terminal is one line
at a time. For eg. Scroll-mode terminals, line printer.
Suppose that T is required to input one block and that C is the computation time.
Without buffering, the execution time per block is essentially T+C. With a single buffer, the
time is max[C, T] +M, Where M is the time required to move the data from the system buffer to
user memory.
For block-oriented transfer, we can roughly estimate the execution time as max[C,T].In both
cases(C<= T and C>T) an improvement over single buffering is achieved. Again, this
improvement comes at the cost of increased complexity. For stream-oriented input, there are two
alternative modes of operation. For line-at-a time I/O, the user process need not be suspended for
input or output, unless the process runs ahead of the double buffers. For byte-at-a time operation,
the double buffer offers no particular advantage over a single buffer of twice the length.
Seek time
Seek time is the time required to move the disk arm to the required track. The seek time consists
of two key components: the initial startup time, and the time taken to traverse the tracks. The
traversal time is not a linear function of the number of tracks but includes a startup time and a
settling time.
Rotational delay
Magnetic disks, other than floppy disks, have rotational speeds in the range 400 to 10,000 rpm.
Floppy disks typically rotate at between 300 to 600rpm. Thus the average delay will be between
100 and 200 ms.
Transfer Time
The transfer time to or from the disk depends on the rotation speed of the disk in the following
fashion:
T=b/rN
Where
T=transfer time
B= number bytes to be transferred
N = number of bytes on a track
R= rotation speed, in revolutions per second
Thus the total average access time can be expressed as Ta=Ts+1/2r+b/rN
Where Ts is the average seek time
These algorithms are not hard to understand, but they can confuse someone because they are
so similar. What we are striving for by using these algorithms is keeping Head Movements
(no. of tracks) to the least amount as possible. The less the head has to move the faster the
seek time will be. I will show you and explain to you why C-LOOK is the best algorithm to
use in trying to establish less seek time.
Given the following queue -- 95, 180, 34, 119, 11, 123, 62, 64 with the Read-write head initially
at the track 50 and the tail track being at 199 let us now discuss the different algorithms.
1. First Come -First Serve (FCFS)
All incoming requests are placed at the end of the queue. Whatever number that is next in the
queue will be the next number served. Using this algorithm doesn't provide the best results. To
determine the number of head movements you would simply find the number of tracks it took to
move from one request to the next. For this case it went from 50 to 95 to 180 and so on. From 50
to 95 it moved 45 tracks. If you tally up the total number of tracks you will find how many tracks
it had to go through before finishing the entire request. In this example, it had a total head
movement of 640 tracks. The disadvantage of this algorithm is noted by the oscillation from track
50 to track 180 and then back to track 11 to 123 then to 64. As you will soon see, this is the
worse algorithm that one can use.
3. Elevator (SCAN)
This approach works like an elevator does. It scans down towards the nearest end and then when
it hits the bottom it scans up servicing the requests that it didn't get going down. If a request
comes in after it has been scanned it will not be serviced until the process comes back down or
moves back up. This process moved a total of 230 tracks. Once again this is more optimal than
the previous algorithm, but it is not the best.
Circular scanning works just like the elevator to some extent. It begins its scan toward the
nearest end and works it way all the way to the end of the system. Once it hits the bottom or top
it jumps to the other end and moves in the same direction. Keep in mind that the huge jump
doesn't count as a head movement. The total head movement for this algorithm is only 187 track,
but still this isn't the most sufficient.
5. C-LOOK
This is just an enhanced version of C-SCAN. In this the scanning doesn't go past the last
request in the direction that it is moving. It too jumps to the other end but not all the way to
the end. Just to the furthest request. C-SCAN had a total movement of 187 but this scan (C-
LOOK) reduced it down to 157 tracks.
Fixed Blocking
• Fixed-length records are used, and an integral number of records are stored in a block
• Unused space at the end of a block is internal fragmentation
• Common for sequential files with fixed-length records
Ext3
Ext3 stands for third extended file system.
It was introduced in 2001. Developed by Stephen Tweedie.
Starting from Linux Kernel 2.4.15 ext3 was available.
The main benefit of ext3 is that it allows journaling.
Journaling has a dedicated area in the file system, where all the changes are tracked.
When the system crashes, the possibility of file system corruption is less because of
journaling.
Maximum individual file size can be from 16 GB to 2 TB
Overall ext3 file system size can be from 2 TB to 32 TB
There are three types of journaling available in ext3 file system.
Journal – Metadata and content are saved in the journal.
Ordered – Only metadata is saved in the journal. Metadata are journaled only after
writing the content to disk. This is the default.
Writeback – Only metadata is saved in the journal. Metadata might be journaled
either before or after the content is written to the disk.
You can convert a ext2 file system to ext3 file system directly (without backup/restore).
Ext4
Ext4 stands for fourth extended file system.
It was introduced in 2008.
Starting from Linux Kernel 2.6.19 ext4 was available.
Supports huge individual file size and overall file system size.
Maximum individual file size can be from 16 GB to 16 TB
Overall maximum ext4 file system size is 1 EB (exabyte). 1 EB = 1024 PB (petabyte). 1
PB = 1024 TB (terabyte).
Directory can contain a maximum of 64,000 subdirectories (as opposed to 32,000 in ext3)
You can also mount an existing ext3 fs as ext4 fs (without having to upgrade it).
Several other new features are introduced in ext4: multiblock allocation, delayed
allocation, journal checksum. fast fsck, etc. All you need to know is that these new
features have improved the performance and reliability of the filesystem when compared
to ext3.
In ext4, you also have the option of turning the journaling feature ―off‖.
ReiserFS offered features that had not been available in existing Linux file systems:
Metadata-only journaling (also block journaling, since Linux 2.6.8), its most-publicized
advantage over what was the stock Linux file system at the time, ext2.
Online resizing (growth only), with or without an underlying volume manager such
as LVM. Since then, Namesys has also provided tools to resize (both grow and shrink)
ReiserFS file systems offline.
Tail packing, a scheme to reduce internal fragmentation. Tail packing, however, can
have a significant performance impact. Reiser4 may have improved this by packing tails
where it does not hurt performance
Btrfs
Btrfs is a new copy on write (CoW) filesystem for Linux aimed at implementing
advanced features while focusing on fault tolerance, repair and easy administration.
It addresses concerns regarding huge storage backend volumes, multi-device spanning,
snapshotting and more.
Although its primary target was enterprise usage, it also offers interesting features to
home users such as online grow/shrink (both on file system as well as underlying storage
level), object-level redundancy, transparent compression and cloning.
Xfs
The xfs file system is an enterprise-ready, high performance journaling file system. It
offers very high parallel throughput and is therefore a common choice amongst
enterprises.
Zfs
The zfs file system (ZFSonLinux) is a multi-featured file system offering block-level
checksumming, compression, snapshotting, copy-on-write, deduplication, extremely
large volumes, remote replication and more.
It has been recently ported from (Open) Solaris to Linux and is gaining ground.
Inodes In Unix
In a standard Unix file system, files are made up of two different types of objects. Every file has
an index node (inode for short) associated with it that contains the metadata about that file:
permissions, ownerships, timestamps, etc. The contents of the file are stored in a collection of
data blocks. At this point in the discussion, a lot of people just wave their hands and say
something like, "And there are pointers in the inode that link to the data blocks."
As it turns out, there are only fifteen block pointers in the inode. Assuming standard 4K data
blocks, that means that the largest possible file that could be addressed directly would be 60K--
obviously not nearly large enough. In fact, only the first 12 block pointers in the inode are
reserved for direct block pointers. This means you can address files of up to 48K just using the
direct pointers in the inode.
Beyond that, you start getting into indirect blocks:
The thirteenth pointer is the indirect block pointer. Once the file grows beyond 48K, the
file system grabs a data block and starts using it to store additional block pointers, setting
the thirteenth block pointer in the inode to the address of this block. Block pointers are 4-
byte quantities, so the indirect block can store 1024 of them. That means that the total file
size that can be addressed via the indirect block is 4MB (plus the 48K of storage
addressed by the direct blocks in the inode).
Once the file size grows beyond 4MB + 48KB, the file system starts using doubly
indirect blocks. The fourteenth block pointer points to a data block that contains the
addresses of other indirect blocks, which in turn contain the addresses of the actual data
blocks that make up the file's contents. That means we have up to 1024 indirect blocks
that in turn point to up to 1024 data blocks-- in other words up to 1M total 4K blocks, or
up to 4GB of storage.
At this point, you've probably figured out that the fifteenth inode pointer is the trebly
indirect block pointer. With three levels of indirect blocks, you can address up to 4TB
(+4GB from the doubly indirect pointer, +4M from the indirect block pointer, +48K from
the direct block pointers) for a single file.
Here's a picture to help you visualize what I'm talking about here:
Android is an open source and Linux-based operating system for mobile devices such as
Smartphone and tablet computers.
Android was developed by the Open Handset Alliance, led by Google, and other
companies.
Android offers a unified approach to application development for mobile devices which
means developers need to only develop for Android, and their applications should be able
to run on different devices powered by Android.
The first beta version of the Android Software Development Kit (SDK) was released by
Google in 2007 where as the first commercial version, Android 1.0, was released in
September 2008.
On June 27, 2012, at the Google I/O conference, Google announced the next Android
version, 4.1 Jelly Bean. Jelly Bean is an incremental update, with the primary aim of
improving the user interface, both in terms of functionality and performance.
The source code for Android is available under free and open source software licenses.
Google publishes most of the code under the Apache License version 2.0 and the rest,
Linux kernel changes, under the GNU General Public License version 2.
Android operating system is a stack of software components which is roughly divided into five
sections and four main layers as shown below in the architecture diagram.
Linux kernel
At the bottom of the layers is Linux kernel - Linux 2.6 with approximately 115 patches.
This provides basic system functionality like process management, memory management,
device management like camera, keypad, display etc.
Also, the kernel handles all the things that Linux is really good at such as networking and
a vast array of device drivers, which take the pain out of interfacing to peripheral
hardware.
Libraries
On top of Linux kernel there is a set of libraries including open-source Web browser
engine WebKit, well known library libc, SQLite database which is a useful repository for
storage and sharing of application data, libraries to play and record audio and video, SSL
libraries responsible for Internet security etc.
Android Runtime
This is the third section of the architecture and available on the second layer from the
bottom. This section provides a key component called Dalvik Virtual Machine which is
a kind of Java Virtual Machine specially designed and optimized for Android.
The Dalvik VM makes use of Linux core features like memory management and multi-
threading, which is intrinsic in the Java language. The Dalvik VM enables every Android
application to run in its own process, with its own instance of the Dalvik virtual machine.
The Android runtime also provides a set of core libraries which enable Android
application developers to write Android applications using standard Java programming
language.
Application Framework
The Application Framework layer provides many higher-level services to applications in
the form of Java classes. Application developers are allowed to make use of these
services in their applications.
Applications
You will find all the Android application at the top layer. You will write your application
to be installed on this layer only. Examples of such applications are Contacts Books,
Browser, and Games etc.
/boot
This is the boot partition of your Android device, as the name suggests. It includes the
Android kernel and the RAM disk. The device will not boot without this partition.
Wiping this partition from recovery should only be done if absolutely required and once
done, the device must NOT be rebooted before installing a new one, which can be done
by installing a ROM that includes a /boot partition.
/system
As the name suggests, this partition contains the entire Android OS, other than the kernel
and the RAM disk.
This includes the Android GUI and all the system applications that come pre-installed on
the device.
Wiping this partition will remove Android from the device without rendering it
unbootable, and you will still be able to put the phone into recovery or bootloader mode
to install a new ROM.
/recovery
This is specially designed for backup.
The recovery partition can be considered as an alternative boot partition, that lets the
device boot into a recovery console for performing advanced recovery and maintenance
operations on it.
/data
Again as the name suggest, it is called user data partition.
This partition contains the user‘s data like your contacts, sms, settings and all android
applications that you have installed.
While you are doing factory reset on your device, this partition will wipe out, then your
device will be in the state, when you use for the first time, or the way it was after the last
official or custom ROM installation.
/cache
This is the partition where Android stores frequently accessed data and app components.
Wiping the cache doesn‘t affect your personal data but simply gets rid of the existing data
there, which gets automatically rebuilt as you continue using the device.
/misc
This partition contains miscellaneous system settings in form of on/off switches.
These settings may include CID (Carrier or Region ID), USB configuration and certain
hardware settings etc.
This is an important partition and if it is corrupt or missing, several of the device‘s
features will not function normally.
/sdcard
This is not a partition on the internal memory of the device but rather the SD card.
In terms of usage, this is your storage space to use as you see fit, to store your media,
documents, ROMs etc. on it.
Wiping it is perfectly safe as long as you backup all the data you require from it, to your
computer first. Though several user-installed apps save their data and settings on the SD
card and wiping this partition will make you lose all that data.
On devices with both an internal and an external SD card – devices like the Samsung
Galaxy S and several tablets – the /sdcard partition is always used to refer to the internal
SD card.
For the external SD card – if present – an alternative partition is used, which differs from
device to device.
In case of Samsung Galaxy S series devices, it is /sdcard/sd while in many other devices,
it is /sdcard2.
Unlike /sdcard, no system or app data whatsoever is stored automatically on this external
SD card and everything present on it has been added there by the user.
You can safely wipe it after backing up any data from it that you need to save.
/sd-ext
This is not a standard Android partition, but has become popular in the custom ROM
scene.
It is basically an additional partition on your SD card that acts as the /data partition when
used with certain ROMs that have special features called APP2SD+ or data2ext enabled.
It is especially useful on devices with little internal memory allotted to the /data partition.
Thus, users who want to install more programs than the internal memory allows can
make this partition and use it with a custom ROM that supports this feature, to get
additional storage for installing their apps.
Wiping this partition is essentially the same as wiping the /data partition – you lose your
contacts, SMS, market apps and settings.
Now when you install a new binary, you can know what you‘re going to lose, make sure
to backup your data before flash new binary in your android device.
Mirroring provides reliability but is expensive; Striping improves performance, but does not
improve reliability. Accordingly there are a number of different schemes that combine the
principals of mirroring and striping in different ways, in order to balance reliability versus
performance versus cost. These are described by different RAID levels, as follows: (In the
diagram that follows, "C" indicates a copy, and "P" indicates parity, i.e. checksum bits.)