Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Introduction ................................................................................................................................1
What is an Operating System?.....................................................................................................1
History of Operating Systems......................................................................................................3
References ..................................................................................................................................8
Before we look at what an operating system does we ought to know what an operating
system is ?
If we just build a computer, using its basic physical components, then we end up with a
lot of assembled metal, plastic and silicon. In this state the computer is useless but could still go
on display as a piece of modern art! To turn it into one of the most useful tools known to man we
need software. We need applications that allow us to write letters, write software, perform
numerical modeling, calculate cash flow forecasts etc etc.
But, if all we have are just the applications, then each programmer has to deal with the
complexities of the hardware. If a program requires data from a disc, the programmer would need
to know how every type of disc worked and then be able to program at a low level in order to
extract the data. In addition, the programmer would have to deal with all the error conditions that
could arise. For example, it is a lot easier for a programmer to say READ NEXT RECORD than
have to worry about: spinning the motor up, moving the read/write heads, waiting for the correct
sector to come around and then reading the data, and eventually de-spinning the motor when it
has not been used for a while (N.B. this is a very simplified view of what really happens).
It was clear, from an early stage in the development of computers, that there needed to be
a “layer of software” that sat between the hardware and the software, to hide the user from such
complexities, and to hide the ‘breakable’ parts of the computer from human error or stupidity. An
example of stupidity, or human error, might be to send instructions to a motor to spin up and then
de-spin a disk drive several times per second. This would not be a good thing to let users
knowingly, or unknowingly, do as the motor might burn out?
All this had led to this following “layered” view of the computer…..
Machine Language
Microprogramming
Hardware
Physical Devices
1) The bottom layer of the hardware consists of the integrated circuits, the cathode ray tubes, wires and
everything else that electrical engineers use to build physical devices such as CDROM drives, keyboard
etc
2) The next layer of hardware, microprogramming, is actually software providing basic operations that
allow communication with the physical devices. This software is normally in Read Only Memory (ROM)
and hence can also be known as firmware instead because you cannot change this.
3) The machine language layer defines the instruction set that is available to the computer. Again, this
layer is really software but it is considered as hardware as the machine language will form part of the
package supplied by the hardware manufacturer.
4) The operating system is a layer between this hardware and the software that we use. It allows us
programmers to access the hardware in a user friendly way. Furthermore, it allows a layer of abstraction
from the hardware e.g. we can issue a print command without worrying about what or how the printer is
physically connected to the computer. In fact, we can even have different operating systems running on the
same hardware (e.g. DOS, Windows and UNIX) so that we may utilize the hardware using an operating
system that suits us.
5) On top of the operating system sits the system software. This is the software that allows us to start doing
practical things with the computer, but does not directly allow us to use the computer for anything that is
useful in the real world (this is a broad statement which we could argue about but, system software really
only allows us to use the computer more effectively).
It is important to realize that system software is not part of the operating system. However, much
system software is supplied by the computer manufacturer that provided the operating system. But, system
software can be written by programmers and many large companies have teams of programmers (called
“system programmers”) who’s main aim is to make the operating system easier to use for other people
(typically programmers) in the organization.
The main difference between the operating system and system software is that the operating
system runs in kernel (or supervisor) mode - system software and applications run in user mode. This
means that the operating system stops the user programs directly accessing the hardware. Hence for safety
6) Finally, at the top level we have the application programs that, at last, allow us to do something really
useful e.g. word processors.
(1) One view considers the operating system as a resource manager. In this view the operating
system is seen as a way of providing the users of the computer with the resources they need at any given
time. Some of these resource requests may not be able to be met (memory, CPU usage etc.) but, as we shall
see later in the course, the operating system is able to deal with scheduling problems such as these.
Other resources have a layer of abstraction placed between them and the physical resource. An example, of
this is a printer. If your program wants to write to a printer, in this day and age, it is unlikely that the
program will be directly connected to a physical printer. The operating system will step in and take the
print requests and spool the data to disc. It will then schedule the prints, making the best use of the printer
possible. During all of this it will appear to the user and the program as if their prints requests are going to a
physical printer.
(2) Another view of an operating system sees it as a way of not having to deal with the complexity
of the hardware. In (Tanenbaum, 1992) the example is given of a floppy disc controller (using an NEC
PDP765 controller chip). This chip has sixteen commands which allow the programmer to read and write
data, move the disc heads, format tracks etc. Just carrying out a simple READ or WRITE command
requires thirteen parameters, which are packed into nine bytes. The operation, when complete, returns 23
status and error fields packed into seven bytes. Now if you think that is complicated, then consider
concerning ourselves with: 1) whether the floppy disc is spinning, 2) what type of recording method we
should use, 3) the fact that hard discs are just as complicated but work differently. It is all these
complexities that the operating system conveniently hides from us simple minded users. So in this view of
the machine, the operating system can be seen as an extended machine or a virtual machine.
You are probably aware that Charles Babbage is attributed with designing the first digital computer, which
he called the Analytical Engine. It is unfortunate that he never managed to build the computer as, being of
a mechanical design, the technology of the day could not produce the components to the needed precision.
Of course, Babbage’s machine did not have an operating system, but would have been incredibly useful all
the same for it’s era for generating nautical navigation tables.
These first generation computers filled entire rooms with thousands of vacuum tubes. Like the analytical
engine they did not have an operating system, they did not even have programming languages and
programmers had to physically wire the computer to carry out their intended instructions. The programmers
also had to book time on the computer as a programmer had to have dedicated use of the machine.
As computers were so expensive methods were developed that allowed the computer to be as productive as
possible. One method of doing this (which is still in use today) is the concept of batch jobs. Instead of
submitting one job at a time, many jobs were placed onto a single tape and these were processed one after
another by the computer. The ability to do this can be seen as the first real operating system (although, as
we said above, depending on your view of an operating system, much of the complexity of the hardware
had been abstracted away by this time).
Up until this time, computers were single tasking. The third generation saw the start of multiprogramming.
That is, the computer could give the illusion of running more than one task at a time. Being able to do this
Another feature of third generation machines was that they implemented spooling. This allowed reading of
punch cards onto disc as soon as they were brought into the computer room. This eliminated the need to
store the jobs on tape, with all the problems this brings.
Similarly, the output from jobs could also be stored to disc, thus allowing programs that produced output to
run at the speed of the disc, and not the printer.
Although, compared to first and second generation machines, third generation machines were far superior
but they did have a downside. Up until this point programmers were used to giving their job to an operator
(in the case of second generation machines) and watching it run (often through the computer room door –
which the operator kept closed but allowed the programmers to press their nose up against the glass). The
turnaround of the jobs was fairly fast.
This changed. With the introduction of batch processing the turnaround could be hours if not days. This
problem led to the concept of time sharing. This allowed programmers to access the computer from a
terminal and work in an interactive manner.
Obviously, with the advent of multiprogramming, spooling and time sharing, operating systems had to
become a lot more complex in order to deal with all these issues.
It is still (largely) true today that there are “mainframe” operating systems (such as VME which runs on
ICL mainframes) and “PC” operating systems (such as MS-Windows and UNIX), although the distinctions
are starting to blur. For example, you can run a version of UNIX on ICL’s mainframes and, similarly, ICL
were planning to make a version of VME that could be run on a PC.
Just being able to accept (and understand!) the spoken word and carry out reasoning on that data requires
many things to come together before we have a fifth generation computer. For example, advances need to
be made in AI (Artificial Intelligence) so that the computer can mimic human reasoning. It is also likely
that computers will need to be more powerful. Maybe parallel processing will be required. Maybe a
Another View
The view of how computers have developed, with regard to where the generation gaps lie, is slightly
different, depending who you ask. Ask somebody else and they might agree with the slightly amended
model below. Most commentators agree on what is the first generation and the fact they are characterised
by the fact that they were developed during the war and used vacuum tubes. Similarly, most people agree
that the transistor heralded the second generation. The third generation came about because of the
development of the IC and operating systems that allowed multiprogramming. But, in the model above, we
stated that the third generation ran from 1965 to 1980. Some people would argue that the fourth generation
actually started in 1971 with the introduction of LSI, then VLSI (Very Large Scale Integration) and then
ULSI (Ultra Large Scale Integration). Really, all we are arguing about is when the PC revolution started.
Was it in the early 70’s when LSI first became available? Or was it in 1980, when the IBM PC was
launched?
Case Study
To show, via an example, how an operating system developed, we give a brief history of ICL’s mainframe
operating systems.
One of ICL’s first operating systems was known as manual exec (short for executive). It ran on its 1900
mainframe computers and provided a level of abstraction between the hardware and also allowed multi-
programming. However, it was very much a manual operating system. The operators had to load and run
each program. Commands such as these were used.
LO#RA15#REP3
GO#RA15 21
The first instruction told the computer to load the program called RA15 from a program library called
REP3. This loaded the program from disc into memory. The “GO 21” instruction told the program to start
running, using entry point 21. This (typically) told the program to read a punched card(s) from the card
reader, which held information to control the program.
The important point is that the computer operator had control over every program in the computer. It had to
be manually loaded into memory, initiated and finally deleted from the memory of the computer (which
was typically 32K). In between, any prompts had to be dealt with. This might mean allowing the computer
to use tape decks, allowing the program to print special stationery or dealing with events that were unusual.
ICL then brought out an operating system they called GEORGE (GEneral ORGanisational Environment).
The first version was called George 1 (G1). G2 and G2+ quickly followed. The idea behind G1/2/2+ was
that it ran on top of the operating system. So it was not an operating system as such (in the same way that
Windows 3.1 is not a true operating system as it is only a GUI that runs on top of DOS).
What G2+ (we’ll ignore the previous versions for now) allowed you to do was submit jobs to the machine
and then G2+ would schedule those jobs and process them accordingly. Some of the features of G2+
included:
• It allowed you to batch many programs into a single job. For example, you could run a program that
extracted data from a masterfile, run a sort and then run a print program to print the results. Under
manual exec you would need to run each program manually. It was not unusual to have a typical job
process twenty or thirty separate programs.
Under G2+, the operators still looked after individual jobs (albeit, they now consisted of several programs).
When ICL released George 3 (G3) and later G4, all this changed. The operators no longer looked after
individual jobs. Instead they looked after the system as a whole. Jobs could now be submitted via
interactive terminals. Whereas the operators used to submit the jobs, this role was now typically carried out
by a dedicated scheduling team who would set up the workload that had to be run over night, and would set
up dependencies between the jobs. In addition, development staff would be able to issue their own batch
jobs and also run jobs in an interactive environment. If there were any problems with any of the jobs, the
output would either go to the development staff or to the technical support staff where the problem would
be resolved and the job resubmitted.
Operators, under this type of operating system were, in some peoples opinion, little more than “tape
monkeys”, although the amount of technical knowledge held by the operators varied greatly from site to
site.
In addition to G3 being an operating system in its own right G3 also had the following features
• To use the machine you had to run the job in a user. This is a widely used concept today but was not a
requirement of G2+.
• The Job Control Language (JCL) was much more extensive than that of G2+.
• It allowed interactive sessions
• It had a concept of filestore. When you created a file you had no idea where it was stored. G3 simply
placed it in filestore. This was a vast amount of disc space used to store files. In fact the filestore was
virtual in that some of it was on tape. What files were placed on tape was controlled by G3. For
example, you could set the parameters so that files over a certain size or files that had not been used for
a certain length of time were more likely to be placed onto tape. If your job requested a file that was in
filestore but had been copied to tape the operator would be asked to load that tape. The operator had no
idea what file was being requested or who it was for (although they could find out). G3 simply asked
for a TSN (Tape Serial Number) to be loaded.
• The operators ran the system, rather than individual jobs.
After G3/G4, ICL released their VME (Virtual Machine Environment) operating system. This is still the
operating system used on ICL mainframes today. VME, as it’s name suggests, creates virtual machines that
jobs run in. If you log onto (or run a job on) VME, you will be created a virtual machine for your session.
In addition, VME is written to cater for the many different workloads that mainframes have to perform. It
supports databases (using ICL’s DBMS – Database Management System and, more recently relational
databases such as Ingress and Oracle), TP (Transaction Processing) systems as well as batch and interactive
working. The job control language which, under VME is called SCL – System Control Language is a lot
more sophisticated and you can often carry out tasks without having to use another language for operations
such as file I/O.
There is still the concept of filestore but, due the to the (relatively) low cost of disc space and the problems
associated with having to wait for tapes, all filestore is now on disc. In addition, the amount of filestore
available to users or group of users is under the control of the operating system (and thus the technical
support teams). Like G3, the operators control the entire system and are not normally concerned with
individual jobs. In fact, there is a move towards having lights out working. This removes the need for
operators entirely and if there are any problems, VME, will telephone a pager.
References
• Levy, S. 1994. Hackers.
• Tanenbaum, A., S. 2001. Modern Operating Systems. Prentice Hall.
Introduction (II)
Handout Introduction
These notes are largely based on (Tanenbaum, 1st Ed.1992 or 2nd Ed. 2001). Where applicable, the notes
will point you to the relevant part of that book(s) in case you want to read about the subject in a little more
detail. Actually you will find it a fairly smart thing to read chapter 1 out of Tanenbaum, and to think about
starting to read chapter 2. Reading about the same subject but using the lectures, these handouts, and the
book will hopefully clear up any misunderstandings and prepare you well for what is to come!
Processes
One of the key tasks for an operating system is to run processes (see Tanenbaum, 2001, p34-36). For now,
we can consider a process as a running program with all the other information that is needed to control its
execution (e.g. program counter, stack, registers, file pointers etc.).
If the CPU is running one process and, for whatever reason, another process now needs to run, the
operating system must save the details of the currently running process and start the new process from
exactly the same point as it was left.
All the data for a process is normally held in a process table. This is a data structure that contains a list of
active processes which the operating system uses to decide which process to run next and to restore a
process to the state it was in before it was stopped from running.
A process may create a child process. For example, if we issue a command from the command shell (one
process), this will create another process to execute the command. This can lead to a tree like structure of
processes. When that command finishes the child process will issue a system call to destroy itself.
One of the main tasks of an operating system is to schedule all the processes which are currently competing
for the CPU.
Processes may also communicate with other processes. This might sound simple but, as we shall see, later
in the course, this leads to all sorts of complications which the operating system must handle.
Files
Another broad class of system calls relate to the file system (See Tanenbaum, p38-41). We said above that
one of the tasks of an operating system is to hide the complexities of the hardware. In order to do this
system calls must be provided to (for example) create files, delete files, move files, rename files, copy files,
open files, close files, read files, write files etc. etc.
As an example of this “abstraction”, consider that some operating systems provide you with different types
of file. You may be able to open a file in text mode or in binary mode. The way you open the file
There is even the concept of a process being an input file for another process, which acts as if it is writing
to a file. This is normally presented to us as a pipe. For example, in MS-DOS if you type
DIR | SORT
It pipes the output from the DIR command to the SORT command, so that the display comes out sorted (in
fact, it sorts the lines in alphabetical order – and might not be what you expect).
Similar to pipes is redirection. This allows the output from a program to be redirected to a file. For
example
Will redirect the output of the DIR command from the standard output (the screen) to the file called dir.txt.
(If you have never used MS-DOS (or UNIX) you might like to experiment with redirection – also try “>>”
and “<”).
The whole point of mentioning standard input, standard output, pipes and redirection is to demonstrate that
the operating system is hiding a lot of complexity from us as well as providing us with many features which
we might find useful.
Many file systems also support the concept of directory (or folder) hierarchies. That is, there is a top level
view of the file system and by creating folders you can build a tree (conceptually) which represents your
view of the data. Therefore, the file system must provide system calls to maintain these directory structures.
When discussing processes we said that it is possible to build a tree of processes, in the same way we can
build a directory hierarchy. But there are many differences between these trees.
• Process trees are normally short lived. Directory trees can last for years.
• Files (assuming access rights are granted) can be maintained by any user of the system. A process can
only be controlled by its parent.
• Directories (and files) can have access rights associated with them. Processes have no such
information.
• Directories (and files) can be accessed in a number of ways (e.g. relative or absolute pathnames).
Process tress have no such concept.
System Calls
Access to the operating system is done through system calls (See Tanenbaum, 2001, p44-48). Each system
call has a procedure associated with it so that calls to the operating system can be done in a familiar way.
When you call one of these procedures it places the parameters into registers and informs the operating
system that there is work to be done. Notifying the operating system is done via a TRAP instruction
(sometimes known as a kernel call or a supervisor call). This instruction switches the CPU from user
mode to kernel (or supervisor) mode. In user mode, for safety, certain instructions are unavailable to the
programs. In kernel mode all the CPU instructions are available. The operating system now carries out the
work, which includes validating the parameters. Eventually, the operating system will have completed the
work and will return a result in the same way as a user written function written in a high level language.
You can see that it is just the same as calling a user written function in a language such
as C.
The Shell
The operating system is the mechanism that carries out the system calls requested by the various parts of
the system. Tools such as compilers, editors etc. are not part of the operating. Similarly, the shell is not part
of the operating system. The shell is the part of (for example) UNIX and MS-DOS where you can type
commands to the operating system and receive a response. You may also hear the shell get called the
Command Line Interpreter (CLI) or the “C” prompt. However, it is worth mentioning the shell as it makes
heavy use of operating system features and is a good way to experiment.
We have already seen one example of a command line (DIR > dir.txt). A more complicated
command (in UNIX this time) could be: cat file1 file2 file3 | sort > /dev/lp &
This command concatenates three files and pipes them to the sort program. It then redirects the sorted file
to a line printer. The ampersand “&” at the end of the command instructs UNIX to issue the command as a
background job. This results in the command prompt being returned immediately, whilst another process
carries out the requested work. You can appreciate, by looking at the above command that there will be a
series of system calls to the operating system in order to satisfy the whole request.
Monolithic Systems
One possible approach is to have no structure to the operating system at all i.e. it is simply a collection of
procedures. Each procedure has a well defined interface and any procedure is able to call any other
procedure. The operating system is constructed by compiling all the procedures into one huge monolithic
system. There is no concept of encapsulation, data hiding or structure amongst the procedures. However,
you find that the way the system procedures are written they naturally fall into a structure whereby some
procedures will be high level procedures and these will call on other utility procedures (see diagram). The
main procedure is called by the user programs and these call service procedures which, in turn call utility
procedures e.g.
Main
Procedure
Service
Procedures
Utility
Procedures
Layered Systems
Layer 0 was responsible for the multiprogramming aspects of the operating system. It decided which
process was allocated to the CPU. It dealt with interrupts and performed the context switches
when a process change was required.
Layer 1 was concerned with allocating memory to processes.
Layer 2 deals with inter-process communication and communication between the operating system and
the console.
Layer 3 managed all I/O between the devices attached to the computer. This included buffering
information from the various devices.
Layer 4 was where the user programs were stored.
Layer 5 was the overall control of the system (called the system operator)
As you move through this hierarchy (from 0 to 5) you do not need to worry about the aspects you have “left
behind.” For example, high level user programs (level 4) do not have to worry about where they are stored
in memory or if they are currently allocated to the processor or not, as these are handled in low level 0-1.
Virtual Machines
Virtual machines mean different things to different people (Tanenbaum 2001, p59). For example, if you run
an MS-DOS prompt from with Windows 95/98/NT you are running, what Microsoft call, a virtual machine.
It is given this name as the MS-DOS program is fooled into thinking that it is running on a machine that it
has sole use of.
ICL’s mainframe operating system is called VME (Virtual Machine Environment). The idea is that when
you log onto the machine a VM (Virtual Machine) is built and it looks as if you have the computer all to
yourself (in an abstract sense – nobody really expects to have an entire mainframe to themselves).
Both of these (Windows 95/98/NT and VME) are fairly recent developments but one of the first operating
systems (VM/370) was able to provide a virtual machine to each user. In addition, each user was able to run
different operating systems if they so desired. This is a major achievement, if you think about it, as
different operating systems will access the hardware in different ways (to name just one problem). The way
the system operated was that the bare hardware was “protected” by VM/370 (called a virtual machine
monitor). This provided access to the hardware when needed by the various processes running on the
computer. In addition, VM/370 created virtual machines when a user required one. But, instead of simply
providing an extension of the hardware that abstracted away the complexities of the hardware, VM/370
provided an exact copy of the hardware, which included I/O, interrupts and user/kernel mode. Any
instructions to the hardware are trapped by VM/370, which carried out the instructions on the physical
hardware and the results returned to the calling process. The diagram below shows a model of the VM/370
computer. Note that CMS is a “Conversational Monitor System” and is just one of the many operating
systems that can be run – in this case it CMS (see below) is a single user OS intended for interactive time
sharing.
Virtual 370’s
TRAP VM/370
Client-Server Model
One of the recent advances in computing is the idea of a client/server model (Tanenbaum, 2001, p61). A
server provides services to any client that requests it. This model is heavily used in distributed systems
where a central computer acts as a server to many other computers. The server may be something as simple
as a print server, which handles print requests from clients. Or, it could be relatively complex, and the
server could provide access to a database which only it is allowed to access directly.
Operating systems can be designed along similar lines. Take, for example, the part of the operating system
that deals with file management. This could be written as a server so that any process which requires access
to any part of the filing system asks the file management server to carry out a request, which presents the
calling client with the results. Similarly, there could be servers which deal with memory management,
process scheduling etc.
• It can result in a minimal kernel. This results in easier maintenance as not so many processes are
running in kernel mode. All the kernel does is provide the communication between the clients and the
servers.
• As each server is managing one part of the operating system, the procedures can be better structured
and more easily maintained.
• If a server crashes it is less likely to bring the entire machine down as it will not be running in kernel
mode. Only the service that has crashed will be affected.
Processes
This section is based on (Tanenbaum, 1992,27-29; 2001, p 34-44).
The concept of a process is fundamental to an operating system and they can be viewed as an abstraction of
a program. Although to be strict we can say that a program (i.e. an algorithm expressed in some suitable
notation) has a process that executes the algorithm and has associated with it input, output and a state.
Computers nowadays can do many things at the same time. They can be writing to a printer, reading from a
disc and scanning an image. The computer (more strictly the operating system) is also responsible for
Therefore, the main point of this part of the course is to consider how an operating system deals with
processes when we allow many to run in pseudoparallelism.
Although, with a single CPU, we know the computer can only execute a single process at a given moment
in time. It is important to realise that this is the case. It is also important to realise that one process can have
an effect on another process which is not currently running; as we shall see later.
Process States
This section is based on (Tanenbaum, 1992, p29-31; 2001, p77-79)
Running. Only one process can be running at any one time (assuming a single processor machine). A
running process is the process that is actually using the CPU at that time.
Ready. A process that is ready is runnable but cannot get access to the CPU due to another process
using it.
Blocked. A blocked process is unable to run until some external event has taken place. For example, it
may be waiting for data to be retrieved from a disc.
A state transition diagram can be used to represent the various states and the transition between those
states.
Running
Ready
Blocked
You can see from this that a running process can either be blocked (i.e. it needs to wait for an external
event) or it can go to a ready state (for example, the scheduler allows another process to use the CPU). A
ready process can only move to a running state whilst a blocked process can only move to a ready state. It
should be apparent that the job of the scheduler is concerned with deciding which one of the processes in a
ready state should be allowed to move to a running state (and thus use the CPU).
Process Management
Registers
Program Counter
Program Status Word
Stack Pointer
Process State
Time when process started
CPU time used
Time of next alarm
Process id
You will notice that, as well as information to ensure the process can start again (e.g. Program Counter),
the process control block also holds accounting information such as the time the process started and how
much CPU time has been used. You should note that this is only a sample of the information held. There
will be other information, not least of all concerned with the files being used and the memory the process is
using.
Race Conditions
This section is based on (Tanenbaum, 1992, p33-34; 2001, p100-101)
It is sometimes necessary for two processes to communicate with one another. This can either be done via
shared memory or via a file on disc. It does not really matter. We are not discussing the situation where a
process can write some data to a file that is read by another process at a later time e.g. days, weeks or
months. We are talking about two processes that need to communicate at the time they are running. Take,
as an example, one type of process (i.e. there could be more than one process of this type running) that
checks a counter when it starts running. If the counter is at a certain value, say x, then the process
terminates as only x copies of the process are allowed to run at any one time. This is how it works….
We now have the situation where we have five processes running but the counter is only set to four. This
problem is known as a race condition.
Critical Sections
This section is based on (Tanenbaum, 1992, p34-35; 2001, p102-103)
One way to avoid race conditions is not to allow two processes to be in their critical sections at the same
time (by critical section we mean the part of the process that accesses a shared variable). That is, we need a
mechanism of mutual exclusion. Some way of ensuring that one processes, whilst using the shared
variable, does not allow another process to access that variable. In fact, to provide a good solution to the
problem of race conditions we need four conditions to hold.
As we shall see in the next lecture that it is difficult to devise a method that meets all these conditions.
References
• Courtois P., J., Heymans F. and Parnas D. L. 1971. Concurrent Control with Readers and Writers.
Communications of the ACM, Vol. 10, pp. 667-668
• Dijkstra E. W. 1965. Co-operating Sequential Processes. Programming Languages, Genuys, F. (ed),
London : Academic Press
• Peterson G., L. 1981. Myths about the Mutual Exclusion Problem. Information Processing Letters, Vol
12, No. 3
• Silberachatz A. et al. 2003/1994. Operating System Concepts. Addison-Wesley Publishing Company
• Tanenbaum, A., S. 2001/1992. Modern Operating Systems. Prentice Hall.
OPS Processes
Implementing Mutual Exclusion with Busy Waiting
This section is based on Tanenbaum (1992, P35-39; 2001 p103-108)
These code fragments offer a solution to the mutual exclusion problem. Assume the variable turn is
initially set to zero. Process 0 is allowed to run. It finds that turn is zero and is allowed to enter its
critical region. If process 1 tries to run, it will also find that turn is zero and will have to wait (the while
statement) until turn becomes equal to 1. When process 0 exits its critical region it sets turn to 1, which
allows process 1 to enter its critical region. If process 0 tries to enter its critical region again it will be
blocked as turn is no longer zero. However, there is one major flaw in this approach. Consider this
sequence of events....
• Process 0 runs, enters its critical section and exits; setting turn to 1. Process 0 is now in its non-
critical section. Assume this non-critical procedure takes a long time.
• Process 1, which is a much faster process, now runs and once it has left its critical section turn is
set to zero.
• Process 1 executes its non-critical section very quickly and returns to the top of the procedure.
• The situation is now that process 0 is in its non-critical section and process 1 is waiting for turn to
be set to zero. In fact, there is no reason why process 1 cannot enter its critical region as process 0
is not in its critical region.
What we can see here is violation of one of the conditions that we listed last lecture i.e. a process, not
in its critical section, is blocking another process. If you work through a few iterations of this solution
you will see that the processes must enter their critical sections in turn; thus this solution is called strict
alternation.
F:\Lectures\Handouts\2003\lecture03-Feb03.rtf
Tony Cook - 03/02/2003 - Page 1 of 5
Operating Systems
1965). In fact the original idea came from a Dutch mathematician (T. Dekker). This was the first time
the mutual exclusion problem had been solved using a software solution. (Peterson, 1981), came up
with a much simpler solution.
The solution consists of two procedures, shown here in a C style syntax - the “//” of course marking the
start of comments
A process that is about to enter its critical region has to call enter_region. After the end of its passage
through the critical region it is made to call leave_region. Initially, both processes are not in their
critical region and the array interested has all (both in the above example) its elements set to false.
Assume that process 0 calls enter_region. The variable other is set to one (the other process number)
and it indicates its interest by setting the relevant element of interested. Next it sets the turn variable,
before coming across the while loop. In this instance, the process will be allowed to enter its critical
region, as process 1 is not interested in running.
Now process 1 could call enter_region. It will be forced to wait as the other process (0) is still
interested. Process 1 will only be allowed to continue when interested[0] is set to false which can only
come about from process 0 calling leave_region.
If we ever arrive at the situation where both processes call enter region at the same time, one of the
processes will set the turn variable, but it will be immediately overwritten. Assume that process 0 sets
turn to zero and then process 1 immediately sets it to 1. Under these conditions process 0 will be
allowed to enter its critical region and process 1 will be forced to wait.
F:\Lectures\Handouts\2003\lecture03-Feb03.rtf
Tony Cook - 03/02/2003 - Page 2 of 5
Operating Systems
enter_region:
tsl register, flag ; copy flag to register and set flag to 1
cmp register, #0 ;was flag zero?
jnz enter_region ;if flag was non zero, lock was set , so loop
ret ;return (and enter critical region)
leave_region:
mov flag, #0 ; store zero in flag
ret ;return
Assume, again, two processes. Process 0 calls enter_region. The tsl instruction copies the flag to a
register and sets it to a non-zero value. The flag is now compared to zero (cmp - compare) and if found
to be non-zero (jnz – jump if non-zero) the routine loops back to the top. Only when process 1 has set
the flag to zero (or under initial conditions), by calling leave_region, will process 0 be allowed to
continue.
Comments
Of all the solutions we have looked at, both Peterson’s and the TSL solutions solve the mutual
exclusion problem. However, both of these solutions have the problem of busy waiting. That is, if the
process is not allowed to enter its critical section it sits in a tight lop waiting for a condition to be met.
This is obviously wasteful in terms of CPU usage. It can also have, not so obvious disadvantages.
Suppose we have two processes, one of high priority, h, and one of low priority, l. The scheduler is set
so that whenever h is in ready state it must be run. If l is in its critical section when h becomes ready to
run, l will be placed in a ready state so that h can be run. However, if h tries to enter its critical section
then it will be blocked by l, which will never be given the opportunity of running and leaving its
critical section. Meantime, h will simply sit in a loop forever. This is sometimes called the priority
inversion problem.
In this section, instead of a process doing a busy waiting we will look at procedures that send the
process to sleep. In reality, it is placed in a blocked state. The important point is that it is not using the
CPU by sitting in a tight loop. To implement a sleep and wakeup system we need access to two system
calls (SLEEP and WAKEUP). These can be implemented in a number of ways. One method is for
SLEEP to simply block the calling process and for WAKEUP to have one parameter; that is the process
it has to wakeup. An alternative is for both calls to have one parameter, this being a memory address
which is used to match the SLEEP and WAKEUP calls.
F:\Lectures\Handouts\2003\lecture03-Feb03.rtf
Tony Cook - 03/02/2003 - Page 3 of 5
Operating Systems
• Once the producer has added an item to the buffer, and incremented count, it checks to see if count
= 1 (i.e. the buffer was empty before). If it is, it wakes up the consumer.
• Once the consumer has removed an item from the buffer, it decrements count. Now it checks count
to see if it equals n-1 (i.e. the buffer was full). If it does it wakes up the producer.
void producer(void) {
int item;
while(TRUE) {
produce_item(&item); // generate next item
if(count == BUFFER_SIZE) sleep (); // if buffer full, sleep
enter_item(item); // put item in buffer
count=count+1; // increment count
if(count == 1) wakeup(consumer); // was buffer empty?
}
}
void consumer(void) {
int item;
while(TRUE) {
if(count == 0) sleep (); // if buffer is empty, sleep
remove_item(&item); // remove item from buffer
count=count-1; // decrement count
if(count == BUFFER_SIZE - 1) wakeup(producer); // was buffer full?
consume_item(&item); // print item
}
}
This seems logically correct but we have the problem of race conditions with count. The following
situation could arise....
• The buffer is empty and the consumer has just read count to see if it is equal to zero.
• At this very instant the scheduler stops running the consumer and starts running the producer,
before it can be put to sleep.
• The producer places an item in the buffer and increments count.
• The producer checks to see if count is equal to one. Finding that it is, it assumes that it was
previously zero which implies that the consumer is sleeping – so it sends a wakeup.
• In fact, the consumer is not asleep so the call to wakeup is lost.
• The consumer now runs – continuing from where it left off – it checks the value of count. Finding
that it is zero it goes to sleep. As the wakeup call has already been issued the consumer will sleep
forever.
• Eventually the buffer will become full and the producer will send itself to sleep.
• Both producer and consumer will sleep forever.
One solution is to have a wakeup waiting bit that is turned on when a wakeup is sent to a process that
is already awake. If a process goes to sleep, it first checks the wakeup bit. If set the bit will be turned
off, but the process will not go to sleep. Whilst seeming a workable solution it suffers from the
drawback that you need an ever increasing number wakeup bits to cater for larger number of processes.
F:\Lectures\Handouts\2003\lecture03-Feb03.rtf
Tony Cook - 03/02/2003 - Page 4 of 5
Operating Systems
The Producer-Consumer Problem (Tannenbaum 2001, p109-109)
Assume there is a producer (which produces goods) and a consumer (which consumes goods). The
producer, produces goods and places them in a fixed size buffer. The consumer takes the goods from
the buffer. The buffer has a finite capacity so that if it is full, the producer must stop producing.
Similarly, if the buffer is empty, the consumer must stop consuming. This problem is also referred to as
the bounded buffer problem. The type of situations we must cater for are when the buffer is full, so the
producer cannot place new items into it. Another potential problem is when the buffer is empty, so the
consumer cannot take from the buffer.
The problem is to program the barber and the customers without getting into race conditions.
References
• Courtois P., J., Heymans F. and Parnas D. L. 1971. Concurrent Control with Readers and Writers.
Communications of the ACM, Vol. 10, pp. 667-668
• Dijkstra E. W. 1965. Co-operating Sequential Processes. Programming Languages, Genuys, F.
(ed), London :Academic Press
• Peterson G., L. 1981. Myths about the Mutual Exclusion Problem. Information Processing Letters,
Vol 12, No. 3
• Silberachatz A., Galvin P. 1994. Operating System Concepts (4th Ed). Addison-Wesley Publishing
Company
• Tanenbaum, A., S. 1992. Modern Operating Systems. Prentice Hall.
F:\Lectures\Handouts\2003\lecture03-Feb03.rtf
Tony Cook - 03/02/2003 - Page 5 of 5
Operating Systems
Introduction .....................................................................................................................................................1
The Semaphore and the Mutex........................................................................................................................2
Process Scheduling..........................................................................................................................................3
References .......................................................................................................................................................6
Introduction
Of all the solutions we have looked at previosly, both Peterson’s and the TSL solutions solve the
mutual exclusion problem. However, both of these solutions have the problem of busy waiting. That is,
if the process is not allowed to enter its critical section it sits in a tight lop waiting for a condition to be
met. This is obviously wasteful in terms of CPU usage. Also in the case of two processes where one (h)
is of high priority, and the other (l) of low priority we can have a priority inversion problem. This
happens when the scheduler is set so that whenever h is in ready state it must be run. If l is in its critical
section when h becomes ready to run, l will be placed in a ready state so that h can be run. However, if
h tries to enter its critical section then it will be blocked by l, which will never be given the opportunity
of running and leaving its critical section. Meantime, h will simply sit in a loop forever. We also
discussed Sleep and Wakeup where instead of a process doing a busy waiting, processes can be sent to
sleep. In reality this really means a blocked state, but with the difference that the sleeping process is not
tying up the CPU in a wait loop. This was implemented using two system calls: SLEEP and WAKEUP
i.e. one method is for SLEEP to block the calling process and for WAKEUP to have one parameter;
that is the process it has to wakeup. An alternative is for both calls to have one parameter, this being a
memory address which is used to match the SLEEP and WAKEUP calls. SLEEP/WAKEUP was
descibed in the Producer-Consumer problem and the solution involved maintaining a variable, count,
that keeps track of the number of items in the buffer. The producer will check count against n
(maximum items in the buffer). If count = n then the producer sends itself the sleep. Otherwise it adds
the item to the buffer and increments n. Similarly, when the consumer retrieves an item from the buffer,
it first checks if n is zero. If it is it sends itself to sleep. Otherwise it removes an item from the buffer
and decrements count. The calls to WAKEUP occur under the following conditions.
• Once the producer has added an item to the buffer, and incremented count, it checks to see if count
= 1 (i.e. the buffer was empty before). If it is, it wakes up the consumer.
• Once the consumer has removed an item from the buffer, it decrements count. Now it checks count
to see if it equals n-1 (i.e. the buffer was full). If it does it wakes up the producer.
void producer(void) {
int item;
while(TRUE) {
produce_item(&item); // generate next item
if(count == BUFFER_SIZE) sleep (); // if buffer full,sleep
enter_item(item); // put item in buffer
count=count+1; // increment count
if(count == 1) wakeup(consumer); // was buffer empty?
}
}
F:\Lecture04\lecture04-Feb07.rtf
Tony Cook- 07/02/2003 - Page 1 of 6
Operating Systems
void consumer(void) {
int item;
while(TRUE) {
if(count == 0) sleep (); //if buffer empty,sleep
remove_item(&item); //remove item from buff
count=count-1; //decrement count
if(count == BUFFER_SIZE - 1) wakeup(producer); //was buffer full?
consume_item(&item); //print item
}
}
This seems logically correct but we have the problem of race conditions with count. The following
situation could arise.
• The buffer is empty and the consumer has just read count to see if it is equal to zero.
• All of a sudden the scheduler stops running the consumer and starts running the producer.
• The producer places an item in the buffer and increments count.
• The producer checks to see if count is equal to one. Finding that it is, it assumes that it was
previously zero which implies that the consumer is sleeping – so it sends a wakeup.
• In fact, the consumer is not asleep so the call to wakeup is lost.
• The consumer eventually gets switched back in and now runs – continuing from where it left off –
it checks the value of count. Finding that it is zero it goes to sleep. As the wakeup call has already
been issued the consumer will sleep forever.
• Eventually the buffer will become full and the producer will send itself to sleep.
• Both producer and consumer will sleep forever.
One solution is to have a wakeup waiting bit that is turned on when a wakeup is sent to a process that
is already awake. If a process goes to sleep, it first checks the wakeup bit. If set the bit will be turned
off, but the process will not go to sleep. Whilst seeming a workable solution it suffers from the
drawback that you need an ever increasing wakeup bits to cope with a larger number of processes.
The Semaphore and the Mutex (Tannebaum 2001, p110-114; Bic and Shaw 2003, p 59-
64; Silberschatz 2003, p201-207)
In (Dijkstra, 1965) the suggestion was made that an integer variable be used that recorded how many
wakeups had been saved. Dijkstra called this variable a semaphore. If it was equal to zero it indicated
that no wakeup’s were saved. A positive value shows that one or more wakeup’s are pending. Now the
sleep operation (which Dijkstra called DOWN) checks the semaphore to see if it is greater than zero. If
it is, it decrements the value (using up a stored wakeup) and continues. If the semaphore is zero the
process sleeps. The wakeup operation (which Dijkstra called UP) increments the value of the
semaphore. If one or more processes were sleeping on that semaphore then one of the processes is
chosen and allowed to complete its DOWN. Checking and updating the semaphore must be done as an
atomic action to avoid race conditions. Here is an example of a series of Down and Up’s. We are
assuming we have a semaphore called mutex (for mutual exclusion). It is initially set to 1. The
subscript figure, in this example, represents the process, p, that is issuing the Down.
From this example, you can see that we can use semaphores to ensure that only one process is in its
critical section at any one time, i.e. the principle of mutual exclusion. We can also use semaphores to
F:\Lecture04\lecture04-Feb07.rtf
Tony Cook- 07/02/2003 - Page 2 of 6
Operating Systems
synchronise processes. For example, the produce and consume functions in the producer-consumer
problem. Take a look at this program fragment.
semaphore mutex = 1;
semaphore empty = BUFFER_SIZE;
semaphore full = 0;
void producer(void) {
int item;
while(TRUE) {
produce_item(&item); // generate next item
down(&empty); // decrement empty count
down(&mutex); // enter critical region
enter_item(item); // put item in buffer
up(&mutex); // leave critical region
up(&full); // increment count of full slots
}
}
void consumer(void) {
int item;
while(TRUE) {
down(&full); // decrement full count
down(&mutex); // enter critical region
remove_item(&item); // remove item from buffer
up(&mutex); // leave critical region
up(&empty); // increment count of empty slots
consume_item(&item); // print item
}
}
The mutex semaphore (given the above example) should be self-explanatory. The empty and full
semaphore provide a method of synchronising adding and removing items to the buffer. Each time an
item is removed from the buffer a down is done on full. This decrements the semaphore and, should it
reach zero the consumer will sleep until the producer adds another item. The consumer also does an up
an empty. This is so that, should the producer try to add an item to a full buffer it will sleep (via the
down on empty) until the consumer has removed an item.
Process Scheduling
Scheduling Objectives (Tanenbaum, 1992 p61-70; 2001 p132-140)
If we assume we only have one processor and there are two or more processes in a ready state, how do
we decide which process to schedule next? Or more precisely, which scheduling algorithm does the
scheduler use in deciding which process should be moved to a running state? These questions are the
subject of this section. In trying to schedule processes, the scheduler tries to meet typically five
objectives:
F:\Lecture04\lecture04-Feb07.rtf
Tony Cook- 07/02/2003 - Page 3 of 6
Operating Systems
Of course, the scheduler cannot meet all of these objectives to an optimum level. For example, in trying
to give interactive users good response times, the batch jobs may have to suffer. Many large
companies, that use mainframes, address these types of problems by taking many of the scheduling
decisions themselves. For example, a company the last year’s lecturer (Dr Graham Kendall) used to
work for did not allow batch work during the day. Instead they gave the TP (Transaction Processing)
system all the available resources so that the response times for the users (many of which were dealing
with the public in an interactive way) was as fast as possible. The batch work was run overnight when
the interactive workload was much less, typically only operations staff and technical support personnel.
However, these type of problems are likely to increase as the world becomes “smaller.” If a company
operates a mainframe that is accessible from all over the world then the concept of night and day no
longer hold and there may be a requirement for TP access 24 hours a day and the batch work somehow
has to be fitted in around this workload.
Preemptive Scheduling
A simple scheduling algorithm would allow the currently active process to run until it has completed.
This would have several advantages
1. We would no longer have to concern ourselves with race conditions as we could be sure that one
process could not interrupt another and update a shared variable.
2. Scheduling the next process to run would simply be a case of taking the highest priority job (or
using some other algorithm, such as FIFO algorithm).
Note : We could define completed as when a process decides to give up the CPU. The process may not
have completed but it would only give up control when safe (e.g. not during the update of a shared
variable).
Therefore, it is usual for the scheduler to have the ability to decide which process can use the CPU and,
once it has had its slice of time then it is placed into a ready state and the next process allowed to run.
This type of scheduling is called preemptive scheduling. This disadvantage of this method is that we
need to cater for race conditions as well as having the responsibility of scheduling the processes.
F:\Lecture04\lecture04-Feb07.rtf
Tony Cook- 07/02/2003 - Page 4 of 6
Operating Systems
it is completed. The problem with FCFS is that the average waiting time can be long. Consider the
following processes
P1 will start immediately, with a waiting time of 0 milliseconds (ms). P2 will have to wait 27ms. P3
will have to wait 36ms before starting. This gives us an average waiting time of 21ms (i.e. (0 + 27 +
36) /3 ).
Now consider if the processes had arrived in the order P2, P3, P1. The average waiting time would now
be 6.6ms (i.e. (0 + 9 + 11) /3). This is obviously a big saving and all due to the fact the way the jobs
arrived. It can be shown that FCFS is not generally minimal, with regard to average waiting time and
this figure varies depending on the process burst times. The FCFS algorithm can also have other
undesirable effects. A CPU bound job may make the I/O bound (once they have finished the I/O) wait
for the processor. At this point the I/O devices are sitting idle. When the CPU bound job finally does
some I/O, the mainly I/O processes use the CPU quickly and now the CPU sits idle waiting for the
mainly CPU bound job to complete its I/O. Although this is a simplistic example, you can appreciate
that FCFS can lead to I/O devices and the CPU both being idle for long periods.
If we schedule the processes in the order they arrive then the average wait time is 19.5 (78/4). If we run
the processes using the burst time as a priority then the wait times will be 0, 4, 11 and 23; giving an
average wait time of 9.50. In fact, the SJF algorithm is provably optimal with regard to the average
waiting time. And, intuitively, this is the case as shorter jobs add less to the average time, thus giving a
shorter average. The problem is we do not know the burst time of a process before it starts. For some
systems (notably batch systems) we can make fairly accurate estimates but for interactive processes it
is not so easy. One approach is to try and estimate the length of the next CPU burst, based on the
processes previous activity. To do this we can use the following formula
Where
a, 0 <= a <= 1
Tn, stores the past history
tn, contains the most recent information
What this formula allows us to do is weight both the history of the burst times and the most recent burst
time. The weight is controlled by a. If a = 0 then Tn+1 = Tn and recent history (the most recent burst
time) has no effect. If a = 1 then the history has no effect and the guess is equal to the most recent burst
time.
A value of 0.5 for a is often used so that equal weight is given to recent and past history.
This formula has been reproduced on a spreadsheet (which I hope to make available on the WWW site
associated with this course) so that you can experiment with the various values.
F:\Lecture04\lecture04-Feb07.rtf
Tony Cook- 07/02/2003 - Page 5 of 6
Operating Systems
Priority Scheduling
Shortest Job First is just a special case of priority scheduling. Of course, we can use a number of
different measures as priority. Another example of setting priorities based on the resources they have
previously used is as follows.
Assume processes are allowed 100ms before the scheduler preempts it. If a process only used, say 2ms,
then it is likely to be a job that is I/O bound and it is in our interest to allow this job to run as soon as it
has completed I/O – in the hope that it will go away and do some more I/O; thus making effective use
of the processor as well as the I/O devices. If a job used all its 100ms we might want to give this job a
lower priority, in the belief that we can get smaller jobs completed first before we allow the longer jobs
to run.
One method of calculating priorities based on this reasoning is to use the formula: 1 / (n / p)
Another way of assigning priorities is to set them externally. During the day interactive jobs may be
given a high priority and batch jobs are only allowed to run when there are no interactive jobs. Another
alternative is to allow users who pay more for their computer time to be given higher priority for their
jobs.
One of the problems with priority scheduling is that some processes may never run. There may always
be higher priority jobs that get assigned the CPU. This is known as indefinite blocking or starvation.
One solution to this problem is called aging. This means that the priority of jobs are gradually
increased until even the lowest priority jobs will become the highest priority job in the system. This
could be done, for example, by increasing the priority of a job after it has been in the system for a
certain length of time.
References
• Bik L.F. and Shaw A.C. 2003. Operating System Principles (1st Ed). Prentice Hall
• Dijkstra E. W. 1965. Co-operating Sequential Processes. Programming Languages, Genuys, F.
(ed), London :Academic Press
• Silberachatz A., et al. 2003. Operating System Concepts (6th Ed). Addison-Wesley Publishing
Company
• Tanenbaum, A., S. 2001. Modern Operating Systems. Prentice Hall.
F:\Lecture04\lecture04-Feb07.rtf
Tony Cook- 07/02/2003 - Page 6 of 6
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 13/5/2005
G53OPS – Revision Notes: Lecture 5
adapted from the notes by Tony Cook
There are two typical processes in a system, interactive jobs which tend to be shorter and batch jobs
which tend to be longer. We can set up different queues to cater for different process types. Each
queue may have its own scheduling algorithm – the background queue will typically use the FCFS
algorithm while the interactive queue may use the RR algorithm. The scheduler has to decide which
queue to run. Either higher priority queues can be processed until they are empty before the lower
priority queues are executed or each queue can be given a certain amount of the CPU. There could
be other queues in addition to the two mentioned, such as a high priority system queue.
Multilevel Queue Scheduling assigns a process to a queue and it remains in that queue. It may be
advantageous to move processes between queues (multilevel feedback queue scheduling). If we
consider processes with different CPU burst characteristics, a process which uses too much of the
CPU will be moved to a lower priority queue. We would leave I/O bound and (fast) interactive
processes in the higher priority queues.
Example
Assume three queues (Q0, Q1 and Q2)
Scheduler executes Q0 and only considers Q1 and Q2 when Q0 is empty
A Q1 process is preempted if a Q0 process arrives
New jobs are placed in Q0
Q0 runs with a quantum of 8ms
If a process is preempted it is placed at the end of the Q1 queue
Q1 has a time quantum of 16ms associated with it
Any processes preempted in Q1 are moved to Q2, which is FCFS
Observations:
Any jobs that require less than 8ms of the CPU are serviced very quickly
Any processes that require between 8ms and 24ms are also serviced fairly quickly
Any jobs that need more than 24ms are executed with any spare CPU capacity once Q0 and Q1
processes have been serviced
Up to now, we have assumed that the processes are all available in memory so that the context
switching is fast. However, if the computer is low on memory then some processes may be swapped
Page 1
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 13/5/2005
G53OPS – Revision Notes: Lecture 5
adapted from the notes by Tony Cook
out to disk. Context switching takes longer in this case so it is sensible to schedule only those
processes in memory. This is the responsibility of a top level scheduler.
A second scheduler is invoked periodically to remove processes from memory to disk and vice versa.
Parameters to decide which processes to move include
How long has it been since a process has been swapped in or out?
How much CPU time has the process recently had?
How big is the process (on the basis that small ones do not get in the way)?
What is the priority of the process?
Linux caters for real time scheduling and non real time scheduling with three queue scheduling
systems:
FIFO for real time threads
Round Robin for real time threads
Other for non real time threads
FIFO will not interrupt any other FIFO queued process except i) if a higher priority FIFO thread is
ready, ii) the current FIFO thread gets blocked waiting e.g. for I/O or iii) the current FIFO thread
voluntarily yields its CPU. If the executing FIFO thread is interrupted it gets put in a queue
associated with it’s priority. If a FIFO thread becomes ready and it has a higher priority than the
current executing FIFO thread, the executing thread is kicked out in favour of the higher priority FIFO
thread. If there are more than one candidates of the same priority waiting to kick out a lower priority
thread then the one that has waited the longest is chosen.
The Round Robin system is similar to FIFO except that a quantum is involved now. When a thread
gets kicked out of the CPU due to using up its quantum, it is put to the back of the queue and
another real time process of greater than or equal to the former’s priority is selected for execution.
To cope with the increasing number of processes and processors, the 2.6 kernel developed the O(1)
scheduler for non-real time processes. It is based on the premise that the time to select and assign
a process to the CPU is a constant, i.e. it is independent of processes or CPUs. The kernel
maintains two scheduling data structures for each CPU and separate queues for each process
priority level. All non real time tasks are assigned a priority of 100 to 139 with 120 being the default.
Typical quantum values are 10 to 200ms. I/O bound tasks are given higher priority which get given
larger quantum values.
Page 2
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 13/5/2005
G53OPS – Revision Notes: Lecture 5
adapted from the notes by Tony Cook
UNIX SVR4 scheduling involves a preemptive static priority scheduler with 160 priority levels divided
into three classes:
Real time processes given highest preference (priority 159-100)
Kernel mode processes given medium preference (99-60)
User mode (time shared) processes given the lowest preference (59-0)
With real time processes, preemption points can be used. These are in between processing steps
where the kernel must not be interrupted. These safe points are where the kernel data structures are
updated/consistent or safely locked with a semaphore. In UNIX SVR4, a dispatch queue is available
with each priority level. Processes in a priority level are executed using Round Robin Scheduling but
real time processes are given very high priorities. In the time-share queues process priority is
variable – it is lowered each time quantum used up and raised if the process blocks an event or
resource. The typical quantums range from 100ms for the priority 0 queue to 10ms for the priority 59
queue.
Windows is designed for single use/interactive environment or as a server. There are two bands of
priorities, real time and variable. The are sixteen levels of real time priorities in the ranges 31-16
which stay fixed. A round robin scheduling system is used. These have priority over all variable
band processes.
There are also sixteen levels of variable priorities (15-0). Processes may jump up or down in priority,
but never exceed 15. FIFO is used. Thread base priority can be +/-2 levels above or below the
process level but interrupted threads have their priorities changed – they are lowered if quantum
used up – i.e. processor bound threads or raised if interrupted by an I/O event – i.e. I/O bound
threads, so interactive threads tend to have higher priorities.
On an n processor system the n-1 highest priority threads are always executed – one per processor.
The lower priority threads are run on the remaining single processor.
Page 3
Operating Systems
D:\lecture06-Feb14.rtf
Tony Cook- 17/02/2003 - Page 1 of 7
Operating Systems
P1 : 0 + 23 = 23
P2 : 8 + 16 + 6 = 30
P3 : 16
P4 : 18
P5 : 23 + 9 = 32
Therefore, the average waiting time for processes to start is ((23 + 30 + 16 + 18 + 32) / 5) = 23.80
milliseconds
D:\lecture06-Feb14.rtf
Tony Cook- 17/02/2003 - Page 2 of 7
Operating Systems
Threads
This section is based on (Tanenbaum, 1992 p507-523; 2001 p 81-100). So far only considered
processes. In this section we take a brief look at threads, which are sometimes called lightweight
processes. One definition of a process is that it has an address space and a single thread of execution or
as Tanenbaum (2001) says about process models, is that they are: “based on two independent concepts:
resource grouping and execution”. Sometimes it would be beneficial if two (or more) processes could
share the same address space (i.e. same variables and registers etc) and run parts of the process in
parallel. This is what threads do.
Firstly, let us consider why we might need to use threads. Assume we have a server application
running. Its purpose is to accept messages and then act upon those messages. Consider the situation
where the server receives a message and, in processing that message, it has to issue an I/O request.
Whilst waiting for the I/O request to be satisfied it goes to a blocked state. If new messages are
received, whilst the process is blocked, they cannot be processed until the process has finished
processing the last request.
One way we could achieve our objective is to have two processes running. One process deals with
incoming messages and another process deals with the requests that are raised. However, this approach
gives us two problems
1. We still have the problem, in that either of the processes could become blocked (although there are
way around this by issuing child processes)
2. The two processes will have to up date shared variables. This is far easier if they share the same
address space.
The answer to these type of problems is to use threads. Threads are like mini-processes that operate
within a single process. Each thread has its own program counter and stack so that it knows where it is.
Apart from this they can be considered the same as processes, with the exception that they share the
same address space. This means that all threads from the same process have access to the same global
variables and the same files.
These tables show you the various items that a process has, compared to the items that each thread has.
D:\lecture06-Feb14.rtf
Tony Cook- 17/02/2003 - Page 3 of 7
Operating Systems
• Stack • Global Variables
• Register Set • Open Files
• Child Threads • Child Processes
• State • Timers
• Signals
• Semaphores
• Accounting Information
If you have a multi-processor machine, then different threads from the same process can run in parallel.
Thread states are like process states in that they can be:running, blocked, ready or terminated
Each thread in a process has it’s own a stack - this contains one frame for each procedure called (but
not returned yet). A frame contains the procedure’s local variables & return address
D:\lecture06-Feb14.rtf
Tony Cook- 17/02/2003 - Page 4 of 7
Operating Systems
2) if a thread causes a page fault (we will learn about later) then the kernel (thread free in user space
thread implementation) will block the process until I/O is complete - so other potentially runable
processes will be unable to run!
3) a rogue thread may block other therads from running in a process i.e. no clock interupts or RR turn
taking for threads within a process
4) for CPU bound processes, not much advantage in thread usage
Kernel implementation
1) a kernel has table of all threads on system - there are no thread tables in processes now
2) a thread table is a subset of the process table in the kernel
3) to create or destroy a thread a kernel (not library) call is made
4) when a thread blocks then the kernel can choose to run a thread from the same process or a thread
from another process
5) it is costly to create and destroy threads in the kernel, therefore thread re-cycling is used by some
systems i.e. re-using data structures
6) kernels are better at coping with page faults i.e. just switches in any other runnable threads
7) the disadvantage of implementing threads in the Kernel system call is that to create/destroy threads
is a substantial exercise, so if there is a lot of thready creations or suicides, then this is not a happy
situation.
What is a deadlock?
Tanenbaum (2001, p163) describes a deadlock as being: “a set of procedures is deadlocked if each
process in the set is waiting for an event that only another process in the set can cause” for example:
Typically deadlocks occur across a network or with interfaced and shared machines and devices. They
can occur in hardware, or software, or both. Now the typical sequence of events to acquire a device /
software / data are...
1) Request resource (if not available must wait - solution blocking or error code issued)
2) Use resourse
3) Relsease resource
D:\lecture06-Feb14.rtf
Tony Cook- 17/02/2003 - Page 5 of 7
Operating Systems
of those things, computers do this from time to time; solution just reboot”. Wait and see - if sytem table
is full when a fork fails - wait a random timer interval and try again
2) Deadlock detection and recovery - wait for a deadlock to occur - detect it - do something about it
3) Deadlock avoidance - make a graph of resources and specify safe and unsafe regions of graph i.e.
Banker’s algorithms for single and multiple resources
4) Deadlock prevention - Undo one of Coffman’s conditions i.e. 1) Mutual exclusion, 2) Hold and wait,
3) No predemption, 4) Circular wait
Circles are processes, squares are resources, so initially A is holding resource R, but at the same time A
is requesting resource S. For each node N in graph perform 5 steps...
1) Recovery through “Preemption” - take a resource away temporarily from it’s owner and give it to
another - perhaps though manual interaction
2) Recovery through “Rollback” - checkpoint processes preiodically to record states, memory images
etc. - on detection of deadlock, just roll back and restart and allocate resources differently. The
disadvantage of this is we lose everything that has happened since the roll back unless it can be
recomputed
3) Recovery by Killing Processes - this is crude but effective i.e. kill a process in the cycle or even
kill a process not in cycle so as to free up it’s processes
D:\lecture06-Feb14.rtf
Tony Cook- 17/02/2003 - Page 6 of 8
Operating Systems
Deadlock Avoidance - Resource Trajectories
Consider two processes (A, B) and two resources: printer and plotter. The horizontal axis is a potential
sequence of instructions executed by process A. The vertical axis is potential sequence of instructions
executed by process B. Process A needs a printer from I1 to I3, and the plotter from I2 to I4. Process B
needs the plotter from I5 to I7, and the printer from I6 to I8. If the system enters the box bounded by
I1,I3 & I6,I8 it will eventually deadlock at the double hatched section when a plotter is also requested.
So actually the system must pre-emptively decide resouces at point t
References
• Tanenbaum, A., S. 1992/2001. Modern Operating Systems. Prentice Hall.
D:\lecture06-Feb14.rtf
Tony Cook- 17/02/2003 - Page 7 of 7
Operating Systems
Deadlocks (II)
Deadlock Avoidance - Resource Trajectories
Consider two processes (A, B) and two resources (printer and plotter). The horizontal axis is the
sequence of instructions that could be potentially executed by process A. The vertical axis is potential
sequence of instructions that could be executed by process B. So take for example:
1) process A needs a printer from I1 to I3, and the plotter from I2 to I4
2) process B needs the plotter from I5 to I7, and the printer from I6 to I8
So if the system enters the box bounded by I1,I3 & I6,I8 it will eventually deadlock at the double
hatched section when a plotter is also requested. So therefore the system must pre-emptive in deciding
resources at point t.
D:\lecture07-Feb17.rtf
Tony Cook- 17/02/2003 - Page 1 of 7
Operating Systems
Safe States
Take 3 processes A, B, C - we can consider how many resources they actually have, and the maximum
that they may need. The maximum number of resources available is 10 in this example. Now
consider...
Unsafe States
Unsafe States are not strictly deadlocks, but are to be avoided! Take 3 processes A, B, C - we can
consider how many resources they actually have, and the maximum that they may need. The maximum
number of resources available in this example is 10...
D:\lecture07-Feb17.rtf
Tony Cook- 17/02/2003 - Page 2 of 7
Operating Systems
In this handout we consider some ways in which these functions are achieved.
Monoprogramming
If we only allow a single process in memory at a time we can make life simple for ourselves. That is,
the processor does not permit multi-programming. Using this model we do not have to worry about
swapping processes out to disc when we run out of memory. Nor do we have to worry about keeping
processes separate in memory. All we have to do is load a process into memory, execute it and then
unload it before loading the next process.
But, even if a monoprogramming model did not have these memory problems we would still be faced
with other problems. In this day and age monoprograming is unacceptable as multi-programming is not
only expected by the users of a computer but it also allows us to make more effective use of the CPU.
For example, we can allow a process to use the CPU whilst another process carries out I/O or we can
allow two people to run interactive jobs and both receive reasonable response times.
We could allow only a single process in memory at one instance in time and still allow multi-
programming. This means that a process, when in a running state, is loaded in memory. When a context
switch occurs the process is copied from memory to disc and then another process is loaded into
memory. This method allows us to have a relatively simple memory module in the operating system
but still allows multi-programming.
The drawback with the method is the amount of time a context switch takes. If we assume that a
quantum is 100ms and a context switch takes 200ms then the CPU spends a disproportionate amount of
time switching processes. We could increase the amount of time for a quantum but then the interactive
users will receive poor response times as processes will have to wait longer to run.
Modelling Multiprogramming
We assume (and have stated) that multiprogramming can improve the utilisation of the CPU, and
intuitively, this is the case. If we have five processes that use the processor twenty percent of the time
D:\lecture07-Feb17.rtf
Tony Cook- 17/02/2003 - Page 3 of 7
Operating Systems
(spending eighty percent doing I/O) then we should be able to achieve one hundred percent CPU
utilisation. Of course, in reality, this will not happen as there may be times when all five processes are
waiting for I/O. However, it seems reasonable that we will achieve better than twenty percent
utilisation that we would achieve with monoprogramming. But, can we model this?
We can build a model from a probabilistic viewpoint. Assume that that a process spends p percent of its
time waiting for I/O. With n processes in memory the probability that all n processes are waiting for
I/O (meaning the CPU is idle) is pn. The CPU utilisation is then given by....
CPU Utlisation = 1 - pn
The following graph shows this formula being used (the spreadsheet that produced this graph is
available from the web site for this course).
CPU Utilisation
1.2
0.8
(%)
0.4
0.2
0
0 1 2 3 4 5 6 7 8 9 10
Degree of Multiprogramming
You can see that with an I/O wait time of 20%, almost 100% CPU utilisation can be achieved with four
processes. If the I/O wait time is 90% then with ten processes, we only achieve just above 60%
utilisation.
The important point is that, as we introduce more processes the CPU utilisation rises.
The model is a little contrived as it assumes that all the processes are independent in that processes
could be running at the same time. This (on a single processor machine) is obviously not possible.
More complex models could be built using queuing theory but we can still use this simplistic model to
make approximate predictions.
Assume a computer with one megabyte of memory. The operating system takes up 200K, leaving room
for four 200K processes. If we have an I/O wait time of 80% then we will achieve just under 60% CPU
utilisation. If we add another megabyte, it allows us to run another five processes (nine in all). We can
now achieve about 86% CPU utilisation. You might now consider adding another megabyte of
memory, allowing fourteen processes to run. If we extend the above graph, we will find that the CPU
utilisation will increase to about 96%. Adding the second megabyte allowed us to go from 59% to 86%.
The third megabyte only took us from 86% to 96% (14 processes + 1 O.S.). It is a commercial decision
if the expense of the third megabyte is worth it.
D:\lecture07-Feb17.rtf
Tony Cook- 17/02/2003 - Page 4 of 7
Operating Systems
The diagram above shows how this scheme might work. The memory is divided into four partitions
(we’ll ignore the operating system). When a job arrives it is placed in the input queue for the smallest
partition that will accommodate it. There are a few drawbacks to this scheme.
1. As the partition sizes are fixed, any space not used by a particular job is lost.
2. It may not be easy to state how big a partition a particular job needs.
3. It is possible that a job is placed in a queue may be prevented from running by other jobs waiting
(and using) that partition.
To cater for the last problem we could have a single input queue where all jobs are held. When a
partition becomes free we search the queue looking for the first job that fits into the partition. An
alternative search strategy is to search the entire input queue looking for the largest job that fits into the
partition. This has the advantage that we do not waste a large partition on a small job but has the
disadvantage that smaller jobs are discriminated against. Smaller jobs are typically interactive jobs
which we normally want to service first. To ensure small jobs do get run we could have at least one
small partition or ensure that small jobs only get skipped a certain number of times. Using fixed
partitions is easy to understand and implement, although there are a number of drawbacks which we
have outlined above.
In order to cater for relocation we could make the loader modify all the relevant addresses as the binary
file is loaded. The OS/360 worked in this way but the scheme suffers from the following problems
• The program cannot be moved, after it has been loaded without going through the same process.
D:\lecture07-Feb17.rtf
Tony Cook- 17/02/2003 - Page 5 of 7
Operating Systems
• Using this scheme does not help the protection problem as the program can still generate illegal
addresses (maybe by using absolute addressing).
• The program needs to have some sort of map that tells the loader which addresses need to be
modified.
A solution, which solves both the relocation and protection problem is to equip the machine with two
registers called the base and limit registers. The base register stores the start address of the partition and
the limit register holds the length of the partition. Any address that is generated by the program has the
base register added to it. In addition, all addresses are checked to ensure they are within the range of
the partition. An additional benefit of this scheme is that if a program is moved within memory, only its
base register needs to be amended. This is obviously a lot quicker than having to modify every address
reference within the program. The IBM PC uses a scheme similar to this, although it does not have a
limit register.
Swapping
This section is based on (Tanenbaum, 1992, p81-88; 2001,p196-199). Using fixed partitions is a simple
method but it becomes ineffective when we have more processes than we can fit into memory at one
time. For example, in a timesharing situation where many people want to access the computer, more
processes will need to be run than can be fitted into memory at the same time. The answer is to hold
some of the processes on disc and swap processes between disc and main memory as necessary. In this
section we look at how we can manage this swapping procedure.
These make for a much more effective memory management system but it makes the process of
maintaining the memory much more difficult. For example, as memory is allocated and deallocated
holes will appear in the memory; it will become fragmented. Eventually, there will be holes that are too
small to have a process allocated to it. We could simply shuffle all the memory being used downwards
(called memory compaction), thus closing up all the holes. But this could take a long time and, for this
reason it is not usually done e.g. 256Mb might take ~3 sec
Another problem is if processes are allowed to grow in size once they are running. That is, they are
allowed to dynamically request more memory (e.g. the new statement in C++). What happens if a
process requests extra memory such that increasing its partition size is impossible without it having to
overwrite another partitions memory? We obviously cannot do that so do we wait until memory is
available so that the process is able to grow into it, do we terminate the process or do we move the
process to a hole in memory that is large enough to accommodate the growing process? The only
realistic option is the last one; although it is obviously wasteful to have to copy a process from one part
of memory to another - maybe by swapping it out first.
D:\lecture07-Feb17.rtf
Tony Cook- 17/02/2003 - Page 6 of 7
Operating Systems
None of the solutions are ideal so it would seem a good idea to allocate more memory than is initially
required. This means that a process has somewhere to grow before it runs out of memory (see left hand
figure below). Most processes will be able to have two growing data segments, data created on the
stack and data created on the heap. Instead of having the two data segments grow upwards in memory a
neat arrangement has one data area growing downwards and the other data segment growing upwards.
This means that a data area is not restricted to just its own space and if a process creates more memory
in the heap then it is able to use space that may have been allocated to the stack (see the right hand
figure below). On Friday we are going to look at three ways in which the operating system can keep
track of the memory usage. That is, which memory is free and which memory is being used.
References
• Tanenbaum, A., S. 1992. Modern Operating Systems. Prentice Hall.
D:\lecture07-Feb17.rtf
Tony Cook- 17/02/2003 - Page 7 of 7
Operating Systems
Swapping
This section is based on (Tanenbaum, 1992, p81-88; 2001,p196-199). Using fixed partitions is a simple
method but it becomes ineffective when we have more processes than we can fit into memory at one
time. For example, in a timesharing situation where many people want to access the computer, more
processes will need to be run than can be fitted into memory at the same time. The answer is to hold
some of the processes on disc and swap processes between disc and main memory as necessary. In this
section we look at how we can manage this swapping procedure.
These make for a much more effective memory management system but it makes the process of
maintaining the memory much more difficult. For example, as memory is allocated and deallocated
holes will appear in the memory; it will become fragmented. Eventually, there will be holes that are too
small to have a process allocated to it. We could simply shuffle all the memory being used downwards
(called memory compaction), thus closing up all the holes. But this could take a long time and, for this
reason it is not usually done frequently because e.g. 256Mb might take ~3 sec
Another problem is if processes are allowed to grow in size once they are running. That is, they are
allowed to dynamically request more memory (e.g. the new statement in C++). What happens if a
process requests extra memory such that increasing its partition size is impossible without it having to
overwrite another partitions memory? We obviously cannot do that so do we wait until memory is
available so that the process is able to grow into it, do we terminate the process or do we move the
process to a hole in memory that is large enough to accommodate the growing process? The only
realistic option is the last one; although it is obviously wasteful to have to copy a process from one part
of memory to another - maybe by swapping it out first.
F:\lecture08-Feb21.rtf
Tony Cook- 21/02/2003 - Page 1 of 5
Operating Systems
None of the solutions are ideal so it would seem a good idea to allocate more memory than is initially
required. This means that a process has somewhere to grow before it runs out of memory (see left hand
figure below). Most processes will be able to have two growing data segments, data created on the
stack and data created on the heap. Instead of having the two data segments grow upwards in memory a
neat arrangement has one data area growing downwards and the other data segment growing upwards.
This means that a data area is not restricted to just its own space and if a process creates more memory
in the heap then it is able to use space that may have been allocated to the stack (see the right hand
figure below).
The main decision with this scheme is the size of the allocation unit. The smaller the allocation unit, the
larger the bit map has to be. But, if we choose a larger allocation unit, we could waste memory as we
may not use all the space allocated in each allocation unit. The other problem with a bit map memory
scheme is when we need to allocate memory to a process. Assume the allocation size is 4 bytes. If a
process requests 256 bytes of memory, we must search the bit map for 64 consecutive zeroes. This is a
slow operation and for this reason bit maps are not often used.
F:\lecture08-Feb21.rtf
Tony Cook- 21/02/2003 - Page 2 of 5
Operating Systems
Memory Usage with Linked Lists
Free and allocated memory can be represented as a linked list. The memory shown above as a bit map
can be represented as a linked list as follows.
P 0 1 H 1 3 P 4 3 H 7 1 P 8 1
In the list above, processes follow holes and vice versa (with the exception of the start and the end of
the list). Also see Fig 4-7 on p199 of Tanenbaum (2001). But, it does not have to be this way. It is
possible that two processes can be next to each other and we need to keep them as separate elements in
the list so that if one process ends we only return the memory for that process. Consecutive holes, on
the other hand, can always be merged into a single list entry. This leads to the following observations
when a process terminates and returns its memory.
A terminating process can have four combinations of neighbours (we’ll ignore the start and the end of
the list to simplify the discussion). If X is the terminating process then the four combinations are...
• In the first option we simply have to replace the P by an H, other than that the list remains the
same.
• In the second option we merge two list entries into one and make the list one entry shorter.
• Option three is effectively the same as option 2.
• For the last option we merge three entries into one and the list becomes two entries shorter.
In order to implement this scheme it is normally better to have a doubly linked list so that we have
access to the previous entry. When we need to allocate memory, storing the list in segment address
order allows us to implement various strategies.
First Fit : This algorithm searches along the list looking for the first segment that is large
enough to accommodate the process. The segment is then split into a hole and a
process. This method is fast as the first available hole that is large enough to
accommodate the process is used.
Best Fit : Best fit searches the entire list and uses the smallest hole that is large enough to
accommodate the process. The idea is that it is better not to split up a larger hole that
might be needed later. Best fit is slower than first fit as it must search the entire list
every time. It has also be shown that best fit performs worse than first fit as it tends
to leave lots of small gaps.
Worst Fit : As best fit leaves many small, useless holes it might be a good idea to always use the
largest hole available. The idea is that splitting a large hole into two will leave a large
enough hole to be useful. It has been shown that this algorithm is not very good
either.
These three algorithms can all be speeded up if we maintain two lists; one for processes and one for
holes. This allows the allocation of memory to a process to be speeded up as we only have to search the
hole list. The downside is that list maintenance is complicated. If we allocate a hole to a process we
have to move the list entry from one list to another. However, maintaining two lists allows us to
introduce another optimisation. If we hold the hole list in size order (rather than segment address order)
F:\lecture08-Feb21.rtf
Tony Cook- 21/02/2003 - Page 3 of 5
Operating Systems
we can make the best fit algorithm stop as soon as it finds a hole that is large enough. In fact, first fit
and best fit effectively become the same algorithm.
The Quick Fit algorithm takes a different approach to those we have considered so far. Separate lists
are maintained for some of the common memory sizes that are requested. For example, we could have
a list for holes of 4K, a list for holes of size 8K etc. One list can be kept for large holes or holes which
do not fit into any of the other lists. Quick fit allows a hole of the right size to be found very quickly,
but it suffers in that there is even more list maintenance.
Quick Intro to Disk I/O to aid with the Assessed Course Work
Fig 1-8 on p25 in Tanenbaum (2001) shows a typical layout of a computer’s hard disk. This is divided
up into several disks (often writable on both surfaces) and almost (but not quite) touching each surface
is a read/write disk head e.g. a small magnetic coil - electric currents are generated in this by the
magnetic bits on the disk as they sweep past underneath. The disk heads can move in and out on an arm
controlled by a digital stepping motor. Now imagine that each disk is divided up into tracks (concentric
circles), and each track has N sectors of say 512 bytes (see Fig 5-25 on p 316 of Tanenbaum (2001)).
This is rough description of a how a computer’s hard drive i.e. disks, tracks and sectors. You will also
here the term “cylinder” a lot. This is because the disk heads can read the same track on many disks
simultaneously - a kind of parallelism. It is also common, because the circumference of a circle
changes with radius (2.Pi.R) that the number of sectors varies from the inside of a disk to the outside.
This is to avoid the magnetic storage density having to increase per sector as we move towards the
center of the disk. Anyway the good news for our course work is that we have a very simplified disk
system with NO cylinders, just one disk amd one surface, and always the same number of sectors per
track.
A company is planning to build a small, non-standard single surface disk drive storage device
with characteristics outlined below. You must build a simplified software model of the disk drive
to demonstrate example disk file writes/reads, bit error checking, and disk arm scheduling. You
may write your program in C, C++ or Java. Your code must contain comments and be well laid
out. The code must also be designed so that the three experiments detailed below can be
performed.
OK so this is non-standard, and we shall disucuss differences between the course work example and
real-world examples later. However to help you with the course work you should be reading
Tanenbaum (2001) p315-327 as we will not be covering I/O for a few weeks. By software model this
could be taken as say an array of memory where each byte represents one byte on the proposed disk
drive. The three languages have been picked because you should know at least one of them. Comments
within software are very useful for people reading your code, and when you come back to it after a
while. Marks will be lost for unintelligable, poorly commented code.
Experiment 1
Write code to create 50 imaginary files, of lengths given in the table below, on an imaginary disk
drive. Assume 512 bytes per sector, 180 sectors per track, 80 tracks, a disk head seek time
between adjacent tracks of 1 ms, and a rotational speed of 1200 revolutions per minute (or 20 per
sec). Note buffering is not allowed. Data from each sector can be read and completely transferred
in 0.1 microseconds. Also assume that the disk head starts initially
at track zero, sector zero and that that each sector is divided up as follows:
F:\lecture08-Feb21.rtf
Tony Cook- 21/02/2003 - Page 4 of 5
Operating Systems
Bytes 496-511 ECC (16 byte) check for subsequent bit errors (see experiment 2) – you may devise
your own error check scheme or implement one from a concept in a book (but do not copy code).
You may use all (or some) of the 16 bytes available for this task.
Calculate the time needed to write each file in sequence – assume first in first out and no cylinder
or track skew between tracks. Assume that the disk head starts on track 0, sector 0. For the
contents of the file (bytes 14-496) just enter the numerical value of the file name.
OK so for 50 imaginary files use the table in the course work to look up the file name, and then it’s
length to reserve the correct number of bytes in the array that you are using to represent the model of
the disk drive. You can fill the bytes for these imaginary files with whatever you like e.g. the file’s
numerical name for each byte.
We are adopting 512 bytes per sector, however as you can see from Fig 5-24 on p315 of Tanenbaum
(2001), not all of a disk sector is allocated to data! So if your write or read data, and it is > 483 bytes,
then it will end up strewn across several adjacent sectors. If it is really big, then it may be stren across
adjacent tracks as well. Once again the 483 is specific to this design and it can vary with disk drives -
so to the other allocated bytes above.
Now there are on this proposed disk drive 180 dectors per track. In reality 8 to 32 sectors per track are
commonly used - but please use 180 sectors per track in our example.
Assume it takes the disk head, on it’s digital stepping motor arm 1 ms to move past one track, 2ms to
move past two tracks, 3 ms to move past 3 tracks etc. In reality it needs time to start up and accelerate,
and time to stop, but we shall ignore this and stick with 1 ms per track. The 0.1 microsecond to read
and write the data in this example is very fast and in this time the disk has had little time to rotate out a
sector. Which sectore the disk head is looking at will become important when the disk head is moving
between tracks - this takes time and during this time the disk has rotated sectors(s) and may have to
wait for the correct sequential sector to come into view again.
The ECC is an error check to see if any bytes in a sector have been corrupted or misread - it is upto you
to design some form of error check, utilizing all or some of the allocated bytes. Do not spend too much
time on this, it is important that you show some understanding of the problem though. Errors on disks
can occur for several reasons e.g. dust particles, magnetic drop outs, electrical interference during read
or write, cosmic ray damage. So it is important to know when this has occurred and where (or at least
which sector). As will be discussed in Experiment 3, the word “Skew” is used. This is where
numerically identical sectors are offset from one another between adjacent tracks so as to allow for the
time lag and rotational offset when the disk head jumps bweteen tracks.
“First in-First” out just means that you write one file after the other in numerical order of sectors, and
then omce these are full on a given track, onto the next track, and starting at sector zero again.
Finally assume that you do not store more than one file per sector i.e. never attempt to store file 01 and
02 on the same sector. If file 01 does not completely fill it’s last sector, just leave the remainder of the
sector’s contents with zeros.
References
• Tanenbaum, A., S. 2001. Modern Operating Systems. Prentice Hall.
F:\lecture08-Feb21.rtf
Tony Cook- 21/02/2003 - Page 5 of 5
Operating Systems
The Buddy System (Knuth, 1973; Knowlton, 1965) is a memory allocation technique that works on the
basis of using binary numbers as these are faster and asier for computers to manipulate. Lists are
maintained which store lists of free memory blocks of sizes 1, 2, 4, 8,…, n, where n is the size of the
memory (in bytes). This means that for a one megabyte memory we require 21 lists. If we assume we
have one megabyte of memory and it is all unused then there will be one entry in the 1M list; and all
other lists will be empty.
Now assume that a 70K process (process A) needs to be swapped into memory. As lists are only held
as powers of two we have to allocate the next highest memory that is a power of two; in this case 128K.
The 128K list is currently empty. In fact, every list is empty except for the 1M list. Therefore, we split
the 1M block into two 512K blocks. One of the 512K blocks is then split into 256K blocks and one of
the 256K blocks is split into two 128K blocks; one of which is allocated to the 70K process with 58K
Next, a process (process B) requiring 35K might be swapped in. This will require a 64K block as it will
not fit into a 32K block. There are no entries in the 64K list so the next size list is considered (128K).
This has an entry so two buddies are created and the process is allocated to one of those blocks. If we
now request an 80K process (process C), this will have to occupy a 128K block, which will come from
the 256K list.
What happens if process A ends and releases its memory? In fact, the block of memory will simply be
added to the 128K list. If another process, D, now requests 60K of memory it will find an entry in the
64K list, so can be allocated there.
Now process B terminates and releases its memory. This will simply place its block in the 64K list. If
process D terminates we can start merging blocks. This is a fast process as we only have to check
adjacent lists and check for adjoining addresses. Finally, process C terminates and we can merge all the
way back to a single list entry in the 1M list.
The reason the buddy system is fast is because when a block size of 2k bytes is returned only the 2k list
has to be searched to see if a merge is possible. The problem with the buddy system is that it is
inefficient in terms of memory usage. All memory requests have to be rounded up to a power of two.
We saw above how an 80K process has to be allocated to a 128K memory block. The extra 40K is
wasted. This type of wastage is known as internal fragmentation. As the wasted memory is internal to
the allocated segments. This is the opposite to external fragmentation where the wasted memory
appears between allocated segments. For the interested student (Peterson, 1977) and (Kaufman, 1984)
have modified the buddy system to get around some of these problems. Linux adopts the buddy system
but with modifications to avoid internal fragmentation (Tanenbaum, 2001, p722)
Virtual Memory
Introduction
The swapping methods we have looked at above are needed so that we can allocate memory to
processes when they need it. But what happens when we do not have enough memory? In the past, a
system called overlays was used. This was a system that was the responsibility of the programmer. The
program would be split into logical sections (called overlays) and only one overlay would be loaded
into memory at a time. This meant that more programs could be running than would be the case if the
complete program had to be in memory. The downside of this approach is that the programmer had to
take responsibility for splitting the program into logical sections. This was time consuming, boring and
open to error.
It is no surprise that somebody eventually devised a method that allowed the computer to take over the
responsibility. It is (Fotheringham, 1961) who is credited with coming up with the method that is now
known as virtual memory. The idea behind virtual memory is that the computer is able to run programs
even if the amount of physical memory is not sufficient to allow the program and all its data to reside in
memory at the same time. At the most basic level we can run a 500K program on a 256K machine. But
we can also use virtual memory in a multiprogramming environment. We can run twelve programs in a
machine that could, without virtual memory, only run four.
Paging
In a computer system that does not support virtual memory, when a program generates a memory
address it is placed directly on the memory bus which causes the requested memory location to be
accessed. On a computer that supports virtual memory, the address generated by a program goes via a
memory management unit (MMU). This unit maps virtual addresses to physical addresses.
This diagram shows how virtual memory operates. The computer in this example can generate 16-bit
addresses. That is addresses between 0 and 64K (0-65536). The problem is the computer only has 32K
of physical memory so although we can write programs that can access 64K of memory, we do not
have the physical memory to support that. We obviously cannot fit 64K into the physical memory
available so we have to store some of it on disc. The virtual memory is divided into pages. The
physical memory is divided into page frames. The size of the virtual pages and the page frames are the
same size (4K in the diagram above). Therefore, we have sixteen virtual pages and eight physical
pages. Transfers between disc and memory are done in pages.
Now let us consider what happens when a program generates a request to access a memory location.
Assume a program tries to access address 8192. This address is sent to the MMU. The MMU
recognises that this address falls in virtual page 2 (assume pages start at zero). The MMU looks at its
page mapping and sees that page 2 maps to physical page 6. The MMU translates 8192 to the relevant
address in physical page 6 (this being 24576). This address is output by the MMU and the memory
board simply sees a request for address 24576. It does not know that the MMU has intervened. The
memory board simply sees a request for a particular location, which it honours.
If a virtual memory address is not on a page boundary (as in the above example) then the MMU also
has to calculate an offset (in fact, there is always an offset – in the above example it was zero).
Question 1. As an exercise, and using the diagram above, work out the physical page and physical
address that are generated by the MMU for each of the following addresses. The answers are at the end
of this handout (question 1).
So far, all we have managed to do is map sixteen virtual pages onto eight physical pages. We have not
really achieved anything yet as, in effect, we have eight virtual pages which do not map to a physical
page. In the diagram above, we represented these pages with an ‘X’. In reality, each virtual page will
have a present/absent bit which indicates if the virtual page is mapped to a physical page.
We need to look at what happens if the program tries to use an unmapped page. For example, the
program tries to access address 24576 (i.e. 24K). The MMU will notice that the page is unmapped
(using the present/absent bit) and will cause a trap to the operating system. This trap is called a page
fault. The operating system would decide to evict one of the currently mapped pages and use that for
the page that has just been referenced. The sequence of events would go like this.
• The program tries to access a memory location in a (virtual) page that is not currently mapped.
• The MMU causes a trap to the operating system. This results in a page fault.
• A little used virtual page is chosen (how this choice is made we will look at later) and its contents
are written to disc.
• The page that has just been referenced is copied (from disc) to the virtual page that has just been
freed.
• The virtual page frames are updated.
• The trapped instruction is restarted.
In the example we have just given (trying to access address 24576) the following would happen.
• The MMU would cause a trap to the operating system as the virtual page is not mapped to a
physical location.
• A virtual page that is mapped is elected for eviction (we’ll assume that virtual page 11 is
nominated).
• Virtual page 11 is mark as unmapped (i.e. the present/absent bit is changed).
• Physical page 7 is written to disc (we’ll assume for now that this needs to be done). That is the
physical page that virtual page 11 maps onto.
• Virtual page 6 is loaded to physical address 28672 (28K).
• The entry for virtual page 6 is changed so that the present/absent bit is changed. Also the ‘X’ is
replaced by a ‘7’ so that it points to the correct physical page.
• When the trapped instruction is re-executed it will now work correctly.
You might like to work through this to see how the mapping is changed. It is interesting to look at how
the MMU works. In particular, to consider why we have chosen to use a page size that is a power of 2.
Take a look at this diagram.
The incoming address (20818) consists of 16 bits. The top four bits are masked off and make an entry
into the virtual page table (in this case it provides an index to entry 5 (101 in binary) and finds that this
page is mapped to physical page 011 (3 in decimal). These three bits make up the top three bits of the
physical page address. The other part of the incoming address is copied directly to the outgoing
address.
Thus, the page table (courtesy of the MMU) has mapped virtual address 20818 to physical address
12626. If you look at the diagram that shows how virtual memory operates you will be able to follow
this conversion. The only other point we should, perhaps, consider, is why the first twelve bits of the
incoming address can be copied directly to the output address. See if you can work it out before
looking at the next sentence? Well the answer is that twelve bits can represent 4096 (i.e. 212 = 4096).
This is the size (in bytes) of our page table. Therefore, these twelve bits of the address (whether
incoming virtual or outgoing relative) represent the offset within the page. In the example we looked at
the top four bits of the virtual address represent the virtual page and the top three bits of the physical
address represent the physical page. But, in both cases, the bottom twelve bits represent the offset
within the page. Therefore, the offset address can simply be copied.
Page Tables
The way we have described how virtual addresses map to physical addresses is how it works but we
still have a couple of problems to consider.
In this course we are not going to look in detail at how these problems are overcome but the interested
student might like to look at (Tanenbaum, 1992, p93-107; 2001, p202-214) which covers it in some
This diagram (Fig 4-13 on p210 of Tanenbeaum 2001) shows typical entries in a page table (although
the exact entries are operating system dependent). The various components are described below.
Answers
Question 1
References
• Kaufman, A. 1984. Tailored-List and Recombination-Delaying Buddy Systems. ACM Trans. On
Programming Languages and Systems, Vol. 6, pp 118-125
• Knowlton, K.C. 1965. A Fast Storage Allocator. Communications of the ACM, vol 8, pp 623-625.
• Knuth, D.E. 1973. The Art of Computer Programming, Volume 1 : Fundamental Algorithms, 2nd
ed, Reading, MA, Addison-Wesley
The Not-Recently-Used (NRU) algorithm removes a page at random from the lowest numbered class that
has entries in it. Therefore, pages which have not been referenced or modified are removed in preference to
those that have not been referenced but have been modified (which is not as impossible as it sounds due to
the fact that the reference bit is periodically reset). Although, not an optimal algorithm, NRU often provides
adequate performance and is easy to understand and implement.
In the worst case, SC operates in the same way as FIFO. Take the situation where the linked list consists of
pages which all have their reference bit set. The first page, call it a, is inspected and placed at the end of the
list, after having its R bit cleared. The other pages all receive the same treatment. Eventually page a reaches
the head of the list and is evicted as its reference bit is now clear. Therefore, even when all pages in a list
have their reference bit set, the algorithm will always terminate.
Whilst this algorithm can be implemented (unlike the optimal algorithm) it is not cheap. Ideally, we need to
maintain a linked list of pages which are sorted in the order in which they have been used. To maintain
such a list is prohibitively expensive (even in hardware) as deleting and moving list elements is a time
consuming process. Sorting a list is also expensive.
However, there are ways that LRU can be implemented in hardware. One way is as follows. The hardware
is equipped with a counter (typically 64 bits). After each instruction the counter is incremented. In addition,
each page table entry has a field large enough to accommodate the counter. Every time the page is
referenced the value from the counter is copied to the page table field. When a page fault occurs the
operating system inspects all the page table entries and selects the page with the lowest counter. This is the
page that is evicted as it has not been referenced for the longest time.
Another hardware implementation of the LRU algorithm is given below. If we have n page table entries a
matrix of n x n bits , initially all zero, is maintained. When a page frame, k, is referenced then all the bits of
the k row are set to one and all the bits of the k column are set to zero. At any time the row with the lowest
binary value is the row that is the least recently used (where row number = page frame number). The next
lowest entry is the next recently used; and so on.
0123210323
it leads to the algorithm operating as follows - it might be worth working through it and calculating the
binary value of each row to see which page frame would be evicted.
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 0 0 0 1 0 0
1 1 0 1 1 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0
2 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 1 1 0 0
3 1 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0
LRU in Software
One of the main drawbacks with implementing LRU (Least Recently Used) in hardware is that if the
hardware does not provide these facilities then the operating system designers, obviously, cannot make use
of them. Instead we need to implement a similar algorithm in software. One method (called the Not
Frequently Used – NFU algorithm) associates a counter with each page. This counter is initially zero but at
each clock interrupt the operating system scans all the pages and adds the R bit for the page to its counter.
As this is either zero or one, the counter either gets incremented or it does not. When a page fault occurs the
page with the lowest counter is selected for replacement.
The main problem with NFU is that it never forgets anything. For example, if a multi-pass compiler is
running, at the end of the first pass the pages may not be needed anymore. However, as they will have high
counts they will not be replaced but pages from the second pass, which still have low counts, will be
replaced. In fact, the situation could be even worse. If the first pass made a lot of memory references but
the second pass does not make as many, then the pages from the first pass will always have higher counts
than the second pass and will therefore remain in memory.
To alleviate this problem we can make a modification to NFU (Not Frequently Used) so that it closely
simulates LRU (Least Recently Used). The modification is in two parts and implements a system of aging.
1. The counters are shifted right one bit before the R bit is added.
2. The R bit is added to the leftmost bit rather than the rightmost bit.
Look at the diagram below. This represents a page table with six entries. Working from right to left we are
showing the state of each of the pages (only the counter entries) at each of the five clock ticks. Consider the
(a) column. After clock tick zero the R flags for the six pages are set to 1, 0, 1, 0, 1 and 1. This indicates
that pages 0, 2, 4 and 5 were referenced. This results in the counters being set as shown. We assume they
all started at zero so that the shift right, in effect, did nothing and the reference bit was added to the leftmost
bit. If you look at the (b) clock tick you should be able to follow the algorithm and similarly for (c) to (e)
clicks.
R bits for pages 0-5 R bits for pages 0-5 R bits for pages 0-5 R bits for pages 0-5 R bits for pages 0-5
Clock Tick 0 Clock Tick 1 Clock Tick 2 Clock Tick 3 Clock Tick 4
101011 110010 110101 100010 011000
Page
0 10000000 11000000 11100000 11110000 01111000
1 00000000 10000000 11000000 01100000 10110000
2 10000000 01000000 00100000 00010000 10001000
3 00000000 00000000 10000000 01000000 00100000
4 10000000 11000000 01100000 10110000 01011000
5 10000000 01000000 10100000 01010000 00101000
(a) #0,2,4 (b) #0,1,4 (c) #0,1,3,5 (d) #0,4 (e) #1,2
When a page fault occurs, the counter with the lowest value is removed. It is obvious that a page that has
not been referenced for, say, four clocks ticks will have four zeroes in the leftmost positions and will have a
lower value that a page that has not been referenced for three clock ticks.
The NFU (Not Frequently Used) algorithm differs from LRU (Least Frequently Used) in two ways. Using
the matrix LRU implementation we update the matrix after each instruction. This, in effect, gives us more
detailed information. If you look at pages 3 and 5 in the above diagram, after clock tick 4, both pages have
not been referenced for the last two clock ticks. But when they were referenced we do not know how early
(or late) in that clock tick the pages were referenced. All we are able to do is evict page 3 as this has not
The reason that page faults decrease (and then stabilise) is because processes normally exhibit a locality of
reference. This means that at a particular execution phase of the process it only uses a small fraction of the
pages available to the entire process. The set of pages that is currently being used is called its working set
(Denning, 1968a; Denning 1980). If the entire working set is in memory then no page faults will occur.
Only when the process moves onto the next phase of execution (e.g. the next phase of a compiler) will page
faults begin to occur as pages not part of the existing working set are brought into memory. If the memory
of the computer is not large enough to hold the entire working set, then pages will constantly be copied out
to disc and subsequently retrieved. This drastically slows a process down as the time taken to execute an
instruction is a lot faster than disc accesses. A process which causes page faults every few instructions is
said to be thrashing (Denning 1968b).
In a system that allows many processes to run at the same time (or at least give that illusion) it is common
to move all the pages for a process to disc (i.e. swap it out). When the process is restarted we have to decide
what to do. Do we simply allow demand paging, so that as the process raises page faults, it pages are
gradually brought into memory? Or do we move all its working set into memory so that it can continue
with minimal page faults? It will come as no surprise that the second option is to be preferred. We would
like to avoid a process, every time it is restarted, raising page faults. In order to do this the paging system
has to keep track of the processes’ working set so that it can be loaded into memory before it is restarted.
The approach is called the working set model (Denning, 1970). Its aim, as we have stated, is to avoid page
faults being raised. This method is also known as prepaging.
A problem arises when we try to implement the working set model as we need to know which pages make
up the working set. One solution is to use the aging algorithm described above. Any page that contains a 1
in n high order bits is deemed to be a member of the working set. The value of n has to be experimentally
found although it has been found that the value is not that sensitive
Paging Daemons
If a page fault occurs it is better if there are plenty of free pages for the page to be copied to. If we have the
situation where every page is being used we have to find a page to evict and we may have to write the page
to disc before evicting it. Many systems have a background process called a paging daemon. This process
sleeps most of the time but runs at periodic intervals. Its task is to inspect the state of the page frames and,
if too few pages are free, it selects pages to evict using the page replacement algorithm that is being used. A
further performance improvement can be achieved by remembering which page frame a page has been
evicted from. If the page frame has not been overwritten when the evicted page is needed again then the
page frame is still valid and the data does not have to copied from disc again. In addition the paging
Segmentation
This subject maybe covered towards the end of the course when more time is available. In the mean time
you should read Tanenbaum, 1992, p128–141 or Tanenbaum 2001, p 249-262.
References
• Denning, P.J. 1968. The Working Set Model for Program Behaviour. Communications of the ACM,
Vol. 11, pp 323-333.
• Denning, P.J. 1968a. Working Sets Past and Present. IEEE Trans on Software Engineering, Vol. SE-6,
pp 64-84.
• Denning, P.J. 1968b. Thrashing: Its Causes and Prevention. Proceedings AFIPS National Computer
Conference, pp 915-922.
• Denning, P.J. 1970. Virtual Memory. Computing Surveys, Vol. 2, pp 153-189.
• Smith, A.J. 1978. Bibliography on Paging and Related Topics. Operating Systems Review, Vol. 12, pp
39-56
• Tanenbaum, A., S. 1992. Modern Operating Systems. Prentice Hall.
Belady’s Anomaly states that having more pages in memory is not necessarily the best course of
action. He demonstrated that in a particular example with FIFO there are more page faults with four
pages in memory than with three. Consider the following sequence of pages, which is assumed to
occur on a system with no pages loaded initially that uses FIFO.
0123014041234
If we have 3 frames this generates 9 page faults:
Req: 0 1 2 3 0 1 4 0 4 1 2 3 4
Table: - 0 1 2 3 0 1 4 4 4 4 2 2
- - 0 1 2 3 0 1 1 1 1 4 4
- - - 0 1 2 3 0 0 0 0 1 1
Fault? Y Y Y Y Y Y Y Y Y
Belady’s Anomaly surprised many and caused a lot of research into page modelling including stack
algorithms, the distance string, predicting page fault rates and page sizes.
Segmentation
Our virtual memory has been 1-dimensional so far. Address 0 to Max. For some problems, 2 or
more separate virtual address spaces are handy e.g. a compiler with many tables that build up during
compiling such as source text for printing/debugging, symbol table for variables, table of constants,
parse tree and a stack for procedure calls. The first four of those tables grow continuously but the
stack though may vary in size. In 1-dimensional memory, contiguous chunks of memory have to be
set aside. Suppose we have an extraordinary number of variables - the allocated space for the table
may fill up. Then what? Do we halt compilation, steal from richer space-free tables?
The solution is to use segmentation - i.e. to set aside completely independent address spaces of
virtual memory e.g. one virtual memory space for each compile table. Each segment of virtual
memory may be a different size, or even change during execution. We could even compile different
segments separately.
As we are now dealing effectively with 2 dimensional virtual memory, we now need 2 addresses to
access it, a segment number and an address within the segment. As most of virtual memory does
not exist physically in RAM, it can be made so large that there is little chance of an individual
segment filling up. A segment is a logical entity. It contains one type of object only and not mixtures,
e.g. not both a stack & symbol table. Processes can now share data and procedures more easily
e.g. a shared graphics library can be in a segment of its own.
Page 1
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
Paging vs Segmentation
Page 2
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
INPUT/OUTPUT
Management of I/O devices is one of the principal tasks for the operating system. This includes
handling message passing, interrupts and errors as well as providing a simple interface and, to some
degree, device independence. In general, users do not care about the electronics/interface,
programmers wants an interface between the hardware and user levels while electrical engineers are
more concerned with the component parts.
Controllers
It is not possible to simply connect I/O devices directly to the system bus for several reasons. There
are many different types of device, each with a different method of operation, e.g. monitors, disk
drives, keyboards. It is impracticable for a CPU to be aware of the operation of every type of device,
particularly as new devices may be designed after the CPU has been produced. The data transfer
rate of most peripherals is much slower than that of the CPU. The CPU cannot communicate directly
with such devices without slowing the whole system down. Peripherals will often use different data
word sizes and formats than the CPU.
I/O units have electronic and mechanical components separated in a modular design. The electronic
component is the “device controller” or “adapter”, e.g a printed circuit board placed into an expansion
slot. These are designed independently for many devices. There are a number of standardised
interfaces (ANSI,ISO,IEEE, etc). Some controllers can handle many devices.
Page 3
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
We need a uniform approach to I/O as seen from the user and from the operating system point of
view to handle:
Units of transfers: i.e. data blocks vs character streams
Data coding conventions, parity (to check for errors) etc
Errors states and reporting of errors
Programmed I/O
The simplest strategy for handling communication between the CPU and an I/O module is
programmed I/O. Using this strategy, the CPU is responsible for all communication with I/O modules,
by executing instructions which control the attached devices, or transfer data. For example, if the
CPU wanted to send data to a device using programmed I/O, it would first issue an instruction to the
appropriate I/O module to tell it to expect data. The CPU must then wait until the module responds
before sending the data. If the module is slower than the CPU, then the CPU may also have to wait
until the transfer is complete. This can be very inefficient. Another problem exists if the CPU must
read data from a device such as a keyboard. Every so often the CPU must issue an instruction to
the appropriate I/O module to see if any keys have been pressed. This is also extremely inefficient.
Consequently this strategy is only used in very small microprocessor controlled devices.
How does the CPU talk to the control registers and the device data buffers?
First method: I/O port
Each control register has an I/O port number (8 or 16 bit integer). Most early computers (i.e. main
frames) used this approach. Address space for memory and I/O were different.
Memory Mapped I/O
Map all device control registers into unique reserved locations at the top (usual) of the memory
Third Method: Hybrid I/O port and Memory Mapped I/O
Page 4
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
If the CPU wants to read a word, it puts the address on the bus address line and asserts a read
signal on bus control line. If the address is for memory space, the memory responds, or if the
address is for I/O space, the I/O driver responds. Note that Pentiums have three “external buses”:
memory, PCI (e.g. SCSI, USB), and ISA (e.g. modem, printer).
The situation is somewhat complicated by the fact that most computer systems will have several
peripherals connected to them. This means the computer must be able to detect which device an
interrupt comes from, and to decide which interrupt to handle if several occur simultaneously. This
decision is usually based on interrupt priority. Some devices will require response from the CPU
more quickly than others, for example, an interrupt from a disk drive must be handled more quickly
than an interrupt from a keyboard.
Many systems use multiple interrupt lines. This allows a quick way to assign priorities to different
devices, as the interrupt lines can have different priorities. However, it is likely that there will be more
devices than interrupt lines, so some other method must be used to determine which device an
interrupt comes from. Most systems use a system of vectored interrupts. When the CPU
acknowleges an interrupt, the relevant device places a word of data (a vector) on the data bus. The
vector identifies the device which requires attention, and is used by the CPU to look up the address
of the appropriate interrupt handing routine.
On receipt of an interrupt, if no other interrupts are pending then the interrupt controller processes
the interrupt immediately, otherwise it continues to assert the interrupt until the CPU can respond.
The CPU knows which devices have sent an interrupt because the controller puts a number on the
address lines. The number on the address line is treated as an index in the “interrupt vector” table to
fetch a new program counter. The program counter points to the start of the appropriate interrupt
service procedure. Traps and interrupts share the same interrupt vector. The interrupt vector can be
in hardware or in memory. The interrupt service procedure acknowledges the I/O device, after a
delay to avoid race conditions.
The interrupt information e.g. program counter, register contents etc, could be saved in internal
registers, but to avoid a second interrupt overwriting, long delays are needed prior to enabling other
interrupts. Most CPUs save interrupt information in a stack.
Page 5
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
A ‘precise interrupt’ refers to when the machine gets left in a well defined state after an interrupt:
PC saved in a known place
all preceeding instructions have been fully run
no subsequent instructions have been executed
execution state of current instruction is known
An ‘imprecise interrupt’ refers to when the above 4 conditions do not hold leading to fatal
programming errors.
The CPU and the DMA controller cannot use the system bus at the same time, so some way must be
found to share the bus between them. One of two methods is normally used.
Burst mode
The DMA controller transfers blocks of data by halting the CPU and controlling the system bus for
the duration of the transfer. The transfer will be as quick as the weakest link in the I/O
module/bus/memory chain, as data does not pass through the CPU, but the CPU must still be
halted while the transfer takes place
Cycle stealing
The DMA controller transfers data one word at a time, by using the bus during a part of an
instruction cycle when the CPU is not using it, or by pausing the CPU for a single clock cycle on
each instruction. This may slow the CPU down slightly overall, but will still be very efficient.
Page 6
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
Error Handling
Should be handled as close to the hardware as possible, e.g. read error on hard disk, drive
controller should repeat reads (or at different speeds) to see if it was a fluke
Synchronous vs Asynchronous Blocking
Most physical I/O is the latter
Buffering - where to put the data?
Sharable (floppy disk drives) and non-sharable devices (CDwriters)
Now consider a printer that works from interrupts, printing characters as and when they arrive.
Assume the printer can handle 100 characters per sec or 1 per 10ms. In a programmed I/O system,
every character written to printer data register - the CPU will wait in an idle loop for 10ms. Interrupts
solves this problem, allowing context switches and enabling the CPU to do other tasks whilst waiting.
Using DMA, we let the DMA controller feed characters to the buffer one at a time. The CPU does not
have to be involved, other than invoking the DMA. Now only one interrupt per buffer needed rather
than one per character. The only disadvantage is that DMA is usually slower than the CPU.
Page 7
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
Device Drivers
Device Controllers have registers for commands or status or both. The number of registers varies
with device and we therefore need device specific code - “device driver”. This is usually delivered by
the manufacturer. We can have a driver for closely related devices e.g. SCSI disk controller for HD
and CDROM. Current Operating Systems expect device drivers to run in the kernel. It needs a well
defined model of what a driver does and how it interacts with the rest of the operating system.
Drivers are normally positioned below the operating system with standard interfaces being defined for
block device drivers and character device drivers. Drivers must except requests from device
independent software. Device Driver typical procedures include:
check input parameter validity
check to see if device is in use
issue commands to control device through device registers
did device accept commands?
driver will either have to wait or results will come back immediately
some drivers may need to be re-enterant
Some I/O software is device independent, other I/O software is device specific. Device independent
software provides uniform interfacing for device drivers in terms of buffering, error reporting,
allocating and releasing dedicated devices and providing a device independent block size.
Buffering
unbuffered input - not efficient and might miss data
buffering in user space - what if buffer paged out when a character arrives?
buffering in the kernel followed by copy to user space, but what happens to characters arriving
when buffer being transferred?
double buffering in the kernel
Too much buffering is not too healthy because it degrades performance as the following steps must
be performed sequentially:
user system call to write to network, data copied to kernel buffer
the invoked driver copies data to network controller
data copied to network (controller independent of cycle stealing)
bits transmitted, then arrive and are placed onto kernel buffer
copied from kernel space to user space and into receiving process
Disks
larger available capacity
lower price per bit
permanent
Page 8
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
Disks have between 8 and 32 sectors each with an equal number of bytes (1K, 0.5K,0.25K) in the
middle as well as on the border. The controller can run a number of seeks (moving the disk heads to
the correct tracks and waiting for the correct sector to come around) on different disks at the same
time but cannot read/write on more than one drive at a time.
Byte 12-15
Byte 0 Byte2-7
No. of Sectors file spans Bytes 496-511
Sector Filename
(integer) Error Check
(0-179) Byte 1 “00”-”49”
16 bytes available
Track
(0-79) Bytes 16-495
Byte 8-11
Data from file
File Length
(480 byte section of file)
(in bytes)
A disk sector would typically consist of i) preamble with cylinder, sector number, etc., ii) data, iii) error
correction code. Controller produces a block of bytes and performs error correction if necessary and
then copies block into memory.
Page 9
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
Access time =
seek time time needed to move the arm to the cylinder (dominant)
+ rotational delay time before the sector appears under the head
+ transfer time time to transfer the data
Dominance of seek time leaves room for optimization. Error checking done by controllers.
Seek time (in ms) to get the arm over the track is difficult to determine: Ts = m*n + s
Ts = estimated seek time (ms)
n = number of tracks crossed
s = start time
Example
Read a file of size 256 sectors with
Ts = 20 ms
512 bytes/sector, 32 sectors/track
Suppose the file is stored as compact as possible: all sectors on 8 consecutive tracks of 32 sectors
each (sequential storage)
In case the access is not sequential but at random for the sectors, we get:
Time per sector = 20 + 8.3 + 0.5 = 28.8ms
Total time 256 sectors = 256*28.8 = 7.37s
It is important to obtain an optimal sequence for the reading of the sectors.
Page 10
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
Optimisation
Heavy loaded disk allows for a strategy to minimize the arm movement. The situation is dynamic in
that the disk driver keeps a table of requested sectors per cylinder, e.g. while a request for track 11 is
being handled, requests for 1, 36, 16, 34, 9 and 12 arrive. Which one is to be handled after the
current request? There are 4 main disk optimisation algorithms:
FCFS – “First Come First Serve”
SSF – “Shortest Seek First”
“Elevator” or “SCAN”
“Circular” or “CSCAN”
First Come First Serve (FCFS) the total number of track crossings is:
|11-1|+|1-36|+|36-16|+|16-34|+|34- 9|+|9-12| = 111
Shortest seek time first (SSTF) (similar to process scheduling shortest job first) we gain 50%:
|11-12|+|12-9|+|9-16|+|16-1|+|1-34|+|34-36| = 61
Problem: starvation, arm stays in the middle of the disk in case of heavy load, edge cylinders are
poorly served, the strategy is unfair.
Lift algorithm, Elevator or SCAN: keep moving in the same direction until no requests ahead then
change direction:
|11-12|+|12-16|+|16-34|+|34-36|+|36-9|+|9-1|=60
Upper limit: 2 * number of tracks
Smaller variance is reached by moving the arm in one direction, always returning to the lowest
number at the end of the road: Circular Scan (CSCAN):
|11-12|+|12-16|+|16-34|+|34-36|+|36-1|+|1-9|=68
Disk Formatting
One thing we have forgotten to mention is the fact if a file is
spread over consecutive tracks, it takes time to move the disk
arm over one track and the disk sectors rotate in this time, so
we introduce a sector “skew”, so the zero sector is offset from
each previous track.
Moreover, reading sectors consecutively requires a certain amount of speed from the hard disk
controller. The platters never stop spinning, and as soon as the controller is done reading all of, for
example, sector 1, it has little time before the start of sector 2 is under the head. Many older
controllers used with early hard disks did not have sufficient processing capacity to be able to do this.
They would not be ready to read the second sector of the track until after the start of the second
physical sector had already spun past the head, at which point it would be too late. If the controller is
slow in this manner, and no compensation is made in the controller, the controller must wait for
Page 11
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
almost an entire revolution of the platters before the start of sector 2 comes around and it can read it.
Hence, the notion of interleaving sectors was introduced:
7 0 7 0 5 0
6 1 3 4 2 3
54 32 62 51 74 16
Possible Errors
Programming errors (non existing sector)
User program needs debugging
Volatile checksum error (dust particle)
Controller tries again
Permanent checksum error (bad block)
Block will be marked “bad” and replaced by a spare block (this may interfere with the
optimisation algorithm)
Seek error (arm moves to the wrong sector)
Mechanical problem, perform RECALIBRATE or ask for maintenance
Controller error
Controller is a parallel system, can get confused, driver may perform a reset
Disk storage density is pushed to the limits by manufacturers, e.g. 5000 bits per mm. Defects WILL
always be present (hopefully few!). If a small defect, the ECC (Error Correction Code) can cope. If a
large defect, the ECC cannot cope, so the sector is remapped. If after writing an error is found, try
rereading sector again and again (sometimes works) – might be dust speck?
Caches
Because waiting for sectors to spin around to the correct place takes time, why not cache a track?
Moreover, if you have enough memory, why not cache more than one track?
Driver caching of tracks:
Reading one track does not take a long time, arm needs not to be moved and the driver has to
wait for the sector anyway
Disadvantage: driver has to copy the data using the CPU, while the controller may use DMA
Page 12
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
RAID 0
All data (user and system) are distributed over the disks so that there is a good chance for
parallelism. Disk is logically a set of strips (blocks, sectors, etc). Strips numbered/assigned
consecutively to disks.
Page 13
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
RAID 1
RAID 2
No longer strips, but works on words or even on a byte basis. Synchronized disks, each I/O
operation is performed in a parallel way. Each byte split into 4 bit nibbles, then add a 3 bit Hamming
code to each one to form a 7 bit word which allows for correction of a single bit error (Bits 1,2,4 for
parity). The Controller can correct without additional delay giving very high data rates. It is still
expensive as typically a very large number of disks used – only used in case many frequent errors
can be expected.
Page 14
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
RAID 3
Level 2 needs log2(number of disks) parity disks. Level 3 needs only one, for one parity bit. In case
one disk crashes, the data can still be reconstructed even on line (“reduced mode”).
RAID 2/3 have high data transfer times, but perform only one I/O at the time so that response times
in transaction oriented environments are not so good.
RAID 4
Larger strips and one parity disk. Blocks are kept on one disk, allowing for parallel access by
multiple I/O requests. Writing penalty: when a block is written, the parity disk must be adjusted.
Parity disk may be a bottleneck. Good response times, less good transfer rates.
Page 15
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
CD media
CDs have one spiral track, with 0.78 micron pits burnt with laser. The pits 1/4 wavelength deep. A
“1” is represented as a transition from pit to surface or vice versa, “0” are pit floors or plateaus. A
single track is 5.6 km long or 22,188 spirals around the disk or 500 spirals per mm! Rotation speed
is 9 revs/sec (inside) and outside speed 3 revs/sec.
CDROMS have improved error checking - each byte Hamming encoded in 14 bits with two bits left
over. The 14 to 8 mapping for reading is done in hardware by lookup tables.
42 symbols form a 588 bit frame with 24 data bytes (192 bits). 98 frames form a CDROM sector.
Each sector starts with a preamble, then a 3 byte sector number for seeking purposes on the “spiral”.
The last byte is a mode. Seek is done by approximately calculating where on the spiral to go.
Clocks
Hardware
- Type 1: 50 Hz clocks (1 interrupt (clock tic) per voltage cycle)
o Simple, cheap, not very accurate, not very functional
One shot mode: clock counts down from register value once and waits for software to start it again.
Square wave mode: counter automatically reloaded (generates clock ticks)
A 1000 MHz clock with a 16 bits register can fix time intervals between 1 nanosecond and 65,535
microseconds (0.0655 sec).
Page 16
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
Software
Administration of process time slices: each running process has “time left” counter which is
decremented at each interrupt. At zero, call scheduler.
Administration of CPU usage/accounting: counter starts when process starts and is part of the
“process environment”. It is stopped while handling an interrupt.
Watchdog timers
Profiling (program performance analysis, etc.)
A second clock is available for timer interrupts. It can cause interrupts at whatever rate a program
needs (specified by applications). No problems if interrupt frequency is low.
Soft timers avoid interrupts as the kernel checks for soft timer expiration before it exits to user mode.
Terminals
Serial RS232 terminals historically used for hardcopy printer terminals (teletypes), glass tty (glass
teletypes), mainframe intelligent terminals, etc.
Memory-mapped interfaces / Graphical User Interfaces which use a keyboard / mouse / display
etc. Bitmapped.
Network computers (X windows, SLIM networks)
Page 17
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
Character Input
Keyboard driver collects keyboard input and passes it to user programs (when they need it)
Raw mode or “Non-Canonical”: Driver passes characters unchanged to software. Buffering is
limited to speed differences and application receives characters immediately e.g. EMACS editor.
Cooked mode or “Canonical”: Driver buffers one line until it is finished and handles corrections
made by the user while typing a line.
Often applications have the choice. Nowadays, window driven applications use raw mode at the
lowest level and perform buffering at the window level.
Keyboard driver transforms the key number into an ASCII character according to a table. Echoing is
(was) done by the OS, or the shell. May be confusing for the user i.e. program may be writing to the
screen (sometimes delayed) whilst the user is still typing.
Handling of tabs, backspaces, etc. were typical problems with terminals. One problem survived, the
end-of-line character. Logically (from the typist’s viewpoint) one needs a CR to bring the cursor back
to the beginning of the line and a LF to go to the next one. These two characters are hidden behind
the ENTER key. The OS can decide how to represent end of line. In *nix, it is the line feed only, in
DOS, carriage return and line feed. LF is ASCII 10, CR is ASCII 13, which produces “^M”.
Serial (RS-232) and memory mapped approaches to terminals differ. Serial terminals have an output
buffer to which characters are sent until it is full or until a line ends. Once full, the real output is
initiated and the driver sleeps until interrupted. Memory mapped terminals can be accessed through
normal memory addressing procedures. Some characters receive a special treatment. The driver is
doing more screen manipulation. Special functions such as scrolling and animation may be done
through special registers (e.g. register with the position of the top line).
Mouse - feeds 3 byte info back up to 40x per sec containing i) change in X to within 0.1 mm, ii)
change in Y to within 0.1 mm, iii) button status.
Page 18
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
Network Terminals
One can run X-windows on top of UNIX or other OS. X is just a windowing system, but for a
complete GUI, other layers are needed on top:
On starting an X program it opens a connection to one or more X servers (workstations - but could
be on same computer). Four types of messages sent over the connections:
a) Drawing commands (software to workstation) – typically oneway and no reply needed
b) Replies from workstation to program queries
c) Events from keyboard, mouse etc
Each message 32 bytes. Byte 1 describes event, next 31 bytes are additional information
Only messages that the program needs to know about are sent.
Events are queued
d) Error messages
Page 19
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
Power Management
There are two approaches to save power. Either turn off unused processes, especially I/O or, when
permitted, degrade performance if this uses less power. Powering down/dimming the screen is one
way to save power when not used much (time since last used thresholds, dimming by levels). Could
also fade down screen sectors not in use and/or reposition windows so they occupy fewer screen
sectors.
- CPU Power Reduction: cutting voltage by two cuts clock speed by two, cuts voltage by 2 and
power by 4.
- Hard Disk Power Reduction: if not used, spin down, but many sec needed to spin up again.
Could get around this with caching if needed block in RAM?
- Memory power down
o flush cache & switch off - reload from memory when needed
o dump memory to disk and hibernate
- Telling the programs to use less energy
o may mean poorer user experience
o e.g., change from colour output to black and white, less resolution or detail in an
image, drop frames/quality in multi-media
- Thermal Issues
o overheating - switch on fan
o for a laptop reduce screen backlighting, slow CPU etc
- Batteries
o smart batteries - voltages, current, drain rate etc sent to OS
Page 20
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
FILES
File allow data to be stored between processes. It allows us to store large volumes of data and
allows more than one process to access the data at the same time.
Different operating systems have different file naming conventions. MS-DOS only allows an eight
character filename (and a three character extension). This limitation also applies to Windows 3.1.
Most UNIX systems allow file names up to 255 characters in length. Modern Windows allows up to
255 characters too, subject to a maximum of 260 characters if one includes the pathname. There
are restrictions as to the characters that can be used in filenames e.g. ? and * are forbidden. Some
operating systems distinguish between upper and lower case characters. To MS-DOS, the filename
ABC, abc, and AbC all represent the same file, UNIX sees these as different files.
Filenames are made up of two parts separated by a full stop. The part of the filename up to the full
stop is the actual filename. The part following the full stop is often called a file extension. In MS-
DOS the extension is limited to three characters. UNIX and Windows 95/NT allow longer extensions.
They are used to tell the operating system what type of data the file contains. It associates the file
with a certain application. Using tools provided with the operating system the user is able to change
the file associations. UNIX allows a file to have more than one extension associated with it.
Files are stored as a sequence of bytes. It is up to the program that accesses the file to interpret the
byte sequence.
Files Structures
Four types of files:
- Byte Sequence
- Record Sequence (fixed record)
- Record Sequence (variable record size)
- Tree Structures
Some files are not ASCII, but binary. Some of these may be executable. In UNIX, an executable file
consists of five parts, a header (comprising a special number to identify file as an executable, the
sizes of the sections outlined below and an execution address), text, data, relocation bytes and a
symbol table.
Page 21
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
Directories
Allow like files to be grouped together and allow operations to be performed on a group of files which
have something in common. For example, copy the files or set one of their attributes. They allow
files to have the same filename (as long as they are in different directories). This allows more
flexibility in naming files. A typical directory entry contains a number of entries; one per file. All the
data (filename, attributes and disc addresses) can be stored within the directory. Alternatively, just
the filename can be stored in the directory together with a pointer to a data structure which contains
the other details.
“root directory”
simple
used on early computers
letters above refer to file owners not directory names - owners could access/overwrite each
other’s filenames
A two level directory prevents users (A, B, C) from putting files into sub-directories. Therefore, a
hierarchical directory system is now used by most systems.
Disks often divided up into partitions with independent file systems on each disk e.g. :C and :D disks
on Windows. Sector 0 of a disk is the Master Boot Record (MBR) which is used at boot-up. A
partition table stores the start and end addresses of each disk partition.
Page 22
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
Contiguous allocation
Allocate N contiguous blocks to a file. If a file was 100K in size and the block was 1K then 100
contiguous blocks would be required. This is very simple to implement as keeping track of the blocks
allocated to a file is reduced to storing the first block that the file occupies and its length. The
performance of such an implementation is good as the file can be read as a contiguous file. The read
write heads have to move very little, if at all. You will never find a filing system that performs as well.
However, the operating system does not know, in advance, how much space the file can occupy.
This leads to fragmentation (no problem with CDs). One could run a defragmentation process
periodically but this is expensive.
Blocks of a file represented using linked lists. All that needs to be held is the address of the first
block that the file occupies. Each block contains data and a pointer to the next block. Using this
scheme, every block can be used, unlike a scheme that insists that every file is contiguous. No
space is lost due to external fragmentation (although there is internal fragmentation within the file,
which can lead to performance issues). The directory entry only has to store the first block number
and the rest of the file can be found from there. The size of the file does not have to be known
beforehand (unlike a contiguous file allocation scheme). When more space is required for a file, any
block can be allocated (e.g. the first block on the free block list).
However, random access is very slow (as it needs many disk reads to access a random point in the
file). Space is lost within each block due to the pointer. This does not allow the number of bytes to
be a power of two. This is not fatal, but does have an impact on performance. Reliability could be a
problem. It only needs one corrupt block pointer and the whole system might become corrupted (e.g.
writing over a block that belongs to another file).
If we use an index, this does not waste space in the block and random access is possible as index is
in memory. The main disadvantage is that the entire table must be in memory all the time. For a
large disc with, say, 500,000 1K blocks (500MB) the table will have 500,000 entries.
Using an Index
We artificially divide the disk space into blocks where a block size is deemed to be of the “order of”
the median size of files. In the computer’s memory we have an index of pointers where each
element in the index refers to a physical disk block. Say File B starts at disk block 11 and is > 1
block in size. The pointer in the index in memory (start block for file B) might point to disk block 2
next, we go to here in the index in memory, and find another pointer to 14, we go here and get
pointed to disk block 8, we go here in memory and find the next pointer is 0. The “0” pointer
indicates we have all the pointers needed. This is called File Allocation Table (FAT).
Page 23
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
I–Nodes
The third method of accessing file blocks utilizes a data structure called an i-node (index node). All
the attributes for the file are stored in an i-node entry, which is loaded into memory when the file is
opened. The i-node also contains a number of direct pointers to disc blocks. The advantage over
linked files is e.g. if “k” files are permitted to be open at once and each i-node uses “n” bytes, then
“nk” bytes will be used in memory. In other words, where a table holding linked lists is proportional to
the disk size it is representing, a table holding i-nodes is proportional to the number of files. Now, if
we have say twelve blocks indexed in an i-node, what happens if we want to address more blocks,
i.e. the file grows in size? There are three additional indirect pointers. These pointers point to further
data structures which eventually lead to a disk block address. The first of these pointers is a single
level of indirection, the next pointer is a double indirect pointer and the third pointer is a triple indirect
pointer:
Implementing Directories
The ASCII path name is used to locate the correct directory entry. The directory entry contains all
the information needed. For example, for a contiguous allocation scheme, the directory entry will
contain the first disc block. The same is true for linked list allocations. For an i-node implementation
the directory entry contains the i-node number.
Page 24
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
A directory entry maps an ASCII filename to the disc blocks. The directory entry may also contain
the attributes of the file (I-node) or maybe a pointer to a data structure.
A UNIX system directory entry just contains an i-node number and a filename. Unlike MS-DOS, all its
attributes are stored in the i-node so there is no need to hold this information in the directory entry.
All the i-nodes have a fixed location on the disc so locating an i-node is a very simple (and fast)
function.
How does UNIX locate a file when given an absolute path name? Assume the path name is
/user/gk/ops/notes. The procedure operates as follows:
- The system locates the root directory i-node. As we said above, this is easy as the entry is on
a fixed place on the disc
- Next it looks up the first path entry (user) in the root directory, to find the i-node number of the
file /user
- Now it has the i-node number for /user it can access the i-node data to locate the next i-node
number (i.e. for /gk)
- This process is repeated until the actual file has been located.
- Accessing a relative path name is identical except that the search is started from the current
working directory.
Situation prior to linking After the link is created After the original owner removes the file
The OS prevents C from deleting the file, but they still own it and it shows up in their disk usage. B
still has a link to the file but does not own it. Only when B deletes the file will the count go to zero
and the file be deleted. The second option is to use symbolic linking e.g. in UNIX
ln –s original_file symbolic_copy
If the original file is deleted (or renamed), the link from the copy remains, but cannot access the
original file. If the original file is brought back into the same filename and directory, everything is OK
again.
Page 25
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
Whatever block size we choose, then every file must occupy this amount of space as a minimum. If
we pick a very large block size, then only one file can reside on it. If block size is smaller than most
files then files are split up over several blocks and we spend more time “seek”ing. There is a
compromise between a block size, fast access and wasted space. The usual compromise is to use a
block size of 512 bytes, 1K bytes or 2K bytes.
Some of the free blocks can be used to hold disc block numbers that are free. But once you do this
they are no longer free! The blocks that contain the free block numbers are linked together so we
end up with a linked list of free blocks.
We can calculate the maximum number of blocks we need to hold a complete free list (i.e. an empty
disc) using the following reasoning: Assume that we need a 16-bit number to store a block number
(that is block numbers can be in the range 0 to 65535) and that we are using a 1K block size. A
block can hold 512 block addresses. That is, 1024*8 / 16. Assume that one of the addresses is used
as a pointer to the next block that contains list of free blocks. For a 20MB disc we need, at most, 41
blocks to hold all the free block numbers. That is, 20*1024 / 511.
An alternative option is to use a bit map to keep track of the free blocks. That is, there is a bit for
each block on the disc. If the bit is 1 then the block is free. If the bit is zero, the block is in use. To
put it another way, a disc with n blocks requires a bit map with n entries.
Consider a 20MB disc with 1K blocks, then we can calculate the number of blocks needed to hold the
disc map. A 20Mb disc has 20480 (20 * 1024) blocks. We need 20480 bits for the map, or 2560
bytes. A block can store 1024 bytes so we need 2.5 blocks to hold a complete bit map of the disc.
This would obviously be rounded up to 3.
Generally bit maps require a lesser number of blocks than a linked list. Only when the disk is nearly
full does the linked list implementation require fewer blocks.
However, when only a small part of memory set aside for tracking of free blocks if the operating
system can only allow one block to be held in memory and that the disc is nearly full, using a bit map
scheme, there is a good chance that the free block list will indicate that every block is being used
hence a disc access needed to get the next part of the bit map. With a linked list scheme, once a
block containing pointers of free blocks has been brought into memory then we will be able to
allocate 511 blocks before doing another disc access.
Page 26
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
Disk Quotas
Quotas are needed to avoid users unfairly over using limited disk space on a multi-user system.
When a user opens a file, its table has a pointer to a quota table. Every time file is changed the user
quota record is updated. Soft quota limits can be exceeded with warnings e.g. at logins, hard quotas
cannot!
Backing Up
A “physical dump” backs up block 0 on disk onwards. A “logical” or “incremental” dump backups only
changes since last backup - logical is messy to recover files from > 1 backup ago. Bit maps are used
by the logical dumping algorithm. Dumping or backups to disk are done in stages:
- all files and directories (modified or not) initially marked
- recursively walk the directory tree and unmark directories with no modified directories or files
in them or below them
- scan i-nodes in (b) in numerical order and dump modified directories
- scan i-nodes in (b) in numerical order and dump modified files
Consistency Check
Possible file system recovery states after a crash include a) consistent, b) missing block (harmless),
c) duplicate block in free list (rebuild free block list), d) duplicate data block (not a happy situation -
which block is garbled?) Utility programs to check for inconsistencies include UNIX’s fsck and
Windows’ scandisk.
Another method to improve performance is the “Block Read Ahead” method. This involves simply
reading ahead in the hope that the user or software will request the next block in sequence. It works
well for sequential files but not for random access files. Yet another way to improve file access
performance is to keep blocks related to a
file close together. In the diagram, right, in
a) i-nodes are placed at start of disk - long
seek to get to associated data blocks but in
b) the disk is divided into cylinder groups so
i-nodes have associated data blocks nearby
thus reducing seek time.
Page 27
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook
In Windows 98, long filenames are accepted. To allow backwards-compatibility, the previously
reserved 10 bytes now used to allow filenames longer than 8+3 characters. The solution is to use
two names for each file. When a file is created that breaks 8+3 rules a Win98 invents a base name
8+3 file e.g. first 6 letters (only) converted to uppercase then append “~1” to form base name, else
“~2”, etc. Long filenames are stored in directory entries preceeding MS-DOS filename.
_________________________________
Page 28