Sei sulla pagina 1di 91

Operating Systems Handouts

Introduction to Operating Systems (I)


About These Handouts
These notes are based upon (Tanenbaum, 2001, 2nd Ed.; 1992 1st Ed.) and where applicable will
point you to the appropriate sections in that book in case you want to read about the subject in a
little more detail. The main topics discussed in the handout are as follows:

Introduction ................................................................................................................................1
What is an Operating System?.....................................................................................................1
History of Operating Systems......................................................................................................3
References ..................................................................................................................................8

What is an Operating System?


Introduction
This section is based on (Tanenbaum, 2001, p1-6; 1992, p1-5)

Before we look at what an operating system does we ought to know what an operating
system is ?

If we just build a computer, using its basic physical components, then we end up with a
lot of assembled metal, plastic and silicon. In this state the computer is useless but could still go
on display as a piece of modern art! To turn it into one of the most useful tools known to man we
need software. We need applications that allow us to write letters, write software, perform
numerical modeling, calculate cash flow forecasts etc etc.

But, if all we have are just the applications, then each programmer has to deal with the
complexities of the hardware. If a program requires data from a disc, the programmer would need
to know how every type of disc worked and then be able to program at a low level in order to
extract the data. In addition, the programmer would have to deal with all the error conditions that
could arise. For example, it is a lot easier for a programmer to say READ NEXT RECORD than
have to worry about: spinning the motor up, moving the read/write heads, waiting for the correct
sector to come around and then reading the data, and eventually de-spinning the motor when it
has not been used for a while (N.B. this is a very simplified view of what really happens).

It was clear, from an early stage in the development of computers, that there needed to be
a “layer of software” that sat between the hardware and the software, to hide the user from such
complexities, and to hide the ‘breakable’ parts of the computer from human error or stupidity. An
example of stupidity, or human error, might be to send instructions to a motor to spin up and then
de-spin a disk drive several times per second. This would not be a good thing to let users
knowingly, or unknowingly, do as the motor might burn out?

C:\Tony's Things\Operating Systems\handout01.rtf


Tony Cook - 26/01/2004 - Page 1 of 1
Operating Systems Handouts

All this had led to this following “layered” view of the computer…..

Word Processor Spreadsheet Accounting Application


Programs

Compilers Editors Command


Interpreter
Systems
Operating System Programs

Machine Language

Microprogramming
Hardware

Physical Devices

1) The bottom layer of the hardware consists of the integrated circuits, the cathode ray tubes, wires and
everything else that electrical engineers use to build physical devices such as CDROM drives, keyboard
etc

2) The next layer of hardware, microprogramming, is actually software providing basic operations that
allow communication with the physical devices. This software is normally in Read Only Memory (ROM)
and hence can also be known as firmware instead because you cannot change this.

3) The machine language layer defines the instruction set that is available to the computer. Again, this
layer is really software but it is considered as hardware as the machine language will form part of the
package supplied by the hardware manufacturer.

4) The operating system is a layer between this hardware and the software that we use. It allows us
programmers to access the hardware in a user friendly way. Furthermore, it allows a layer of abstraction
from the hardware e.g. we can issue a print command without worrying about what or how the printer is
physically connected to the computer. In fact, we can even have different operating systems running on the
same hardware (e.g. DOS, Windows and UNIX) so that we may utilize the hardware using an operating
system that suits us.

5) On top of the operating system sits the system software. This is the software that allows us to start doing
practical things with the computer, but does not directly allow us to use the computer for anything that is
useful in the real world (this is a broad statement which we could argue about but, system software really
only allows us to use the computer more effectively).
It is important to realize that system software is not part of the operating system. However, much
system software is supplied by the computer manufacturer that provided the operating system. But, system
software can be written by programmers and many large companies have teams of programmers (called
“system programmers”) who’s main aim is to make the operating system easier to use for other people
(typically programmers) in the organization.
The main difference between the operating system and system software is that the operating
system runs in kernel (or supervisor) mode - system software and applications run in user mode. This
means that the operating system stops the user programs directly accessing the hardware. Hence for safety

C:\Tony's Things\Operating Systems\handout01.rtf


Tony Cook - 26/01/2004 - Page 2 of 2
Operating Systems Handouts
you cannot, for example, write your own disc interrupt handler to replace that in the operating system.
However, you can write your own (say) command shell and replace the one supplied with the computer.

6) Finally, at the top level we have the application programs that, at last, allow us to do something really
useful e.g. word processors.

Two views of an operating system


Above, we viewed the operating system in the context of how it fits in within the overall structure
of the computer. Now we are going to look at another two views of an operating system.

(1) One view considers the operating system as a resource manager. In this view the operating
system is seen as a way of providing the users of the computer with the resources they need at any given
time. Some of these resource requests may not be able to be met (memory, CPU usage etc.) but, as we shall
see later in the course, the operating system is able to deal with scheduling problems such as these.
Other resources have a layer of abstraction placed between them and the physical resource. An example, of
this is a printer. If your program wants to write to a printer, in this day and age, it is unlikely that the
program will be directly connected to a physical printer. The operating system will step in and take the
print requests and spool the data to disc. It will then schedule the prints, making the best use of the printer
possible. During all of this it will appear to the user and the program as if their prints requests are going to a
physical printer.

(2) Another view of an operating system sees it as a way of not having to deal with the complexity
of the hardware. In (Tanenbaum, 1992) the example is given of a floppy disc controller (using an NEC
PDP765 controller chip). This chip has sixteen commands which allow the programmer to read and write
data, move the disc heads, format tracks etc. Just carrying out a simple READ or WRITE command
requires thirteen parameters, which are packed into nine bytes. The operation, when complete, returns 23
status and error fields packed into seven bytes. Now if you think that is complicated, then consider
concerning ourselves with: 1) whether the floppy disc is spinning, 2) what type of recording method we
should use, 3) the fact that hard discs are just as complicated but work differently. It is all these
complexities that the operating system conveniently hides from us simple minded users. So in this view of
the machine, the operating system can be seen as an extended machine or a virtual machine.

History of Operating Systems


In this section we take a brief look at the history of operating systems (Tanenbaum, 2001, p6-18; 1992, p5-
12) – which is almost the same as looking at the history of computers. If you are particularly interested in
the history of computers you might like to read (Levy, 1994). Although the title of the book suggests
activities of an illegal nature, in fact, “hacking”, used to refer to people who had intimate knowledge of
computers and were addicted to using computers and extending their knowledge.

You are probably aware that Charles Babbage is attributed with designing the first digital computer, which
he called the Analytical Engine. It is unfortunate that he never managed to build the computer as, being of
a mechanical design, the technology of the day could not produce the components to the needed precision.
Of course, Babbage’s machine did not have an operating system, but would have been incredibly useful all
the same for it’s era for generating nautical navigation tables.

First Generation (1945-1955)


Like many developments, the first digital computer was developed due to the motivation of war. During the
second world war many people were developing automatic calculating machines. For example
• By 1941 a German engineer, Konrad Zuse, developed the “Z3” computer that helped in the design of
airplanes and missiles.

C:\Tony's Things\Operating Systems\handout01.rtf


Tony Cook - 26/01/2004 - Page 3 of 3
Operating Systems Handouts
• In 1943, the British had built a code breaking computer called Colossus which decoded German
messages (in fact, Colossus only had a limited effect on the development of computers as it was not a
general purpose computer – it could just break codes – and its existence was kept secret until long after
the war ended).
• By 1944, Howard H. Aiken, an engineer with IBM, had built an all-electronic calculator that created
ballistic charts for the US Navy. This computer contained about 500 miles of wiring and was about
half as long as a football field. Called The Harvard IBM Automatic Sequence Controlled Calculator
(Mark I, for short) it took between three and five seconds to do a calculation and was inflexible as the
sequence of calculations could not change. But it could carry out basic arithmetic as well as more
complex equations.
• ENIAC (Electronic Numerical Integrator and Computer) was developed by John Presper Eckert and
John Mauchly. It consisted of 18,000 vacuum tubes, 70,000 soldered resisters and five million soldered
joints. It consumed so much electricity (160kw) that an entire section of Philadelphia had their lights
dim whilst it was running. ENIAC was a general purpose computer that ran about 1000 faster than the
Mark I.
• In 1945 John von Neumann designed the Electronic Discrete Variable Automatic Computer (EDVAC)
that had a memory which held a program as well as data. In addition the CPU, allowed all computer
functions to be coordinated through a single source. The UNIVAC I (Universal Automatic Computer),
built by Remington Rand in 1951 was one of the first commercial computers to make use of these
advances.

These first generation computers filled entire rooms with thousands of vacuum tubes. Like the analytical
engine they did not have an operating system, they did not even have programming languages and
programmers had to physically wire the computer to carry out their intended instructions. The programmers
also had to book time on the computer as a programmer had to have dedicated use of the machine.

Second Generation (1955-1965)


Vacuum tubes proved very unreliable and a programmer, wishing to run his program, could quite easily
spend all his/her time searching for and replacing tubes that had blown. The mid fifties saw the
development of the transistor which, as well as being smaller than vacuum tubes, were much more reliable.
It now became feasible to manufacture computers that could be sold to customers willing to part with their
money. Of course, the only people who could afford computers were large organisations who needed large
air conditioned rooms in which to place them.
Now, instead of programmers booking time on the machine, the computers were under the control of
computer operators. Programs were submitted on punched cards that were placed onto a magnetic tape.
This tape was given to the operators who ran the job through the computer and delivered the output to the
expectant programmer.

As computers were so expensive methods were developed that allowed the computer to be as productive as
possible. One method of doing this (which is still in use today) is the concept of batch jobs. Instead of
submitting one job at a time, many jobs were placed onto a single tape and these were processed one after
another by the computer. The ability to do this can be seen as the first real operating system (although, as
we said above, depending on your view of an operating system, much of the complexity of the hardware
had been abstracted away by this time).

Third Generation (1965-1980)


The third generation of computers is characterised by the use of Integrated Circuits as a replacement for
transistors. This allowed computer manufacturers to build systems that users could upgrade as necessary.
IBM, at this time introduced its System/360 range and ICL introduced its 1900 range (this would later be
updated to the 2900 range, the 3900 range and the SX range, which is still in use today).

Up until this time, computers were single tasking. The third generation saw the start of multiprogramming.
That is, the computer could give the illusion of running more than one task at a time. Being able to do this

C:\Tony's Things\Operating Systems\handout01.rtf


Tony Cook - 26/01/2004 - Page 4 of 4
Operating Systems Handouts
allowed the CPU to be used much more effectively. When one job had to wait for an I/O request, another
program could use the CPU.
The concept of multiprogramming led to a need for a more complex operating system. One was now
needed that could schedule tasks and deal with all the problems that this brings (which we will be looking
at in some detail later in the course).
In implementing multiprogramming, the system was confined by the amount of physical memory that was
available (unlike today where we have the concept of virtual memory).

Another feature of third generation machines was that they implemented spooling. This allowed reading of
punch cards onto disc as soon as they were brought into the computer room. This eliminated the need to
store the jobs on tape, with all the problems this brings.
Similarly, the output from jobs could also be stored to disc, thus allowing programs that produced output to
run at the speed of the disc, and not the printer.

Although, compared to first and second generation machines, third generation machines were far superior
but they did have a downside. Up until this point programmers were used to giving their job to an operator
(in the case of second generation machines) and watching it run (often through the computer room door –
which the operator kept closed but allowed the programmers to press their nose up against the glass). The
turnaround of the jobs was fairly fast.

This changed. With the introduction of batch processing the turnaround could be hours if not days. This
problem led to the concept of time sharing. This allowed programmers to access the computer from a
terminal and work in an interactive manner.

Obviously, with the advent of multiprogramming, spooling and time sharing, operating systems had to
become a lot more complex in order to deal with all these issues.

Fourth Generation (1980-present)


The late seventies saw the development of Large Scale Integration (LSI). This led directly to the
development of the personal computer (PC). These computers were (originally) designed to be single user,
highly interactive and provide graphics capability. One of the requirements for the original PC produced by
IBM was an operating system and, in what is probably regarded as the deal of the century, Bill Gates
supplied MS-DOS on which he made his fortune. In addition, mainly on non-Intel processors, the UNIX
operating system was being used.

It is still (largely) true today that there are “mainframe” operating systems (such as VME which runs on
ICL mainframes) and “PC” operating systems (such as MS-Windows and UNIX), although the distinctions
are starting to blur. For example, you can run a version of UNIX on ICL’s mainframes and, similarly, ICL
were planning to make a version of VME that could be run on a PC.

Fifth Generation (Sometime in the future)


If you look through the descriptions of the computer generations you will notice that each have
been influenced by new hardware that was developed (vacuum tubes, transistors, integrated circuits and
LSI). The fifth generation of computers may be the first that breaks with this tradition and the advances in
software will be as important as advances in hardware. One view of what will define a fifth generation
computer is one that is able to interact with humans in a way that is natural to us. No longer will we use
mice and keyboards but we will be able to talk to computers in the same way that we communicate with
each other. In addition, we will be able to talk in any language and the computer will have the ability to
convert to any other language. Computers will also be able to reason in a way that imitates humans.

Just being able to accept (and understand!) the spoken word and carry out reasoning on that data requires
many things to come together before we have a fifth generation computer. For example, advances need to
be made in AI (Artificial Intelligence) so that the computer can mimic human reasoning. It is also likely
that computers will need to be more powerful. Maybe parallel processing will be required. Maybe a

C:\Tony's Things\Operating Systems\handout01.rtf


Tony Cook - 26/01/2004 - Page 5 of 5
Operating Systems Handouts
computer based on a non-silicon substance may be needed to fulfill that requirement (as silicon has a
theoretical limit as to how fast it can go). This is one view of what will make a fifth generation computer.
At the moment, as we do not have any, it is difficult to provide a reliable definition.

Another View
The view of how computers have developed, with regard to where the generation gaps lie, is slightly
different, depending who you ask. Ask somebody else and they might agree with the slightly amended
model below. Most commentators agree on what is the first generation and the fact they are characterised
by the fact that they were developed during the war and used vacuum tubes. Similarly, most people agree
that the transistor heralded the second generation. The third generation came about because of the
development of the IC and operating systems that allowed multiprogramming. But, in the model above, we
stated that the third generation ran from 1965 to 1980. Some people would argue that the fourth generation
actually started in 1971 with the introduction of LSI, then VLSI (Very Large Scale Integration) and then
ULSI (Ultra Large Scale Integration). Really, all we are arguing about is when the PC revolution started.
Was it in the early 70’s when LSI first became available? Or was it in 1980, when the IBM PC was
launched?

Case Study
To show, via an example, how an operating system developed, we give a brief history of ICL’s mainframe
operating systems.

One of ICL’s first operating systems was known as manual exec (short for executive). It ran on its 1900
mainframe computers and provided a level of abstraction between the hardware and also allowed multi-
programming. However, it was very much a manual operating system. The operators had to load and run
each program. Commands such as these were used.

LO#RA15#REP3
GO#RA15 21

The first instruction told the computer to load the program called RA15 from a program library called
REP3. This loaded the program from disc into memory. The “GO 21” instruction told the program to start
running, using entry point 21. This (typically) told the program to read a punched card(s) from the card
reader, which held information to control the program.

The important point is that the computer operator had control over every program in the computer. It had to
be manually loaded into memory, initiated and finally deleted from the memory of the computer (which
was typically 32K). In between, any prompts had to be dealt with. This might mean allowing the computer
to use tape decks, allowing the program to print special stationery or dealing with events that were unusual.

ICL then brought out an operating system they called GEORGE (GEneral ORGanisational Environment).
The first version was called George 1 (G1). G2 and G2+ quickly followed. The idea behind G1/2/2+ was
that it ran on top of the operating system. So it was not an operating system as such (in the same way that
Windows 3.1 is not a true operating system as it is only a GUI that runs on top of DOS).
What G2+ (we’ll ignore the previous versions for now) allowed you to do was submit jobs to the machine
and then G2+ would schedule those jobs and process them accordingly. Some of the features of G2+
included:

• It allowed you to batch many programs into a single job. For example, you could run a program that
extracted data from a masterfile, run a sort and then run a print program to print the results. Under
manual exec you would need to run each program manually. It was not unusual to have a typical job
process twenty or thirty separate programs.

C:\Tony's Things\Operating Systems\handout01.rtf


Tony Cook - 26/01/2004 - Page 6 of 6
Operating Systems Handouts
• You could write parameterised macros (or JCL – Job Control Language) so that you could automate
tasks.
For example, you could capture prompts that would normally be sent to the operator and have the
macro answer those prompts.
• You could provide parameters at the time you submitted the job so that the jobs could run without user
intervention.
• You could submit many jobs at the same time so that G2+ would run them one after another.
• You could adjust the scheduling algorithm (via the operators console) so that an important job could be
run next – rather than waiting for all the jobs in the input queue to complete.
• You could inform G2+ of the requirements of each job so that it would not run (say) two jobs which
both required four tape decks when the computer only had six tape decks.

Under G2+, the operators still looked after individual jobs (albeit, they now consisted of several programs).
When ICL released George 3 (G3) and later G4, all this changed. The operators no longer looked after
individual jobs. Instead they looked after the system as a whole. Jobs could now be submitted via
interactive terminals. Whereas the operators used to submit the jobs, this role was now typically carried out
by a dedicated scheduling team who would set up the workload that had to be run over night, and would set
up dependencies between the jobs. In addition, development staff would be able to issue their own batch
jobs and also run jobs in an interactive environment. If there were any problems with any of the jobs, the
output would either go to the development staff or to the technical support staff where the problem would
be resolved and the job resubmitted.

Operators, under this type of operating system were, in some peoples opinion, little more than “tape
monkeys”, although the amount of technical knowledge held by the operators varied greatly from site to
site.

In addition to G3 being an operating system in its own right G3 also had the following features
• To use the machine you had to run the job in a user. This is a widely used concept today but was not a
requirement of G2+.
• The Job Control Language (JCL) was much more extensive than that of G2+.
• It allowed interactive sessions
• It had a concept of filestore. When you created a file you had no idea where it was stored. G3 simply
placed it in filestore. This was a vast amount of disc space used to store files. In fact the filestore was
virtual in that some of it was on tape. What files were placed on tape was controlled by G3. For
example, you could set the parameters so that files over a certain size or files that had not been used for
a certain length of time were more likely to be placed onto tape. If your job requested a file that was in
filestore but had been copied to tape the operator would be asked to load that tape. The operator had no
idea what file was being requested or who it was for (although they could find out). G3 simply asked
for a TSN (Tape Serial Number) to be loaded.
• The operators ran the system, rather than individual jobs.

After G3/G4, ICL released their VME (Virtual Machine Environment) operating system. This is still the
operating system used on ICL mainframes today. VME, as it’s name suggests, creates virtual machines that
jobs run in. If you log onto (or run a job on) VME, you will be created a virtual machine for your session.
In addition, VME is written to cater for the many different workloads that mainframes have to perform. It
supports databases (using ICL’s DBMS – Database Management System and, more recently relational
databases such as Ingress and Oracle), TP (Transaction Processing) systems as well as batch and interactive
working. The job control language which, under VME is called SCL – System Control Language is a lot
more sophisticated and you can often carry out tasks without having to use another language for operations
such as file I/O.

C:\Tony's Things\Operating Systems\handout01.rtf


Tony Cook - 26/01/2004 - Page 7 of 7
Operating Systems Handouts

There is still the concept of filestore but, due the to the (relatively) low cost of disc space and the problems
associated with having to wait for tapes, all filestore is now on disc. In addition, the amount of filestore
available to users or group of users is under the control of the operating system (and thus the technical
support teams). Like G3, the operators control the entire system and are not normally concerned with
individual jobs. In fact, there is a move towards having lights out working. This removes the need for
operators entirely and if there are any problems, VME, will telephone a pager.

References
• Levy, S. 1994. Hackers.
• Tanenbaum, A., S. 2001. Modern Operating Systems. Prentice Hall.

C:\Tony's Things\Operating Systems\handout01.rtf


Tony Cook - 26/01/2004 - Page 8 of 8
Operating Systems

Introduction (II)
Handout Introduction
These notes are largely based on (Tanenbaum, 1st Ed.1992 or 2nd Ed. 2001). Where applicable, the notes
will point you to the relevant part of that book(s) in case you want to read about the subject in a little more
detail. Actually you will find it a fairly smart thing to read chapter 1 out of Tanenbaum, and to think about
starting to read chapter 2. Reading about the same subject but using the lectures, these handouts, and the
book will hopefully clear up any misunderstandings and prepare you well for what is to come!

The main topics discussed in the handout are as follows

Review of last lecture: Introduction (I)........................................................................................................... 1


Introduction (II): Operating System Concepts continued............................................................................. 3
Operating System Structure……………………………………………………………………………3
Monolithic Operating Systems………………...………………………………………………………3
Layered Operating Systems……………………………………………………………………………3
Virtual Machines…………………...………………………………………………………………….4
Client Server Model…………………………………………………………………………………….5
Processes (I) - Introduction .......................................................................................................................... 5
Process States……………………………..…………………………………………………………………..6
Process Control Blocks (PCB)........................................................................................................................ 6
Race Conditions.............................................................................................................................................. 7
Critical Sections.............................................................................................................................................. 8
References....................................................................................................................................................... 9

Introduction to Operating Systems (II)


Summary of last lecture
What is an operating system? – 3 (perhaps 4?) views…
1) Abstraction – a system able to hide low level complex hardware, I/O routines etc from the high
level users and processes. E.g. so the high level users and processes need not concern themselves
how when using a word processor, bytes get written to disk.
2) Virtualization – the sharing of computer resources/hardware/files between users or programs.
3) Resource Management – how to run and share processes at maximum efficiently, but not
forgetting individual processes of users who might need special attention from time to time.
4) A safety net to prevent high level users or processes doing dangerous things to the hardware or
hogging CPU time

History of Computing/Operating Systems…


1) First Generation (1945-55) – Valves (Vacuum tubes) – no real operating system & programming
in the wiring!
2) Second Generation(1955-65) – Transistors – batch jobs & punch cards
3) Third Generation (1965-1980) – IC’s – multi-user operating systems – time sharing - spooling
4) Fourth Generation (1980/70’s?) onwards – large scale integration – the “PC”
5) Fifth Generation ??? – major developments to come e.g. software like on the HAL9000 computer
in the 2001 movie?

C:\Tony's Things\Operating Systems\handout02.rtf


Tony Cook - 27/01/2004 - Page 1 of 1
Operating Systems
Case Study – ICL Main Frame Operating Systems …
1) Manual-Exec on ICL 1900 mainframe computer
2) GEORGE (GEneral ORGanisational Environment). George 1 (G1), G2 and G2 ran on top of
existing Man-Exec system & could batch many programs into a single job.
3) George 3 (G3) and later G4 - operators no longer looked after individual jobs but after system as a
whole + jobs could be submitted via interactive terminals
4) G3 had a concept of a “filestore” - a vast amount of disc space used to store files. The filestore
was virtual as sometimes it could be from tape archives
5) The operators ran the system, rather than individual jobs.
6) After G3/G4, ICL released their VME (Virtual Machine Environment) operating system. Creates
virtual machines that jobs run in.

Introduction to Operating Systems (II)


Continuing with Operating Systems (this lecture)
Operating System Concepts
This section is based on (Tanenbaum, 1992, p12-18; 2001, p34-42). As you might expect we use an
operating system (in terms of using the OS within a program) via a well defined interface. That is, we make
calls to the operating system so that it will perform some task for us. These “entries” into the OS are called
system calls. Through these system calls we can manipulate various types of objects. During this course we
will be looking at some of these objects in detail (e.g. processes) but we briefly introduce them here.

Processes
One of the key tasks for an operating system is to run processes (see Tanenbaum, 2001, p34-36). For now,
we can consider a process as a running program with all the other information that is needed to control its
execution (e.g. program counter, stack, registers, file pointers etc.).
If the CPU is running one process and, for whatever reason, another process now needs to run, the
operating system must save the details of the currently running process and start the new process from
exactly the same point as it was left.
All the data for a process is normally held in a process table. This is a data structure that contains a list of
active processes which the operating system uses to decide which process to run next and to restore a
process to the state it was in before it was stopped from running.

A process may create a child process. For example, if we issue a command from the command shell (one
process), this will create another process to execute the command. This can lead to a tree like structure of
processes. When that command finishes the child process will issue a system call to destroy itself.

One of the main tasks of an operating system is to schedule all the processes which are currently competing
for the CPU.

Processes may also communicate with other processes. This might sound simple but, as we shall see, later
in the course, this leads to all sorts of complications which the operating system must handle.

Files
Another broad class of system calls relate to the file system (See Tanenbaum, p38-41). We said above that
one of the tasks of an operating system is to hide the complexities of the hardware. In order to do this
system calls must be provided to (for example) create files, delete files, move files, rename files, copy files,
open files, close files, read files, write files etc. etc.

As an example of this “abstraction”, consider that some operating systems provide you with different types
of file. You may be able to open a file in text mode or in binary mode. The way you open the file

C:\Tony's Things\Operating Systems\handout02.rtf


Tony Cook - 27/01/2004 - Page 2 of 2
Operating Systems
determines how the data is delivered to your program. In fact, the underlying mechanisms which position
the read/write head, access the data and return it to the operating system are the same no matter what type
of file you are reading. It is only when it is delivered to your program does the data get interpreted in the
way the program is expecting.
To take this one stage further, many operating systems provide a special file called standard input and
standard output. These default to data typed at the terminal and data sent to the terminal. We consider
them as files but there is obviously a lot more going on in the operating system that allows us to visualise a
terminal as a file.

There is even the concept of a process being an input file for another process, which acts as if it is writing
to a file. This is normally presented to us as a pipe. For example, in MS-DOS if you type

DIR | SORT

It pipes the output from the DIR command to the SORT command, so that the display comes out sorted (in
fact, it sorts the lines in alphabetical order – and might not be what you expect).

Similar to pipes is redirection. This allows the output from a program to be redirected to a file. For
example

DIR > dir.txt

Will redirect the output of the DIR command from the standard output (the screen) to the file called dir.txt.
(If you have never used MS-DOS (or UNIX) you might like to experiment with redirection – also try “>>”
and “<”).

The whole point of mentioning standard input, standard output, pipes and redirection is to demonstrate that
the operating system is hiding a lot of complexity from us as well as providing us with many features which
we might find useful.

Many file systems also support the concept of directory (or folder) hierarchies. That is, there is a top level
view of the file system and by creating folders you can build a tree (conceptually) which represents your
view of the data. Therefore, the file system must provide system calls to maintain these directory structures.

When discussing processes we said that it is possible to build a tree of processes, in the same way we can
build a directory hierarchy. But there are many differences between these trees.
• Process trees are normally short lived. Directory trees can last for years.
• Files (assuming access rights are granted) can be maintained by any user of the system. A process can
only be controlled by its parent.
• Directories (and files) can have access rights associated with them. Processes have no such
information.
• Directories (and files) can be accessed in a number of ways (e.g. relative or absolute pathnames).
Process tress have no such concept.

System Calls
Access to the operating system is done through system calls (See Tanenbaum, 2001, p44-48). Each system
call has a procedure associated with it so that calls to the operating system can be done in a familiar way.
When you call one of these procedures it places the parameters into registers and informs the operating
system that there is work to be done. Notifying the operating system is done via a TRAP instruction
(sometimes known as a kernel call or a supervisor call). This instruction switches the CPU from user
mode to kernel (or supervisor) mode. In user mode, for safety, certain instructions are unavailable to the
programs. In kernel mode all the CPU instructions are available. The operating system now carries out the
work, which includes validating the parameters. Eventually, the operating system will have completed the
work and will return a result in the same way as a user written function written in a high level language.

C:\Tony's Things\Operating Systems\handout02.rtf


Tony Cook - 27/01/2004 - Page 3 of 3
Operating Systems
An example of an operating system call (via a procedure) is

count = read(file, buffer, nbytes);

You can see that it is just the same as calling a user written function in a language such
as C.

The Shell
The operating system is the mechanism that carries out the system calls requested by the various parts of
the system. Tools such as compilers, editors etc. are not part of the operating. Similarly, the shell is not part
of the operating system. The shell is the part of (for example) UNIX and MS-DOS where you can type
commands to the operating system and receive a response. You may also hear the shell get called the
Command Line Interpreter (CLI) or the “C” prompt. However, it is worth mentioning the shell as it makes
heavy use of operating system features and is a good way to experiment.

We have already seen one example of a command line (DIR > dir.txt). A more complicated
command (in UNIX this time) could be: cat file1 file2 file3 | sort > /dev/lp &

This command concatenates three files and pipes them to the sort program. It then redirects the sorted file
to a line printer. The ampersand “&” at the end of the command instructs UNIX to issue the command as a
background job. This results in the command prompt being returned immediately, whilst another process
carries out the requested work. You can appreciate, by looking at the above command that there will be a
series of system calls to the operating system in order to satisfy the whole request.

Operating System Structure


This section is based on (Tanenbaum, 1992, p18-24; 2001, p56-63).

Monolithic Systems
One possible approach is to have no structure to the operating system at all i.e. it is simply a collection of
procedures. Each procedure has a well defined interface and any procedure is able to call any other
procedure. The operating system is constructed by compiling all the procedures into one huge monolithic
system. There is no concept of encapsulation, data hiding or structure amongst the procedures. However,
you find that the way the system procedures are written they naturally fall into a structure whereby some
procedures will be high level procedures and these will call on other utility procedures (see diagram). The
main procedure is called by the user programs and these call service procedures which, in turn call utility
procedures e.g.

Main
Procedure

Service
Procedures

Utility
Procedures

Layered Systems

C:\Tony's Things\Operating Systems\handout02.rtf


Tony Cook - 27/01/2004 - Page 4 of 4
Operating Systems
In 1968 E. W. Dijkstra (Holland) and his students built the “THE” operating system that was structured into
layers. It can be viewed as a generalisation of the model shown above, but this model had six layers.
(Tanenbaum 2001, p57)

Layer 0 was responsible for the multiprogramming aspects of the operating system. It decided which
process was allocated to the CPU. It dealt with interrupts and performed the context switches
when a process change was required.
Layer 1 was concerned with allocating memory to processes.
Layer 2 deals with inter-process communication and communication between the operating system and
the console.
Layer 3 managed all I/O between the devices attached to the computer. This included buffering
information from the various devices.
Layer 4 was where the user programs were stored.
Layer 5 was the overall control of the system (called the system operator)

As you move through this hierarchy (from 0 to 5) you do not need to worry about the aspects you have “left
behind.” For example, high level user programs (level 4) do not have to worry about where they are stored
in memory or if they are currently allocated to the processor or not, as these are handled in low level 0-1.

Virtual Machines
Virtual machines mean different things to different people (Tanenbaum 2001, p59). For example, if you run
an MS-DOS prompt from with Windows 95/98/NT you are running, what Microsoft call, a virtual machine.
It is given this name as the MS-DOS program is fooled into thinking that it is running on a machine that it
has sole use of.

ICL’s mainframe operating system is called VME (Virtual Machine Environment). The idea is that when
you log onto the machine a VM (Virtual Machine) is built and it looks as if you have the computer all to
yourself (in an abstract sense – nobody really expects to have an entire mainframe to themselves).

Both of these (Windows 95/98/NT and VME) are fairly recent developments but one of the first operating
systems (VM/370) was able to provide a virtual machine to each user. In addition, each user was able to run
different operating systems if they so desired. This is a major achievement, if you think about it, as
different operating systems will access the hardware in different ways (to name just one problem). The way
the system operated was that the bare hardware was “protected” by VM/370 (called a virtual machine
monitor). This provided access to the hardware when needed by the various processes running on the
computer. In addition, VM/370 created virtual machines when a user required one. But, instead of simply
providing an extension of the hardware that abstracted away the complexities of the hardware, VM/370
provided an exact copy of the hardware, which included I/O, interrupts and user/kernel mode. Any
instructions to the hardware are trapped by VM/370, which carried out the instructions on the physical
hardware and the results returned to the calling process. The diagram below shows a model of the VM/370
computer. Note that CMS is a “Conversational Monitor System” and is just one of the many operating
systems that can be run – in this case it CMS (see below) is a single user OS intended for interactive time
sharing.
Virtual 370’s

I/O Hardware CMS CMS


Instructions

TRAP VM/370

370 Bare Hardware


C:\Tony's Things\Operating Systems\handout02.rtf
Tony Cook - 27/01/2004 - Page 5 of 5
Operating Systems

Client-Server Model
One of the recent advances in computing is the idea of a client/server model (Tanenbaum, 2001, p61). A
server provides services to any client that requests it. This model is heavily used in distributed systems
where a central computer acts as a server to many other computers. The server may be something as simple
as a print server, which handles print requests from clients. Or, it could be relatively complex, and the
server could provide access to a database which only it is allowed to access directly.

Operating systems can be designed along similar lines. Take, for example, the part of the operating system
that deals with file management. This could be written as a server so that any process which requires access
to any part of the filing system asks the file management server to carry out a request, which presents the
calling client with the results. Similarly, there could be servers which deal with memory management,
process scheduling etc.

The benefits of this approach include

• It can result in a minimal kernel. This results in easier maintenance as not so many processes are
running in kernel mode. All the kernel does is provide the communication between the clients and the
servers.
• As each server is managing one part of the operating system, the procedures can be better structured
and more easily maintained.
• If a server crashes it is less likely to bring the entire machine down as it will not be running in kernel
mode. Only the service that has crashed will be affected.

The client-server model can be represented as follows

Client Client Process File Server Memory ………


Process Process Server Server Server User Mode

Kernel Kernel Mode

Messages sent from client to


server

Processes
This section is based on (Tanenbaum, 1992,27-29; 2001, p 34-44).

The concept of a process is fundamental to an operating system and they can be viewed as an abstraction of
a program. Although to be strict we can say that a program (i.e. an algorithm expressed in some suitable
notation) has a process that executes the algorithm and has associated with it input, output and a state.

Computers nowadays can do many things at the same time. They can be writing to a printer, reading from a
disc and scanning an image. The computer (more strictly the operating system) is also responsible for

C:\Tony's Things\Operating Systems\handout02.rtf


Tony Cook - 27/01/2004 - Page 6 of 6
Operating Systems
running many processes, usually, on the same CPU. In fact, only one process can be run at a time so the
operating system has to share the CPU between the processes that are available to be run, whilst giving the
illusion that the computer is doing many things at the same time. This approach can be directly contrasted
with the first computers. They could only run one program at a time and although this may seem archaic
now, it does have its advantages and it makes most of this part of the course irrelevant and we can simply
ignore the issues raised.

Therefore, the main point of this part of the course is to consider how an operating system deals with
processes when we allow many to run in pseudoparallelism.

Although, with a single CPU, we know the computer can only execute a single process at a given moment
in time. It is important to realise that this is the case. It is also important to realise that one process can have
an effect on another process which is not currently running; as we shall see later.

Process States
This section is based on (Tanenbaum, 1992, p29-31; 2001, p77-79)

A process may be in one of three states

Running. Only one process can be running at any one time (assuming a single processor machine). A
running process is the process that is actually using the CPU at that time.
Ready. A process that is ready is runnable but cannot get access to the CPU due to another process
using it.
Blocked. A blocked process is unable to run until some external event has taken place. For example, it
may be waiting for data to be retrieved from a disc.

A state transition diagram can be used to represent the various states and the transition between those
states.

Running

Ready

Blocked

You can see from this that a running process can either be blocked (i.e. it needs to wait for an external
event) or it can go to a ready state (for example, the scheduler allows another process to use the CPU). A
ready process can only move to a running state whilst a blocked process can only move to a ready state. It
should be apparent that the job of the scheduler is concerned with deciding which one of the processes in a
ready state should be allowed to move to a running state (and thus use the CPU).

Process Control Blocks (PCB)


This section is based on (Tanenbaum, 1992, p31-32; 2001, p79-80;)

C:\Tony's Things\Operating Systems\handout02.rtf


Tony Cook - 27/01/2004 - Page 7 of 7
Operating Systems
Assuming we have a computer with one processor, it is obvious that the computer can only execute one
process at a time. However, when a process moves from a running state to a ready or blocked state it must
store certain information so that it can remember where it was up to when it moves back to a running state.
For example, it needs to know which instruction it was about to execute, which record it was about to read
from its input file and the values of various variables that it was using as working storage. Each process has
associated with it a process table. This is used to hold all the information that a process needs in order to
restart from where it left off when it moves from a ready state to a running state. Different systems will
hold different information in the process block. This table shows a typical set of data

Process Management
Registers
Program Counter
Program Status Word
Stack Pointer
Process State
Time when process started
CPU time used
Time of next alarm
Process id

You will notice that, as well as information to ensure the process can start again (e.g. Program Counter),
the process control block also holds accounting information such as the time the process started and how
much CPU time has been used. You should note that this is only a sample of the information held. There
will be other information, not least of all concerned with the files being used and the memory the process is
using.

Race Conditions
This section is based on (Tanenbaum, 1992, p33-34; 2001, p100-101)

It is sometimes necessary for two processes to communicate with one another. This can either be done via
shared memory or via a file on disc. It does not really matter. We are not discussing the situation where a
process can write some data to a file that is read by another process at a later time e.g. days, weeks or
months. We are talking about two processes that need to communicate at the time they are running. Take,
as an example, one type of process (i.e. there could be more than one process of this type running) that
checks a counter when it starts running. If the counter is at a certain value, say x, then the process
terminates as only x copies of the process are allowed to run at any one time. This is how it works….

• The process starts


• The counter, i, is read from the shared memory
• If the i = x the process terminates else i = i + 1
• i is written back to the shared memory

Now this sounds okay, but consider the following scenario….


• Process 1, P1, starts
• P1 reads the counter, i1, from the shared memory. Assume i1 = 3 (that is three processes of this type are
already running)
• P1 gets interrupted and is placed in a ready state
• Process 2, P2, starts
• P2 reads the counter, i2, from the shared memory; i2 = 3
• Assume i2 < x so i2 = i2 +1 (i.e. 4)
• i2 is written back to shared memory
• P2 is moved to a ready state and P1 goes into a running state
• Assume i1 < x so i1 = i1 +1 (i.e. 4)

C:\Tony's Things\Operating Systems\handout02.rtf


Tony Cook - 27/01/2004 - Page 8 of 8
Operating Systems
• i1 is written back to the shared memory

We now have the situation where we have five processes running but the counter is only set to four. This
problem is known as a race condition.

Critical Sections
This section is based on (Tanenbaum, 1992, p34-35; 2001, p102-103)

One way to avoid race conditions is not to allow two processes to be in their critical sections at the same
time (by critical section we mean the part of the process that accesses a shared variable). That is, we need a
mechanism of mutual exclusion. Some way of ensuring that one processes, whilst using the shared
variable, does not allow another process to access that variable. In fact, to provide a good solution to the
problem of race conditions we need four conditions to hold.

1. No two processes may be simultaneously inside their critical sections.


2. No assumptions may be made about the speed or the number of processors.
3. No process running outside its critical section may block other processes
4. No process should have to wait forever to enter its critical section.

As we shall see in the next lecture that it is difficult to devise a method that meets all these conditions.

References
• Courtois P., J., Heymans F. and Parnas D. L. 1971. Concurrent Control with Readers and Writers.
Communications of the ACM, Vol. 10, pp. 667-668
• Dijkstra E. W. 1965. Co-operating Sequential Processes. Programming Languages, Genuys, F. (ed),
London : Academic Press
• Peterson G., L. 1981. Myths about the Mutual Exclusion Problem. Information Processing Letters, Vol
12, No. 3
• Silberachatz A. et al. 2003/1994. Operating System Concepts. Addison-Wesley Publishing Company
• Tanenbaum, A., S. 2001/1992. Modern Operating Systems. Prentice Hall.

C:\Tony's Things\Operating Systems\handout02.rtf


Tony Cook - 27/01/2004 - Page 9 of 9
Operating Systems

OPS Processes
Implementing Mutual Exclusion with Busy Waiting
This section is based on Tanenbaum (1992, P35-39; 2001 p103-108)

Disabling Interrupts (Tannenbaum 2001, p103)


Perhaps the most obvious way of achieving mutual exclusion is to allow a process to disable interrupts
before it enters its critical section and then enable interrupts after it leaves its critical section. By
disabling interrupts the CPU will be unable to switch processes. This guarantees that the process can
use the shared variable without another process accessing it. But, disabling interrupts, is a major
undertaking. At best, the computer will not be able to service interrupts for, maybe, a long time (who
knows what a process is doing in its critical section?). At worst, the process may never enable
interrupts, thus (effectively) crashing the computer. Although disabling interrupts might seem a good
solution its disadvantages far outweigh the advantages.

Lock Variables (Tannenbaum 2001, p104)


Another method, which is obviously flawed, is to assign a lock variable. This is set to (say) 1 when a
process is in its critical section and reset to zero when a processes exits its critical section. It does not
take a great leap of intuition to realise that this simply moves the problem from the shared variable to
the lock variable.

Strict Alternation(Tannebaum 2001, p104-105)


Process 0 Process 1
While (TRUE) { While (TRUE) {
while (turn != 0); // wait while (turn != 1); // wait
critical_section(); critical_section();
turn = 1; turn = 0;
noncritical_section(); noncritical_section();
} }

These code fragments offer a solution to the mutual exclusion problem. Assume the variable turn is
initially set to zero. Process 0 is allowed to run. It finds that turn is zero and is allowed to enter its
critical region. If process 1 tries to run, it will also find that turn is zero and will have to wait (the while
statement) until turn becomes equal to 1. When process 0 exits its critical region it sets turn to 1, which
allows process 1 to enter its critical region. If process 0 tries to enter its critical region again it will be
blocked as turn is no longer zero. However, there is one major flaw in this approach. Consider this
sequence of events....

• Process 0 runs, enters its critical section and exits; setting turn to 1. Process 0 is now in its non-
critical section. Assume this non-critical procedure takes a long time.
• Process 1, which is a much faster process, now runs and once it has left its critical section turn is
set to zero.
• Process 1 executes its non-critical section very quickly and returns to the top of the procedure.
• The situation is now that process 0 is in its non-critical section and process 1 is waiting for turn to
be set to zero. In fact, there is no reason why process 1 cannot enter its critical region as process 0
is not in its critical region.

What we can see here is violation of one of the conditions that we listed last lecture i.e. a process, not
in its critical section, is blocking another process. If you work through a few iterations of this solution
you will see that the processes must enter their critical sections in turn; thus this solution is called strict
alternation.

Peterson’s Solution(Tannebaum, 2001 p105-106)


A solution to the mutual exclusion problem that does not require strict alternation, but still uses the idea
of lock (and warning) variables together with the concept of taking turns is described in (Dijkstra,

F:\Lectures\Handouts\2003\lecture03-Feb03.rtf
Tony Cook - 03/02/2003 - Page 1 of 5
Operating Systems
1965). In fact the original idea came from a Dutch mathematician (T. Dekker). This was the first time
the mutual exclusion problem had been solved using a software solution. (Peterson, 1981), came up
with a much simpler solution.

The solution consists of two procedures, shown here in a C style syntax - the “//” of course marking the
start of comments

int No_Of_Processes; // Number of processes


int turn; // Whose turn is it?
int interested[No_Of_Processes]; // All values initially FALSE

void enter_region(int process) {


int other; // number of the other process

other = 1 – process; // the opposite process


interested[process] = TRUE; // this process is interested
turn = process; // set flag
while(turn == process && interested[other] == TRUE); // wait
}

void leave_region(int process) {


interested[process] = FALSE; // process leaves critical
region
}

A process that is about to enter its critical region has to call enter_region. After the end of its passage
through the critical region it is made to call leave_region. Initially, both processes are not in their
critical region and the array interested has all (both in the above example) its elements set to false.
Assume that process 0 calls enter_region. The variable other is set to one (the other process number)
and it indicates its interest by setting the relevant element of interested. Next it sets the turn variable,
before coming across the while loop. In this instance, the process will be allowed to enter its critical
region, as process 1 is not interested in running.

Now process 1 could call enter_region. It will be forced to wait as the other process (0) is still
interested. Process 1 will only be allowed to continue when interested[0] is set to false which can only
come about from process 0 calling leave_region.

If we ever arrive at the situation where both processes call enter region at the same time, one of the
processes will set the turn variable, but it will be immediately overwritten. Assume that process 0 sets
turn to zero and then process 1 immediately sets it to 1. Under these conditions process 0 will be
allowed to enter its critical region and process 1 will be forced to wait.

F:\Lectures\Handouts\2003\lecture03-Feb03.rtf
Tony Cook - 03/02/2003 - Page 2 of 5
Operating Systems

Test and Set Lock (TSL) - Tannebaum (2001, p107-108)


If we are given assistance by the instruction set of the processor we can implement a solution to the
mutual exclusion problem. The assembly (machine code) instruction we require is called test and set
lock (TSL). This instructions reads the contents of a memory location, stores it in a register and then
stores a non-zero value at the address. This operation is guaranteed to be indivisible. That is, no other
process can access that memory location until the TSL instruction has finished. This assembly (like)
code shows how we can make use of the TSL instruction to solve the mutual exclusion problem.

enter_region:
tsl register, flag ; copy flag to register and set flag to 1
cmp register, #0 ;was flag zero?
jnz enter_region ;if flag was non zero, lock was set , so loop
ret ;return (and enter critical region)

leave_region:
mov flag, #0 ; store zero in flag
ret ;return

Assume, again, two processes. Process 0 calls enter_region. The tsl instruction copies the flag to a
register and sets it to a non-zero value. The flag is now compared to zero (cmp - compare) and if found
to be non-zero (jnz – jump if non-zero) the routine loops back to the top. Only when process 1 has set
the flag to zero (or under initial conditions), by calling leave_region, will process 0 be allowed to
continue.

Comments
Of all the solutions we have looked at, both Peterson’s and the TSL solutions solve the mutual
exclusion problem. However, both of these solutions have the problem of busy waiting. That is, if the
process is not allowed to enter its critical section it sits in a tight lop waiting for a condition to be met.
This is obviously wasteful in terms of CPU usage. It can also have, not so obvious disadvantages.
Suppose we have two processes, one of high priority, h, and one of low priority, l. The scheduler is set
so that whenever h is in ready state it must be run. If l is in its critical section when h becomes ready to
run, l will be placed in a ready state so that h can be run. However, if h tries to enter its critical section
then it will be blocked by l, which will never be given the opportunity of running and leaving its
critical section. Meantime, h will simply sit in a loop forever. This is sometimes called the priority
inversion problem.

Sleep and Wakeup


This section is based on (Tanenbaum, 1992, p39–41; 2001 p108)

In this section, instead of a process doing a busy waiting we will look at procedures that send the
process to sleep. In reality, it is placed in a blocked state. The important point is that it is not using the
CPU by sitting in a tight loop. To implement a sleep and wakeup system we need access to two system
calls (SLEEP and WAKEUP). These can be implemented in a number of ways. One method is for
SLEEP to simply block the calling process and for WAKEUP to have one parameter; that is the process
it has to wakeup. An alternative is for both calls to have one parameter, this being a memory address
which is used to match the SLEEP and WAKEUP calls.

The Producer-Consumer Problem


This problem is outlined later in this handout. You should read this now. To implement a solution to
the problem using SLEEP/WAKEUP we need to maintain a variable, count, that keeps track of the
number of items in the buffer. The producer will check count against n (maximum items in the buffer).
If count = n then the producer sends itself the sleep. Otherwise it adds the item to the buffer and
increments n. Similarly, when the consumer retrieves an item from the buffer, it first checks if count is
zero. If it is it sends itself to sleep. Otherwise it removes an item from the buffer and decrements count.

The calls to WAKEUP occur under the following conditions.

F:\Lectures\Handouts\2003\lecture03-Feb03.rtf
Tony Cook - 03/02/2003 - Page 3 of 5
Operating Systems
• Once the producer has added an item to the buffer, and incremented count, it checks to see if count
= 1 (i.e. the buffer was empty before). If it is, it wakes up the consumer.
• Once the consumer has removed an item from the buffer, it decrements count. Now it checks count
to see if it equals n-1 (i.e. the buffer was full). If it does it wakes up the producer.

Here is the producer and consumer code.

int BUFFER_SIZE = 100;


int count = 0;

void producer(void) {
int item;
while(TRUE) {
produce_item(&item); // generate next item
if(count == BUFFER_SIZE) sleep (); // if buffer full, sleep
enter_item(item); // put item in buffer
count=count+1; // increment count
if(count == 1) wakeup(consumer); // was buffer empty?
}
}

void consumer(void) {
int item;
while(TRUE) {
if(count == 0) sleep (); // if buffer is empty, sleep
remove_item(&item); // remove item from buffer
count=count-1; // decrement count
if(count == BUFFER_SIZE - 1) wakeup(producer); // was buffer full?
consume_item(&item); // print item
}
}

This seems logically correct but we have the problem of race conditions with count. The following
situation could arise....

• The buffer is empty and the consumer has just read count to see if it is equal to zero.
• At this very instant the scheduler stops running the consumer and starts running the producer,
before it can be put to sleep.
• The producer places an item in the buffer and increments count.
• The producer checks to see if count is equal to one. Finding that it is, it assumes that it was
previously zero which implies that the consumer is sleeping – so it sends a wakeup.
• In fact, the consumer is not asleep so the call to wakeup is lost.
• The consumer now runs – continuing from where it left off – it checks the value of count. Finding
that it is zero it goes to sleep. As the wakeup call has already been issued the consumer will sleep
forever.
• Eventually the buffer will become full and the producer will send itself to sleep.
• Both producer and consumer will sleep forever.

One solution is to have a wakeup waiting bit that is turned on when a wakeup is sent to a process that
is already awake. If a process goes to sleep, it first checks the wakeup bit. If set the bit will be turned
off, but the process will not go to sleep. Whilst seeming a workable solution it suffers from the
drawback that you need an ever increasing number wakeup bits to cater for larger number of processes.

Inter-Process Communication Problems


In this section we describe a few classic inter process communication problems. If you have the time
and the inclination you might like to try and write a program which solves these problems.

F:\Lectures\Handouts\2003\lecture03-Feb03.rtf
Tony Cook - 03/02/2003 - Page 4 of 5
Operating Systems
The Producer-Consumer Problem (Tannenbaum 2001, p109-109)
Assume there is a producer (which produces goods) and a consumer (which consumes goods). The
producer, produces goods and places them in a fixed size buffer. The consumer takes the goods from
the buffer. The buffer has a finite capacity so that if it is full, the producer must stop producing.
Similarly, if the buffer is empty, the consumer must stop consuming. This problem is also referred to as
the bounded buffer problem. The type of situations we must cater for are when the buffer is full, so the
producer cannot place new items into it. Another potential problem is when the buffer is empty, so the
consumer cannot take from the buffer.

The Dining Philosophers Problem (Tannebaum 2001, p124-128)


This problem was posed by (Dijkstra, 1965). Five philosophers are sat around a circular table. In front
of each of them is a bowl of food. In between each bowl there is a fork. That is there are five forks in
all. Philosophers spend their time either eating or thinking. When they are thinking they are not using a
fork. When they want to eat they need to use two forks. They must pick up one of the forks to their
right or left. Once they have acquired one fork they must acquire the other one. They may acquire the
forks in any order. Once a philosopher has two forks they can eat. When finished eating they return
both forks to the table. The question is, can a program be written, for each philosopher, that never gets
stuck, that is, a philosopher is waiting for a fork forever.

The Readers Writers Problem (Tannebaum 2001, p128-129)


This problem was devised by (Courtois et al., 1971). It models a number of processes requiring access
to a database. Any number of processes may read the database but only one can write to the database.
The problem is to write a program that ensures this happens.

The Sleeping Barber Problem (Tannebaum 2001, p129-131)


A barber shop consists of a waiting room with n chairs. There is another room that contains the barbers
chair. The following situations can happen.
• If there are no customers the barber goes to sleep.
• If a customer enters and the barber is asleep (indicating there are no other customers waiting) he
wakes the barber and has a haircut.
• If a customer enters and the barber is busy and there are spare chairs in the waiting room, the
customer sits in one of the chairs.
• If a customer enters and all the chairs in the waiting room are occupied, the customer leaves.

The problem is to program the barber and the customers without getting into race conditions.

References
• Courtois P., J., Heymans F. and Parnas D. L. 1971. Concurrent Control with Readers and Writers.
Communications of the ACM, Vol. 10, pp. 667-668
• Dijkstra E. W. 1965. Co-operating Sequential Processes. Programming Languages, Genuys, F.
(ed), London :Academic Press
• Peterson G., L. 1981. Myths about the Mutual Exclusion Problem. Information Processing Letters,
Vol 12, No. 3
• Silberachatz A., Galvin P. 1994. Operating System Concepts (4th Ed). Addison-Wesley Publishing
Company
• Tanenbaum, A., S. 1992. Modern Operating Systems. Prentice Hall.

F:\Lectures\Handouts\2003\lecture03-Feb03.rtf
Tony Cook - 03/02/2003 - Page 5 of 5
Operating Systems

Processes - Lecture 4 - Feb 07


Handout Introduction
These notes are largely based on (Tanenbaum, 1992/2003). Where applicable, the notes will point you
to the relevant part of that book in case you want to read about the subject in a little more detail.

The main topics discussed in the handout are as follows

Introduction .....................................................................................................................................................1
The Semaphore and the Mutex........................................................................................................................2
Process Scheduling..........................................................................................................................................3
References .......................................................................................................................................................6

Introduction
Of all the solutions we have looked at previosly, both Peterson’s and the TSL solutions solve the
mutual exclusion problem. However, both of these solutions have the problem of busy waiting. That is,
if the process is not allowed to enter its critical section it sits in a tight lop waiting for a condition to be
met. This is obviously wasteful in terms of CPU usage. Also in the case of two processes where one (h)
is of high priority, and the other (l) of low priority we can have a priority inversion problem. This
happens when the scheduler is set so that whenever h is in ready state it must be run. If l is in its critical
section when h becomes ready to run, l will be placed in a ready state so that h can be run. However, if
h tries to enter its critical section then it will be blocked by l, which will never be given the opportunity
of running and leaving its critical section. Meantime, h will simply sit in a loop forever. We also
discussed Sleep and Wakeup where instead of a process doing a busy waiting, processes can be sent to
sleep. In reality this really means a blocked state, but with the difference that the sleeping process is not
tying up the CPU in a wait loop. This was implemented using two system calls: SLEEP and WAKEUP
i.e. one method is for SLEEP to block the calling process and for WAKEUP to have one parameter;
that is the process it has to wakeup. An alternative is for both calls to have one parameter, this being a
memory address which is used to match the SLEEP and WAKEUP calls. SLEEP/WAKEUP was
descibed in the Producer-Consumer problem and the solution involved maintaining a variable, count,
that keeps track of the number of items in the buffer. The producer will check count against n
(maximum items in the buffer). If count = n then the producer sends itself the sleep. Otherwise it adds
the item to the buffer and increments n. Similarly, when the consumer retrieves an item from the buffer,
it first checks if n is zero. If it is it sends itself to sleep. Otherwise it removes an item from the buffer
and decrements count. The calls to WAKEUP occur under the following conditions.
• Once the producer has added an item to the buffer, and incremented count, it checks to see if count
= 1 (i.e. the buffer was empty before). If it is, it wakes up the consumer.
• Once the consumer has removed an item from the buffer, it decrements count. Now it checks count
to see if it equals n-1 (i.e. the buffer was full). If it does it wakes up the producer.

Here is the producer and consumer code.......

int BUFFER_SIZE = 100;


int count = 0;

void producer(void) {
int item;
while(TRUE) {
produce_item(&item); // generate next item
if(count == BUFFER_SIZE) sleep (); // if buffer full,sleep
enter_item(item); // put item in buffer
count=count+1; // increment count
if(count == 1) wakeup(consumer); // was buffer empty?
}
}

F:\Lecture04\lecture04-Feb07.rtf
Tony Cook- 07/02/2003 - Page 1 of 6
Operating Systems
void consumer(void) {
int item;
while(TRUE) {
if(count == 0) sleep (); //if buffer empty,sleep
remove_item(&item); //remove item from buff
count=count-1; //decrement count
if(count == BUFFER_SIZE - 1) wakeup(producer); //was buffer full?
consume_item(&item); //print item
}
}

This seems logically correct but we have the problem of race conditions with count. The following
situation could arise.
• The buffer is empty and the consumer has just read count to see if it is equal to zero.
• All of a sudden the scheduler stops running the consumer and starts running the producer.
• The producer places an item in the buffer and increments count.
• The producer checks to see if count is equal to one. Finding that it is, it assumes that it was
previously zero which implies that the consumer is sleeping – so it sends a wakeup.
• In fact, the consumer is not asleep so the call to wakeup is lost.
• The consumer eventually gets switched back in and now runs – continuing from where it left off –
it checks the value of count. Finding that it is zero it goes to sleep. As the wakeup call has already
been issued the consumer will sleep forever.
• Eventually the buffer will become full and the producer will send itself to sleep.
• Both producer and consumer will sleep forever.

One solution is to have a wakeup waiting bit that is turned on when a wakeup is sent to a process that
is already awake. If a process goes to sleep, it first checks the wakeup bit. If set the bit will be turned
off, but the process will not go to sleep. Whilst seeming a workable solution it suffers from the
drawback that you need an ever increasing wakeup bits to cope with a larger number of processes.

The Semaphore and the Mutex (Tannebaum 2001, p110-114; Bic and Shaw 2003, p 59-
64; Silberschatz 2003, p201-207)
In (Dijkstra, 1965) the suggestion was made that an integer variable be used that recorded how many
wakeups had been saved. Dijkstra called this variable a semaphore. If it was equal to zero it indicated
that no wakeup’s were saved. A positive value shows that one or more wakeup’s are pending. Now the
sleep operation (which Dijkstra called DOWN) checks the semaphore to see if it is greater than zero. If
it is, it decrements the value (using up a stored wakeup) and continues. If the semaphore is zero the
process sleeps. The wakeup operation (which Dijkstra called UP) increments the value of the
semaphore. If one or more processes were sleeping on that semaphore then one of the processes is
chosen and allowed to complete its DOWN. Checking and updating the semaphore must be done as an
atomic action to avoid race conditions. Here is an example of a series of Down and Up’s. We are
assuming we have a semaphore called mutex (for mutual exclusion). It is initially set to 1. The
subscript figure, in this example, represents the process, p, that is issuing the Down.

Down1(mutex) // p1 enters critical section (mutex = 0)


Down2(mutex) // p2 sleeps (mutex = 0)
Down3(mutex) // p3 sleeps (mutex = 0)
Down4(mutex) // p4 sleeps (mutex = 0)
Up(mutex) // mutex = 1 and chooses p3
Down3(mutex) // p3 completes its down (mutex = 0)
Up(mutex) // mutex = 1 and chooses p2
Down2(mutex) // p2 completes its down (mutex = 0)
Up(mutex) // mutex = 1 and chooses p2
Down1(mutex) // p1 completes its down (mutex = 0)
Up(mutex) // mutex = 1 and chooses p4
Down4(mutex) // p4 completes its down (mutex = 0)

From this example, you can see that we can use semaphores to ensure that only one process is in its
critical section at any one time, i.e. the principle of mutual exclusion. We can also use semaphores to

F:\Lecture04\lecture04-Feb07.rtf
Tony Cook- 07/02/2003 - Page 2 of 6
Operating Systems
synchronise processes. For example, the produce and consume functions in the producer-consumer
problem. Take a look at this program fragment.

int BUFFER_SIZE = 100;


typedef int semaphore;

semaphore mutex = 1;
semaphore empty = BUFFER_SIZE;
semaphore full = 0;

void producer(void) {
int item;
while(TRUE) {
produce_item(&item); // generate next item
down(&empty); // decrement empty count
down(&mutex); // enter critical region
enter_item(item); // put item in buffer
up(&mutex); // leave critical region
up(&full); // increment count of full slots
}
}

void consumer(void) {
int item;
while(TRUE) {
down(&full); // decrement full count
down(&mutex); // enter critical region
remove_item(&item); // remove item from buffer
up(&mutex); // leave critical region
up(&empty); // increment count of empty slots
consume_item(&item); // print item
}
}

The mutex semaphore (given the above example) should be self-explanatory. The empty and full
semaphore provide a method of synchronising adding and removing items to the buffer. Each time an
item is removed from the buffer a down is done on full. This decrements the semaphore and, should it
reach zero the consumer will sleep until the producer adds another item. The consumer also does an up
an empty. This is so that, should the producer try to add an item to a full buffer it will sleep (via the
down on empty) until the consumer has removed an item.

Process Scheduling
Scheduling Objectives (Tanenbaum, 1992 p61-70; 2001 p132-140)
If we assume we only have one processor and there are two or more processes in a ready state, how do
we decide which process to schedule next? Or more precisely, which scheduling algorithm does the
scheduler use in deciding which process should be moved to a running state? These questions are the
subject of this section. In trying to schedule processes, the scheduler tries to meet typically five
objectives:

1. Fairness : Ensure each process gets a fair share of the CPU


2. Efficiency : Ensure the CPU is busy 100% of the time. In
practice, a measure of between 40% (for a lightly loaded system) to 90%
(for a heavily loaded system) is acceptable
3. Response Times: Ensure interactive users get good response times
4. Turnaround : Ensure batch jobs are processed in acceptable
time
5. Throughput : Ensure a maximum number of jobs are processed

F:\Lecture04\lecture04-Feb07.rtf
Tony Cook- 07/02/2003 - Page 3 of 6
Operating Systems
Of course, the scheduler cannot meet all of these objectives to an optimum level. For example, in trying
to give interactive users good response times, the batch jobs may have to suffer. Many large
companies, that use mainframes, address these types of problems by taking many of the scheduling
decisions themselves. For example, a company the last year’s lecturer (Dr Graham Kendall) used to
work for did not allow batch work during the day. Instead they gave the TP (Transaction Processing)
system all the available resources so that the response times for the users (many of which were dealing
with the public in an interactive way) was as fast as possible. The batch work was run overnight when
the interactive workload was much less, typically only operations staff and technical support personnel.
However, these type of problems are likely to increase as the world becomes “smaller.” If a company
operates a mainframe that is accessible from all over the world then the concept of night and day no
longer hold and there may be a requirement for TP access 24 hours a day and the batch work somehow
has to be fitted in around this workload.

Preemptive Scheduling
A simple scheduling algorithm would allow the currently active process to run until it has completed.
This would have several advantages
1. We would no longer have to concern ourselves with race conditions as we could be sure that one
process could not interrupt another and update a shared variable.
2. Scheduling the next process to run would simply be a case of taking the highest priority job (or
using some other algorithm, such as FIFO algorithm).

Note : We could define completed as when a process decides to give up the CPU. The process may not
have completed but it would only give up control when safe (e.g. not during the update of a shared
variable).

But the disadvantages of this approach far outweigh the advantages.


• A rogue process may never relinquish control, effectively bringing the computer to a standstill.
• Processes may hold the CPU too long, not allowing other applications to run.

Therefore, it is usual for the scheduler to have the ability to decide which process can use the CPU and,
once it has had its slice of time then it is placed into a ready state and the next process allowed to run.
This type of scheduling is called preemptive scheduling. This disadvantage of this method is that we
need to cater for race conditions as well as having the responsibility of scheduling the processes.

Typical Process Activity


There is probably no such thing as an average process but studies have been done on typical processes.
It has been found that processes come in two varieties. Processes which are I/O bound, and only require
the CPU in short bursts. Then there are processes that require the CPU for long bursts. Of course, a
process can combine these two attributes. At the start of its processing it is I/O bound so that it only
requires short bursts of the CPU. Later in its processing, it is heavily I/O bound so that (if it was
allowed) it would like the CPU for long periods. An important aspect, when scheduling is to consider
the CPU Bursts that are required. That is how long the process needs the CPU before it will either
finish or move to a blocked state. From a scheduling point of view we need not concern ourselves with
processes that are waiting for I/O. As far as the scheduler is concerned, this is a good thing, as it is one
less process to worry about. However, the scheduler needs to be concerned about the burst time. If a
process has a long burst time it may need to have access to the CPU on a number of occasions in order
to complete its burst sequence. The problem is that the scheduler cannot know what burst time a
process has before it schedules the process. Therefore, when we look at the scheduling algorithms
below, we can only look at the effect of the burst time and the effect it has on the average running time
of the processes (it is usual to measure the effect of a scheduling policy using average figures).

First Come – First Served Scheduling (FCFS)


An obvious scheduling algorithm is to execute the processes in the order they arrive and to execute
them to completion. In fact, this simply implements a non-preemptive scheduling algorithm. It is an
easy algorithm to implement. When a process becomes ready it is added to the tail of ready queue. This
is achieved by adding the Process Control Block (PCB) to the queue. When the CPU becomes free the
process at the head of the queue is removed, moved to a running state and allowed to use the CPU until

F:\Lecture04\lecture04-Feb07.rtf
Tony Cook- 07/02/2003 - Page 4 of 6
Operating Systems
it is completed. The problem with FCFS is that the average waiting time can be long. Consider the
following processes

Process Burst Time


P1 27
P2 9
P3 2

P1 will start immediately, with a waiting time of 0 milliseconds (ms). P2 will have to wait 27ms. P3
will have to wait 36ms before starting. This gives us an average waiting time of 21ms (i.e. (0 + 27 +
36) /3 ).

Now consider if the processes had arrived in the order P2, P3, P1. The average waiting time would now
be 6.6ms (i.e. (0 + 9 + 11) /3). This is obviously a big saving and all due to the fact the way the jobs
arrived. It can be shown that FCFS is not generally minimal, with regard to average waiting time and
this figure varies depending on the process burst times. The FCFS algorithm can also have other
undesirable effects. A CPU bound job may make the I/O bound (once they have finished the I/O) wait
for the processor. At this point the I/O devices are sitting idle. When the CPU bound job finally does
some I/O, the mainly I/O processes use the CPU quickly and now the CPU sits idle waiting for the
mainly CPU bound job to complete its I/O. Although this is a simplistic example, you can appreciate
that FCFS can lead to I/O devices and the CPU both being idle for long periods.

Shortest Job First (SJF)


Using the SJF algorithm, each process is tagged with the length of its next CPU burst. The processes
are then scheduled by selecting the shortest job first. Consider these processes, P1..P4. Assume they
arrived in the order P1..P4.

Process Burst Time Wait Time


P1 12 0
P2 19 12
P3 4 31
P4 7 35

If we schedule the processes in the order they arrive then the average wait time is 19.5 (78/4). If we run
the processes using the burst time as a priority then the wait times will be 0, 4, 11 and 23; giving an
average wait time of 9.50. In fact, the SJF algorithm is provably optimal with regard to the average
waiting time. And, intuitively, this is the case as shorter jobs add less to the average time, thus giving a
shorter average. The problem is we do not know the burst time of a process before it starts. For some
systems (notably batch systems) we can make fairly accurate estimates but for interactive processes it
is not so easy. One approach is to try and estimate the length of the next CPU burst, based on the
processes previous activity. To do this we can use the following formula

Tn+1 = atn + (1 – a)Tn

Where
a, 0 <= a <= 1
Tn, stores the past history
tn, contains the most recent information

What this formula allows us to do is weight both the history of the burst times and the most recent burst
time. The weight is controlled by a. If a = 0 then Tn+1 = Tn and recent history (the most recent burst
time) has no effect. If a = 1 then the history has no effect and the guess is equal to the most recent burst
time.

A value of 0.5 for a is often used so that equal weight is given to recent and past history.

This formula has been reproduced on a spreadsheet (which I hope to make available on the WWW site
associated with this course) so that you can experiment with the various values.

F:\Lecture04\lecture04-Feb07.rtf
Tony Cook- 07/02/2003 - Page 5 of 6
Operating Systems
Priority Scheduling
Shortest Job First is just a special case of priority scheduling. Of course, we can use a number of
different measures as priority. Another example of setting priorities based on the resources they have
previously used is as follows.

Assume processes are allowed 100ms before the scheduler preempts it. If a process only used, say 2ms,
then it is likely to be a job that is I/O bound and it is in our interest to allow this job to run as soon as it
has completed I/O – in the hope that it will go away and do some more I/O; thus making effective use
of the processor as well as the I/O devices. If a job used all its 100ms we might want to give this job a
lower priority, in the belief that we can get smaller jobs completed first before we allow the longer jobs
to run.

One method of calculating priorities based on this reasoning is to use the formula: 1 / (n / p)

Where n, is the last CPU burst for that process


p, is the CPU time allowed for each process before it is preempted (100ms in our example)

Plugging in some real figures we can assign priorities as follows

CPU Burst Last Time Processing Time Slice Priority Assigned


100 100 1
50 100 2
25 100 4
5 100 20
2 100 50
1 100 100

Another way of assigning priorities is to set them externally. During the day interactive jobs may be
given a high priority and batch jobs are only allowed to run when there are no interactive jobs. Another
alternative is to allow users who pay more for their computer time to be given higher priority for their
jobs.

One of the problems with priority scheduling is that some processes may never run. There may always
be higher priority jobs that get assigned the CPU. This is known as indefinite blocking or starvation.
One solution to this problem is called aging. This means that the priority of jobs are gradually
increased until even the lowest priority jobs will become the highest priority job in the system. This
could be done, for example, by increasing the priority of a job after it has been in the system for a
certain length of time.

References
• Bik L.F. and Shaw A.C. 2003. Operating System Principles (1st Ed). Prentice Hall
• Dijkstra E. W. 1965. Co-operating Sequential Processes. Programming Languages, Genuys, F.
(ed), London :Academic Press
• Silberachatz A., et al. 2003. Operating System Concepts (6th Ed). Addison-Wesley Publishing
Company
• Tanenbaum, A., S. 2001. Modern Operating Systems. Prentice Hall.

F:\Lecture04\lecture04-Feb07.rtf
Tony Cook- 07/02/2003 - Page 6 of 6
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 13/5/2005
G53OPS – Revision Notes: Lecture 5
adapted from the notes by Tony Cook

Multilevel Queue Scheduling

There are two typical processes in a system, interactive jobs which tend to be shorter and batch jobs
which tend to be longer. We can set up different queues to cater for different process types. Each
queue may have its own scheduling algorithm – the background queue will typically use the FCFS
algorithm while the interactive queue may use the RR algorithm. The scheduler has to decide which
queue to run. Either higher priority queues can be processed until they are empty before the lower
priority queues are executed or each queue can be given a certain amount of the CPU. There could
be other queues in addition to the two mentioned, such as a high priority system queue.

Multilevel Queue Scheduling assigns a process to a queue and it remains in that queue. It may be
advantageous to move processes between queues (multilevel feedback queue scheduling). If we
consider processes with different CPU burst characteristics, a process which uses too much of the
CPU will be moved to a lower priority queue. We would leave I/O bound and (fast) interactive
processes in the higher priority queues.

Example
Assume three queues (Q0, Q1 and Q2)
 Scheduler executes Q0 and only considers Q1 and Q2 when Q0 is empty
 A Q1 process is preempted if a Q0 process arrives
 New jobs are placed in Q0
 Q0 runs with a quantum of 8ms
 If a process is preempted it is placed at the end of the Q1 queue
 Q1 has a time quantum of 16ms associated with it
 Any processes preempted in Q1 are moved to Q2, which is FCFS
Observations:
 Any jobs that require less than 8ms of the CPU are serviced very quickly
 Any processes that require between 8ms and 24ms are also serviced fairly quickly
 Any jobs that need more than 24ms are executed with any spare CPU capacity once Q0 and Q1
processes have been serviced

The scheduler can be defined by a number of parameters:


 The number of queues
 The scheduling algorithm for each queue
 The method used to demote processes to lower priority queues
 The method used to promote processes to a higher priority queue (some form of aging)
 The method used to determine which queue a process will enter

Up to now, we have assumed that the processes are all available in memory so that the context
switching is fast. However, if the computer is low on memory then some processes may be swapped

Page 1
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 13/5/2005
G53OPS – Revision Notes: Lecture 5
adapted from the notes by Tony Cook

out to disk. Context switching takes longer in this case so it is sensible to schedule only those
processes in memory. This is the responsibility of a top level scheduler.

A second scheduler is invoked periodically to remove processes from memory to disk and vice versa.
Parameters to decide which processes to move include
 How long has it been since a process has been swapped in or out?
 How much CPU time has the process recently had?
 How big is the process (on the basis that small ones do not get in the way)?
 What is the priority of the process?

Linux Process Scheduling

Linux caters for real time scheduling and non real time scheduling with three queue scheduling
systems:
 FIFO for real time threads
 Round Robin for real time threads
 Other for non real time threads

FIFO will not interrupt any other FIFO queued process except i) if a higher priority FIFO thread is
ready, ii) the current FIFO thread gets blocked waiting e.g. for I/O or iii) the current FIFO thread
voluntarily yields its CPU. If the executing FIFO thread is interrupted it gets put in a queue
associated with it’s priority. If a FIFO thread becomes ready and it has a higher priority than the
current executing FIFO thread, the executing thread is kicked out in favour of the higher priority FIFO
thread. If there are more than one candidates of the same priority waiting to kick out a lower priority
thread then the one that has waited the longest is chosen.

The Round Robin system is similar to FIFO except that a quantum is involved now. When a thread
gets kicked out of the CPU due to using up its quantum, it is put to the back of the queue and
another real time process of greater than or equal to the former’s priority is selected for execution.

To cope with the increasing number of processes and processors, the 2.6 kernel developed the O(1)
scheduler for non-real time processes. It is based on the premise that the time to select and assign
a process to the CPU is a constant, i.e. it is independent of processes or CPUs. The kernel
maintains two scheduling data structures for each CPU and separate queues for each process
priority level. All non real time tasks are assigned a priority of 100 to 139 with 120 being the default.
Typical quantum values are 10 to 200ms. I/O bound tasks are given higher priority which get given
larger quantum values.

Page 2
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 13/5/2005
G53OPS – Revision Notes: Lecture 5
adapted from the notes by Tony Cook

UNIX Process Scheduling

UNIX SVR4 scheduling involves a preemptive static priority scheduler with 160 priority levels divided
into three classes:
 Real time processes given highest preference (priority 159-100)
 Kernel mode processes given medium preference (99-60)
 User mode (time shared) processes given the lowest preference (59-0)

With real time processes, preemption points can be used. These are in between processing steps
where the kernel must not be interrupted. These safe points are where the kernel data structures are
updated/consistent or safely locked with a semaphore. In UNIX SVR4, a dispatch queue is available
with each priority level. Processes in a priority level are executed using Round Robin Scheduling but
real time processes are given very high priorities. In the time-share queues process priority is
variable – it is lowered each time quantum used up and raised if the process blocks an event or
resource. The typical quantums range from 100ms for the priority 0 queue to 10ms for the priority 59
queue.

Windows Process Scheduling

Windows is designed for single use/interactive environment or as a server. There are two bands of
priorities, real time and variable. The are sixteen levels of real time priorities in the ranges 31-16
which stay fixed. A round robin scheduling system is used. These have priority over all variable
band processes.

There are also sixteen levels of variable priorities (15-0). Processes may jump up or down in priority,
but never exceed 15. FIFO is used. Thread base priority can be +/-2 levels above or below the
process level but interrupted threads have their priorities changed – they are lowered if quantum
used up – i.e. processor bound threads or raised if interrupted by an I/O event – i.e. I/O bound
threads, so interactive threads tend to have higher priorities.

On an n processor system the n-1 highest priority threads are always executed – one per processor.
The lower priority threads are run on the remaining single processor.

Page 3
Operating Systems

Processes - Lecture 6 - Feb 14


Handout Introduction

The main topics discussed in the handout are as follows

Calculation of Average Process Wait Times...................................................................................................1


Evaluation of Scheduling Algorithms: Simulation..........................................................................................3
Evaluation of Scheduling Algorithms: Implementation ..................................................................................3
Process Threads...............................................................................................................................................3
Deadlocks........................................................................................................................................................5
References .......................................................................................................................................................7

Calculation of Average Process wait times


Consider 5 processes which all arrive at time zero – assume (for this example) a quantum of 8ms:

Process Burst Time


P1 9
P2 33
P3 2
P4 5
P5 14

Which of the following algorithms will perform best on this workload?


1) First Come First Served (FCFS)
2) Non Preemptive Shortest Job First (SJF)
3) Round Robin (RR)

First Come First Serve


Process Burst Time (ms) Start & End Times (ms) Wait time (ms)
P1 9 0-9 0
P2 33 9-42 9
P3 2 42-44 42
P4 5 44-49 44
P5 14 49-63 49

Therefore, the average waiting time is ((P1+P2+P3+P4+P5) / 5) or….

((0 + 9 + 42 + 44 + 49) / 5) = 28.80 milliseconds

D:\lecture06-Feb14.rtf
Tony Cook- 17/02/2003 - Page 1 of 7
Operating Systems

Shortest Job First (non preempted)


Process Burst Time Start and End Times Wait Time (ms)
(ms) (ms)
P3 2 0-2 0
P4 5 2-7 2
P1 9 7-16 7
P5 14 16-30 16
P2 33 30-63 30

The average waiting time is ((P3+P4+P1+P5+P2) / 5) or….

((0 + 2 + 7 + 16 + 30) / 5) = 11 milliseconds

Round Robin Scheduling


Process CPU Time to Start and CPU Remaining CPU Wait Time (ms)
Completion (ms) End Times Time Time (ms)
(ms) (ms)
P1 9 0-8 8 1 0
P2 33 8-16 8 25 8
P3 5 16-18 2 Completed 16
P4 2 18-23 5 Completed 18
P5 14 23-31 8 6 23
P1 (9-8)=1 31-32 1 Completed 0+(31-8)=23
P2 (25-8)=17 32-40 8 17 8+(32-16)=24
P5 6 40-46 6 Completed 23+(40-31)=32
P2 17 46-54 8 9 8+(32-16)+(46-
40)=30
P2 9 54-62 8 1 8+(39-16)+(46-
40)+(54-54)=30
P2 1 62-63 1 Completed 8+(39-16)+(46-
40)+(54-54)+(62-
62)=30
The cumulative waiting time for each process is as follows…

P1 : 0 + 23 = 23
P2 : 8 + 16 + 6 = 30
P3 : 16
P4 : 18
P5 : 23 + 9 = 32

Therefore, the average waiting time for processes to start is ((23 + 30 + 16 + 18 + 32) / 5) = 23.80
milliseconds

D:\lecture06-Feb14.rtf
Tony Cook- 17/02/2003 - Page 2 of 7
Operating Systems

Evaluating Process Scheduling Algorithms: Simulations


Rather than using queuing models we simulate a computer. A Variable, representing a clock is
incremented. At each increment the state of the simulation is updated. Statistics are gathered at each
clock tick so that the system performance can be analysed. The data to drive the simulation can be
generated in the same way as the queuing model, although this leads to similar problems. Alternatively,
we can use trace data. This is data collected from real processes on real machines and is fed into the
simulation. This can often provide good results and good comparisons over a range of scheduling
algorithms. However, simulations can take a long time to run, can take a long time to implement and
the trace data may be difficult to collect and require large amounts of storage.

Evaluating Process Scheduling Algorithms: Implementation


The best way to compare algorithms is to implement them on real machines. This will give the best
results but does have a number of disadvantages.
• It is expensive as the algorithm has to be written and then implemented on real hardware.
• If typical workloads are to be monitored, the scheduling algorithm must be used in a live situation.
Users may not be happy with an environment that is constantly changing.
• If we find a scheduling algorithm that performs well there is no guarantee that this state will
continue if the workload or environment changes.

Threads
This section is based on (Tanenbaum, 1992 p507-523; 2001 p 81-100). So far only considered
processes. In this section we take a brief look at threads, which are sometimes called lightweight
processes. One definition of a process is that it has an address space and a single thread of execution or
as Tanenbaum (2001) says about process models, is that they are: “based on two independent concepts:
resource grouping and execution”. Sometimes it would be beneficial if two (or more) processes could
share the same address space (i.e. same variables and registers etc) and run parts of the process in
parallel. This is what threads do.

Firstly, let us consider why we might need to use threads. Assume we have a server application
running. Its purpose is to accept messages and then act upon those messages. Consider the situation
where the server receives a message and, in processing that message, it has to issue an I/O request.
Whilst waiting for the I/O request to be satisfied it goes to a blocked state. If new messages are
received, whilst the process is blocked, they cannot be processed until the process has finished
processing the last request.
One way we could achieve our objective is to have two processes running. One process deals with
incoming messages and another process deals with the requests that are raised. However, this approach
gives us two problems
1. We still have the problem, in that either of the processes could become blocked (although there are
way around this by issuing child processes)
2. The two processes will have to up date shared variables. This is far easier if they share the same
address space.

The answer to these type of problems is to use threads. Threads are like mini-processes that operate
within a single process. Each thread has its own program counter and stack so that it knows where it is.
Apart from this they can be considered the same as processes, with the exception that they share the
same address space. This means that all threads from the same process have access to the same global
variables and the same files.

These tables show you the various items that a process has, compared to the items that each thread has.

Per Thread Items Per Process Items


• Program Counter • Address Space

D:\lecture06-Feb14.rtf
Tony Cook- 17/02/2003 - Page 3 of 7
Operating Systems
• Stack • Global Variables
• Register Set • Open Files
• Child Threads • Child Processes
• State • Timers
• Signals
• Semaphores
• Accounting Information

If you have a multi-processor machine, then different threads from the same process can run in parallel.

Threads not as independent as processes because:


1) in a given process they share the same global variables
2) they can potentially overwite or wipe out each other’s stacks, hence no inter-thread protection - but
perhaps not needed anyway because threads in a process created by a single user, hence do not
belong to other processes or users

Thread states are like process states in that they can be:running, blocked, ready or terminated

Typical Thread library calls are:

thread_create - start a thread in a named procedure


thread _exit - self terminating call
thread_wait - make the thread wait
thread_yield - to donate CPU usage to another thread

Each thread in a process has it’s own a stack - this contains one frame for each procedure called (but
not returned yet). A frame contains the procedure’s local variables & return address

Why do we really want threads?


1) ability of parallel entities, within a process, to all share common variables and data
2) 2) with no resources attached, easier to create and destroy than processes. In fact thread creation is
~100x faster than process creation
3) performance gain when there is a lot of computing and a lot of I/O as tasks can overlap. Pure CPU
bound thread application no advantage though
4) threads very effective on multiple CPU systems for concurrent processing
5) See p85-87 in Tanenbaum (2001) for usefulness of threads in a word processor
6) p87-90 for use of threads on a web server

Implementing threads in user space


1) If we put threads entirely in user space, then the kernel need just think it is serving single threaded
processes.
2) Threads can be implemented on an OS that does not support threads
3) Each process needs it’s very own thread table and can be thought of as an analog of the kernel’s
process table. A thread table contains.... a) thread program counter, b) thread stack pointer, c)
thread registers, d) thread state i.e. ready, blocked etc
4) Unlike processes though, when a thread_yield is called it saves the thread infomation in the thread
table itself and can instruct the thread scheduler to call another thread. Just local procedures used,
so more efficient/faster than a kernel call. Threads permit each process to have its own scheduler

Implementing threads in the kernel


Implentation in user space has some disadvantages:
1) how to implement blocking system calls e.g. what if a thread scans the keyboard before any keys
pressed - this will block others

D:\lecture06-Feb14.rtf
Tony Cook- 17/02/2003 - Page 4 of 7
Operating Systems
2) if a thread causes a page fault (we will learn about later) then the kernel (thread free in user space
thread implementation) will block the process until I/O is complete - so other potentially runable
processes will be unable to run!
3) a rogue thread may block other therads from running in a process i.e. no clock interupts or RR turn
taking for threads within a process
4) for CPU bound processes, not much advantage in thread usage

Kernel implementation
1) a kernel has table of all threads on system - there are no thread tables in processes now
2) a thread table is a subset of the process table in the kernel
3) to create or destroy a thread a kernel (not library) call is made
4) when a thread blocks then the kernel can choose to run a thread from the same process or a thread
from another process
5) it is costly to create and destroy threads in the kernel, therefore thread re-cycling is used by some
systems i.e. re-using data structures
6) kernels are better at coping with page faults i.e. just switches in any other runnable threads
7) the disadvantage of implementing threads in the Kernel system call is that to create/destroy threads
is a substantial exercise, so if there is a lot of thready creations or suicides, then this is not a happy
situation.

Implementing hybrid threads


Various ways have been investigated and you are advised to read p94 of Tanenbaum (2001) to look
into the advantages of using threads in hybrid user and kernel space. Suffice to say that each kernel
level thread has a subset of user level threads that utilize it.

What is a deadlock?
Tanenbaum (2001, p163) describes a deadlock as being: “a set of procedures is deadlocked if each
process in the set is waiting for an event that only another process in the set can cause” for example:

1) process A makes a request to use the scanner


2) process A given the scanner
3) process B requests the CD writer
4) process B given the CD writer
5) now A requests the CD writer
6) A is denied permission until B releases the CD writer
7) Alas now B asks for the scanner
8) result a DEADLOCK!

Typically deadlocks occur across a network or with interfaced and shared machines and devices. They
can occur in hardware, or software, or both. Now the typical sequence of events to acquire a device /
software / data are...

1) Request resource (if not available must wait - solution blocking or error code issued)
2) Use resourse
3) Relsease resource

N.B. semaphores and mutexes can help

Coffman’s conditions for a Deadlock...


1) “Mutual exclusion” - resources are either each assigned to one process or are available
2) “Hold and wait” - processes presently holding resources given earlier can request new processes
3) “No preemption condition” - resources previously given cannot be forcibly removed from a
process. They can only be released by the process holding them
4) “Circular wait condition” - A circular chain of two or more processes, each waiting for a resource
held by the next member in the chain

How to deal with deadlocks?


1) Ignore the problem - “Ostrich Algorithm”
This would be an engineer’s approach - they ask “how often will this happen” and if the answer is not
very frequently i.e. once per five years and it is not a life critical system then they will say “it’s just one

D:\lecture06-Feb14.rtf
Tony Cook- 17/02/2003 - Page 5 of 7
Operating Systems
of those things, computers do this from time to time; solution just reboot”. Wait and see - if sytem table
is full when a fork fails - wait a random timer interval and try again
2) Deadlock detection and recovery - wait for a deadlock to occur - detect it - do something about it
3) Deadlock avoidance - make a graph of resources and specify safe and unsafe regions of graph i.e.
Banker’s algorithms for single and multiple resources
4) Deadlock prevention - Undo one of Coffman’s conditions i.e. 1) Mutual exclusion, 2) Hold and wait,
3) No predemption, 4) Circular wait

Deadlock detection and recovery


1) Example - “one resource of each type” e.g. a system has one scanner, one CD writer, one plotter etc
now construct a resource graph. If a cycle is found we have a deadlock - if no cycle, no deadlock!

Tanenbaum (2001) fig 3-5

Circles are processes, squares are resources, so initially A is holding resource R, but at the same time A
is requesting resource S. For each node N in graph perform 5 steps...

1) create an empty list and initialize all arcs as unmarked


2) add current node to list and see if it appears > once; if so finish - it’s a loop!
3) from given node see if there are any outgoing branches not checked and if so goto (4) else jump to
step (5)
4) pick any unchecked outgoing branch at random, mark it as checked and follow it to the new
currect node and goto (2)
5) at a dead-end, so remove it and backtrack to the previous node and goto (2). If on going to (2) we
find out this is the initial node then the graph does not contain any loops - so terminate the search
as it is deadlock free

So let us start to make a list...


1) [R, A]
2) [R, A, S] - since S has no outgoing branch (deadend), back track to A, then to R Re-start in new
section with an empty list [] and move to try node B
3) [B]
4) [B, T] ..... [B, T, E] .....[B, T, E, V, G, U, D] D can goto S or T; S we have seen before so deadend,
and backtrack to D, and then it tries T
5) [B, T, E, V, G, U, D, T] - T has appeared twice, so we have a “DEADLOCK” @ any loops - so
terminate the search as it is deadlock free

Now we have found a deadlock what do we do about it?

1) Recovery through “Preemption” - take a resource away temporarily from it’s owner and give it to
another - perhaps though manual interaction
2) Recovery through “Rollback” - checkpoint processes preiodically to record states, memory images
etc. - on detection of deadlock, just roll back and restart and allocate resources differently. The
disadvantage of this is we lose everything that has happened since the roll back unless it can be
recomputed
3) Recovery by Killing Processes - this is crude but effective i.e. kill a process in the cycle or even
kill a process not in cycle so as to free up it’s processes

D:\lecture06-Feb14.rtf
Tony Cook- 17/02/2003 - Page 6 of 8
Operating Systems
Deadlock Avoidance - Resource Trajectories

Tanenbaum (2001) fig 3-8

Consider two processes (A, B) and two resources: printer and plotter. The horizontal axis is a potential
sequence of instructions executed by process A. The vertical axis is potential sequence of instructions
executed by process B. Process A needs a printer from I1 to I3, and the plotter from I2 to I4. Process B
needs the plotter from I5 to I7, and the printer from I6 to I8. If the system enters the box bounded by
I1,I3 & I6,I8 it will eventually deadlock at the double hatched section when a plotter is also requested.
So actually the system must pre-emptively decide resouces at point t

References
• Tanenbaum, A., S. 1992/2001. Modern Operating Systems. Prentice Hall.

D:\lecture06-Feb14.rtf
Tony Cook- 17/02/2003 - Page 7 of 7
Operating Systems

Deadlocks II & Memory Management I -


Lecture 7 - Feb 17
Handout Introduction
These notes are largely based on (Tanenbaum, 1992/2001). Where applicable, the notes will point you
to the relevant part of that book in case you want to read about the subject in a little more detail.

The main topics discussed in the handout are as follows

Handout Introduction ......................................................................................................................................1


Deadlocks........................................................................................................................................................1
Introduction to Memory Management.............................................................................................................2
Swapping.........................................................................................................................................................6
References .......................................................................................................................................................7

Deadlocks (II)
Deadlock Avoidance - Resource Trajectories

Tanenbaum (2001) Fig 3-8

Consider two processes (A, B) and two resources (printer and plotter). The horizontal axis is the
sequence of instructions that could be potentially executed by process A. The vertical axis is potential
sequence of instructions that could be executed by process B. So take for example:
1) process A needs a printer from I1 to I3, and the plotter from I2 to I4
2) process B needs the plotter from I5 to I7, and the printer from I6 to I8

So if the system enters the box bounded by I1,I3 & I6,I8 it will eventually deadlock at the double
hatched section when a plotter is also requested. So therefore the system must pre-emptive in deciding
resources at point t.

D:\lecture07-Feb17.rtf
Tony Cook- 17/02/2003 - Page 1 of 7
Operating Systems

Safe States

Tanenbaum (2001) fig 3-9

Take 3 processes A, B, C - we can consider how many resources they actually have, and the maximum
that they may need. The maximum number of resources available is 10 in this example. Now
consider...

a) (3+2+2)=7 resources allocated, so 10-7= 3 are free


b) say B ran exclusively taking 2 more resources, so free=1
c) B finishes, releases resources and A & C remain, & free=5
d) say scheduler selects C next and it takes 5 resources & free=0
e) completes, free=7 and A can run to completion

Unsafe States

Tanenbaum (2001) fig 3-10

Unsafe States are not strictly deadlocks, but are to be avoided! Take 3 processes A, B, C - we can
consider how many resources they actually have, and the maximum that they may need. The maximum
number of resources available in this example is 10...

a) 2+2)=7 resources allocated, so 10-7= 3 are free


b) say A ran exclusively taking 1 extra resource, so free=2
c) say B ran text and takes the two remaining resources, free=0
d) B completes and frees up it’s resources, free=4
neither A or C can complete as not enough resources are free!

Banker’s algorithm for a single resource


Banks and Building Societies are very tight over who they loan money to for say a house mortgage.
You do not just have to prove that you can pay them the money back, but you need to be able to show
that you have made a budget and are aware of certain actions that might take you into dangerous areas
where you are less able to pay their money back. In the Banker’s algorithm:
1) checks are done to requests of resources, so as to ensure that they do not permit entry into unsafe
states
2) one does not need to forbid (permanently) a request that at a given time might lead to an unsafe
state, just delay it until it’s outcome is deemed to be safe
3) the algorithm always checks to see if it has enough resources to satisfy customer demand
See Tanenbaum (2001) p178-179

Banker’s algorithm for multiple resources


See Tanenbaum (2001) p171-173 & p179-180

D:\lecture07-Feb17.rtf
Tony Cook- 17/02/2003 - Page 2 of 7
Operating Systems

Deadlock Prevention (see Tanenbaum (2001) p181-183)


1) Attacking Mutual Exclusion - avoid two or more devices accessing same resource using spooling
2) Attacking Hold and Wait - request all resources initially
3) Attacking No Preemption - i.e. taking away processes, not ideal e.g. if in middle of printing
4) Attacking Circular Waiting - introduce global numbering of resources e.g. scanner=1, plotter=1,
printer=3 etc. Hence a process may only request resources one after the other in numerical order.
As a result of this a resource allocation graph can never loop

Introduction to Memory Management


This section is based on (Tanenbaum, 1992 p74-81; 2001, p189-199). One of the main tasks of an
operating system is to manage the computers memory. This includes many responsibilities, for
example:
• Being aware of what parts of the memory are in use and which parts are not.
• Allocating memory to processes when they request it and de-allocating memory when a process
releases its memory.
• Moving data from memory to disc, when the physical capacity becomes full, and vice versa.
• Managing hierarchical memory e.g. volatile cache Æ RAM Æ Disk storage

In this handout we consider some ways in which these functions are achieved.

Monoprogramming
If we only allow a single process in memory at a time we can make life simple for ourselves. That is,
the processor does not permit multi-programming. Using this model we do not have to worry about
swapping processes out to disc when we run out of memory. Nor do we have to worry about keeping
processes separate in memory. All we have to do is load a process into memory, execute it and then
unload it before loading the next process.

However, even this simple scheme has its problems.


• We have not yet considered the data that the program will operate upon.
• We are also assuming that a process is self contained in that it has everything within it that allows
it to function. This assumes that it has a driver for each device it needs to communicate with. This
is both wasteful and unrealistic. We are also forgetting about the operating system routines. The
OS can be considered another process and so we have two processes running which means we
have left behind our ideal where we can consider that the memory is only being used by one
process.

But, even if a monoprogramming model did not have these memory problems we would still be faced
with other problems. In this day and age monoprograming is unacceptable as multi-programming is not
only expected by the users of a computer but it also allows us to make more effective use of the CPU.
For example, we can allow a process to use the CPU whilst another process carries out I/O or we can
allow two people to run interactive jobs and both receive reasonable response times.

We could allow only a single process in memory at one instance in time and still allow multi-
programming. This means that a process, when in a running state, is loaded in memory. When a context
switch occurs the process is copied from memory to disc and then another process is loaded into
memory. This method allows us to have a relatively simple memory module in the operating system
but still allows multi-programming.

The drawback with the method is the amount of time a context switch takes. If we assume that a
quantum is 100ms and a context switch takes 200ms then the CPU spends a disproportionate amount of
time switching processes. We could increase the amount of time for a quantum but then the interactive
users will receive poor response times as processes will have to wait longer to run.

Modelling Multiprogramming
We assume (and have stated) that multiprogramming can improve the utilisation of the CPU, and
intuitively, this is the case. If we have five processes that use the processor twenty percent of the time

D:\lecture07-Feb17.rtf
Tony Cook- 17/02/2003 - Page 3 of 7
Operating Systems
(spending eighty percent doing I/O) then we should be able to achieve one hundred percent CPU
utilisation. Of course, in reality, this will not happen as there may be times when all five processes are
waiting for I/O. However, it seems reasonable that we will achieve better than twenty percent
utilisation that we would achieve with monoprogramming. But, can we model this?

We can build a model from a probabilistic viewpoint. Assume that that a process spends p percent of its
time waiting for I/O. With n processes in memory the probability that all n processes are waiting for
I/O (meaning the CPU is idle) is pn. The CPU utilisation is then given by....

CPU Utlisation = 1 - pn

The following graph shows this formula being used (the spreadsheet that produced this graph is
available from the web site for this course).

CPU Utilisation

1.2

0.8
(%)

20% I/O Wait Time


CPU Utilisation

50% I/O Wait Time


0.6
80% I/O Wait Time
90% I/O Wait Time

0.4

0.2

0
0 1 2 3 4 5 6 7 8 9 10
Degree of Multiprogramming

You can see that with an I/O wait time of 20%, almost 100% CPU utilisation can be achieved with four
processes. If the I/O wait time is 90% then with ten processes, we only achieve just above 60%
utilisation.
The important point is that, as we introduce more processes the CPU utilisation rises.

The model is a little contrived as it assumes that all the processes are independent in that processes
could be running at the same time. This (on a single processor machine) is obviously not possible.
More complex models could be built using queuing theory but we can still use this simplistic model to
make approximate predictions.

Assume a computer with one megabyte of memory. The operating system takes up 200K, leaving room
for four 200K processes. If we have an I/O wait time of 80% then we will achieve just under 60% CPU
utilisation. If we add another megabyte, it allows us to run another five processes (nine in all). We can
now achieve about 86% CPU utilisation. You might now consider adding another megabyte of
memory, allowing fourteen processes to run. If we extend the above graph, we will find that the CPU
utilisation will increase to about 96%. Adding the second megabyte allowed us to go from 59% to 86%.
The third megabyte only took us from 86% to 96% (14 processes + 1 O.S.). It is a commercial decision
if the expense of the third megabyte is worth it.

D:\lecture07-Feb17.rtf
Tony Cook- 17/02/2003 - Page 4 of 7
Operating Systems

Mulitprogramming with Fixed Partitions


If we accept that mulitiprogramming is a good idea, we next need to decide how to organise the
available memory in order to make effective use of the resource. One method is to divide the memory
into fixed sized partitions. These partitions can be of different sizes but once a partition has taken on a
certain size then it remains at that size. There is no provision for changing its size. The IBM OS/360
was set up in this way. The computer operator defined the sizes of the partitions in the morning (or
when the machine was booted) and these partitions remained in effect until the computer was reloaded.
This was called MFT (Multiprogramming with Fixed number of Tasks – or OS/MFT)

Tanenbaum (2001) Fig 4-2

The diagram above shows how this scheme might work. The memory is divided into four partitions
(we’ll ignore the operating system). When a job arrives it is placed in the input queue for the smallest
partition that will accommodate it. There are a few drawbacks to this scheme.
1. As the partition sizes are fixed, any space not used by a particular job is lost.
2. It may not be easy to state how big a partition a particular job needs.
3. It is possible that a job is placed in a queue may be prevented from running by other jobs waiting
(and using) that partition.

To cater for the last problem we could have a single input queue where all jobs are held. When a
partition becomes free we search the queue looking for the first job that fits into the partition. An
alternative search strategy is to search the entire input queue looking for the largest job that fits into the
partition. This has the advantage that we do not waste a large partition on a small job but has the
disadvantage that smaller jobs are discriminated against. Smaller jobs are typically interactive jobs
which we normally want to service first. To ensure small jobs do get run we could have at least one
small partition or ensure that small jobs only get skipped a certain number of times. Using fixed
partitions is easy to understand and implement, although there are a number of drawbacks which we
have outlined above.

Relocation and Protection


As soon as we introduce multiprogramming we have two problems that we need to address.
Relocation : When a program is run it does not know in advance what location it will be loaded at.
Therefore, the program cannot simply generate static addresses (e.g. from jump
instructions). Instead, they must be made relative to where the program has been
loaded.
Protection : Once you can have two programs in memory at the same time there is a danger that
one program can write to the address space of another program. This is obviously
dangerous and should be avoided.

In order to cater for relocation we could make the loader modify all the relevant addresses as the binary
file is loaded. The OS/360 worked in this way but the scheme suffers from the following problems
• The program cannot be moved, after it has been loaded without going through the same process.

D:\lecture07-Feb17.rtf
Tony Cook- 17/02/2003 - Page 5 of 7
Operating Systems
• Using this scheme does not help the protection problem as the program can still generate illegal
addresses (maybe by using absolute addressing).
• The program needs to have some sort of map that tells the loader which addresses need to be
modified.

A solution, which solves both the relocation and protection problem is to equip the machine with two
registers called the base and limit registers. The base register stores the start address of the partition and
the limit register holds the length of the partition. Any address that is generated by the program has the
base register added to it. In addition, all addresses are checked to ensure they are within the range of
the partition. An additional benefit of this scheme is that if a program is moved within memory, only its
base register needs to be amended. This is obviously a lot quicker than having to modify every address
reference within the program. The IBM PC uses a scheme similar to this, although it does not have a
limit register.

Swapping
This section is based on (Tanenbaum, 1992, p81-88; 2001,p196-199). Using fixed partitions is a simple
method but it becomes ineffective when we have more processes than we can fit into memory at one
time. For example, in a timesharing situation where many people want to access the computer, more
processes will need to be run than can be fitted into memory at the same time. The answer is to hold
some of the processes on disc and swap processes between disc and main memory as necessary. In this
section we look at how we can manage this swapping procedure.

Multiprogramming with Variable Partitions


Just because we are swapping processes between memory and disc does not stop us using fixed
partition sizes. However, the reason we are having to swap processes out to disc is because memory is a
scare resource and, as we have discussed, fixed partitions can be wasteful of memory. Therefore it
would be a good idea to look for an alternative that makes better use of the scarce resource. It seems an
obvious development to move towards variable partition sizes. That is partitions that can change size
as the need arises. Variable partitions can be summed up as follows...

• The number of partition varies.


• The sizes of the partitions varies.
• The starting addresses of the partitions varies.

These make for a much more effective memory management system but it makes the process of
maintaining the memory much more difficult. For example, as memory is allocated and deallocated
holes will appear in the memory; it will become fragmented. Eventually, there will be holes that are too
small to have a process allocated to it. We could simply shuffle all the memory being used downwards
(called memory compaction), thus closing up all the holes. But this could take a long time and, for this
reason it is not usually done e.g. 256Mb might take ~3 sec

Another problem is if processes are allowed to grow in size once they are running. That is, they are
allowed to dynamically request more memory (e.g. the new statement in C++). What happens if a
process requests extra memory such that increasing its partition size is impossible without it having to
overwrite another partitions memory? We obviously cannot do that so do we wait until memory is
available so that the process is able to grow into it, do we terminate the process or do we move the
process to a hole in memory that is large enough to accommodate the growing process? The only
realistic option is the last one; although it is obviously wasteful to have to copy a process from one part
of memory to another - maybe by swapping it out first.

D:\lecture07-Feb17.rtf
Tony Cook- 17/02/2003 - Page 6 of 7
Operating Systems

Tanenbaum (2001) Fig 4-6

None of the solutions are ideal so it would seem a good idea to allocate more memory than is initially
required. This means that a process has somewhere to grow before it runs out of memory (see left hand
figure below). Most processes will be able to have two growing data segments, data created on the
stack and data created on the heap. Instead of having the two data segments grow upwards in memory a
neat arrangement has one data area growing downwards and the other data segment growing upwards.
This means that a data area is not restricted to just its own space and if a process creates more memory
in the heap then it is able to use space that may have been allocated to the stack (see the right hand
figure below). On Friday we are going to look at three ways in which the operating system can keep
track of the memory usage. That is, which memory is free and which memory is being used.

References
• Tanenbaum, A., S. 1992. Modern Operating Systems. Prentice Hall.

D:\lecture07-Feb17.rtf
Tony Cook- 17/02/2003 - Page 7 of 7
Operating Systems

Deadlocks II & Memory Management I -


Lecture 8 - Feb 21
Handout Introduction
These notes are largely based on (Tanenbaum, 1992/2001). Where applicable, the notes will point you
to the relevant part of that book in case you want to read about the subject in a little more detail.

The main topics discussed in the handout are as follows

Handout Introduction ......................................................................................................................................1


Swapping.........................................................................................................................................................1
Memory Usage with Bit Maps ........................................................................................................................2
Memory Usage with Linked Lists ...................................................................................................................3
First, Best and Worst Fits ................................................................................................................................4
Disk I/O for Course Work ...............................................................................................................................4
References .......................................................................................................................................................4

Memory Management (II)

Swapping
This section is based on (Tanenbaum, 1992, p81-88; 2001,p196-199). Using fixed partitions is a simple
method but it becomes ineffective when we have more processes than we can fit into memory at one
time. For example, in a timesharing situation where many people want to access the computer, more
processes will need to be run than can be fitted into memory at the same time. The answer is to hold
some of the processes on disc and swap processes between disc and main memory as necessary. In this
section we look at how we can manage this swapping procedure.

Multiprogramming with Variable Partitions


Just because we are swapping processes between memory and disc does not stop us using fixed
partition sizes. However, the reason we are having to swap processes out to disc is because memory is a
scare resource and, as we have discussed, fixed partitions can be wasteful of memory. Therefore it
would be a good idea to look for an alternative that makes better use of the scarce resource. It seems an
obvious development to move towards variable partition sizes. That is partitions that can change size
as the need arises. Variable partitions can be summed up as follows...

• The number of partition varies.


• The sizes of the partitions varies.
• The starting addresses of the partitions varies.

These make for a much more effective memory management system but it makes the process of
maintaining the memory much more difficult. For example, as memory is allocated and deallocated
holes will appear in the memory; it will become fragmented. Eventually, there will be holes that are too
small to have a process allocated to it. We could simply shuffle all the memory being used downwards
(called memory compaction), thus closing up all the holes. But this could take a long time and, for this
reason it is not usually done frequently because e.g. 256Mb might take ~3 sec

Another problem is if processes are allowed to grow in size once they are running. That is, they are
allowed to dynamically request more memory (e.g. the new statement in C++). What happens if a
process requests extra memory such that increasing its partition size is impossible without it having to
overwrite another partitions memory? We obviously cannot do that so do we wait until memory is
available so that the process is able to grow into it, do we terminate the process or do we move the
process to a hole in memory that is large enough to accommodate the growing process? The only
realistic option is the last one; although it is obviously wasteful to have to copy a process from one part
of memory to another - maybe by swapping it out first.

F:\lecture08-Feb21.rtf
Tony Cook- 21/02/2003 - Page 1 of 5
Operating Systems

Tanenbaum (2001) Fig 4-6

None of the solutions are ideal so it would seem a good idea to allocate more memory than is initially
required. This means that a process has somewhere to grow before it runs out of memory (see left hand
figure below). Most processes will be able to have two growing data segments, data created on the
stack and data created on the heap. Instead of having the two data segments grow upwards in memory a
neat arrangement has one data area growing downwards and the other data segment growing upwards.
This means that a data area is not restricted to just its own space and if a process creates more memory
in the heap then it is able to use space that may have been allocated to the stack (see the right hand
figure below).

Memory Usage with Bit Maps


Under this scheme the memory is divided into allocation units and each allocation unit has a
corresponding bit in a bit map. If the bit is zero, the memory is free. If the bit in the bit map is one, then
the memory is currently being used. This scheme can be shown as follows (also see Fig 4-7 on p199 of
Tanenbaum (2001):

The main decision with this scheme is the size of the allocation unit. The smaller the allocation unit, the
larger the bit map has to be. But, if we choose a larger allocation unit, we could waste memory as we
may not use all the space allocated in each allocation unit. The other problem with a bit map memory
scheme is when we need to allocate memory to a process. Assume the allocation size is 4 bytes. If a
process requests 256 bytes of memory, we must search the bit map for 64 consecutive zeroes. This is a
slow operation and for this reason bit maps are not often used.

F:\lecture08-Feb21.rtf
Tony Cook- 21/02/2003 - Page 2 of 5
Operating Systems
Memory Usage with Linked Lists
Free and allocated memory can be represented as a linked list. The memory shown above as a bit map
can be represented as a linked list as follows.

P 0 1 H 1 3 P 4 3 H 7 1 P 8 1

Each entry in the list holds the following data


• P or H : for Process or Hole
• Starting segment address
• The length of the memory segment
• The next pointer is not shown but assumed to be present

In the list above, processes follow holes and vice versa (with the exception of the start and the end of
the list). Also see Fig 4-7 on p199 of Tanenbaum (2001). But, it does not have to be this way. It is
possible that two processes can be next to each other and we need to keep them as separate elements in
the list so that if one process ends we only return the memory for that process. Consecutive holes, on
the other hand, can always be merged into a single list entry. This leads to the following observations
when a process terminates and returns its memory.

A terminating process can have four combinations of neighbours (we’ll ignore the start and the end of
the list to simplify the discussion). If X is the terminating process then the four combinations are...

Before X terminates After X terminates


1 X
2 X
3 X
4 X

• In the first option we simply have to replace the P by an H, other than that the list remains the
same.
• In the second option we merge two list entries into one and make the list one entry shorter.
• Option three is effectively the same as option 2.
• For the last option we merge three entries into one and the list becomes two entries shorter.

In order to implement this scheme it is normally better to have a doubly linked list so that we have
access to the previous entry. When we need to allocate memory, storing the list in segment address
order allows us to implement various strategies.

First Fit : This algorithm searches along the list looking for the first segment that is large
enough to accommodate the process. The segment is then split into a hole and a
process. This method is fast as the first available hole that is large enough to
accommodate the process is used.
Best Fit : Best fit searches the entire list and uses the smallest hole that is large enough to
accommodate the process. The idea is that it is better not to split up a larger hole that
might be needed later. Best fit is slower than first fit as it must search the entire list
every time. It has also be shown that best fit performs worse than first fit as it tends
to leave lots of small gaps.
Worst Fit : As best fit leaves many small, useless holes it might be a good idea to always use the
largest hole available. The idea is that splitting a large hole into two will leave a large
enough hole to be useful. It has been shown that this algorithm is not very good
either.

These three algorithms can all be speeded up if we maintain two lists; one for processes and one for
holes. This allows the allocation of memory to a process to be speeded up as we only have to search the
hole list. The downside is that list maintenance is complicated. If we allocate a hole to a process we
have to move the list entry from one list to another. However, maintaining two lists allows us to
introduce another optimisation. If we hold the hole list in size order (rather than segment address order)

F:\lecture08-Feb21.rtf
Tony Cook- 21/02/2003 - Page 3 of 5
Operating Systems
we can make the best fit algorithm stop as soon as it finds a hole that is large enough. In fact, first fit
and best fit effectively become the same algorithm.

The Quick Fit algorithm takes a different approach to those we have considered so far. Separate lists
are maintained for some of the common memory sizes that are requested. For example, we could have
a list for holes of 4K, a list for holes of size 8K etc. One list can be kept for large holes or holes which
do not fit into any of the other lists. Quick fit allows a hole of the right size to be found very quickly,
but it suffers in that there is even more list maintenance.

Quick Intro to Disk I/O to aid with the Assessed Course Work
Fig 1-8 on p25 in Tanenbaum (2001) shows a typical layout of a computer’s hard disk. This is divided
up into several disks (often writable on both surfaces) and almost (but not quite) touching each surface
is a read/write disk head e.g. a small magnetic coil - electric currents are generated in this by the
magnetic bits on the disk as they sweep past underneath. The disk heads can move in and out on an arm
controlled by a digital stepping motor. Now imagine that each disk is divided up into tracks (concentric
circles), and each track has N sectors of say 512 bytes (see Fig 5-25 on p 316 of Tanenbaum (2001)).
This is rough description of a how a computer’s hard drive i.e. disks, tracks and sectors. You will also
here the term “cylinder” a lot. This is because the disk heads can read the same track on many disks
simultaneously - a kind of parallelism. It is also common, because the circumference of a circle
changes with radius (2.Pi.R) that the number of sectors varies from the inside of a disk to the outside.
This is to avoid the magnetic storage density having to increase per sector as we move towards the
center of the disk. Anyway the good news for our course work is that we have a very simplified disk
system with NO cylinders, just one disk amd one surface, and always the same number of sectors per
track.

So onto the coursework and question 1....

A company is planning to build a small, non-standard single surface disk drive storage device
with characteristics outlined below. You must build a simplified software model of the disk drive
to demonstrate example disk file writes/reads, bit error checking, and disk arm scheduling. You
may write your program in C, C++ or Java. Your code must contain comments and be well laid
out. The code must also be designed so that the three experiments detailed below can be
performed.

OK so this is non-standard, and we shall disucuss differences between the course work example and
real-world examples later. However to help you with the course work you should be reading
Tanenbaum (2001) p315-327 as we will not be covering I/O for a few weeks. By software model this
could be taken as say an array of memory where each byte represents one byte on the proposed disk
drive. The three languages have been picked because you should know at least one of them. Comments
within software are very useful for people reading your code, and when you come back to it after a
while. Marks will be lost for unintelligable, poorly commented code.

Experiment 1
Write code to create 50 imaginary files, of lengths given in the table below, on an imaginary disk
drive. Assume 512 bytes per sector, 180 sectors per track, 80 tracks, a disk head seek time
between adjacent tracks of 1 ms, and a rotational speed of 1200 revolutions per minute (or 20 per
sec). Note buffering is not allowed. Data from each sector can be read and completely transferred
in 0.1 microseconds. Also assume that the disk head starts initially
at track zero, sector zero and that that each sector is divided up as follows:

Byte 0 byte representing sector No.


Byte 1 byte representing track No.
Bytes 2-6 file name (up to 6 characters)
Bytes 7-10 integer representing file length
Bytes 10-13 integer representing number of sequential sectors holding file
Bytes 14-496 containing the file data

F:\lecture08-Feb21.rtf
Tony Cook- 21/02/2003 - Page 4 of 5
Operating Systems
Bytes 496-511 ECC (16 byte) check for subsequent bit errors (see experiment 2) – you may devise
your own error check scheme or implement one from a concept in a book (but do not copy code).
You may use all (or some) of the 16 bytes available for this task.

Calculate the time needed to write each file in sequence – assume first in first out and no cylinder
or track skew between tracks. Assume that the disk head starts on track 0, sector 0. For the
contents of the file (bytes 14-496) just enter the numerical value of the file name.

OK so for 50 imaginary files use the table in the course work to look up the file name, and then it’s
length to reserve the correct number of bytes in the array that you are using to represent the model of
the disk drive. You can fill the bytes for these imaginary files with whatever you like e.g. the file’s
numerical name for each byte.

We are adopting 512 bytes per sector, however as you can see from Fig 5-24 on p315 of Tanenbaum
(2001), not all of a disk sector is allocated to data! So if your write or read data, and it is > 483 bytes,
then it will end up strewn across several adjacent sectors. If it is really big, then it may be stren across
adjacent tracks as well. Once again the 483 is specific to this design and it can vary with disk drives -
so to the other allocated bytes above.

Now there are on this proposed disk drive 180 dectors per track. In reality 8 to 32 sectors per track are
commonly used - but please use 180 sectors per track in our example.

Assume it takes the disk head, on it’s digital stepping motor arm 1 ms to move past one track, 2ms to
move past two tracks, 3 ms to move past 3 tracks etc. In reality it needs time to start up and accelerate,
and time to stop, but we shall ignore this and stick with 1 ms per track. The 0.1 microsecond to read
and write the data in this example is very fast and in this time the disk has had little time to rotate out a
sector. Which sectore the disk head is looking at will become important when the disk head is moving
between tracks - this takes time and during this time the disk has rotated sectors(s) and may have to
wait for the correct sequential sector to come into view again.

The ECC is an error check to see if any bytes in a sector have been corrupted or misread - it is upto you
to design some form of error check, utilizing all or some of the allocated bytes. Do not spend too much
time on this, it is important that you show some understanding of the problem though. Errors on disks
can occur for several reasons e.g. dust particles, magnetic drop outs, electrical interference during read
or write, cosmic ray damage. So it is important to know when this has occurred and where (or at least
which sector). As will be discussed in Experiment 3, the word “Skew” is used. This is where
numerically identical sectors are offset from one another between adjacent tracks so as to allow for the
time lag and rotational offset when the disk head jumps bweteen tracks.

“First in-First” out just means that you write one file after the other in numerical order of sectors, and
then omce these are full on a given track, onto the next track, and starting at sector zero again.

Finally assume that you do not store more than one file per sector i.e. never attempt to store file 01 and
02 on the same sector. If file 01 does not completely fill it’s last sector, just leave the remainder of the
sector’s contents with zeros.

References
• Tanenbaum, A., S. 2001. Modern Operating Systems. Prentice Hall.

F:\lecture08-Feb21.rtf
Tony Cook- 21/02/2003 - Page 5 of 5
Operating Systems

Memory Management III -


Lecture 9 - Feb 24
Handout Introduction
These notes are largely based on (Tanenbaum, 1992/2001). Where applicable, the notes will point you
to the relevant part of that book in case you want to read about the subject in a little more detail.

The main topics discussed in the handout are as follows

Handout Introduction ......................................................................................................................................1


Memory Usage with the Buddy System..........................................................................................................1
Virtual Memory: Introduction .........................................................................................................................2
Virtual Memory: Paging..................................................................................................................................5
Virtual Memory: Page Tables .........................................................................................................................5
References .......................................................................................................................................................6

Memory Management (III)


Memory Usage with the Buddy System
If we keep a list of holes sorted by their size, we can make allocation to processes very fast as we only
need to search down the list until we find a hole that is big enough. The problem is that when a process
ends the maintenance of the lists is complicated. In particular, merging adjacent holes is difficult as the
entire list has to be searched in order to find its neighbours.

The Buddy System (Knuth, 1973; Knowlton, 1965) is a memory allocation technique that works on the
basis of using binary numbers as these are faster and asier for computers to manipulate. Lists are
maintained which store lists of free memory blocks of sizes 1, 2, 4, 8,…, n, where n is the size of the
memory (in bytes). This means that for a one megabyte memory we require 21 lists. If we assume we
have one megabyte of memory and it is all unused then there will be one entry in the 1M list; and all
other lists will be empty.

Now assume that a 70K process (process A) needs to be swapped into memory. As lists are only held
as powers of two we have to allocate the next highest memory that is a power of two; in this case 128K.
The 128K list is currently empty. In fact, every list is empty except for the 1M list. Therefore, we split
the 1M block into two 512K blocks. One of the 512K blocks is then split into 256K blocks and one of
the 256K blocks is split into two 128K blocks; one of which is allocated to the 70K process with 58K

C:\Tony's Things\Operating Systems\lecture09-Feb24.rtf


Tony Cook- 24/02/2003 - Page 1 of 7
Operating Systems
unused. This situation is shown in the diagram above. At this stage we have three lists with one entry
(128K, 256K and 512K) and the 1M list is empty (as are all the other lists). The reason the scheme is
known as the buddy system is because each time a block is split it is called a buddy.

Next, a process (process B) requiring 35K might be swapped in. This will require a 64K block as it will
not fit into a 32K block. There are no entries in the 64K list so the next size list is considered (128K).
This has an entry so two buddies are created and the process is allocated to one of those blocks. If we
now request an 80K process (process C), this will have to occupy a 128K block, which will come from
the 256K list.

What happens if process A ends and releases its memory? In fact, the block of memory will simply be
added to the 128K list. If another process, D, now requests 60K of memory it will find an entry in the
64K list, so can be allocated there.

Now process B terminates and releases its memory. This will simply place its block in the 64K list. If
process D terminates we can start merging blocks. This is a fast process as we only have to check
adjacent lists and check for adjoining addresses. Finally, process C terminates and we can merge all the
way back to a single list entry in the 1M list.

The reason the buddy system is fast is because when a block size of 2k bytes is returned only the 2k list
has to be searched to see if a merge is possible. The problem with the buddy system is that it is
inefficient in terms of memory usage. All memory requests have to be rounded up to a power of two.
We saw above how an 80K process has to be allocated to a 128K memory block. The extra 40K is
wasted. This type of wastage is known as internal fragmentation. As the wasted memory is internal to
the allocated segments. This is the opposite to external fragmentation where the wasted memory
appears between allocated segments. For the interested student (Peterson, 1977) and (Kaufman, 1984)
have modified the buddy system to get around some of these problems. Linux adopts the buddy system
but with modifications to avoid internal fragmentation (Tanenbaum, 2001, p722)

Virtual Memory
Introduction
The swapping methods we have looked at above are needed so that we can allocate memory to
processes when they need it. But what happens when we do not have enough memory? In the past, a
system called overlays was used. This was a system that was the responsibility of the programmer. The
program would be split into logical sections (called overlays) and only one overlay would be loaded
into memory at a time. This meant that more programs could be running than would be the case if the
complete program had to be in memory. The downside of this approach is that the programmer had to
take responsibility for splitting the program into logical sections. This was time consuming, boring and
open to error.

It is no surprise that somebody eventually devised a method that allowed the computer to take over the
responsibility. It is (Fotheringham, 1961) who is credited with coming up with the method that is now
known as virtual memory. The idea behind virtual memory is that the computer is able to run programs
even if the amount of physical memory is not sufficient to allow the program and all its data to reside in
memory at the same time. At the most basic level we can run a 500K program on a 256K machine. But
we can also use virtual memory in a multiprogramming environment. We can run twelve programs in a
machine that could, without virtual memory, only run four.

Paging
In a computer system that does not support virtual memory, when a program generates a memory
address it is placed directly on the memory bus which causes the requested memory location to be
accessed. On a computer that supports virtual memory, the address generated by a program goes via a
memory management unit (MMU). This unit maps virtual addresses to physical addresses.

C:\Tony's Things\Operating Systems\lecture09-Feb24.rtf


Tony Cook- 24/02/2003 - Page 2 of 7
Operating Systems

This diagram shows how virtual memory operates. The computer in this example can generate 16-bit
addresses. That is addresses between 0 and 64K (0-65536). The problem is the computer only has 32K
of physical memory so although we can write programs that can access 64K of memory, we do not
have the physical memory to support that. We obviously cannot fit 64K into the physical memory
available so we have to store some of it on disc. The virtual memory is divided into pages. The
physical memory is divided into page frames. The size of the virtual pages and the page frames are the
same size (4K in the diagram above). Therefore, we have sixteen virtual pages and eight physical
pages. Transfers between disc and memory are done in pages.

Now let us consider what happens when a program generates a request to access a memory location.
Assume a program tries to access address 8192. This address is sent to the MMU. The MMU
recognises that this address falls in virtual page 2 (assume pages start at zero). The MMU looks at its
page mapping and sees that page 2 maps to physical page 6. The MMU translates 8192 to the relevant
address in physical page 6 (this being 24576). This address is output by the MMU and the memory
board simply sees a request for address 24576. It does not know that the MMU has intervened. The
memory board simply sees a request for a particular location, which it honours.

If a virtual memory address is not on a page boundary (as in the above example) then the MMU also
has to calculate an offset (in fact, there is always an offset – in the above example it was zero).

Question 1. As an exercise, and using the diagram above, work out the physical page and physical
address that are generated by the MMU for each of the following addresses. The answers are at the end
of this handout (question 1).

C:\Tony's Things\Operating Systems\lecture09-Feb24.rtf


Tony Cook- 24/02/2003 - Page 3 of 7
Operating Systems

Virtual Address Physical Page ?? Physical Address ???


0
45060
16384
21503
24576

So far, all we have managed to do is map sixteen virtual pages onto eight physical pages. We have not
really achieved anything yet as, in effect, we have eight virtual pages which do not map to a physical
page. In the diagram above, we represented these pages with an ‘X’. In reality, each virtual page will
have a present/absent bit which indicates if the virtual page is mapped to a physical page.

We need to look at what happens if the program tries to use an unmapped page. For example, the
program tries to access address 24576 (i.e. 24K). The MMU will notice that the page is unmapped
(using the present/absent bit) and will cause a trap to the operating system. This trap is called a page
fault. The operating system would decide to evict one of the currently mapped pages and use that for
the page that has just been referenced. The sequence of events would go like this.

• The program tries to access a memory location in a (virtual) page that is not currently mapped.
• The MMU causes a trap to the operating system. This results in a page fault.
• A little used virtual page is chosen (how this choice is made we will look at later) and its contents
are written to disc.
• The page that has just been referenced is copied (from disc) to the virtual page that has just been
freed.
• The virtual page frames are updated.
• The trapped instruction is restarted.

In the example we have just given (trying to access address 24576) the following would happen.
• The MMU would cause a trap to the operating system as the virtual page is not mapped to a
physical location.
• A virtual page that is mapped is elected for eviction (we’ll assume that virtual page 11 is
nominated).
• Virtual page 11 is mark as unmapped (i.e. the present/absent bit is changed).
• Physical page 7 is written to disc (we’ll assume for now that this needs to be done). That is the
physical page that virtual page 11 maps onto.
• Virtual page 6 is loaded to physical address 28672 (28K).
• The entry for virtual page 6 is changed so that the present/absent bit is changed. Also the ‘X’ is
replaced by a ‘7’ so that it points to the correct physical page.
• When the trapped instruction is re-executed it will now work correctly.

You might like to work through this to see how the mapping is changed. It is interesting to look at how
the MMU works. In particular, to consider why we have chosen to use a page size that is a power of 2.
Take a look at this diagram.

C:\Tony's Things\Operating Systems\lecture09-Feb24.rtf


Tony Cook- 24/02/2003 - Page 4 of 7
Operating Systems

The incoming address (20818) consists of 16 bits. The top four bits are masked off and make an entry
into the virtual page table (in this case it provides an index to entry 5 (101 in binary) and finds that this
page is mapped to physical page 011 (3 in decimal). These three bits make up the top three bits of the
physical page address. The other part of the incoming address is copied directly to the outgoing
address.
Thus, the page table (courtesy of the MMU) has mapped virtual address 20818 to physical address
12626. If you look at the diagram that shows how virtual memory operates you will be able to follow
this conversion. The only other point we should, perhaps, consider, is why the first twelve bits of the
incoming address can be copied directly to the output address. See if you can work it out before
looking at the next sentence? Well the answer is that twelve bits can represent 4096 (i.e. 212 = 4096).
This is the size (in bytes) of our page table. Therefore, these twelve bits of the address (whether
incoming virtual or outgoing relative) represent the offset within the page. In the example we looked at
the top four bits of the virtual address represent the virtual page and the top three bits of the physical
address represent the physical page. But, in both cases, the bottom twelve bits represent the offset
within the page. Therefore, the offset address can simply be copied.

Page Tables
The way we have described how virtual addresses map to physical addresses is how it works but we
still have a couple of problems to consider.

The page table can be large


Assume a computer uses virtual addresses of 32 bits (very common) and the page size is 4K. This
results in over 1 million pages (232 / 4096 = 1048576).

The mapping between logical and physical addresses must be fast


A typical instruction requires two memory accesses (to fetch the instruction and to fetch the data
that the instruction will operate upon). Therefore, it requires two page table accesses in order to find
the right physical location. Assume an instruction takes 10ms. The mapping of the addresses must
be done in a fraction of this time in order for this part of the system not to become a bottleneck.

In this course we are not going to look in detail at how these problems are overcome but the interested
student might like to look at (Tanenbaum, 1992, p93-107; 2001, p202-214) which covers it in some

C:\Tony's Things\Operating Systems\lecture09-Feb24.rtf


Tony Cook- 24/02/2003 - Page 5 of 7
Operating Systems
detail. However, we need to look at, in more detail, the structure of a page table so that we can use it in
future material.

This diagram (Fig 4-13 on p210 of Tanenbeaum 2001) shows typical entries in a page table (although
the exact entries are operating system dependent). The various components are described below.

Page Frame Number :


This is the number of the physical page that this page maps to. As this is
the whole point of the page, this can be considered the most important
part of the page frame entry.
Present/Absent Bit : This indicates if the mapping is valid. A value of 1 indicates the physical
page, to which this virtual page relates is in memory. A value of zero
indicates the mapping is not valid and a page fault will occur if the page
is accessed.
Protection : The protection bit could simply be a single bit which is set to 0 if the
page can be read and written and 1 if the page can only be read. If three
bits are allowed then each bit can be used to represent read, write and
execute (R,W,E).
Modified : This bit is updated if the data in the page has been modified. This bit is
used when the data in the page is evicted. If the modified bit is set, the
data in the page frame needs to be written back to disc. If the modified
bit is not set, then the data can simply be evicted, in the knowledge that
the data on disc is already up to date.
Referenced : This bit is updated if the page has been referenced. This bit can be used
when deciding which page should be evicted (we will be looking at its
use later).
Caching Disabled : This bit allows caching to be disabled for the page. This is useful if
a memory address maps onto a device register rather than to a memory address. In this case, the register
could be changed by the device and it is important that the register is accessed, rather than using the
cached value which may not be up to date.

Answers
Question 1

Virtual Address Physical Page Physical Address


0 2 8192 + 0 = 8192
45060 7 28672 + 4 = 28676
16384 4 16384 + 0 = 16384
21503 3 12288 + 1023 = 13311
24576 ?? ??

References
• Kaufman, A. 1984. Tailored-List and Recombination-Delaying Buddy Systems. ACM Trans. On
Programming Languages and Systems, Vol. 6, pp 118-125
• Knowlton, K.C. 1965. A Fast Storage Allocator. Communications of the ACM, vol 8, pp 623-625.
• Knuth, D.E. 1973. The Art of Computer Programming, Volume 1 : Fundamental Algorithms, 2nd
ed, Reading, MA, Addison-Wesley

C:\Tony's Things\Operating Systems\lecture09-Feb24.rtf


Tony Cook- 24/02/2003 - Page 6 of 7
Operating Systems
• Peterson, J.L., Norman, T.A. 1977. Buddy Systems. Communications of the ACM, Vol. 20, pp
421-431
• Tanenbaum, A., S. 1992/2001. Modern Operating Systems. Prentice Hall.

C:\Tony's Things\Operating Systems\lecture09-Feb24.rtf


Tony Cook- 24/02/2003 - Page 7 of 7
Operating Systems

OPS Memory Management


Handout Introduction
These notes are largely based on (Tanenbaum, 1992/2001). Where applicable, the notes will point you to
the relevant part of that book in case you want to read about the subject in a little more detail.

The main topics discussed in the handout are as follows

Handout Introduction ......................................................................................................................................1


Page Placement Algorithms..........................................................................................................................2
Design Issues for Paging .................................................................................................................................6
Segmentation...................................................................................................................................................6
References.......................................................................................................................................................6

Page Replacement Algorithms


Introduction
In the discussion above we said that when a page fault occurs we evict a page that is currently mapped and
replace it with the page that we are currently trying to access. In doing this, we need to write the page we
are evicting to disc, if it has been modified. However, we have not said how we decide which page to evict.
One of the easiest methods would be to choose a mapped page at random but this is likely to lead to
degraded system performance. This is because the page chosen has a reasonable chance of being a page
that will need to be used again in the near future. Therefore, it will be evicted (and may have to be written
to disc) and then brought back into memory; maybe on the next instruction. There has been a lot of work
done in this area and in this section we describe some of the algorithms. The interested student might like
to look at (Smith, 1978) which lists over 300 papers on this topic.

The Optimal Page Replacement Algorithm


You might recall, when we were discussing process scheduling, that there is an optimal scheme where we
always schedule the shortest job first. This leads to the minimum average waiting time. The problem is we
cannot identify the burst time of a process before it is run. Therefore, it may be optimal, but we cannot
implement it. This page replacement algorithm is similar in that it leads to an optimal solution, but we
cannot implement it. The basis of the algorithm is that we evict the page that we will not use for the longest
period of time. For example, if we only have two pages to choose from, one of which will be used on the
next instruction and the other will not be used for another one hundred instructions. It makes sense to evict
the page that will be used for one hundred instructions. The problem, of course, is that we cannot look into
the future and decide which page to evict. But, if we cannot implement the algorithm then why bother
discussing it? In fact, we can implement it, but only after running the program to see which pages we
should evict and at what point. Once we know that we can run the optimal page replacement algorithm. We
can then use this as a measure to see how other algorithms perform against this ideal.

The Not-Recently-Used Page Replacement Algorithm


In order to implement this algorithm we make use of the referenced and modified bits that were mentioned
above. These bits are often implemented in hardware (as they are updated so much) but they can be
simulated in software as follows. When a process starts, all its page entries are marked as not in memory
(i.e. the present/absent bit). When a page is referenced a page fault will occur. The R (reference) bit is set
and the page table entry modified to point to the correct page. In addition the page is set to read only. If the
page is later written to, the M (modified) bit is set and the page is changed so that it is read/write. Updating
the flags in this way allows a simple paging algorithm to be built.

C:\Tony's Things\Operating Systems\lecture10-Feb28.rtf


Tony Cook - 28/02/2003 - Page 1 of 6
Operating Systems
When a process is started up all R and M bits are cleared (set to zero). Periodically (e.g. on each clock
interrupt) the R bit is cleared. This allows us to recognise which pages have been recently referenced.
When a page fault occurs (so that a page needs to be evicted), the pages are inspected and divided into four
categories based on their R and M bits.

Class 0 : Not Referenced, Not Modified


Class 1 : Not Referenced, Modified
Class 2 : Referenced, Not Modified
Class 3 : Referenced, Modified

The Not-Recently-Used (NRU) algorithm removes a page at random from the lowest numbered class that
has entries in it. Therefore, pages which have not been referenced or modified are removed in preference to
those that have not been referenced but have been modified (which is not as impossible as it sounds due to
the fact that the reference bit is periodically reset). Although, not an optimal algorithm, NRU often provides
adequate performance and is easy to understand and implement.

The First-In, First-Out (FIFO) Page Replacement Algorithm


This algorithm simply maintains the pages as a linked list, with new pages being added to the end of the
list. When a page fault occurs, the page at the head of the list (the oldest page) is evicted. Whilst simple to
understand and implement a simple FIFO algorithm does not lead to good performance as a heavily used
page is just as likely to be evicted as a lightly used page.

The Second Chance Page Replacement Algorithm


The second chance (SC) algorithm is a modification of the FIFO algorithm. When a page fault occurs the
page at the front of the linked list is inspected. If it has not been referenced (i.e. it reference bit is clear), it
is evicted. If its reference bit is set, then it is placed at the end of the linked list and its reference bit reset.
The next page is then inspected. In this way, a page that has been referenced will be given a second chance.

In the worst case, SC operates in the same way as FIFO. Take the situation where the linked list consists of
pages which all have their reference bit set. The first page, call it a, is inspected and placed at the end of the
list, after having its R bit cleared. The other pages all receive the same treatment. Eventually page a reaches
the head of the list and is evicted as its reference bit is now clear. Therefore, even when all pages in a list
have their reference bit set, the algorithm will always terminate.

The Clock Page Replacement Algorithm

Tanenbaum (2001, p219, Fig 4-17)

C:\Tony's Things\Operating Systems\lecture10-Feb28.rtf


Tony Cook - 28/02/2003 - Page 2 of 6
Operating Systems
The clock page (CP) algorithm differs from SC only in its implementation. Whilst SC is a reasonable
algorithm it suffers in the amount of time it has to devote to the maintenance of the linked list. It is more
efficient to hold the pages in a circular list and move the pointer rather than move the pages from the head
of the list to the end of the list. It is called the clock page algorithm as it can be visualised as a clock face
with the hand (pointer) pointing at the pages, which represent the numbers on the clock.

The Least Recently Used (LRU) Page Replacement Algorithm


Although we cannot implement an optimal algorithm by evicting the page that will not be used for the
longest time in the future, we can approximate the algorithm keeping track of when a page was last used. If
a page has recently been used then it is likely that it will be used again in the near future. Conversely, a
page that has not been used for some time is unlikely to be used again in the near future. Therefore, if we
evict the page that has not been used for the longest amount of time we can implement a least recently used
(LRU) algorithm.

Whilst this algorithm can be implemented (unlike the optimal algorithm) it is not cheap. Ideally, we need to
maintain a linked list of pages which are sorted in the order in which they have been used. To maintain
such a list is prohibitively expensive (even in hardware) as deleting and moving list elements is a time
consuming process. Sorting a list is also expensive.

However, there are ways that LRU can be implemented in hardware. One way is as follows. The hardware
is equipped with a counter (typically 64 bits). After each instruction the counter is incremented. In addition,
each page table entry has a field large enough to accommodate the counter. Every time the page is
referenced the value from the counter is copied to the page table field. When a page fault occurs the
operating system inspects all the page table entries and selects the page with the lowest counter. This is the
page that is evicted as it has not been referenced for the longest time.

Another hardware implementation of the LRU algorithm is given below. If we have n page table entries a
matrix of n x n bits , initially all zero, is maintained. When a page frame, k, is referenced then all the bits of
the k row are set to one and all the bits of the k column are set to zero. At any time the row with the lowest
binary value is the row that is the least recently used (where row number = page frame number). The next
lowest entry is the next recently used; and so on.

If we have four page frames and access them as follows

0123210323

it leads to the algorithm operating as follows - it might be worth working through it and calculating the
binary value of each row to see which page frame would be evicted.

Page Page Page Page Page


0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
0 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0
1 0 0 0 0 1 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0
2 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1
3 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0
(a) page 0 (b) page 1 (c) page 2 (d) page 3 (e) page 2

0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 0 0 0 1 0 0
1 1 0 1 1 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0
2 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 1 1 0 0
3 1 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0

C:\Tony's Things\Operating Systems\lecture10-Feb28.rtf


Tony Cook - 28/02/2003 - Page 3 of 6
Operating Systems
(f) page 1 (g) page 0 (h) page 3 (h) page 2 (i) page 3

LRU in Software
One of the main drawbacks with implementing LRU (Least Recently Used) in hardware is that if the
hardware does not provide these facilities then the operating system designers, obviously, cannot make use
of them. Instead we need to implement a similar algorithm in software. One method (called the Not
Frequently Used – NFU algorithm) associates a counter with each page. This counter is initially zero but at
each clock interrupt the operating system scans all the pages and adds the R bit for the page to its counter.
As this is either zero or one, the counter either gets incremented or it does not. When a page fault occurs the
page with the lowest counter is selected for replacement.

The main problem with NFU is that it never forgets anything. For example, if a multi-pass compiler is
running, at the end of the first pass the pages may not be needed anymore. However, as they will have high
counts they will not be replaced but pages from the second pass, which still have low counts, will be
replaced. In fact, the situation could be even worse. If the first pass made a lot of memory references but
the second pass does not make as many, then the pages from the first pass will always have higher counts
than the second pass and will therefore remain in memory.

To alleviate this problem we can make a modification to NFU (Not Frequently Used) so that it closely
simulates LRU (Least Recently Used). The modification is in two parts and implements a system of aging.

1. The counters are shifted right one bit before the R bit is added.
2. The R bit is added to the leftmost bit rather than the rightmost bit.

Look at the diagram below. This represents a page table with six entries. Working from right to left we are
showing the state of each of the pages (only the counter entries) at each of the five clock ticks. Consider the
(a) column. After clock tick zero the R flags for the six pages are set to 1, 0, 1, 0, 1 and 1. This indicates
that pages 0, 2, 4 and 5 were referenced. This results in the counters being set as shown. We assume they
all started at zero so that the shift right, in effect, did nothing and the reference bit was added to the leftmost
bit. If you look at the (b) clock tick you should be able to follow the algorithm and similarly for (c) to (e)
clicks.

R bits for pages 0-5 R bits for pages 0-5 R bits for pages 0-5 R bits for pages 0-5 R bits for pages 0-5
Clock Tick 0 Clock Tick 1 Clock Tick 2 Clock Tick 3 Clock Tick 4
101011 110010 110101 100010 011000
Page
0 10000000 11000000 11100000 11110000 01111000
1 00000000 10000000 11000000 01100000 10110000
2 10000000 01000000 00100000 00010000 10001000
3 00000000 00000000 10000000 01000000 00100000
4 10000000 11000000 01100000 10110000 01011000
5 10000000 01000000 10100000 01010000 00101000
(a) #0,2,4 (b) #0,1,4 (c) #0,1,3,5 (d) #0,4 (e) #1,2

When a page fault occurs, the counter with the lowest value is removed. It is obvious that a page that has
not been referenced for, say, four clocks ticks will have four zeroes in the leftmost positions and will have a
lower value that a page that has not been referenced for three clock ticks.

The NFU (Not Frequently Used) algorithm differs from LRU (Least Frequently Used) in two ways. Using
the matrix LRU implementation we update the matrix after each instruction. This, in effect, gives us more
detailed information. If you look at pages 3 and 5 in the above diagram, after clock tick 4, both pages have
not been referenced for the last two clock ticks. But when they were referenced we do not know how early
(or late) in that clock tick the pages were referenced. All we are able to do is evict page 3 as this has not

C:\Tony's Things\Operating Systems\lecture10-Feb28.rtf


Tony Cook - 28/02/2003 - Page 4 of 6
Operating Systems
been referenced anymore, whereas page 5 has. The aging counters of NFU have a finite size. If two
counters have a zero value, all we can do is choose one at random. Assuming that the counters are 8 bits
then we do not know if the page was accessed 9 clock ticks ago or 1000 clock ticks ago. In practise, 8 bits
is commonly used. If a clock tick is around 20ms and a page has a zero counter this means that the page has
not been referenced for 160ms. In this case, it is probably not that important.

Design Issues for Paging


In this section we are going to look at some other aspects of designing a paging system so that we can
achieve good performance.

Working Set Model


The most obvious way to implement a paging system is to start a process with none of its pages in memory.
When the process starts to execute it will try to get its first instruction, which will cause a page fault. Other
page faults will quickly follow as the process tries to access the stack, global variables and executes other
instructions. After a period of time the process should start to find that most of its pages are in memory and
not so many page faults occur. This type of paging is known as demand paging as pages are brought into
memory on demand.

The reason that page faults decrease (and then stabilise) is because processes normally exhibit a locality of
reference. This means that at a particular execution phase of the process it only uses a small fraction of the
pages available to the entire process. The set of pages that is currently being used is called its working set
(Denning, 1968a; Denning 1980). If the entire working set is in memory then no page faults will occur.
Only when the process moves onto the next phase of execution (e.g. the next phase of a compiler) will page
faults begin to occur as pages not part of the existing working set are brought into memory. If the memory
of the computer is not large enough to hold the entire working set, then pages will constantly be copied out
to disc and subsequently retrieved. This drastically slows a process down as the time taken to execute an
instruction is a lot faster than disc accesses. A process which causes page faults every few instructions is
said to be thrashing (Denning 1968b).

In a system that allows many processes to run at the same time (or at least give that illusion) it is common
to move all the pages for a process to disc (i.e. swap it out). When the process is restarted we have to decide
what to do. Do we simply allow demand paging, so that as the process raises page faults, it pages are
gradually brought into memory? Or do we move all its working set into memory so that it can continue
with minimal page faults? It will come as no surprise that the second option is to be preferred. We would
like to avoid a process, every time it is restarted, raising page faults. In order to do this the paging system
has to keep track of the processes’ working set so that it can be loaded into memory before it is restarted.
The approach is called the working set model (Denning, 1970). Its aim, as we have stated, is to avoid page
faults being raised. This method is also known as prepaging.

A problem arises when we try to implement the working set model as we need to know which pages make
up the working set. One solution is to use the aging algorithm described above. Any page that contains a 1
in n high order bits is deemed to be a member of the working set. The value of n has to be experimentally
found although it has been found that the value is not that sensitive

Paging Daemons
If a page fault occurs it is better if there are plenty of free pages for the page to be copied to. If we have the
situation where every page is being used we have to find a page to evict and we may have to write the page
to disc before evicting it. Many systems have a background process called a paging daemon. This process
sleeps most of the time but runs at periodic intervals. Its task is to inspect the state of the page frames and,
if too few pages are free, it selects pages to evict using the page replacement algorithm that is being used. A
further performance improvement can be achieved by remembering which page frame a page has been
evicted from. If the page frame has not been overwritten when the evicted page is needed again then the
page frame is still valid and the data does not have to copied from disc again. In addition the paging

C:\Tony's Things\Operating Systems\lecture10-Feb28.rtf


Tony Cook - 28/02/2003 - Page 5 of 6
Operating Systems
daemon, if it does not evict pages, can ensure they are clean. That is they have been written to disc so that
if they are later evicted they can simply be overwritten without having to write data to disc.

Segmentation
This subject maybe covered towards the end of the course when more time is available. In the mean time
you should read Tanenbaum, 1992, p128–141 or Tanenbaum 2001, p 249-262.

References
• Denning, P.J. 1968. The Working Set Model for Program Behaviour. Communications of the ACM,
Vol. 11, pp 323-333.
• Denning, P.J. 1968a. Working Sets Past and Present. IEEE Trans on Software Engineering, Vol. SE-6,
pp 64-84.
• Denning, P.J. 1968b. Thrashing: Its Causes and Prevention. Proceedings AFIPS National Computer
Conference, pp 915-922.
• Denning, P.J. 1970. Virtual Memory. Computing Surveys, Vol. 2, pp 153-189.
• Smith, A.J. 1978. Bibliography on Paging and Related Topics. Operating Systems Review, Vol. 12, pp
39-56
• Tanenbaum, A., S. 1992. Modern Operating Systems. Prentice Hall.

C:\Tony's Things\Operating Systems\lecture10-Feb28.rtf


Tony Cook - 28/02/2003 - Page 6 of 6
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Modeling Page Replacement

Belady’s Anomaly states that having more pages in memory is not necessarily the best course of
action. He demonstrated that in a particular example with FIFO there are more page faults with four
pages in memory than with three. Consider the following sequence of pages, which is assumed to
occur on a system with no pages loaded initially that uses FIFO.
0123014041234
If we have 3 frames this generates 9 page faults:
Req: 0 1 2 3 0 1 4 0 4 1 2 3 4
Table: - 0 1 2 3 0 1 4 4 4 4 2 2
- - 0 1 2 3 0 1 1 1 1 4 4
- - - 0 1 2 3 0 0 0 0 1 1
Fault? Y Y Y Y Y Y Y Y Y

If we have 4 frames this generates 10 page faults.

Belady’s Anomaly surprised many and caused a lot of research into page modelling including stack
algorithms, the distance string, predicting page fault rates and page sizes.

Segmentation

Our virtual memory has been 1-dimensional so far. Address 0 to Max. For some problems, 2 or
more separate virtual address spaces are handy e.g. a compiler with many tables that build up during
compiling such as source text for printing/debugging, symbol table for variables, table of constants,
parse tree and a stack for procedure calls. The first four of those tables grow continuously but the
stack though may vary in size. In 1-dimensional memory, contiguous chunks of memory have to be
set aside. Suppose we have an extraordinary number of variables - the allocated space for the table
may fill up. Then what? Do we halt compilation, steal from richer space-free tables?

The solution is to use segmentation - i.e. to set aside completely independent address spaces of
virtual memory e.g. one virtual memory space for each compile table. Each segment of virtual
memory may be a different size, or even change during execution. We could even compile different
segments separately.

As we are now dealing effectively with 2 dimensional virtual memory, we now need 2 addresses to
access it, a segment number and an address within the segment. As most of virtual memory does
not exist physically in RAM, it can be made so large that there is little chance of an individual
segment filling up. A segment is a logical entity. It contains one type of object only and not mixtures,
e.g. not both a stack & symbol table. Processes can now share data and procedures more easily
e.g. a shared graphics library can be in a segment of its own.

Page 1
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

The diagram below illustrates checker-boarding (external fragmentation) and compaction:

Paging vs Segmentation

Page 2
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

INPUT/OUTPUT

Management of I/O devices is one of the principal tasks for the operating system. This includes
handling message passing, interrupts and errors as well as providing a simple interface and, to some
degree, device independence. In general, users do not care about the electronics/interface,
programmers wants an interface between the hardware and user levels while electrical engineers are
more concerned with the component parts.

Classification of I/O Devices


 Machine readable or Block devices
Information in fixed blocks with each block having has its own address. Each block can be read
from/written to independently, e.g., disks, tapes, sensors
 User readable or Character devices
Accepts a stream of characters with no attention paid to block structure. It is not addressable and
has no seek ability, e.g., printers, graphical terminals, screen, keyboard, mouse
 Other
Communications e.g. modems
Clocks etc

Data Transfer Rates


Range from 10 bytes/sec (keyboard) to 20Gbytes/sec (Sun Gigaplane XB backplane)
 Telephone/cable ~100Kytes/sec
 Digital Camcorder: 4 Mbytes/sec
 Firewire: 50Mbyte/sec
 Ethernet range from 1.25Mbyte/sec (Classic) to Fast Ethernet 12.5Mbyte/sec to Gigabit Ethernet
at 125Mbyte/sec
 Disks typically have 6Mbytes/sec (40x CDROM) to 80Mbytes/sec (SCSI Ultra 2 Disk)

Controllers
It is not possible to simply connect I/O devices directly to the system bus for several reasons. There
are many different types of device, each with a different method of operation, e.g. monitors, disk
drives, keyboards. It is impracticable for a CPU to be aware of the operation of every type of device,
particularly as new devices may be designed after the CPU has been produced. The data transfer
rate of most peripherals is much slower than that of the CPU. The CPU cannot communicate directly
with such devices without slowing the whole system down. Peripherals will often use different data
word sizes and formats than the CPU.

I/O units have electronic and mechanical components separated in a modular design. The electronic
component is the “device controller” or “adapter”, e.g a printed circuit board placed into an expansion
slot. These are designed independently for many devices. There are a number of standardised
interfaces (ANSI,ISO,IEEE, etc). Some controllers can handle many devices.

Page 3
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Example: Video Card


Reads bytes from own memory and generates signals to steer the CRT or LCD. Programming the
electron rays/pixels is clearly not feasible for normal programmers so the controller must offer an
abstract interface. Each controller has a few registers that are used to talk to the CPU and these
may be part of the normal address space of the computer (memory mapped I/O). The OS
communicates with the hardware controllers by writing to registers that control the hardware and
reading the registers gives the device status. Many devices uses data buffers for the CPU to access.

We need a uniform approach to I/O as seen from the user and from the operating system point of
view to handle:
 Units of transfers: i.e. data blocks vs character streams
 Data coding conventions, parity (to check for errors) etc
 Errors states and reporting of errors

The I/O function can be approached in three ways:


 “Programmed I/O”, where continuous attention of the processor is required - uses an I/O port (8 or
16 bit integer) number
 “Direct memory access”, the DMA module governs the exchange of data between the I/O unit and
the main memory
 “Interrupt driven I/O”: processor launches I/O and can continue until interrupted

Programmed I/O
The simplest strategy for handling communication between the CPU and an I/O module is
programmed I/O. Using this strategy, the CPU is responsible for all communication with I/O modules,
by executing instructions which control the attached devices, or transfer data. For example, if the
CPU wanted to send data to a device using programmed I/O, it would first issue an instruction to the
appropriate I/O module to tell it to expect data. The CPU must then wait until the module responds
before sending the data. If the module is slower than the CPU, then the CPU may also have to wait
until the transfer is complete. This can be very inefficient. Another problem exists if the CPU must
read data from a device such as a keyboard. Every so often the CPU must issue an instruction to
the appropriate I/O module to see if any keys have been pressed. This is also extremely inefficient.
Consequently this strategy is only used in very small microprocessor controlled devices.

How does the CPU talk to the control registers and the device data buffers?
 First method: I/O port
Each control register has an I/O port number (8 or 16 bit integer). Most early computers (i.e. main
frames) used this approach. Address space for memory and I/O were different.
 Memory Mapped I/O
Map all device control registers into unique reserved locations at the top (usual) of the memory
 Third Method: Hybrid I/O port and Memory Mapped I/O

Page 4
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

If the CPU wants to read a word, it puts the address on the bus address line and asserts a read
signal on bus control line. If the address is for memory space, the memory responds, or if the
address is for I/O space, the I/O driver responds. Note that Pentiums have three “external buses”:
memory, PCI (e.g. SCSI, USB), and ISA (e.g. modem, printer).

Interrupt Controlled I/O


A more common strategy is to use interrupt driven I/O. This strategy allows the CPU to carry on with
its other operations until the module is ready to transfer data. When the CPU wants to communicate
with a device, it issues an instruction to the appropriate I/O module, and then continues with other
operations. When the device is ready, it will interrupt the CPU. The CPU can then carry out the data
transfer as before. This also removes the need for the CPU to continually poll input devices to see if
it must read any data. When an input device has data, then the appropriate I/O module can interrupt
the CPU to request a data transfer.

The situation is somewhat complicated by the fact that most computer systems will have several
peripherals connected to them. This means the computer must be able to detect which device an
interrupt comes from, and to decide which interrupt to handle if several occur simultaneously. This
decision is usually based on interrupt priority. Some devices will require response from the CPU
more quickly than others, for example, an interrupt from a disk drive must be handled more quickly
than an interrupt from a keyboard.

Many systems use multiple interrupt lines. This allows a quick way to assign priorities to different
devices, as the interrupt lines can have different priorities. However, it is likely that there will be more
devices than interrupt lines, so some other method must be used to determine which device an
interrupt comes from. Most systems use a system of vectored interrupts. When the CPU
acknowleges an interrupt, the relevant device places a word of data (a vector) on the data bus. The
vector identifies the device which requires attention, and is used by the CPU to look up the address
of the appropriate interrupt handing routine.

On receipt of an interrupt, if no other interrupts are pending then the interrupt controller processes
the interrupt immediately, otherwise it continues to assert the interrupt until the CPU can respond.
The CPU knows which devices have sent an interrupt because the controller puts a number on the
address lines. The number on the address line is treated as an index in the “interrupt vector” table to
fetch a new program counter. The program counter points to the start of the appropriate interrupt
service procedure. Traps and interrupts share the same interrupt vector. The interrupt vector can be
in hardware or in memory. The interrupt service procedure acknowledges the I/O device, after a
delay to avoid race conditions.

The interrupt information e.g. program counter, register contents etc, could be saved in internal
registers, but to avoid a second interrupt overwriting, long delays are needed prior to enabling other
interrupts. Most CPUs save interrupt information in a stack.

Page 5
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

A ‘precise interrupt’ refers to when the machine gets left in a well defined state after an interrupt:
 PC saved in a known place
 all preceeding instructions have been fully run
 no subsequent instructions have been executed
 execution state of current instruction is known

An ‘imprecise interrupt’ refers to when the above 4 conditions do not hold leading to fatal
programming errors.

Direct Memory Access


Although interrupt driven I/O is much more efficient than program controlled I/O, all data is still
transferred through the CPU. This will be inefficient if large quantities of data are being transferred
between the peripheral and memory. The transfer will be slower than necessary, and the CPU will be
unable to perform any other actions while it is taking place. Many systems therefore use an
additional strategy, known as direct memory access (DMA). DMA uses an additional piece of
hardware - a DMA controller. The DMA controller can take over the system bus and transfer data
between an I/O module and main memory without the intervention of the CPU. Whenever the CPU
wants to transfer data, it tells the DMA controller the direction of the transfer, the I/O module
involved, the location of the data in memory, and the size of the block of data to be transferred. It can
then continue with other instructions and the DMA controller will interrupt it when the transfer is
complete.

The CPU and the DMA controller cannot use the system bus at the same time, so some way must be
found to share the bus between them. One of two methods is normally used.
 Burst mode
The DMA controller transfers blocks of data by halting the CPU and controlling the system bus for
the duration of the transfer. The transfer will be as quick as the weakest link in the I/O
module/bus/memory chain, as data does not pass through the CPU, but the CPU must still be
halted while the transfer takes place
 Cycle stealing
The DMA controller transfers data one word at a time, by using the bus during a part of an
instruction cycle when the CPU is not using it, or by pausing the CPU for a single clock cycle on
each instruction. This may slow the CPU down slightly overall, but will still be very efficient.

Goals of I/O Software


 Device Independence
Software can access any device without having to define that device in advance, e.g. reading a
file from floppy, hard disk or CD-ROM should make no difference
 Uniform Naming
Files and devices can be addressed the same way by a path

Page 6
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

 Error Handling
Should be handled as close to the hardware as possible, e.g. read error on hard disk, drive
controller should repeat reads (or at different speeds) to see if it was a fluke
 Synchronous vs Asynchronous Blocking
Most physical I/O is the latter
 Buffering - where to put the data?
 Sharable (floppy disk drives) and non-sharable devices (CDwriters)

Example - how to print “ABCDEFGH” to a printer?


Using programmed I/O,
 assemble string in buffer, user process to acquire printer (if available). If printer not available
either fail or block until ready
 copies buffer to kernel space, check and/or wait until printer available then copy first character to
printer’s register
 wait till printer acknowledges it is ready then send next char

Now consider a printer that works from interrupts, printing characters as and when they arrive.
Assume the printer can handle 100 characters per sec or 1 per 10ms. In a programmed I/O system,
every character written to printer data register - the CPU will wait in an idle loop for 10ms. Interrupts
solves this problem, allowing context switches and enabling the CPU to do other tasks whilst waiting.

Using DMA, we let the DMA controller feed characters to the buffer one at a time. The CPU does not
have to be involved, other than invoking the DMA. Now only one interrupt per buffer needed rather
than one per character. The only disadvantage is that DMA is usually slower than the CPU.

I/O Software Layers


The four layers below show typical organization of layered software. I/O interrupts should be hidden
as low as possible in the Operating System. Any driver starting an I/O operation should block until it
is complete. Then the interrupt procedure gets activated and handles the interrupt request.

Page 7
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Device Drivers

Device Controllers have registers for commands or status or both. The number of registers varies
with device and we therefore need device specific code - “device driver”. This is usually delivered by
the manufacturer. We can have a driver for closely related devices e.g. SCSI disk controller for HD
and CDROM. Current Operating Systems expect device drivers to run in the kernel. It needs a well
defined model of what a driver does and how it interacts with the rest of the operating system.
Drivers are normally positioned below the operating system with standard interfaces being defined for
block device drivers and character device drivers. Drivers must except requests from device
independent software. Device Driver typical procedures include:
 check input parameter validity
 check to see if device is in use
 issue commands to control device through device registers
 did device accept commands?
 driver will either have to wait or results will come back immediately
 some drivers may need to be re-enterant

Some I/O software is device independent, other I/O software is device specific. Device independent
software provides uniform interfacing for device drivers in terms of buffering, error reporting,
allocating and releasing dedicated devices and providing a device independent block size.

Buffering
 unbuffered input - not efficient and might miss data
 buffering in user space - what if buffer paged out when a character arrives?
 buffering in the kernel followed by copy to user space, but what happens to characters arriving
when buffer being transferred?
 double buffering in the kernel
Too much buffering is not too healthy because it degrades performance as the following steps must
be performed sequentially:
 user system call to write to network, data copied to kernel buffer
 the invoked driver copies data to network controller
 data copied to network (controller independent of cycle stealing)
 bits transmitted, then arrive and are placed onto kernel buffer
 copied from kernel space to user space and into receiving process

Disks
 larger available capacity
 lower price per bit
 permanent

Page 8
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Structure of a disk drive

A “cylinder” refers to the same track on many disks.

Disks have between 8 and 32 sectors each with an equal number of bytes (1K, 0.5K,0.25K) in the
middle as well as on the border. The controller can run a number of seeks (moving the disk heads to
the correct tracks and waiting for the correct sector to come around) on different disks at the same
time but cannot read/write on more than one drive at a time.

Byte 12-15
Byte 0 Byte2-7
No. of Sectors file spans Bytes 496-511
Sector Filename
(integer) Error Check
(0-179) Byte 1 “00”-”49”
16 bytes available
Track
(0-79) Bytes 16-495
Byte 8-11
Data from file
File Length
(480 byte section of file)
(in bytes)

A disk sector would typically consist of i) preamble with cylinder, sector number, etc., ii) data, iii) error
correction code. Controller produces a block of bytes and performs error correction if necessary and
then copies block into memory.

Page 9
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Algorithms for the Disk Arms

Access time =
seek time time needed to move the arm to the cylinder (dominant)
+ rotational delay time before the sector appears under the head
+ transfer time time to transfer the data
Dominance of seek time leaves room for optimization. Error checking done by controllers.

Seek time (in ms) to get the arm over the track is difficult to determine: Ts = m*n + s
 Ts = estimated seek time (ms)
 n = number of tracks crossed
 s = start time

Rotational Latency and Transfer Time


3600 rpm (= 16.7ms/rotation)
Tt=b/rN
 Tt = transfer time
 b = number of bytes to be transferred
 N = number of bytes per track
 r = rotation speed in rotations per second
Average rotational latency is 8.3 ms

Example
Read a file of size 256 sectors with
Ts = 20 ms
512 bytes/sector, 32 sectors/track
Suppose the file is stored as compact as possible: all sectors on 8 consecutive tracks of 32 sectors
each (sequential storage)

The first track takes 20 + 8.3 + 16.7 = 45 ms


The remaining tracks do not need seek time
Per track we need 8.3 + 16.7 = 25 ms
Total time 45 + 7*25 = 230 ms = 0.22s

In case the access is not sequential but at random for the sectors, we get:
Time per sector = 20 + 8.3 + 0.5 = 28.8ms
Total time 256 sectors = 256*28.8 = 7.37s
It is important to obtain an optimal sequence for the reading of the sectors.

Page 10
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Optimisation
Heavy loaded disk allows for a strategy to minimize the arm movement. The situation is dynamic in
that the disk driver keeps a table of requested sectors per cylinder, e.g. while a request for track 11 is
being handled, requests for 1, 36, 16, 34, 9 and 12 arrive. Which one is to be handled after the
current request? There are 4 main disk optimisation algorithms:
 FCFS – “First Come First Serve”
 SSF – “Shortest Seek First”
 “Elevator” or “SCAN”
 “Circular” or “CSCAN”

First Come First Serve (FCFS) the total number of track crossings is:
|11-1|+|1-36|+|36-16|+|16-34|+|34- 9|+|9-12| = 111

Shortest seek time first (SSTF) (similar to process scheduling shortest job first) we gain 50%:
|11-12|+|12-9|+|9-16|+|16-1|+|1-34|+|34-36| = 61
Problem: starvation, arm stays in the middle of the disk in case of heavy load, edge cylinders are
poorly served, the strategy is unfair.

Lift algorithm, Elevator or SCAN: keep moving in the same direction until no requests ahead then
change direction:
|11-12|+|12-16|+|16-34|+|34-36|+|36-9|+|9-1|=60
Upper limit: 2 * number of tracks

Smaller variance is reached by moving the arm in one direction, always returning to the lowest
number at the end of the road: Circular Scan (CSCAN):
|11-12|+|12-16|+|16-34|+|34-36|+|36-1|+|1-9|=68

Disk Formatting
One thing we have forgotten to mention is the fact if a file is
spread over consecutive tracks, it takes time to move the disk
arm over one track and the disk sectors rotate in this time, so
we introduce a sector “skew”, so the zero sector is offset from
each previous track.

Moreover, reading sectors consecutively requires a certain amount of speed from the hard disk
controller. The platters never stop spinning, and as soon as the controller is done reading all of, for
example, sector 1, it has little time before the start of sector 2 is under the head. Many older
controllers used with early hard disks did not have sufficient processing capacity to be able to do this.
They would not be ready to read the second sector of the track until after the start of the second
physical sector had already spun past the head, at which point it would be too late. If the controller is
slow in this manner, and no compensation is made in the controller, the controller must wait for

Page 11
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

almost an entire revolution of the platters before the start of sector 2 comes around and it can read it.
Hence, the notion of interleaving sectors was introduced:

7 0 7 0 5 0
6 1 3 4 2 3
54 32 62 51 74 16

No Interleaving Single Interleaving Double Interleaving

Possible Errors
 Programming errors (non existing sector)

User program needs debugging
 Volatile checksum error (dust particle)

Controller tries again
 Permanent checksum error (bad block)
 Block will be marked “bad” and replaced by a spare block (this may interfere with the
optimisation algorithm)
 Seek error (arm moves to the wrong sector)
 Mechanical problem, perform RECALIBRATE or ask for maintenance
 Controller error
 Controller is a parallel system, can get confused, driver may perform a reset

Error Handling - a disk track with a bad sector


 Substituting a spare for the bad sector - practical, but must keep track of bad sectors and
replacements in a table
 Shifting all the sectors to bypass the bad one - ideal, but not always practical as it take a while to
shift all sectors along

Disk storage density is pushed to the limits by manufacturers, e.g. 5000 bits per mm. Defects WILL
always be present (hopefully few!). If a small defect, the ECC (Error Correction Code) can cope. If a
large defect, the ECC cannot cope, so the sector is remapped. If after writing an error is found, try
rereading sector again and again (sometimes works) – might be dust speck?

Caches
Because waiting for sectors to spin around to the correct place takes time, why not cache a track?
Moreover, if you have enough memory, why not cache more than one track?
Driver caching of tracks:
 Reading one track does not take a long time, arm needs not to be moved and the driver has to
wait for the sector anyway
 Disadvantage: driver has to copy the data using the CPU, while the controller may use DMA

Page 12
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Controller may build an internal cache:


 It is transparent for the driver
 Data transfer uses DMA
 In this case the driver does not need to do any caching

RAID (Redundant Array of Independent (or Inexpensive) Disks)


In this arrangement, a set of disks is treated as one logical disk. Data are distributed in “strips” over
the logical disk with “N sectors” per “strip”. Redundant capacity is used for parity, allowing for data
repair. Six levels of RAID (0-5) have been accepted by industry with other kinds proposed in
literature. Level 2 and 4 are not commercially available, but included for clarity. Windows supports
RAIDs in hardware (at the controller level) and in software (non contiguous disk space is allocated
and combined in logical partitions by the fault tolerant software disk driver (FTDISK). This supports
RAID 1 and RAID 5.

RAID 0
All data (user and system) are distributed over the disks so that there is a good chance for
parallelism. Disk is logically a set of strips (blocks, sectors, etc). Strips numbered/assigned
consecutively to disks.

Performance depends highly on the pattern of requests


Advantages:
- high data transfer rates are reached if
o integral data path is fast (internal controllers, I/O bus of host system, I/O adapters and
host memory buses)
o application generates efficient usage of the disk array by requests that span many
consecutive strips
- If response time is important (transactions) more I/O requests can be handled in parallel
Disadvantages:
- No better than single disk if accessing many files of sizes smaller than strip size
- Fatal disk failure (loss of data) more likely as we have more disks

Page 13
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

RAID 1

Does not use parity, it mirrors the data to obtain reliability.


Advantages:
- Reading request can be served by any of the two disks containing the requested data
(minimum search time)
- Writing request can be performed in parallel to the two disks
- Recovery from error is easy, just copy the data from the correct disk
Disadvantages:
 Price for disks is doubled
 Will only be used for system critical data that must be available at all times
RAID 1 can reach high transfer rates and fast response times (~2*RAID 0) if most requests are
reading. If most requests are writing, RAID 1 is not much faster than RAID 0.

RAID 2

No longer strips, but works on words or even on a byte basis. Synchronized disks, each I/O
operation is performed in a parallel way. Each byte split into 4 bit nibbles, then add a 3 bit Hamming
code to each one to form a 7 bit word which allows for correction of a single bit error (Bits 1,2,4 for
parity). The Controller can correct without additional delay giving very high data rates. It is still
expensive as typically a very large number of disks used – only used in case many frequent errors
can be expected.

Page 14
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

RAID 3

Level 2 needs log2(number of disks) parity disks. Level 3 needs only one, for one parity bit. In case
one disk crashes, the data can still be reconstructed even on line (“reduced mode”).

RAID 2/3 have high data transfer times, but perform only one I/O at the time so that response times
in transaction oriented environments are not so good.

RAID 4

Larger strips and one parity disk. Blocks are kept on one disk, allowing for parallel access by
multiple I/O requests. Writing penalty: when a block is written, the parity disk must be adjusted.
Parity disk may be a bottleneck. Good response times, less good transfer rates.

RAID 5 (Block Level Distributed Parity)

Distribution of the parity strip to avoid the bottle neck.


Can use round robin, i.e., parity disk = (-block number/4) mod 5

Page 15
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

CD media

CDs have one spiral track, with 0.78 micron pits burnt with laser. The pits 1/4 wavelength deep. A
“1” is represented as a transition from pit to surface or vice versa, “0” are pit floors or plateaus. A
single track is 5.6 km long or 22,188 spirals around the disk or 500 spirals per mm! Rotation speed
is 9 revs/sec (inside) and outside speed 3 revs/sec.

Audio – ‘red book’


Software CDROM – ‘yellow book’

CDROMS have improved error checking - each byte Hamming encoded in 14 bits with two bits left
over. The 14 to 8 mapping for reading is done in hardware by lookup tables.

42 symbols form a 588 bit frame with 24 data bytes (192 bits). 98 frames form a CDROM sector.
Each sector starts with a preamble, then a 3 byte sector number for seeking purposes on the “spiral”.
The last byte is a mode. Seek is done by approximately calculating where on the spiral to go.

CD-R and CD-RW


Silver CD-Rs use dye layers to simulate holes. CD-RW consists of a metallic alloy with two stable
states crystalline and amorphous. Laser has 3 powers.

Clocks
Hardware
- Type 1: 50 Hz clocks (1 interrupt (clock tic) per voltage cycle)
o Simple, cheap, not very accurate, not very functional

- Type 2: High precision clocks (5-100 MHz, or higher)


o Contain a quartz crystal oscillator
o Steers a counter counting down
o Generates an interrupt when counter reaches 0
o Counter is eventually reloaded from a programmable register
o One chip normally implements multiple clocks Counter decrements

One shot mode: clock counts down from register value once and waits for software to start it again.
Square wave mode: counter automatically reloaded (generates clock ticks)

A 1000 MHz clock with a 16 bits register can fix time intervals between 1 nanosecond and 65,535
microseconds (0.0655 sec).

Page 16
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Three ways to maintain the time of day with Type 1 clocks:


- Use a register to count ticks from an epoch e.g. since 1/1/1970, or since Microsoft start up on
1/1/1980. 32 bit register overflows after 2.3 years storing 60 Hz tics. 64 bits is more
expensive, but lasts forever (9742M years).
- Increment seconds counter when faster tick counter reaches set limit. A 32 bit “seconds”
counter lasts 138 years before overflow
- Time in ticks since boot-up with 16 bit counter, but will overflow after 2.3 years continuous
operation.

Software
 Administration of process time slices: each running process has “time left” counter which is
decremented at each interrupt. At zero, call scheduler.
 Administration of CPU usage/accounting: counter starts when process starts and is part of the
“process environment”. It is stopped while handling an interrupt.
 Watchdog timers
 Profiling (program performance analysis, etc.)

A second clock is available for timer interrupts. It can cause interrupts at whatever rate a program
needs (specified by applications). No problems if interrupt frequency is low.

Example: Gigabit Ethernet


This high performance network is optimal at a rate of one packet per 12 micro seconds. To achieve
this, on completion of packet transmission cause an interrupt OR set second timer to interrupt every
12 micro seconds. Overhead occurs: an interrupt on a 300MHz Pentium II takes 4.5 microseconds.

Soft timers avoid interrupts as the kernel checks for soft timer expiration before it exits to user mode.

Terminals
 Serial RS232 terminals historically used for hardcopy printer terminals (teletypes), glass tty (glass
teletypes), mainframe intelligent terminals, etc.
 Memory-mapped interfaces / Graphical User Interfaces which use a keyboard / mouse / display
etc. Bitmapped.
 Network computers (X windows, SLIM networks)

Page 17
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

RS-232 Terminal Hardware


An RS-232 terminal/modem communicates with computer (50-56,000 bit/sec). Out of the 9 or 25 pin
connector, 3 pins are used for transmit, receive, ground, the others are for control functions (mostly
not used). The interface is a UART (Universal Asynchronous Receiver Transmitter) on RS232 cards.
In an RS-232 serial line, bits must go out in series, one bit at a time:
- Driver writes characters to interface card, which transforms them into bit sequence
- Prefix with a start bit, then after character bits, a parity bit, and 1-2 stop bits
- Serial lines - Windows uses COM1 and COM2 ports, UNIX uses /dev/tty1 /dev/tty2
- Computer and terminal are completely independent – slow interrupt wakes driver

Character Input
Keyboard driver collects keyboard input and passes it to user programs (when they need it)
 Raw mode or “Non-Canonical”: Driver passes characters unchanged to software. Buffering is
limited to speed differences and application receives characters immediately e.g. EMACS editor.
 Cooked mode or “Canonical”: Driver buffers one line until it is finished and handles corrections
made by the user while typing a line.
Often applications have the choice. Nowadays, window driven applications use raw mode at the
lowest level and perform buffering at the window level.

Keyboard driver transforms the key number into an ASCII character according to a table. Echoing is
(was) done by the OS, or the shell. May be confusing for the user i.e. program may be writing to the
screen (sometimes delayed) whilst the user is still typing.

Handling of tabs, backspaces, etc. were typical problems with terminals. One problem survived, the
end-of-line character. Logically (from the typist’s viewpoint) one needs a CR to bring the cursor back
to the beginning of the line and a LF to go to the next one. These two characters are hidden behind
the ENTER key. The OS can decide how to represent end of line. In *nix, it is the line feed only, in
DOS, carriage return and line feed. LF is ASCII 10, CR is ASCII 13, which produces “^M”.

Serial (RS-232) and memory mapped approaches to terminals differ. Serial terminals have an output
buffer to which characters are sent until it is full or until a line ends. Once full, the real output is
initiated and the driver sleeps until interrupted. Memory mapped terminals can be accessed through
normal memory addressing procedures. Some characters receive a special treatment. The driver is
doing more screen manipulation. Special functions such as scrolling and animation may be done
through special registers (e.g. register with the position of the top line).

Mouse - feeds 3 byte info back up to 40x per sec containing i) change in X to within 0.1 mm, ii)
change in Y to within 0.1 mm, iii) button status.

Page 18
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Network Terminals
One can run X-windows on top of UNIX or other OS. X is just a windowing system, but for a
complete GUI, other layers are needed on top:

On starting an X program it opens a connection to one or more X servers (workstations - but could
be on same computer). Four types of messages sent over the connections:
a) Drawing commands (software to workstation) – typically oneway and no reply needed
b) Replies from workstation to program queries
c) Events from keyboard, mouse etc
Each message 32 bytes. Byte 1 describes event, next 31 bytes are additional information
Only messages that the program needs to know about are sent.
Events are queued
d) Error messages

Page 19
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Power Management

There are two approaches to save power. Either turn off unused processes, especially I/O or, when
permitted, degrade performance if this uses less power. Powering down/dimming the screen is one
way to save power when not used much (time since last used thresholds, dimming by levels). Could
also fade down screen sectors not in use and/or reposition windows so they occupy fewer screen
sectors.
- CPU Power Reduction: cutting voltage by two cuts clock speed by two, cuts voltage by 2 and
power by 4.
- Hard Disk Power Reduction: if not used, spin down, but many sec needed to spin up again.
Could get around this with caching if needed block in RAM?
- Memory power down
o flush cache & switch off - reload from memory when needed
o dump memory to disk and hibernate
- Telling the programs to use less energy
o may mean poorer user experience
o e.g., change from colour output to black and white, less resolution or detail in an
image, drop frames/quality in multi-media
- Thermal Issues
o overheating - switch on fan
o for a laptop reduce screen backlighting, slow CPU etc
- Batteries
o smart batteries - voltages, current, drain rate etc sent to OS

Uninterruptible Power Supply (UPS)


For PCs and Mainframe computers if the power goes off, this can be serious. Volatile memory (e.g.
hard disks, registers, cache) affected badly in that files being written at that precise moment (e.g. if a
block is not written fully) are gone and lost for good, unless there was an older version of periodic
backups. Non-volatile memory, e.g. CD-ROMs only affected if being written to during the power cut.
In the 1960s it was not so serious as they used magnetic storage known as “core-store” that retained
it’s memory. UPS attempts to solve volatile memory loss:
- Backup battery is kept charged all the time
- Monitors incoming mains voltage & generates an interrupt if it falls below a threshold
- Backup battery switched into computer and mains switched out – can keep system running
for hours until power is restored or a few seconds to allow for a correct shut down
- The interrupt issued on detecting a power drop is of the highest priority and is called a non-
maskable interrupt (NMI) – it cannot be disabled.

Page 20
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

FILES

File allow data to be stored between processes. It allows us to store large volumes of data and
allows more than one process to access the data at the same time.

Different operating systems have different file naming conventions. MS-DOS only allows an eight
character filename (and a three character extension). This limitation also applies to Windows 3.1.
Most UNIX systems allow file names up to 255 characters in length. Modern Windows allows up to
255 characters too, subject to a maximum of 260 characters if one includes the pathname. There
are restrictions as to the characters that can be used in filenames e.g. ? and * are forbidden. Some
operating systems distinguish between upper and lower case characters. To MS-DOS, the filename
ABC, abc, and AbC all represent the same file, UNIX sees these as different files.

Filenames are made up of two parts separated by a full stop. The part of the filename up to the full
stop is the actual filename. The part following the full stop is often called a file extension. In MS-
DOS the extension is limited to three characters. UNIX and Windows 95/NT allow longer extensions.
They are used to tell the operating system what type of data the file contains. It associates the file
with a certain application. Using tools provided with the operating system the user is able to change
the file associations. UNIX allows a file to have more than one extension associated with it.

Files are stored as a sequence of bytes. It is up to the program that accesses the file to interpret the
byte sequence.

Files Structures
Four types of files:
- Byte Sequence
- Record Sequence (fixed record)
- Record Sequence (variable record size)
- Tree Structures

Some files are not ASCII, but binary. Some of these may be executable. In UNIX, an executable file
consists of five parts, a header (comprising a special number to identify file as an executable, the
sizes of the sections outlined below and an execution address), text, data, relocation bytes and a
symbol table.

Page 21
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Directories
Allow like files to be grouped together and allow operations to be performed on a group of files which
have something in common. For example, copy the files or set one of their attributes. They allow
files to have the same filename (as long as they are in different directories). This allows more
flexibility in naming files. A typical directory entry contains a number of entries; one per file. All the
data (filename, attributes and disc addresses) can be stored within the directory. Alternatively, just
the filename can be stored in the directory together with a pointer to a data structure which contains
the other details.

Single-Level Directory Systems:

 “root directory”
 simple
 used on early computers
 letters above refer to file owners not directory names - owners could access/overwrite each
other’s filenames

Two-Level Directory Systems

 Users given their own directories


 User login procedure needed now
 Users access their own files
 Can read other users files
 Executable-only directory useful

A two level directory prevents users (A, B, C) from putting files into sub-directories. Therefore, a
hierarchical directory system is now used by most systems.

File System Layouts

Disks often divided up into partitions with independent file systems on each disk e.g. :C and :D disks
on Windows. Sector 0 of a disk is the Master Boot Record (MBR) which is used at boot-up. A
partition table stores the start and end addresses of each disk partition.

Page 22
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Contiguous allocation

Allocate N contiguous blocks to a file. If a file was 100K in size and the block was 1K then 100
contiguous blocks would be required. This is very simple to implement as keeping track of the blocks
allocated to a file is reduced to storing the first block that the file occupies and its length. The
performance of such an implementation is good as the file can be read as a contiguous file. The read
write heads have to move very little, if at all. You will never find a filing system that performs as well.
However, the operating system does not know, in advance, how much space the file can occupy.
This leads to fragmentation (no problem with CDs). One could run a defragmentation process
periodically but this is expensive.

Linked List Allocation

Blocks of a file represented using linked lists. All that needs to be held is the address of the first
block that the file occupies. Each block contains data and a pointer to the next block. Using this
scheme, every block can be used, unlike a scheme that insists that every file is contiguous. No
space is lost due to external fragmentation (although there is internal fragmentation within the file,
which can lead to performance issues). The directory entry only has to store the first block number
and the rest of the file can be found from there. The size of the file does not have to be known
beforehand (unlike a contiguous file allocation scheme). When more space is required for a file, any
block can be allocated (e.g. the first block on the free block list).

However, random access is very slow (as it needs many disk reads to access a random point in the
file). Space is lost within each block due to the pointer. This does not allow the number of bytes to
be a power of two. This is not fatal, but does have an impact on performance. Reliability could be a
problem. It only needs one corrupt block pointer and the whole system might become corrupted (e.g.
writing over a block that belongs to another file).

If we use an index, this does not waste space in the block and random access is possible as index is
in memory. The main disadvantage is that the entire table must be in memory all the time. For a
large disc with, say, 500,000 1K blocks (500MB) the table will have 500,000 entries.

Using an Index
We artificially divide the disk space into blocks where a block size is deemed to be of the “order of”
the median size of files. In the computer’s memory we have an index of pointers where each
element in the index refers to a physical disk block. Say File B starts at disk block 11 and is > 1
block in size. The pointer in the index in memory (start block for file B) might point to disk block 2
next, we go to here in the index in memory, and find another pointer to 14, we go here and get
pointed to disk block 8, we go here in memory and find the next pointer is 0. The “0” pointer
indicates we have all the pointers needed. This is called File Allocation Table (FAT).

Page 23
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

I–Nodes

The third method of accessing file blocks utilizes a data structure called an i-node (index node). All
the attributes for the file are stored in an i-node entry, which is loaded into memory when the file is
opened. The i-node also contains a number of direct pointers to disc blocks. The advantage over
linked files is e.g. if “k” files are permitted to be open at once and each i-node uses “n” bytes, then
“nk” bytes will be used in memory. In other words, where a table holding linked lists is proportional to
the disk size it is representing, a table holding i-nodes is proportional to the number of files. Now, if
we have say twelve blocks indexed in an i-node, what happens if we want to address more blocks,
i.e. the file grows in size? There are three additional indirect pointers. These pointers point to further
data structures which eventually lead to a disk block address. The first of these pointers is a single
level of indirection, the next pointer is a double indirect pointer and the third pointer is a triple indirect
pointer:

Implementing Directories
The ASCII path name is used to locate the correct directory entry. The directory entry contains all
the information needed. For example, for a contiguous allocation scheme, the directory entry will
contain the first disc block. The same is true for linked list allocations. For an i-node implementation
the directory entry contains the i-node number.

There are two ways of handling long file


names in a directory (see right):
- In-line (variable length) where each
filename terminated with a special
character. Compacting memory is
feasible.
- Store file names in a heap

Page 24
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

A directory entry maps an ASCII filename to the disc blocks. The directory entry may also contain
the attributes of the file (I-node) or maybe a pointer to a data structure.

A UNIX system directory entry just contains an i-node number and a filename. Unlike MS-DOS, all its
attributes are stored in the i-node so there is no need to hold this information in the directory entry.
All the i-nodes have a fixed location on the disc so locating an i-node is a very simple (and fast)
function.

How does UNIX locate a file when given an absolute path name? Assume the path name is
/user/gk/ops/notes. The procedure operates as follows:
- The system locates the root directory i-node. As we said above, this is easy as the entry is on
a fixed place on the disc
- Next it looks up the first path entry (user) in the root directory, to find the i-node number of the
file /user
- Now it has the i-node number for /user it can access the i-node data to locate the next i-node
number (i.e. for /gk)
- This process is repeated until the actual file has been located.
- Accessing a relative path name is identical except that the search is started from the current
working directory.

The diagram, right, shows a file system containing a shared file -


directed acyclic graph (DAG). It is not a good idea to make a
copy if a file is being shared. If it is a joint editing project –
which one is correct? There are two options. First:

Situation prior to linking After the link is created After the original owner removes the file

The OS prevents C from deleting the file, but they still own it and it shows up in their disk usage. B
still has a link to the file but does not own it. Only when B deletes the file will the count go to zero
and the file be deleted. The second option is to use symbolic linking e.g. in UNIX
ln –s original_file symbolic_copy

If the original file is deleted (or renamed), the link from the copy remains, but cannot access the
original file. If the original file is brought back into the same filename and directory, everything is OK
again.

Page 25
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Disk Space Management

Whatever block size we choose, then every file must occupy this amount of space as a minimum. If
we pick a very large block size, then only one file can reside on it. If block size is smaller than most
files then files are split up over several blocks and we spend more time “seek”ing. There is a
compromise between a block size, fast access and wasted space. The usual compromise is to use a
block size of 512 bytes, 1K bytes or 2K bytes.

Some of the free blocks can be used to hold disc block numbers that are free. But once you do this
they are no longer free! The blocks that contain the free block numbers are linked together so we
end up with a linked list of free blocks.

We can calculate the maximum number of blocks we need to hold a complete free list (i.e. an empty
disc) using the following reasoning: Assume that we need a 16-bit number to store a block number
(that is block numbers can be in the range 0 to 65535) and that we are using a 1K block size. A
block can hold 512 block addresses. That is, 1024*8 / 16. Assume that one of the addresses is used
as a pointer to the next block that contains list of free blocks. For a 20MB disc we need, at most, 41
blocks to hold all the free block numbers. That is, 20*1024 / 511.

An alternative option is to use a bit map to keep track of the free blocks. That is, there is a bit for
each block on the disc. If the bit is 1 then the block is free. If the bit is zero, the block is in use. To
put it another way, a disc with n blocks requires a bit map with n entries.

Consider a 20MB disc with 1K blocks, then we can calculate the number of blocks needed to hold the
disc map. A 20Mb disc has 20480 (20 * 1024) blocks. We need 20480 bits for the map, or 2560
bytes. A block can store 1024 bytes so we need 2.5 blocks to hold a complete bit map of the disc.
This would obviously be rounded up to 3.

Generally bit maps require a lesser number of blocks than a linked list. Only when the disk is nearly
full does the linked list implementation require fewer blocks.

However, when only a small part of memory set aside for tracking of free blocks if the operating
system can only allow one block to be held in memory and that the disc is nearly full, using a bit map
scheme, there is a good chance that the free block list will indicate that every block is being used
hence a disc access needed to get the next part of the bit map. With a linked list scheme, once a
block containing pointers of free blocks has been brought into memory then we will be able to
allocate 511 blocks before doing another disc access.

Page 26
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Disk Quotas
Quotas are needed to avoid users unfairly over using limited disk space on a multi-user system.
When a user opens a file, its table has a pointer to a quota table. Every time file is changed the user
quota record is updated. Soft quota limits can be exceeded with warnings e.g. at logins, hard quotas
cannot!

Backing Up
A “physical dump” backs up block 0 on disk onwards. A “logical” or “incremental” dump backups only
changes since last backup - logical is messy to recover files from > 1 backup ago. Bit maps are used
by the logical dumping algorithm. Dumping or backups to disk are done in stages:
- all files and directories (modified or not) initially marked
- recursively walk the directory tree and unmark directories with no modified directories or files
in them or below them
- scan i-nodes in (b) in numerical order and dump modified directories
- scan i-nodes in (b) in numerical order and dump modified files

Consistency Check
Possible file system recovery states after a crash include a) consistent, b) missing block (harmless),
c) duplicate block in free list (rebuild free block list), d) duplicate data block (not a happy situation -
which block is garbled?) Utility programs to check for inconsistencies include UNIX’s fsck and
Windows’ scandisk.

File System Performance


How can we improve performance with respect to fast memory/slow disk access? Block or buffer
cache helps to reduce disk access by keeping certain disk blocks in memory to improve
performance. A typical algorithm checks cache first for a block before going to disk - if it is there it
uses it, if not it reads from disk and stores it in cache. Note that blocks are moved according to most
or least recently used - LRU removed when new blocks arrive. Modified LRU used to help recover
from crashes – only remove a block when: a) block not needed soon, b) block not essential for file
consistency.

Another method to improve performance is the “Block Read Ahead” method. This involves simply
reading ahead in the hope that the user or software will request the next block in sequence. It works
well for sequential files but not for random access files. Yet another way to improve file access
performance is to keep blocks related to a
file close together. In the diagram, right, in
a) i-nodes are placed at start of disk - long
seek to get to associated data blocks but in
b) the disk is divided into cylinder groups so
i-nodes have associated data blocks nearby
thus reducing seek time.

Page 27
G53OPS Peter Siepmann (pxs02u)
Dr. Cook 17/5/2005
G53OPS – Revision Notes: Lectures 11-18
adapted from the notes by Tony Cook

Log-Structured File Systems (LFS)


With CPUs faster, memory larger, disk caches can also be larger and so an increasing number of
read requests can come from cache. Thus, most disk accesses will be writes. Most writes are very
small and involve wasteful seeks. The LFS Strategy structures entire the disk as a log. Initially, all
writes are buffered in memory. Periodically, these are written these to the end of the disk log. 1MB
log “segment” writes are typical to utilize full disk bandwidth. At the start of a segment is a summary
of what is in it. I-nodes are now scattered everywhere so an I-node map is needed to locate these.
When file is opened, we locate its i-node, then find the blocks. The log needs to be compacted by a
cleaner thread from time to time.

CD-ROM File System


This system is simple as the drive is write once so no tracking of free blocks is needed. Extensions
to the format include Rock Ridge for UNIX (rwx attributes, symbolic link, alternative name, >8 depth
directories, etc.) and Joliet for Windows (long file names (128 bytes), Unicode characters, > 8 depth
directories etc.).

MS-DOS File System


Uses 8+3 filenames with a hierarchical directory structure. It is a single user operating system. File
attributes RWE used and hidden files permitted. Date and time accurate to +/- 2 sec. Under MS-
DOS a directory entry is 32 bytes long. It is split as follows:

In Windows 98, long filenames are accepted. To allow backwards-compatibility, the previously
reserved 10 bytes now used to allow filenames longer than 8+3 characters. The solution is to use
two names for each file. When a file is created that breaks 8+3 rules a Win98 invents a base name
8+3 file e.g. first 6 letters (only) converted to uppercase then append “~1” to form base name, else
“~2”, etc. Long filenames are stored in directory entries preceeding MS-DOS filename.

_________________________________

Page 28

Potrebbero piacerti anche