Sei sulla pagina 1di 55

SYSTEM SOFTWARE AND LANGUAGES INTRODUCTION TO COMPUTER SOFTWARE

A computer contains two basic parts: (i) Hardware and (ii) Software. In the first two units we touched upon hardware issues in quite detail. In this unit and also in the rest of the units of this block we will discuss topics related to software. Without software a computer will remain just a metal. With software, a computer can store, retrieve, solve different types of problems, create friendly environment for software development etc. The process of software development is called programming. To do programming one should have knowledge of (i) a particular programming language, (ii) set of procedures (algorithm) to solve a problem or develop software. The development of an algorithm is basic to computer programming and is an important part of computer science studies. Developing a computer program is a detailed process, which requires serious thought, careful planning and accuracy. It is a challenging and exacting task, drawing on the creativity of the programmer. Once an algorithm is obtained, the next step for a solution using a computer would be to program the algorithm using mathematical and data processing techniques. Programming languages constitute the vehicle for this stage of problem solving. The development of programming Languages is one of the finest intellectual achievements in Computer Science. It has been said "to understand a computer, it is necessary to understand a programming language. Understanding them does not really mean only being able to use them. A lot of people can use them without really fully understanding them". An Operating System is system software, which may be viewed as an organized collection of software consisting of procedures for operating a computer and providing an environment for execution of programs. It acts as an interface between users and the hardware of a computer system. There are many important reasons for studying operating systems. Some of them are: User interacts with the computer through operating system in order to accomplish his task since it is his primary interface with a computer. It helps users to understand the inner functions of a computer very closely. Many concepts and techniques found in operating system have general applicability in other applications. In this unit, we will discuss about the concepts relating to a programming language and in the next unit we will deal with the operating system concepts.

INTRODUCTION TO SYSTEM SOFTWARE


Computer software consists of sets of instructions that mould the raw arithmetic and logical capabilities of the hardware units to perform. In order to communicate with each other, we use natural languages like Hindi, English, Bengali, Tamil, Marathi, Gujarati etc. In the same way programming languages of one type or another are used in order to communicate instructions and commands to a computer for solving problems. Learning a programming language requires learning the symbols, words and rules of the language. Program and Programming: A computer can neither think nor make any judgment on its own. Also it is impossible for any computer to independently analyse a given data and follow its own method of solution. It needs a program to tell it what to do. A program is a set of instructions that are arranged in a sequence that guides the computer to solve a problem. The process of writing a program is called Programming. Programming is a critical step in data processing. If the system is not correctly programmed, it delivers information results that cannot be used. There are two ways in which we can acquire a program. One is to purchase an existing program, which is normally referred to as packaged software and the other is to prepare a new program from scratch in which case it is called customized software. A computer software can be broadly classified into two categories-System Software and Application Software. Today, there are many languages available for developing programs software. These languages are designed keeping in mind some specific areas of applications. Thus, some of the languages may be good for writing system programs/software while some other for application software.

Since a computer can be used for writing various types of application/system software, there are different programming languages. i) System Programming Languages: System programs are designed to make the computer easier to use: An example of system software is an operating system, which consists of many other programs for controlling input/output devices, memory, processor etc. To write an operating system, the programmer needs instruction to control the computer's circuitry (hardware part). For example, instructions that move data from one location of storage to a register of the processor. C and C++ languages are widely used to develop system software. ii) Application Programming Language: Application programs are designed for specific applications, such as payroll processing, inventory control etc. To write programs for payroll processing or other applications, the programmer does not need to control the basic circuitry of a computer. Instead the programmer needs instructions that make it easy to input data, produce output, do calculations and store and retrieve data. Programming languages that are suitable for such application programs support these instructions but not necessarily the types of instructions needed for development of system programs. There are two main categories of application programs: business programs and scientific application programs. Most programming languages are designed to be good for one category of applications but not necessarily for the other, although there are some general purpose languages that supports both types. Business applications are characterized by processing of large inputs and large outputs, high volume data storage and retrieval but call for simple calculations. Languages, which are suitable for business program, development, must support high volume input, output and storage but do not need to support complex calculations. On the other hand, programming languages that are designed for writing scientific programs contain very powerful instructions for calculations but rather poor instructions for input, output etc. Amongst traditionally used programming languages, COBOL (Commercial Business Oriented Programming Language) is more suitable for business applications whereas FORTRAN (Formula Translation - Language) is more suitable for scientific applications. Before we discuss more about languages let us briefly look at the categories of software viz. system and application software.

SYSTEM SOFTWARE Language Translator


A language translator is a system software which translates a computer program written by a user into a machine understandable form.

Operating System
An operating system (OS) is the most important system software and is a must to operate a computer system. An operating system manages a computer's resources very effectively, takes care of scheduling multiple jobs for execution and manages the flow of data and instructions between the input/output units and the main memory. Advances in the field of computer hardware have also helped in the development of more efficient operating systems.

Utilities
Utility programs are those which are very often requested by many application programs. A few examples are:

SORT/MERGE
utilities, which are used for sorting large volumes of data and merging them into a single sorted list, formatting etc.

APPLICATION SOFTWARE
Application software is written to enable the computer to solve a specific data processing task. A number of powerful application software packages, which does not require significant programming knowledge, have been developed. These are easy to learn and use as compared to the

programming languages. Although these packages can perform many general and special functions, there are applications where these packages are not found adequate. In such cases, application program is written to meet the exact requirements. A user application program may be written using one of these packages or a programming language. The most important categories of software packages available are: Data Base Management Software Spreadsheet Software Word Processing Desktop Publishing (DTP) and presentation Software Graphics Software Data Communication Software Statistical and Operational Research Software

Data Base Management Software


Databases are very useful in creation maintaining query, the databases and generation of reports. Many of today's Database Management System are Relational Database Management System's. Many RDBMS packages provide smart assistants for creation of simple databases for invoices, orders and contact lists. Many database management systems are available in the market these days. You can select any one based on your needs, for example, if you have only few databases then package like dBase, FoxPro etc. may be good. If you require some additional features and moderate work load then Lotus Approach, Microsoft Access are all-right. However, if you are having high end database requirements which requires multi-user environment and data security, access right, very good user interface etc. then you must go for professional RDBMS package like Ingress, Oracle, Integra etc.

Accounting Package
The accounting packages are one of the most important packages for an office. Some of the features, which you may be looking on an accounting, may be: tax planner facility facility for producing charts and graphs finding accounts payable simple inventory control facility payroll functions on-line connection to stock quotes creation of invoices easily One of the good packages in this connection is Quicken for windows.

Communication Package
The communication software includes software for fax. The fax-software market is growing up. Important fax software is Delrina's WinFax PRO 4.0. Some of the features such as Remote Retrieval and Fax Mailbox should be looked into fax software. These features ensure that irrespective of your location you will receive the fax message. Another important feature is fax Broadcast. This allows you to send out huge numbers of faxes without tying up your fax machine all day. If you have to transfer files from your notebook computer to a desktop computer constantly then you need a software program that coordinates and updates documents. On such software is Lap link for Windows. This software offers very convenient to use features. For example, by simply dragging and dropping a file enables file transfer. This software can work if a serial cable or a Novell network or a modem connects you.

Desktop Publishing Packages


Desktop Publishing Packages are very popular in Indian context. Newer publishing packages also provide certain in built formats such as brochures, newsletters, flyers etc., which can be used directly. Already created text can be very easily put in these packages, so are the graphics placements. Many DTP packages for English and languages other than English are available. Microsoft Publisher, PageMaker, Corel Ventura are few popular names. Desktop publishing

packages, in general, are better equipped in Apple-Macintosh computers.

CATEGORIES OF LANGUAGES

We can choose any language for writing a program according to the need. But a computer executes programs only after they are represented internally in binary form (sequences of 1s and 0s). Programs written in any other language must be translated to the binary representation of the instructions before the computer can execute those. Programs written for a computer may be in one of the following categories of languages.

MACHINE LANGUAGE
This is a sequence of instructions written in the form of binary numbers consisting of l s, 0s to which the computer responds directly. The machine language was initially referred to as code, although now the term code is used more broadly to refer to any program text. An instruction prepared in any machine language will have at least two parts. The first part is the command or Operation, which tells the computer what functions, is to be performed. All computers have an operation code for each of its functions. The second part of the instruction is the operand or it tells the computer where to find or store the data that has to be manipulated. Just as hardware is classified into generations based on technology, computer languages also have a generation classification based on the level of interaction with the machine. Machine language is considered to be the first generation language.

Advantage Of Machine Language


It is faster in execution since the computer directly starts executing it.

Disadvantage Of Machine Language


It is difficult to understand and develop a program using machine language. Anybody going through this program for checking will have a difficult task understanding what will be achieved when this program is executed. Nevertheless, the computer hardware recognizes only this type of instruction code. The following program is an example of a machine language program for adding two numbers. 0011 0000 0000 0000 1000 0011 0110 0000 0111 1110 0111 0110 1010 0000 1010 0110 0000 0110 whose address is 100 (decimal) Halt processing Load A register with value 7 Load B register with 10 A = A+B store the result into the memory location

ASSEMBLY LANGUAGE
Assembly language unlocks the secret of your computer's hardware and software. It teaches you about the way the computer's hardware and operating system work together and how, the application programs communicate with the operating system. Assembly language, unlike high level languages, is machine dependent. Each microprocessor has its own set of instructions, that it can support. When we employ symbols (letter, digits or special characters) for the operation part, the address part and other parts of the instruction code, this representation is called an assembly language program. This is considered to be the second-generation language. Machine and Assembly languages are referred to as low level languages since the coding for a problem is at the individual instruction level. Each machine has got its own assembly language, which is dependent upon the internal architecture of the processor. An assembler is a translator, which takes its input in the form of an assembly language program and produces machine language code as its output. The following program is an example of an assembly language program for adding two numbers X and Y and storing the result in some memory location.

LDA ,7 LDB ,10

Load register A with 7 Load register B with 10

ADD A,B LD (100),A HALT

A_A+B Save the result in the location 100 Halt process

From this program, it is clear that usage of mnemonics in our example LD, ADD, HALT are the mnemonics) has improved the readability of our program significantly. A machine cannot execute an assembly language program directly, as it is not in a binary form. An assembler is needed in order to translate an assembly language program into the object code executable by the machine. This is illustrated in the figure 1.

Figure 1: Assembler

Advantage of Assembly Language


Writing a program in assembly language is more convenient than in machine language. Instead of binary sequence, as in machine language, it is written in the form of symbolic instructions. Therefore, it gives a little more readability.

Disadvantages of Assembly Language


Assembly language (program) is specific to particular machine architecture. Assembly languages are designed for specific make and model of a microprocessor. It means that assembly language programs written for one processor will not work on a different processor if it is architecturally different. That is why the assembly language program is not portable. Assembly language program is not as fast as machine language. It has to be first translated into machine (binary) language code.

VARIABLES, CONSTANTS, DATA TYPE, ARRAY AND EXPRESSIONS


These are the smallest components of a programming language.

Variable

figure - 2 Memory Organization

The first thing we must learn is how to use the internal memory of a computer in writing a program. Memory may be pictured as a series of separate memory cells as shown in figure 2 . Computer memory is divided into several locations. Each location has got its own address. Each storage location holds a piece of information. In order to store or retrieve information from a

memory location, we must give that particular location a name. Now study the following definition. Variable: It is a character or group of characters assigned by the programmer to a single memory location and used in the program as the name of that memory location in order to access the value stored in it. For example in expression A = 5, A is a name of memory location i.e. a variable where 5 is stored.

Constant
It has fixed value in the sense that two cannot be equal to four. String constant is simply a sequence of characters such as "computer" which is a string of 8 characters. The numeric constant can be integer representing whole quantities or a number with a decimal point to represent numbers with fractional part. Constant would be probably the most familiar concept to us since we have used it in doing everything that has to do with numbers. Numeric constants can be added, subtracted, multiplied, divided, and also compared to say whether two of them are equal, less than or greater than each other. As string constants are a sequence of characters, a related string constant may be obtained from a given one, by chopping off some characters from beginning or end or both or by appending another string constant at the beginning or end. For example, from 'Gone with the wind', we can get 'one with ', 'Gone with wind', and so on. String constants can also be compared in a lexicographic (dictionary) sense to say whether two of them are equal, not equal, less than or greater than each other.

Data type
In computer, programming, the term data refers to anything and everything processed by the computer. There are different types of data processed by the computer, numbers are one type of data and words are of another type. In addition, the operations that are performed on data differ from one type of data to another type. For example multiplication applies to numbers and not words or sentences. Data type defines a set of related values/integers, number with fraction, characters and a set of specific operations that can be performed on those values. In BASIC a statement LET A = 15 denotes that A is a numeric data type because it contains numbers but in a statement LET A$ = "BOMBAY", A$ is a variable of character data type. Data type also defines in terms of contiguous cells should be allocated for a particular variable.

Array
In programming we deal with large amount of related data. To represent each data element we have to consider them as separate variables. For example if we have to analyse for the sales performance of a particular company for the last 10 years, we can take ten different variables (names) each one representing sales of a particular year. If we analyse sales information for more than 10 years, then accordingly number of variables will further increase. It is very difficult to manage with large number of variables in a program. To deal with such situation an array is used. An array is a collection of same type of data (either string or numeric), all of that are referenced by the same name. For example, list of 5 years sales information of a company can be referred to by same array name A.

A(1) 50,000

A(2) 1,00,000

A(3) 5,00,000

A(4) 8,00,000

A(5) 9,00,000

A(1) specifies Sales information of a first year A(2) specifies Sales information of a second year A(3) specifies Sales information of a fifth year

Expression
We know that we can express intended arithmetic operations using expressions such as X +Y+ Z and so on. Several simple expressions can even be nested together using parentheses to form complex expressions. Every computer language specifies an order by in which various arithmetic operators are evaluated in a given expression. An expression may contain operators such as

Parentheses () Exponentiation ^ Negation Multiplication, division *, / Addition, subtraction +,The operators are evaluated in the order given above. For example, the expression 2+8*(4 - 613) can be considered to be evaluated as follows: 2+8*(4 - 6/3)Sub expression (4 - 6/3) taken up first 2+8*(4 - 2)division 6/3 within (4 - 6/3) has higher priority than 4 - 6 2+8*2Subtraction (4 - 2) is performed next (4 - 6/3) is now complete. 2+8*28*2 will be executed first then its result will be added with 2 that is 16 + 2 = 18 It is useful to remember the order of priority of various operators. But it is safer to simplify expressions and enclose them in parentheses to avoid unpleasant surprises. So far we have focused on arithmetic expressions. But expression is a very general concept. We mentioned earlier that apart from arithmetic operations we could compare numbers or strings. We do it by using relational operators in expressions. The following is a list of relational operators: = <> < > <= >= equal to not equal to less than. greater than less than or equal to greater than or equal to

These operations have the same level of Priority among themselves but a lower priority than arithmetic operators mentioned earlier. The relational expressions result in one of the truth-values, either TRUE or FALSE. When a relational expression such as (3 > 5) is evaluated to be FALSE by such languages, a value 0, that is false, is assigned, whereas (5, < 7) will be evaluated to be TRUE, and value 1 will be assigned. Note that relational expressions are capable of comparing only two values separated by appropriate relational operator. If we want to an express idea such as whether number 7 happens to be within two other numbers 4 and 10, we may be tempted to write relational expression 4 <= 7 <= 10. Such reasonable expectation from us may be a bit too complex for a computer language. In such cases, we need to explain our idea in terms of simple relational expressions such as (4 <= 7) AND (7 <= 10) which means that 7 is between 4 and 10. To combine several relation expressions for expressing complex conditions, we use logical operators such as AND or OR operators. Among other logical operators NOT simply negates a truth value in the sense that NOT TRUE is FALSE, and NOT FALSE is TRUE. The logical operators have lower priority than relational operators. Among themselves they have the following order of priority during evaluation of logical expressions. Operator NOT Meaning Simply negates a truth-value. NOT (2 >5) is TRUE TRUE is both adjoining expressions are TRUE otherwise it is FALSE. For example (4 < 7) and (7 < 10) is TRUE whereas (4 > 7) and (7 < 10) is FALSE FALSE is both adjoining expressions are FALSE otherwise it is TRUE For example (4 < 2) OR (7 > 2) is FALSE whereas (4 > 2) OR (7 > 2) is TRUE

AND

OR

XOR

TRUE only if one of the adjoining expressions is TRUE and other is FALSE. The XOR has same priority as OR. (4 < 7) XOR (7 < 10) is FALSE.

ASSEMBLY LANGUAGE FUNDAMENTALS


The best way to learn to write assembly language program, is to first study a simple assembly written program. We shall in this section do just the same.

A Sample Program
;ABSTRACT ; ; ; ; ;ALGORITHM: ; ; ; ; ; ; ; ;PORTS ;PROCEDURES ;REGISTERS DATA SEGMENT NUM1 DB 15h NUM2 DB 20h RESULT DB CARRY DB DATA ENDS CODE SEGMENT START: : : : : : This program adds 2 8-bit words in the memory locations called NUM1 and NUM2. The result is stored in the memory location called RESULT. If there was a carry from the addition it will be stored as 0000 0001 in the location CARRY

get NUM l Add NUM2 put sum into memory at SUM position carry in LSB of byte registers mask off upper seven bits Store the result in the carry location. :None used :None used : Uses CS, DS, AX ; First number stored here ; Second number stored here ? ; Put sum here ? ; Put any carry here ASSUME CS:CODE, DS:DATA MOV AX, DATA MOV DS, AX MOV AI, NUM1 ADD AI, NUM2 SULT, AL RCL AL, 01 AND AL, 00000001B MOV CARRY, AL MOV AX,4C00h INT 21h END START

; Initialize data segment ; register ; Get the first number ; Add it to 2nd number ; Store the result ; Rotate carry into LSB ; Mask out all but LSB ; Store the carry result

CODE ENDS

The program contains, certain additional mnemonics, in addition to the instructions you have studied so far. These are called as assembler directives or pseudo operations. These are the directions for the assembler. Their meaning is valid only till the assembly time. There is no code generated for them.

SEGMENT and ENDS Directive

The SEGMENT and ENDS directives are used to identify a group of data items or a group of instructions, called the segment. These directives are used in the same way as parentheses are used in algebra, to group the like items together. A group of data statements or the instructions, that are put in between the SEGMENT and ENDS directives are said to constitute a logical segment. This segment is given a name. In our example CODE and DATA are the names given to code and data segments respectively. The segments should have a unique name, there can be no blanks within the segment name, the length of the segment name can be up to 31 characters. Name of the mnemonics or any other reserved words is not allowed as the segment name or label.

Data Definition Directives


In assembly language, we define storage for variables using data definition directives. Data definition directives create storage at assembly time, and can even initialize a variable string to a starting value. The directives are summarized in the following table:

Directive DB DW DD DO DT

Description Define byte Define word Define doubleword Define quadword Define 10 bytes

Number of bytes 1 2 4 8 10

Attribute Byte word double word quad word ten bytes

As we see from the following table, the variable being defined is given an attribute. The attribute refers to the basic unit of storage used when the variable was defined. These variables can be given a name as follows: Example CHAR_VAR WORD_VAR LIST NUM DEN DB DW DB DW DB 'A'; 01234h; 1,2,3,4; 4200 20 CHAR_VAR = 41h ex number should begin with zero list of 4 bytes initialized by numbers 1,2,3,4

DUP directive is used to duplicate the basic data definition 'n' number of times. Example: ARRAY DB 10 DUP (0) Define an array ARRAY of 10 data bytes, each byte initialized to 0. The initial value can be anything acceptable to the basic data type.EQU directive is used to define a name to a constant. Example: CONS EOU 20 will define a constant with value 20. Now in your program, where ever you want to use 20, you can use the name instead. The advantage of this is that: lets say, you want to change the value of CONS to, say 10, at some instance of time. Now, instead of making changes every where in the program, you just have to change the EQU definition, and assemble the program again. The change will be done automatically at all places. Types of numbers used in data statements can be octal, binary, hexadecimal, decimal and ASCII. Following are the examples of each type:

TEMP_MAX OLD_VAL DECIMAL HEX_VAL ASCII_VAL

DB DW DB DW DB

01101100B 73410 49 03B2Ah 'EXAMPLE'

;BInary ;Octal ;Decimal ;Hex ;ASCII

The ASSUME Directive


8086 has four type of segments, discussed in the previous unit. In the program there can be more than one code segments, data segments, or extra segments defined. However, only one of each type can be active at a time. ASSUME directive is used to tell the assembler, which segment is to be used as an active segment at any instant, and with respect to which it has to calculate the offsets of the variables or instructions. It is usually placed immediately after the SEGMENT directive, in the code segment, but you can have as many additional Assumes as you like. Each time an ASSUME is encountered, the assembler starts to calculate the offset with respect to that segment. In the example above CODE and DATA are the two segments defined, one each for code and data.

Initializing Segment Registers


ASSUME is only a directive, which is used to calculate the offset of variables, instructions or stack element, with respect to a specific segment of its type. It does not initialize the segment registers. Initialization of the segment registers has to be done explicitly using MOV instructions as follows: MOV AX,DATA MOV DS,AX The above statements are used to initialize the data segment register. The segment registers cannot be directly loaded with memory variable, therefore, the segment name is first moved into some general purpose register, which then is moved into the segment register. All segment registers can be initialized in the same manner. Code segment register, is initialized automatically by the loader.

END Directive
The END directive tells the assembler to stop reading and assembling the program from there on. Any statement after the END will be ignored by the assembler. There can be only one END in the program, which is the last statement of the program.

THE ASSEMBLY LANGUAGE PROGRAMS


The assembly language programs can be written in two ways: one in which all code and data is written as part of one segment, called COM programs, and the other where you have more than one segment, called the EXE programs. We shall . study each of them in brief, looking at their advantages and disadvantages.

COM Programs
A COM (Command) program is simply a binary image of a machine language program. It is loaded in the memory at the lowest available segment address. The program code begins at offset 100h, the first 1K being occupied by the interrupt vector table, discussed in the earlier section. All segment registers are set to the base segment address of the program. A COM program keeps, its code, data, and stack within the same segment. Thus, its total size should not exceed 64K bytes. A COM program sample is shown. The program's only segment (CSEG) must be declared explicitly using segment directives.

;TITLE ADD TWO NUMBERS AND STORE THE CARRY IN A THIRD ; VARIABLE CSEG SEGMENT ASSUME CS:CSEG, DS:CSEG, SS:CSEG ORG 100h START:MOV AX, CSEG ; Initialize data segment
MOV DS, AX MOV AL, NUM1 ADD AL, NUM2 ; register ; Get the first number ; Add it to. 2nd number

MOV RESULT, AL RCL AL, 01 AND AL, 00000001B MOV CARRY, AL MOV AY,4C00h INT 21h NUM1 DB 15h NUM2 DB 20h RESULT DB ? CARRY DB ? CSEG ENDS ENDSTART

; Store the result ; Rotate carry into LSB ; Mask out all but LSB ; Store the carry result ; First number stored here ; Second number stored here ; Put sum here ; Put any carry here

The ORG directive sets the location counter at offset 100h before generating any instruction. A COM program takes up less space on disk, as compared to the EXE program. In spite of this it allocates all available RAM when loaded. COM programs require at least one full segment, because they automatically place their stack at the end of the segment.

EXE Programs
An EXE program is stored on disk with extension EXE. EXE programs are longer than the COM programs, because with each EXE program is associated an EXE header followed by a load module containing the program itself The EXE header, is of fixed 256 bytes, and contains information, which is used by DOS to correctly calculate the address of segments and other components. We will not go into the details of these. The load module consists of separate segments, which may be thought of as reserved area for instructions, variables and stack. The EXE program may contain up to 64K segments, although at the most only four segments may be active at any time. The segments may be of variable size, with maximum being 64K bytes. Advantages Of exe programs are : EXE programs are better suited to debugging. EXE-format assembler programs are more easily converted into subroutines for high-level languages. The third reason has to do with memory management. EXE programs are more easily relocatable, because, there is no ORG statement, forcing the program to be loaded from a specific address. Also to fully use multitasking operating system, programs must be able to share computer memory and resources. An EXE program is easily able to do this.

ASSEMBLER / MACRO PROCESSOR INTRODUCTION


Computers have changed a lot since the days when people used to communicate with them by on and off switches denoting primitive instructions. With present day computers interaction has become more user-friendly because of the advancement in hardware and software tools. One category of software which assist in the mechanics of software development is system software. Assembler, linker/loader, compiler, operating system all belong to the realm of system software. We discussed several components of programming languages, basic definitions of Assembler, Compiler, interpreters and differences among them. In this unit our focus will be on the implementation and use of assemblers. We will also cover broadly the use of macro processor, loaders and linkers. This unit is organized as follows:

ASSEMBLER Assembler Implementation


An assembly is a program that accepts as input, an assembly language program and produces its machine language equivalent along with information for the loader (Figure 1).

Fig. 1: Assembler For example, the externally defined symbols (library program) must be indicated to the loader the assembler does not know the address of these symbols and it is up to the loader to find the programs containing them, load them into memory and place the values of these symbols in the calling program. Here we will discuss the different approaches to design of an assembler and its related program. Assembler and its related Program The assembler-language program contains three kinds of entities. Absolute entities include operation codes, numeric and string constants and fixed addresses. The values of absolute entities are independent of which storage locations the resulting machine code will eventually occupy. Relative entities include the addresses of instructions and of working storage. These are fixed only with respect to each other, and are normally staled relative to the address of the beginning of the module. An externally defined entity is used within a module but not defined within it Absolute or relative is not necessarily known at the time the module is translated. The object program includes identification of which addresses are relative. which symbols are defined externally, and which internally defined symbols are expected to be referenced externally. In the modules in which the latter are used. they are considered to be externally defined. These external references are resolved for two or more object programs by a linker. The linker accepts the several object program as input and produces a single program ready for loading, hence termed a load program. The module is free of external references and consists essentially of machine-language code accompanied by a specification of which addresses are relative. When the actual main storage locations to be occupied by the program become known, a relocating loader reads the program into storage and adjusts the relative addresses to refer to those actual locations. The output from the loader is a machine-language program ready for execution. The overall process is depicted in Figure 3. If only a single source-language module containing no external references is translated, it can be loaded directly without intervention by the linker. In some programming systems the format of linker output is sufficiently compatible with that of its input to permit the linking of a previously produced load module with some new object modules. The functions of linking and loading are sometimes both effected by a single program, called a linking loader. Despite the convenience of combining the linking and loading functions, it is important to realize that they are distinct functions, each of which can be performed independently of the other.

Fig. 3 : Program Translation

LOAD AND GO ASSEMBLER


The simplest assembler program is the load and go assembler. It accepts as input a program whose instructions are essentially one to one correspondence with those of machine language but with symbolic names used for operators and operands. It produces machine language as output which are loaded directly in main memory and gets executed. The translation is usually performed in a single pass over the input program text. The resulting machine language program occupies storage locations which are fixed at the time of translation and cannot be changed subsequently. The program can call library subroutines, provided that they occupy other locations than those required by the program. No provision is made for combining separate subprograms translated in this manner. The load and go assembler forgoes the advantages of modular program development. Among the most of these are (1) the ability to design code and test different program components in parallel. (2) change in one particular module does not require scanning the rest of program. Most assemblers are therefore designed to satisfy the desire to create programs in modules. These module assemblers. generally are developed in a two-pass translation. During the first pass the assembler examines the assembler-language program and collects the symbolic names into a table. During the second pass, the assembler generates code which is not quite in machine language. It is rather in a similar form, sometimes called "relocatable code" and here called object code. The program module in object-code form is typically called an object module.

ONE-PASS MODULE ASSEMBLER


The translation performed by an assembler is essentially a collection of substitutions: machine operation code for mnemonic, machine address for symbolic, machine encoding of a number for its character representation, etc. Except for one factor, these substitutions could all be performed in one sequential pass over the source text. That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler). The separate passes of the two pass assemblers are required to handle forward references without restriction. If certain limitations are imposed, however, it becomes possible to handle forward references without making two passes. Different sets of restrictions lead to the one pass assembler. These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely, as on many small machines.

TWO PASS ASSEMBLER


Mostly assembler are designed in two passes stages), therefore, they are called Two-Pass Assemblers. 'Re pass-wise grouping of tasks in a two pass assembler is given below:

Pass I

Separate the symbols, mnemonic op-code and operational fields. Determine the storage requirement for every assembly language statement and up date the location counter. Build the symbol table. (Table that is used to store each label and its corresponding value).

Pass II
Generate object code.

FUNCTION
The program of figure 4, although, written in a hypothetical assembler language, contains the basic elements which need to be translated into machine language. (It is not essential for students to understand the meaning of each statement of the program.) For ease of reference, each instruction is defined by a line number, which is not part of the program. Each instruction in our language contains either an operation specification (lines 1- 15) or a storage specification (lines 16- 21). An operation specification is a symbolic operation code, which may be preceded by a label and must be followed by 0, 1, or two operand specifications, as appropriate to the operation. A storage specification is a symbolic instruction to the assembler. In our assembler language, it must be preceded by a label and must be followed, if appropriate, by a constant FIXED. Labels and operand specifications are symbolic addresses; every operand specification must appear somewhere in the program as a label.

Line
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Label

Operation
COPY COPY READ WRITE

Operand 1
ZERO ONE LIMIT OLD OLDER OLD NEW LIMIT FINAL NEW OLD NEW FRONT LIMIT 0

Operand 2
OLDER OLD

FRONT

LOAD ADD STORE SUBST BRPOS WRITE COPY COPY JMP

OLDER OLD

FINAL ZERO ONE OLDER OLD NEW LIMIT

WRITE STOP CONST CONST SPACE SPACE SPACE SPACE

figure 4 : Sample Assembler-Language Program Operation Code No of

Symbolic

Machine

Length

Operands

Action

ADD JMP JMPNEG JMPPOS JMPZERO COPY DIVIDE LOAD MULT READ STOP STORE SUB WRITE

02 00 05 01 04 13 10 03 14 12 11 07 06 08

2 2 2 2 2 3 2 2 2 2 1 2 2 2

1 1 1 1 1 2 1 1 1 1 0 1 1 1

ACC - ACC + OPDI Jump to OPDI Jump to OPDI if ACC < 0 Jump to OPDI IF ACC > 0 Jump to OPDI IF ACC = 0 PD2 - OPDI ACC - ACC / OPDI ACC - OPDI ACC -ACC X OPDI OPDI - input stream Stop execution OPDI - ACC ACC - ACC -OPDI Output stream - OPDI

figure 5 : Instruction Set Our hypothetical machine has a single accumulator and a main storage of unspecified size. Its 14 instructions are listed in Figure 6. Ale first column shows the operation code and the second gives the machine-language equivalent (in decimal). The fourth column specifies the number of operands, and the last column describes the action which ensues when the instruction is executed. In that column "ACC", "OPDI", and "OPD2" refer to contents of the accumulator, of the first operand location, and of the second operand location, respectively. The length of each instruction in words is, 1 greater than the number of its operands. Thus if the machine has 12 bit words, an ADD instruction is 2 words of 24 bits, long. The table's third column, which is redundant, gives the instruction length. If our hypothetical computer had a fixed instruction length, the third and fourth columns could both he omitted. The storage specification SPACE reserves one word of storage which presumably will eventually hold a number; there is no operand. lie storage specification FIXED also reserves a word of storage; it has an operand which is the value of a number to be placed in that word by the assembler. The instructions of the program are presented in four fields, and might indeed be, constrained so such a format on the input medium. The label, if present, occupies the first field. The second field contains the symbolic operation code or storage specification which will hence- forth be referred to simply as the operation. The third and fourth fields hold the operand specification, or simply operands, if present. Although, it is not at all important to our discussion to understand what the example program does, the foregoing specifications of the machine and of its assembler language reveal the algorithm. The program simply, computes the so-called Fibonacci numbers (0,1,1,2,3,5,8,...). This program is also written in BASIC programming language of Unit 1 Course 2. Now that we have seen the elements of an assembler-language program we can ask what functions the assembler must perform in translating it Here is the list Replace symbolic addresses by numeric addresses. Replace symbolic operation codes by machine operation codes. Reserve storage for instructions and data. Translate constants into machine representation. The assignment of numeric addresses can be performed without prior knowledge of what actual locations will eventually be occupied by the assembled program. It is necessary only to generate addresses relative to the start of the program. We shall assume that our assemble normally assigns addresses starting at 0. In translating line 1 of our example program, the resulting machine instruction will therefore be assigned address 1 and occupy 3 words, because COPY

instructions are 3 words long. Hence the instruction corresponding to line 2 will be assigned address 3, the READ instruction will be assigned address 6, and the WRITE instruction of line 4 will be assigned address 8, and so on to the end of the program. But what addresses will be assigned to the operands named ZERO and OLDER? These addresses must be inserted in the machine-language representation of the first instruction.

IMPLEMENTATION
The assembler uses a counter to keep track of machine- language addresses. Because these addresses will ultimately specify locations in main storage, the counter is called the location counter. Before assembly, the location counter is initialized to zero. After each source line has been examined on the first pass, the location counter is incremental by the length of the machine-language code which will ultimately be generated to correspond to that source line. When the assembler first encounters line 1 of the example program, it cannot replace the symbols ZERO and OLDER by addresses because those symbols make forward references to source language program lines not yet reached by the assembler. The most straightforward way to cope with the problem of forward references is to examine the entire program, text once, before attempting to complete the translation. During that examination, the assembler determines the address which corresponds to each symbol, and places both the symbols and their addresses in a symbol table. This is possible because each symbol used in an operand field must also appear as a label. The address corresponding to a label is just the dress of the symbol table requires one pass over the source text. During a second pass, the assembler uses the addresses collected in the symbol table to perform the translation. As such symbolic address is encountered in the second pass, the corresponding numeric address is substituted for it in the object code. Two of the most common logical errors in assemblerlanguage programming involve improper use of symbols. If a symbol appears in the operand field of some instruction, but nowhere in a label field. it is undefined. If a symbol appears in the label fields of more than one instruction, it is multiply defined. In building the symbol table on the first pass, the assembler must examine the label field of each instruction to permit it to associate the location counter value with each symbol. Multiply-defined symbols will be found on this pass. Undefined symbols, on the other hand, will not be found on the first pass unless the assembler also examines operand fields for symbols. Although this examination is not required for construction of the symbol table, normal practice is to perform it anyhow, because of its value in early detection of program errors. There are many ways to organize a symbol table. The organisation of a symbol table will not be discussed in this Unit. The state of processing after fine 3 is shown in Figure 7. During processing of line 1, the symbols ZERO and OLDER were encountered and entered into the fiat two positions of the symbol table, The operation COPY was identified. and instruction length, information from figure 6 used to advance the location counter from 0 to 3. During processing of line 2 two more symbols were encountered and entered in the symbol table and the location counter was advanced from 3 to 6. Line 3 yielded the fifth symbol, LIMIT, and caused incrimination of the location counter from 6 to 8. At this point the symbol table holds five symbols, none of which yet has an address. The location counter holds the address 8, and processing ready to continue from line 4. Neither the line numbers nor the addresses shown in part (a) of the figure are actually part of the sourcelanguage program. The addresses record the history of incrimination of the location counter the line numbers permit easy reference. Clearly, the assembler needs not only a location counter, but also a line counter to keep track of which source line is being processed.

Line
1 2 3

Address
0 3 6

Label

Operation
COPY COPY READ

Operand 1
ZERO ONE LIMIT

Operand 2
OLDER OLD

(a) Source text scanned

Symbol
ZERO OLDER ONE OLD LIMIT

Address
-----(b) Symbol table: Counters Location counter ; 8 Line counter ; 4

figure 6 : First Pass After Scanning Line 3


During processing of line 4 the symbol OLD is encountered for the second time. Because it is already in the symbol table, it is not entered again. During processing of line 5, the symbol FRONT is encountered in ft label field. It is entered into the symbol table, and the current location counter value, 10 is entered with it as its address. Figure 7 displays the state of the translation after line 9 has been processed.

Line
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Address
0 3 6 8 10 12 14 16 18 20 22 25 28 30 32 33 34 35 36 37 38

Label

Operation
COPY COPY READ WRITE

Operand 1
ZERO ONE LIMIT OLD OLDER OLD NEW OLD FINAL NEW OLD NEW FRONT LIMIT 0 1

Operand 2
OLDER OLD

FRONT

LOAD ADD STORE ADD JWPOS WRITE COPY COPY JMP

OLDER OLD

FINAL ZERO ONE OLDER OLD NEW LIMIT

WRITE STOP CONST CONST SPACE SPACE SPACE SPACE

(a) Source text scanned

Symbol
ZERO OLDER ONE OLD

Address
33 35 34 36

LIMIT FRONT NEW FINAL

38 10 37 30 Line Counter .. 22 Location Counter : 39

(b) Symbol table: Counters Figure : 7 The XX can be thought of as a specification to the loader will eventually process the object code, that the content of the location corresponding to address 35 does not need to have any specific value loaded. The loader can then just skip over that location. Some assemblers specify anyway a particular value for reserved storage locations, often zeros. There is no logical requirement to do so, however, and the user unfamiliar with his assembler is ill-advised to count on a particular value.

Address
00 03 06 08 10 12 14 16 18 20 22 25 28 32 33 34 35 36 37 38

Length
3 3 2 2 2 2 2 2 2 2 3 3 2 1 1 1 1 1 1 1

Machine Code
13 33 35 13 34 36 12 38 08 36 03 35 02 36 07 37 06 38 01 30 08 37 13 36 35 13 36 35 00 10 11 00 01 XX XX XX XX

Figure 8 : Object Code Generated on 2nd Pass The specifications CONST and SPACE do not correspond to machine instructions. They are really instructions to the assembler program. Because of this, we shall refer to them as assembler instructions. Another common designation for them is pseudo-instructions. Neither term is really satisfactory. Of the two types of assembler instructions in our example program, one results in the generation of machine code and the other in the reservation of storage. Later we shall see assembler instructions which result in neither of these actions. One organization is to use a separate table which is usually searched before the operation code table is searched. Another is to include both machine operations and assembler instructions in the same table. A field in the table entry then identifies the types to the assembler. A few variations to the foregoing process can be considered. Some of the translation can actually be performed during the first pass. Operation fields must be examined during the first pass to determine their effect on the location counter. The second pass table lookup to determine the

machine operation code can be obviated at he cost of producing intermediate test which holds machine operation code and instruction length in addition to source text. Another translation which can be performed during the first pass is that of constants, e.g. from source- language decimal to machine-language binary. The translation of any symbolic addresses which refer backward in the text, rather than forward, could be performed on the first pass, but it is more convenient to wait for the second pass and treat all symbolic addresses uniformly. A minor variation is to assemble addresses relative to a starting address other than 0. The location counter is merely initialized to the desired address. If, for example, the value 200 is chosen, the symbol table would appear as in figure 11.The object code corresponding to line 1 wouldbe200 3 13 233 235.

Symbol ZERO OLDER ONE OLD LIMIT FRONT NEW FINAL

Address 233 235 234 236 238 210 237 230

figure 9 : Symbol Table with Starting Location 200 If it were known at assembly time that the program is to reside at location 200 for execution then full object code with address and length need not be generated. The machine code alone would suffice. In this event the result of translation would be the following 39-word sequence. 13 236 01 236 XX 233 03 230 00 XX 235 235 08 210 XX 13 02 238 08 234 236 13 238 236 07 236 11 12 237 235 00 238 06 13 01 08 238 237 XX

MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program. The block may consist of code to swap sets of registers, do some arithmetic operations. In this situation the programmer find a macro instruction facility useful. Macro instruction (often called macros) are single line abbreviation for group of instructions. In employing a macro, the programmer essentially defines a single instruction to represent a block of code. For every occurrence of this one-line macro instruction in his program, the macro processing assembler substitute the entire block.

Macro Definition and Usage


To highlight salient aspects of macro-processor. The example is very similar to Intel's 8 bit microprocessor assembly language instruction. Example : MACRO INCRMT LOAD ADD STORE &A , &B &A &B &A Macro Definition

ENDM INCRMT X,Y LOAD X Macro ADD Y expansion STORE X ENDM Macro Program Figure 10 A macro definition is placed at the start of a program, enclosed between the statements MACRO and ENDM. A MACRO statement indicates that a macro definition starts, while statement ENDM indicates the end of a macro definition. Thus, a group of statements starting with MACRO and ending with ENDM constitutes one macro definition unit. If many macros are to be defined in a program, as many definition modules will exist at the start of the program. Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above, INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence. The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro. The assembler replaces such a statement by the statement sequence comprising the macro. This is known as macro expansion. INCRMTX,Y is shown to lead to insertion of the assembly statements LOAD X ADD Y STORE X in its place. All macro calls in a program are expanded in this fashion.

DEFINING A MACRO
Let us take another look at the macro definition unit appearing in the following Figure 10.The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit, signals the start of the main assembly language program. The next statement in the definition unit is die prototype for a macro call. This statement names the macro and indicates how the operands in any call on the macro would be written. The prototype is followed by the so called model statements. These are assembly statements which will replace the macro call as a result of macro expansion.

Positional Parameters
The prototype statement indicates how operands in a macro call would be written. These operands are called parameters or arguments. All parameters used in the prototype statement have names starting with the special character '&'. These parameters are known as formal parameters. A macro call is written using parameter names which do not start with ft special character '&'. These are known as actual parameters. The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively, establish a correspondence between each formal parameter and an actual parameter. In figure 10 , this correspondence is determined by the relative positions of these parameters in their respective lists. Thus the first actual parameter in the fist is paired with the first of formal parameters etc. Considering the prototype and macro call statements once again. INCRMT &A,&B ... prototype INCRMT X,Y ... macro call We see that X would be paired with &A and Y with &B. While expanding a macro call, any formal parameter appearing within a model statement is replaced by the corresponding actual parameter. This is how expansion of the call INCR X,Y heads to the following statements LOAD X ADD Y STORE X

Schematics for Macro-Expansion Above we touched upon the fundamental aspects of macro expansion. From the discussion, it appears that the process of macro expansion is similar to language translation. The source program containing macro definitions and calls is translated into an assembly language, program without any macro definitions or calls. This program form can now be handed over to a conventional assembler as to obtain the target languages form of the program. In such a schematic (Figure 11), the process of macro expansion is completely segregated from the process of assembly program. The translator which performs macro expansion in this manner is called a macro pre-processor. The advantage of this scheme is that any existing conventional assembler can be enhanced in this manner to incorporate macro processing. It would reduce the programming cost involved in making a macro facility available to programmer using a computer system. The disadvantage is that this scheme is probably not very efficient because of the time spent in generating assembly language statements and processing them again for the purpose of translation to the target language.

Fig. 12 : A pre-processor based scheme for macro assembly

ISSUES RELATED TO THE DESIGN OF A MACRO PRE-PROCESSOR


As against this schematic of prefixing a conventional assembler with a macro pre-processor, it is possible to design a macro assembler which not only processes macro definitions and macro calls for the purpose of expansion, but also assembles the expanded statements along with the original assembly statements. The macro assembler should require fewer passes over the program than the pre-processor scheme. This holds out a promise for better efficiency. But for the sake of simplicity in this section, we will discuss the issues related to implementation of macro preprocessor instead of actual implementation.

Issues related to the Design of a Macro Pre Processor


Our discussion regarding the definition and use of macros in an assembly program has brought out to some extent the working principles of a macro pre-processor. To summarise, we should be able to differentiate between macro names and invalid operation code mnemonics. On thus recognizing a call on a macro, we should be able to access the text of its definition so that we can expand the call. For generating a statement during expansion, we need to develop a simple scheme for substituting the appearance of a formal parameter with its value. Correspondence between a formal parameter and its value will have to be established for this purpose. It is desirable that instead of performing this action for every appearance of a formal parameter, correspondent between formal parameters and their value should be established once and for all, at the start of macro expansion. Considerations of positional and keyword correspondence would thus get localized to the start of macro expansion only. This would have the further advantage that no distinction would need to be made between keyword and positional parameters during macro expansion. Step 1: Scan all macro definitions one by one. For each macro defined. enter its name in the Macro Name Table (MNT). store the entire macro definition in the Macro Definition Table (MDT). add auxiliary information to the MNT indicating where the definition of a macro can be found in MDT.

Step 2: Examine all statements in the assembly source program to detect macro calls. For each macro call locate the macro in MNT. obtain information from MNT regarding position of the macro definition in MDT. process the macro call statement to establish correspondence between all formal parameters and their values (i.e. actual parameters). expand the macro call by following the procedure given in step 3. Step 3: Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered. The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables. In order to have a complete working scheme within the above framework, we need to finalise the following details: Method of establishing correspondence between a formal parameter and its value. Method of sequencing through the statements comprising a macro definition in expansion time order. Method of expanding a model statement Allocation of storage for expansion time variables and access to their values during expansion.

COMPILER/ LINKER LOADER LOADERS INTRODUCTION


The purpose of this section is to discuss various functions of a loader. The loader is a program which accepts an object code and prepare them for an execution. An object code produced by an assembler/compiler cannot be executed without any modification. As many as four more function must be performed first. These functions are performed by a loader. These functions are: Allocation of space in main memory for the programs. Linking of a program with each other like library programs Adjust all address dependent locations. such as address constants, to correspond to the allocated space. it is also called relocation Physically load the machine instructions and data into memory. The following figure 1 shows the function of a loader

Fig. 1: Function of a loader. Let us examine the need of some of these function of the loader.

Linking

The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a 'stand-alone' nature. That is a program generally cannot execute on its own. without requiring the presence of some other programs in the computer's memory. For example. consider a program written in high level languages like C. Such a program may contain calls on certain Input/Output functions like Printf ( ), Scanf ( ) etc., which am not written by the programmer himself. During program execution, those standard functions must reside into the main memory. Furthermore, every time an Input/Output function is called by a C language program, control should get transferred to the appropriate function. The linking function makes address of programs known to each other so that such transfers can take place during the execution.

RELOCATION
Another function commonly performed by a loader is that of program relocation. This function can be explained as follows: Assume that a program written in C ( let us call it A) calls standard function Printf ( ). A and Printf ( ) would have to be linked with each other. But where is main storage shall we load A and Printf ( ). A possible solution would be to load them according to the addresses assigned when they were U~W& For example, as translated . A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150. If we were to load these programs at their translated addresses, a lot of storage lying between them may go waste. Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100. 7bus, A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50. But there is simply no way A and Printf ( )can co-exist at same storage location. Therefore, the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste. It should be noted that relocation is more than simply moving a program from one area to another in the storage. It refers to adjustment of address fields and not to movement of a program. The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity, be it a program or data. It is possible to produce multiple program or data segment in a single source file). The pan of a loader which performs relocation is called relocating loader.

LOADER SCHEMES
There, are several schemes accomplishing the four loading function. These schemes are (i) Absolute loader (ii) Relocating Loader (iii) Direct Linking Loader (iv) Dynamic Loading (v) Dynamic Linking etc.

Absolute Loader : The task of an absolute loader is virtually trivial. The loader simply
accepts the machine language code produced by the assembler and places it into main memory at the location specified by the assembler.

Relocating Loader: To avoid possible reassembling of all subroutines when a single


sub-routine is changed and to perform the tasks of allocation and linking for the programmer. The general class of relocating loader was introduced. The output of a relocating loader is the object program and information about all other programs it references. In addition, there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory.

Direct Linking Loader: It is a general relocatable loader, and is perhaps the most
popular loading scheme presently used. It has the advantage of allowing the programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments. This provides

flexible inter segment referencing and accessing ability, while at the same time allowing independent translations of programs. The other two loader schemes will be discussed in the next section.

Dynamic Loading And Linking: There are numerous variations to the previously
presented loader schemes. One disadvantage of the direct-linking loader, as presented, is that it is necessary to allocate, relocate, link. And load all of the subroutines each time in order to execute a program. Since there may be tens and often hundreds of subroutines involved, especially when we include utility routines such as SQRT etc., this loading process can be extremely time- consuming. Furthermore, even though the loader program may be smaller than the assembler, it does absorb a considerable amount of space. These problems can be solved by dividing the loading process into two separate programs: a binder and a module loader. A binder is a program that performs the same functions as the direct-linking loader in binding subroutines together, but rather Cm placing the relocated and linked text directly into memory, it outputs the text as a file. This output file is in a format ready to be loaded and is typically called a load module. The module loader merely has to physically load the module into main memory. The binder essentially performs the functions of allocation, relocation, and linking; the module loader merely performs the function of loading. There are two major classes of binders. The simplest type produces a load module that looks very much like a single absolute loader filet This means that the specific memory allocation of the program is performed at the time that the subroutines are bound together. A more sophisticated binder, called a linkage editor. can keep auk of the relocation information so that the resulting load module can be further relocated and thereby loaded anywhere, in memory. In this case the module loader must perform additional allocation and relocation as well as loading, but it does not have to worry about the complex problems of linking. In both cases, a program that is to be used repeatedly need only be bound once and then can be loaded whenever required. The first binder is relatively simple and fast. The second one (linkage editor binder) is somewhat more complex but allows a more flexible allocation and loading scheme.

Dynamic Loading
In each of the previous loader schemes we have assumed that all of the subroutines needed are loaded into main memory at the same time. If the total amount of memory required by all these subroutines exceeds the amount available, as is common with large programs on small computers, there is trouble! There are several hardware, techniques, such as paging and segmentation, that attempt to solve this problem. Usually the subroutines of a program are needed at different times: for example, pass 1 and pass 2 of an assembler are mutually exclusive ~ 1 and pass 2 should not simultaneously occupy memory resources). By explicitly recognizing which subroutines call other subroutines it is possible to produce an overlay structure that identifies mutually exclusive subroutines. Figure 2 illustrates a program consisting of five subprograms (A, B. C, D and E) that require 100K bytes of memory. The arrows indicate that subprogram A only calls B, D and E; subprogram B only calls C and E; subprogram D only calls E; and subprograms C and E do not call any other routines. Figure 16(a) highlights that interdependencies between the procedures. Note that procedures B and D are never in use at the same time; neither are C and E. If we load only those procedures that are actually to be used at any particular time. the amount of memory needed is equal to the longest path of the overlay structure. This happens to be 7-K for the example in Figure 16(b) procedures A, B and C. Figure 2 (c) illustrates a storage assignment for each procedure consistent with the overlay structure. In order for the overlay structure to work it is necessary for the module loader to load the, various procedures as they are needed. We will not go into their specific details, but there

are many binders Capable of processing and allocating an overlay structure. The portion of the loader that actually intercepts the calls and loads the necessary procedure is called the over lay supervisor or simply the flipper. This overall scheme is called dynamic loading or load on-call

Figure 2 ( A )

Figure 2 ( B )

Figure 2 ( C )

Figure 2 ( D )

Fig. 2 : Dynamic Loading


DYNAMIC LINKING
The major disadvantage of all of the previous loading schemes is that if a subroutine is referenced but never executed (e.g. if the programmer had placed a call statement in his program but this statement was never executed because of a condition did not satisfy) the loader would still incur the overhead of linking the subroutine. Furthermore, all of these schemes require the programmer to explicitly name all procedures that might be called. A very general type of loading scheme is charted dynamic linking. This is a mechanism by which loading and linking of external references are postponed until execution time. The loader loads only the main program. If the main program should execute a transfer instruction to an external address, or should reference an external variable (that is, a variable that has not been defined in this procedure segment), the loader is called. Only then is the segment containing the external reference loaded. An advantage here is that no overhead is incurred unless the procedure to be called or referenced is actually used. A further advantage is that the system can be dynamically reconfigured. The major drawback to using this type of loading scheme is the considerable overhead and complexity incurred, due to the fact that we have postponed most of the binding process until execution time. Now we will discuss the implementation of the simplest type of loader scheme which is called an absolute loader.

Implementation of an Absolute Loader


Absolute loaders are simple to implement but they do have disadvantages. First, the programmer must specify to the assembler the address in memory when the program is to be loaded. Further, if there are multiple function to be called within a program, the programmer must remember the address of each and use that absolute address explicitly in his Other functions to perform linking of functions. The figure B illustrates the operation of an absolute loader. The programmer must he careful not to assign two subroutine function to the same or overlapping address.

Figure 3 : Absolute Loader The program First. c is assigned to locations 100-300 and the sqrt function is assigned location 400-450. If changes were made to A that increased its length to more than 300 bytes, the end of first. c (at 100+300 = 400) would overlap the start of sqrt (at 400). It would then be necessary to assign sqrt to a new address. Furthermore, it would also be necessary to modify all other functions that referred to sqrt. In situation when dozen of subroutines are being used, this manual shuffling can get very complex, tedious and wasteful of time and memory. The four loader functions are accomplished as follows in an absolute loading scheme: MACRO INCRMT LOAD ADD STORE ENDM INCRMT X,Y LOAD X Macro ADD Y expansion STORE X ENDM Macro Program &A , &B &A &B &A Macro Definition

COMPILER
The study of compiler designing form a central theme in the field of computer science. An understanding of the technique used by high level language compilers can give the programmer a set of skills applicable in many aspects of software design - one does not have to be a compiler writer to make use of them. Assembler which translates assembly language program into machine language. here we will look at another type of translator called compiler. The compiler writing is not confined to one discipline only but rather spans several other disciplines: programming languages, computer architecture, theory of programming languages, algorithms, etc. Today a few basic compiler writing techniques can be used to construct translators for a wide variety of languages. This unit is intended as an introduction to the basic essential features of compiler designing.

WHAT IS A COMPILER?
A compiler is a software (Program) that reads a program written in a source language and translates it into an equivalent program in another language - the target language (see figure 4).The important aspect of compilation, process is to produce diagnostic (error messages) in the source program. These error messages are mainly due to the grammatical mistakes done by a programmer. A familiarity with the material covered in this unit will be a great help in understanding the inner function of a compiler

Fig. 4 . A Complier There are thousands of source languages, ranging from C and PASCAL to specialized languages that have arisen in virtually every area of computer application. Target languages a also in thousands. A target language may be another programming language or the machine language or an assembly language. Compilers are classified as single pass, multitasks, debugging or optimizing, depending on how they have been constructed or on what functions are supposed to perform. Earlier (in 1950's) compilers were considered as a difficult program to write. The first FORTRAN compiler, for example, took 18 staff-years to implement B now several new techniques and tools have been developed for handling many of the important tasks that occur during compilation process. Good implementation languages, programming environments (editors, debuggers, etc.) and software tools have also been developed. With these development compiler writing exercise has become easier.

Approaches To Compiler Development


There are several approaches to compiler developments. Here we will look at some of them are -

Assembly Language Coding


Early compilers were mostly coded in assembly language. The main consideration was to increase efficiency. This approach worked very well for small High Level Languages (HLL). As languages and their compilers became larger, lots of bugs started surfacing which were difficult to remove. The major difficulty with assembly language implementation was of poor software maintenance. Around this time, it was realised that coding the compilers in high level language would overcome this disadvantage of poor maintenance. Many compilers were therefore coded in FORTRAN, the only widely available HLL at that time. For example, FORTRAN H compiler for IBM/360 was coded in FORTRAN. Later many system programming languages were developed to ensure efficiency of compilers written into HLL.Assembly language is still being used but trend is towards compiler implementation through HLL.

Cross-Compiler
A cross-compiler is a compiler which runs on one machine and generates a code for another machine. The only difference between a cross-compiler and a normal compiler is in terms of code generated by it. For example, consider the problem of implementing a Pascal compiler on a new piece of hardware (a computer called X) on which assembly language is the only programming language already available. Under these circumstances, the obvious approach is to write the Pascal compiler in assembler. Hence, the compiler in this case is a program that takes Pascal source as input, produces machine code for the target machine as output and is written in the assembly language of the target machine. The languages characterizing this compiler can be represented as:

figure 5 : showing that Pascal source is translated by a program written in X assembly language (the compiler) running on machine X into X's object code. This code can then be run on the target machine. This notation is essentially equivalent to the T-diagram. The T-diagram for this compiler is shown in figure 5 .

Fig. 5 T-diagram The language accepted as input by the compiler is stated on the left the language output by the compiler is shown on the right and the language in which the compiler is written is shown at the bottom. The advantage of this particular notation is that several T-diagrams can be meshed together to represent more complex compiler implementation methods. This compiler implementation involves a great deal of work since a large assembly language program has to be written for X. It is to be noticed in this case that the compiler is very machine specific; that is, not only does it run on X but it also produces machine code suitable for running on X. Furthermore, only one computer is involved in the entire implementation process. The use of a high-level language for coding the compiler can offer great savings in implementation effort. If the language in which the compiler is being written is already available on the computer in use, then the process is simple. For example, Pascal might already be available on machine X, thus permitting the coding of, say, a Modula-2 compiler in Pascal. Such a compiler can be represented as:

If the language in which the compiler is being written is not available on the machine, then all is not lost, since it may be possible to make use of an implementation of that language on another machine. For example, a Modulc-2 compiler could be implemented in Pascal on machine Y, producing object code for machine X:

The object code for X generated on machine Y would of course have, to be transferred to X for its execution. This process of generating code on one machine for execution on another is called cross-compilation. At first sight, the introduction of a second computer to the compiler implementation plan seems to offer a somewhat inconvenient solution. Each time a compilation is required, it has to be done on machine Y and the object code transferred, perhaps via a slow or laborious mechanism, to machine X for execution. Furthermore, both computes have to be running and inter-linked somehow, for this approach to work.

BOOTSTRAPPING
It is a concept of developing a compiler for a language by using subsets (small pail) of the same language. Suppose that a Modula-2 compiler is required for machine X, but that the compiler be coded in Modula-2. Coding the compiler in the language it is to compile is nothing nothing special and, as will be seen, it has a great deal in its favour. Suppose further that Modula-2 is already available on machine Y. In this case, the compiler can be run on machine Y, producing object code for machine X: This is the same situation as before except that the compiler is coded in Modula-2 rather than Pascal. The special feature of this approach appears in the next step. The compiler, running on Y, is nothing more than a large program written in Modula-2. Its function an input file of Module-2 statements into a functionally equivalent sequence of statement in X's machine code. Therefore, the source statements of this Module-2 compiler can be passed into itself running on Y to produce a file containing X's, machine code. This file is of course a Module-2 compiler, which is capable of being run on X. By making the compiler compile itself, a version of the compiler that runs on X has been created.

Once this machine code has been transferred to X, a self-sufficient Module-2 compiler is available on X; hence there is no further use for machine Y for supporting Module-2 compilation. This implementation plan is very attractive. Machine Y is only required for compiler development and once this development has reached the stage at which the compiler can (correctly) compile itself, machine Y is no longer required. Consequently, the original compiler implemented on Y need not be of the highest quality - for example, optimization can be completely disregarded. Further development (and obviously conventional use) of the compiler can then continue at leisure on machine X.This approach to compiler implementation is called bootstrapping. Many languages, including C, Pascal, FORTRAN and LISP have been implemented in this way. Pascal was first implemented by writing a compiler in Pascal itself. This was done through several bootstrapping processes. The compiler was then translated "by hand" into an available low level language.

Compiler Designing Phases


The compiler being a complex program is developed through several phases. Each phase transforms the source program from one representation to another. The tasks of a compiler can be divided very broadly into two sub-tasks. The analysis of a source program The synthesis of the object program In a typical compiler, the analysis task consists of 3 phases. Lexical analysis Syntax analysis Semantic analysis The synthesis task is usually considered as a code generation phase but it can be divided into some other distinct phases like intermediate code generation and code optimization. These four phase functions in sequence are shown in figure 6 . Code optimization is beyond this unit. The nature of the interface between these four phases depends on the compiler. It is perfectly possible for the four phases to exist as four separate programs.

Fig. 6 Compiler Design Phases

Lexical Analysis
Lexical analysis is the first phase of a compiler. Lexical analysis, also called scanning, scans a source program form left to right character by character and group them into tokens having a collective meaning. It performs two important tasks. First, it scans a source program character by character from left to right and groups them into tokens (or syntactic element). Each token or basic syntactic element represents a logically cohesive sequence of characters such as identifier (also called variable), a keyword (if, then. else, etc.), a multi -character operator < =, etc. The output of this phase goes to the next phase, i.e., syntax analysis or parsing. The interaction between two phases is shown below in figure 7 .

Fig. 7 Interaction between the first two phases The second task performed during lexical analysis is to make entry of tokens into a symbol table if it is not there. Some other tasks performed during lexical analysis are: to remove all comments, tabs, blank spaces and machine characters. to produce error messages (also called diagnostics) occurred in a source program. Let us consider the following Pascal language statement. For i = 1 To 50 do sum = sum + x [i]; sum of numbers stored in array x After going through the statement, the lexical analysis transforms it into the sequence of tokens: For i = 1 To 50 do sum := sum + x [i]; Tokens are based on certain grammatical structures. Regular expressions are important notations for specifying these tokens. It consists of symbols (in the alphabet of the language that is being defined) and a set of operators that allow: concatenation ( combination of strings), repetition, and alteration. Examples of Regular Expressions ab denotes the set of strings (ab) a 1 b denotes either a or b a* denotes (empty, a, aa, aaa), etc. ab* denotes (3, ab, abb, abbb). [a-z A-z] [a-z A-z 0-0]* gives a definition of a variable which means that variable start with an alphabetic character followed by either alphabetic character or digit character. Writing a lexical analysis completely from scratch is a fairly challenging task. Several tools have been built for constructing lexical analysis from special purpose notation based on regular expressions. Perhaps the most famous of these tools is Lex, one of the many utilities available with Unix operating system. Lex requires that the syntax of each lexical token be defined in terms of a regular expression. Associated with each regular expression is a fragment of code that defines the action to be taken when that expression is recognized .

The Symbol Table


An essential function of a compiler is to record the identifiers and the related information about its attributes type (numeric or character), its scope (where in the program it is valid) and in the case of procedure or the function names, such thing as the number and types of its arguments, the mechanism of passing each argument and the type of result it returns. A symbol table is a set of locations containing a record for each identifier with fields for the attributes of the identifier. A symbol table allows us to find the record for each identifier (variable) and to store or retrieve data from that record quickly.

For example, take an expression written in C such as int x, y, z; The lexical analysis after going through this expression will enter x, y and z into the symbol table. This is shown in the figure given below.

Fig. 8 Symbol Table The first column of this table contains the entry of variables and the second contains the address of memory locations where values of these variables will be stored. The remaining phases enter information about identifiers into the symbol table and then use this information in various ways.

SYNTAX ANALYSIS
Every language whether it is a programming language or any natural language follows certain grammatical rules dart define syntactical structures of a language. In C language, for example a program is made out of main function consisting of blocks. a block out of statements, a statement out of expressions, an expression out of tokens and so on. The syntax of a programming language constructs can be described by Backens Naur Form - (BNF) notations. These types of notations are also called context-free grammars. Well formed grammars offer significant advantages to compiler designer: A grammar gives a precise, yet easy to understand syntactic specification of a programming language. Development of tools for designing Parser to determine if a source program is syntactically correct, can be achieved from certain class of grammars. A well designed grammar imparts a structure to a programming language that is useful for the translation of source program into correct object code. Syntax analysis is die second phase of compilation process. Ibis process is also called parsing. It performs the following operations: Obtains a group of tokens from the lexical analyser. Determines whether a string of tokens can be generated by a grammar of the language, i.e. it checks whether the expression is syntactically correct or not. Reports syntax errors) if any. The output of parsing is a representation of the syntactic strlicu= of a statement in the form of Parse tree (syntax tree). The process of parsing is shown below in figure 9 .

figure 9 . Process of Parsing For example the statement X = Y+Z could be represented by the syntax tree shown in the figure 10 .

Fig. 10 Parse tree The parse tree of the statement in this form means that the first Y and Z will be added and then its result will be assigned to X.

CONTEXT FREE GRAMMARS


Each Programming language has got its own syntax (grammars). In this section we will discuss context free grammar for specifying the syntax of a language. A grammar naturally describes the hierarchical structure of many programming language constructs. For example, an if-else statement in C has the form : if (expression) statement else statement Suppose we take variables expr. and stmt to denote expressions and statements respectively then if-else statement can be written as start --> if (expr) stmt else stmt Such a rule is called Production In a production lexical elements like the keywords: if, else and the parenthesis are called tokens (also called terminal symbols). Variables like expr and stmt represent sequence of tokens and are called non- terminals. A context free grammar has four components: A set of terminal symbols like keywords for a programming language A set of non-terminal symbols

A set of productions (rules) where each production consists of a non-terminal called the left side of the production an arrow and a sequence of tokens and/or non-terminals, called the right side of tokens. A designation of one of the non-terminals as the start symbol. Example: use of expression is common in a programming language. An expression consists of digits and arithmetical operations +, -, * etc. e.g. 3-2+1, 4+3, 1. Since arithmetical operators must appear between two digits, we define expressions a list of digits separated by arithmetical operations. The following grammar describes the syntax of arithmetical expressions. The productions are List List List List ----list + digits list - digits list * digits digits 0|1|2|3|4|5|6|7|8 |9 The first three productions with non-terminal symbol list on the left side can also be written as list --> list + digits | list - digits | list * digits. In this production + - 0 1 2 3 4 5 6 7 8 9 are all tokens of the grammar. List and digits are non-terminal symbols. List is also starting symbol of the production rule. (1) (2) (3) (4) (4)

digit --

Fig. 11 Parse tree for an expression 5 * 8 - 6 + 2 This grammar will be able to generate any type of arithmetic expressions. A grammar derives strings (expressions) by beginning with the start symbol and repeatedly replacing a non-terminal symbol by the right side of a production for that non-terminal. In this example, list will be replaced with another list which will be replaced further by some other list or digits. Example: Suppose we have an expression 5*8-6+2. Let us verify whether this expression can be derived from this grammar and construct a parse tree (Figure 11). 5 is a list by production (5), since 5 is a digit. 5*8 is a list by production (3) 5+8-6 is a list by production (2) 5*8-6+2 is a list by production (I) A parse tree graphically displays how the start symbol of the grammar derives an expression 5*8-6+2. A parse tree is a tree with the following properties: The root is labelled by the start symbol (list). Each leaf is labelled by a token or terminal symbol.

Each interior mode is labelled by a non-terminal (list, digit). The syntax of a language defines the set of valid programs, but a programmer or compiler writer must have more information before the language can be used or compiler developed. The semantic rules provide this information and specify the meaning or actions of all valid programs allowed by the syntax rules.

Semantic Analysis
The role of semantic analyser is to derive methods by which the structures constructed by the syntax analyser may be evaluated or analysed. The semantic analysis phase checks the source program for semantic errors and gathers data type information for the subsequent code-generation phase. An important component of semantic analysis is type checking. Here the compiler checks that each operator has operands that are permitted by the source language specification. For example: Many programming languages definition require a compiler to report an error every time a real number is used to index an array. For example a [5.6]; here 5.6 is a real value not an integer. To illustrate some of the actions of a semantic analyser consider the expression a+b-c*d in a language such as Pascal where abc have data type integer and d has type real. The syntax analyser produces a parse tree of the form shown in figure (12. a). One of the tasks of the semantic analyser is to perform t)W checking within this expression. By consulting the symbol table, the data types of all the variables can be inserted into the tree as shown in the figure 12 (b) and performs semantic type conversion and label a node accordingly.

Fig. 12 :Semantic Analysis of an arithmetic expression The semantic analyser can determine the types of the intermediate results and thus propagate the type attributes through the tree checking for compatibility as it goes. In our example, the semantic analyser first considers the results of c and d. According to the Pascal semantic rule integer * real --> real, the * node can be labelled as real. This is shown in figure 12(c). Compilers vary widely in the role taken by the semantic analyser. In some simpler compilers, there is no easily identifiable semantic analysis phase, the syntax analyser itself does semantic analysis and intermediate code generation directly. In other compilers syntax analysis, semantic analysis and code generation is a separate phase. In the next section we will discuss about code generation phase.

Code Generation and Optimization


The final phase of the compiler is the code generator. The code generator takes an input as intermediate representation (in the form of parse tree) of the source program and produces as output an equivalent target program (figure 13 ).

Figure 13: Code generation phase The target program may take on a variety of forms: absolute machine language, relocatable machine language or assembly language. Producing an absolute machine language program as output has the advantage that it can be placed in a fixed location in memory and immediately executed. Producing a relocatable machine language program (object module) as output allows sub-programs to be compiled separately. A set of relocatable object modules can be linked together and loaded for execution by a linking loader. The process of linking and loading in producing relocatable object code might be little time consuming but it provides flexibility in being able to compile subroutine separately and to call other previously compiled program from the object module. If the target machine does not handle location automatically, the compiler must provide relocation information to the loader to link the separately compiled program segments. Producing an assembly-language program as output makes the process of code generation in somewhat simpler. We can generate symbolic instruction and use the macro facilities of the assembler to help generate code. Some issues in the design of code-generation: A through knowledge of the target machine's architecture as well as instruction set is required to write a good code generator. The code generator is concerned with the choice of machine instruction, allocation of machine registers, addressing, interfacing with operating system. The concept of registers, addressing scheme has been discussed in Block 1 of Course 1. To produce faster and more compact code, the code generator should include some form of code optimization. This may exploit techniques such as the use of special purpose machine instructions or addressing modes, register optimization etc. This code optimization may incorporate both machine-dependent and machine-independent techniques.

SOFTWARE TOOLS
Writing a compiler is not a simple project and anything that make the task simpler is worth exploring. At a very early stage in the history of compiler development it has recognized that some aspects of compiler design could tie automated. Consequently a great deal of effort has been directed towards the development of software tools to aid the production of a compiler. Two best known software tools for compiler constructions are Lex (a lexical analyser generator) and Yacc (a parser generator). both of which are available under the UNIX operating system. Their continuing popularity is partly due to their widespread availability but also because they are powerful and easy to use with a wide range of applicability. This section describes these two software tools.

OPERATING SYSTEM INTRODUCTION TO OPERATING SYSTEM


An Operating System is a system software which may be viewed as an organized collection of software consisting of procedure's for operating a computer and providing an environment for execution of programs. It acts as an interface between users and the hardware of a computer system.

There are many important reasons for studying operating systems. Some of them are: User interacts with the computer through operating system in order to accomplish his task since it is his primary interface with a computer. It helps users to understand the inner functions of a computer very closely. Many concepts and techniques found in operating system have general applicability in other applications. In the previous block we mainly covered different types of software: Compilers, Assemblers, Software utilities like Lex, Yacc and GUIs (MS-WINDOW, XWINDOW etc.). All these software have been developed under a particular operating system environment. The introductory concepts and principles of an operating system will be the main issues for the discussion in this unit. Evolution of operating system, types and models of operating systems will also be broadly covered here.

WHAT IS AN OPERATING SYSTEM ?


An operating system is an essential component of a computer system. The primary objectives of an operating system is to make computer system convenient to use and utilizes computer hardware in an efficient manner. An operating system is a large collection of software which manages resources of the computer system, such as memory, processor, rite system and input/output devices. It keeps track of the status of each resource and decides who will have a control over computer resources, for how long and when. The positioning of operating system in overall computer system is shown in figure 1.

Figure 1 - Component of computer system From the diagram, it is clear that operating system directly controls computer hardware resources. Other programs rely on facilities provided by the operating system to gain access to computer system resources. There are two ways one can interact with operating system: By means of Operating System Call in a program Directly by means of Operating System Commands

System Call
System calls provide the interface to a running program and the operating system. User program receives operating system services through the set of system calls. Earlier these calls were available in assembly language instructions but now a days these features are supported through high-level languages like C, Pascal etc., which replaces assembly language for system programming. The use of system calls in C or Pascal programs very much resemble pre-defined function or subroutine calls.

As an example of how system calls are used, let us consider a simple program to copy data from one file to another. In an interactive system, the following system calls will be generated by the operating system: Prompt messages for inputting two file names and reading it from terminal. Open source and destination file. Prompt error messages in case the source file cannot be open because it is protected against access or destination file cannot be created because there is already a file with this name. Read the source file. Write into the destination file. Display status information regarding various Read/Write error conditions. For example, the program may find that the end of the file has been reached or that there was a hardware failure. The write operation may encounter various errors, depending upon the output device (no more disk space, physical end of tape, printer out of paper add so on). Close both files after the entire file is copied. As we can observe, a user program takes heavy use of the operating system. All interaction between the program and its environment must occur as the result of requests from the program to the operating system.

Operating System Commands


Apart from system calls, users may interact with operating system directly by mews of operating system commands. For example, if you want to list files or sub-directories in MS-DOS, you invoke dir command. In either case, the operating system acts as an interface between users and the hardware of a computer system. The fundamental goal of computer systems is to solve user problems. Towards this goal computer hardware is designed. Since the bare hardware alone is not very easy to use, programs (software) are developed. These programs require certain common operations, such as controlling peripheral devices. The command function of controlling and allocating resources are then brought together into one piece of software; the operating system. To see what operating systems are and what operating systems do, let us consider how they have evolved over the years. By tracing that evolution, we can identify the common elements of operating systems and examine how and why they have developed as they have.

EVOLUTION OF OPERATING SYSTEMS


An operating system may process its task serially (sequentially) or concurrently (several tasks simultaneously). It means that the resources of the computer system may be dedicated to a single program until its completion or they may be. allocated among several programs in different stages of execution. The feature of operating system to execute multiple programs in interleaved fashion or different time cycles is called as multiprogramming systems. In this section, we will try to trace the evolution of operating system. In particular, we will describe serial processing, batch processing and multiprogramming.

SERIAL PROCESSING
Programming in 1's and 0's (machine language) was quite common for early computer systems. Instruction and data used to be fed into the computer by means of console switches or perhaps through a hexadecimal keyboard. Programs used to be started by loading the program computer register with the address of the first instruction of a program and its result (program) used to be examined by the contents of various registers and memory locations of the machine. Therefore, programming in this style caused a low utilisation of both users and machine. Advent of Input/Output devices, such as punched cards paper tape and language translators (Compiler/Assemblers) brought a significant step in computer system utilization. Program started being coded into programming language are first changed into object code (binary code) by translator and then automatically gets loaded into memory by a program called loader.

After transferring a control to the loaded program, the execution of a program begins and its result gets displayed or printed. Once in memory, the program may be re-run with a different set of input data. The process of development and preparation of a program in such environment is slow and cumbersome due to serial processing and numerous manual processing. In a typical sequence first the editor is called to create a source code of user program written in programming language, translator is called to convert a source code into binary code and then finally loader is called to load executable program into main memory for execution. If syntax errors are detected, the whole process must be restarted from the beginning. The next development was the replacement of card-decks with standard input/output and some useful library programs, which were further linked with user program through a system software called linker. While there was a definite improvement over machine language approach, the serial mode of operation is obviously not very efficient. This results in low utilization of resources.

BATCH PROCESSING
Utilisation of computer resources and improvement in programmer's productivity was still a major prohibition. During the time that tapes were being mounted or programmer was operating the console, the CPU was sitting idle. The next logical step in the evolution of operating system was to automate the sequencing of operations involved in program execution and in the mechanical aspects of program development. Jobs with similar requirements were batched together and run through the computer as a group. For example, suppose the operator received one FORTRAN program, one COBOL program and another FORTRAN program. If he runs them in that order, he would have to set up for FORTRAN program environment (loading the FORTRAN compiler tapes) then set up COBOL program and finally FORTRAN program again. If he runs the two FORTRAN programs as a batch, however he could set up only once for FORTRAN thus saving operator's time. Batching similar jobs brought utilisation of system resources quite a bit. But there were still problems. For example, when a job is stopped, the operator would have to notice that fact by observing the console, determine why the program stopped and then load the card reader or paper tape reader with the next job and restart the computer. During this transition from one job to the next, the CPU sat idle. To overcome this idle time, a small program called a resident monitor was created which is always resident in the memory. It automatically sequenced one job to another job. Resident monitor acts according to the directives given by a programmer through control cards which contain information like marking of job's beginnings and endings, commands for loading and executing programs, etc. These commands belong to job control language. These job control language commands are included with user program and data. Here is an example of job control language commands $COB $JOB $END $LOAD $RUN Execute the COBOL compiler First card of a job Last card of a job Load program into memory Execute the user program

Figure 2 shows a sample card deck set up for a simple batch system.

Figure 2 : Card deck for Cobol Program for a simple batch system With sequencing of program execution mostly automated by batch operating system, the speed discrepancy between fast CPU and comparatively slow input/output devices such as card readers, printers emerged as a major performance bottleneck. Even a slow CPU works in the microsecond range, with millions of instructions per second. But, fast card reader, on the other hand, might read 1200 cards per minute. Thus, the difference in speed between the CPU and its input/output devices may be three orders of magnitude or more. The relative slowness of input/output devices can mean that CPU is often waiting for input/output. As an example, an Assembler or Compiler may be able to process 300 or more cards per second. A fast card reader, on the other hand, may be able to read only 1200 cards per minute. This means that assembling or compiling a 1200 card program would require only 4 seconds of CPU time but 60 seconds to read. Thus, the CPU is idle for 56 out of 60 seconds or 93.3 per cent of the time. The resulting CPU utilisation is only 6.7 per cent. The process is similar for output operations. The problem is that while an input/output is occurring, the CPU is idle, waiting for the input/output to complete; while the CPU is executing, input/output devices are idle. Over the years, of course, improvements in technology resulted in faster input/output devices. But CPU speed increased even faster. Therefore, the need was to increase the throughput and resource utilisation by overlapping input/output and processing operations. Channels, peripheral controllers and later dedicated input/output processors brought a major improvement in this direction. DMA (Direct Memory Access) chip which directly transfers the entire block of data from its own buffer to main memory without intervention by CPU was a major development. While CPU is executing, DMA can transfer data between high speed input/output devices and main memory. CPU requires to be interrupted per block only by DMA. Apart from DMA, there are two other approaches to improving system performance by overlapping input, output and processing. These are buffering and spooling. Buffering is a method of overlapping input, output and processing of a single job. The idea is quite simple. After data has been read and the CPU is about to start operating on it, the input device is instructed to begin the next input immediately. The CPU and input device are then both busy. With luck, by the time that the CPU is ready for the next data item, the input device will have finished reading it. The CPU can then begin processing the newly read data, while the input device, starts to read the following data. Similarly, this can be done for output. In this case, the CPU creates data that is put into a buffer until an output device can accept it.

If the CPU is, on the average much faster than an input device, buffering will be of little use. If the CPU is always faster, then it always finds an empty buffer and have to wait for the input device. For output, the CPU can proceed at full speed until, eventually all system buffers are full. Then the CPU must wait for the output device. This situation occurs with input/output bound jobs where the amount of input/output relation to computation is very high. Since the CPU is faster than the input/output device, the speed of execution is controlled by the input/output device, not by the speed of the CPU. More sophisticated form of input/output buffering called SPOOLING (simultaneous peripheral operation on line) essentially use the disk as a very large buffer (figure 3) for reading and for storing output files.

Figure 3: Spooling Buffering overlaps input, output and processing of a single job whereas Spooling allows CPU to overlap the input of one job with the computation and output of other jobs. Therefore this approach is better than buffering. Even in a simple system, the spooler may be reading the input of one job while printing the output of a different job.

MULTIPROGRAMMING
Buffering and spooling improve system performance by overlapping the input, output and computation of a single job, but both of them have their limitations. A single user cannot always keep CPU or I/0 devices busy at all times. Multiprogramming offers a more efficient approach to increase system performance. In order to increase the resource utilisation, systems supporting multiprogramming approach allow more than one job (program) to utilize CPU time at any moment. More number of programs competing for system resources, better will be resource utilisation. The idea is implemented as follows. The main memory of a system contains more than one program (Figure 4).

Figure 4: Memory layout In multiprogramming environment

The operating system picks one of the programs and start executing. During execution process program 1 may need some 110 operation to complete. In a sequential execution environment (Figure 5a), the CPU would sit idle. In a multiprogramming system, (Figure 5b) operating system will simply switch over to the next program (Program2).

Figure 5 : Multiprogramming When that program needs to wait for some I/0 operation, it switches over to Program 3 and so on. If there is no other new program left in the main memory, the CPU will pass its control back to the previous programs. Multiprogramming has traditionally been employed to increase the resource utilisation of a computer system and to support multiple simultaneously interactive users (terminals). Compared to operating system which supports only sequential execution, multiprogramming system requires some form of CPU and memory management strategies which will be discussed in the next section.

OPERATING SYSTEM STRUCTURE


Since operating system is a very large and complex software, supports large number of functions. It should be developed as a collection of several smaller modules with carefully defined inputs, outputs and functions rather than a single piece of software. In this section, we will examine different operating system structure

Layered Structure Approach


The operating system architecture based on layered approach consists of number of layers (levels), each built on top of lower layers. The bottom layer is the hardware; the highest layer is the user interface. The first system constructed in this way was the THE system built by E.W. Dijkestra (1968) and his students. The THE system was a simple batch operating system which had 32k of 27 bit words. The system supported 6 layers in (Figure 6 ) 5 4 3 2 1 0 User Programs Buffering for I/O devices</td Device Drivers Memory Manager CPU scheduling Hardware

Figure 6 : The layered structure of THE operating system As shown in figure 6, layer 0 dealt with hardware; the higher layer layer 1 handled allocation of jobs to processor. The next layer implemented memory management. The memory management scheme was virtual memory (to be discussed in Unit 3). Level 3 contained the device driver for the operator's console. By placing it, as well as I/O buffering at level 4, above memory management, the device buffers could be placed in virtual memory. lie I/O buffering was also above the operator's console, so that I/O error conditions could be output to the operator's console. The main advantage of the layered approach is modularity which helps in debugging and verification of the system easily. The layers are designed in such a way that it uses operation and services only if a layer below it. A higher layer need not know how these operations are implemented, only what these operation do. Hence each layer hides implementation details from higher level layers. Any layer can be debugged without any concern about the rest of the layer. The major difficulty with the layered approach is definition of a new level i.e. how to differentiate one level from another. Since a layer c-an use services of a layer below it, it should be designed carefully. For example, the device driver for secondary memory must be at a lower level than the memory management routines since memory management requires the ability to use the backing store

KERNEL APPROACH
Kernel is that part of operating system which directly makes interface with hardware system. Its main functions are: To provide a mechanism for creation and deletion of processes To provide processor scheduling, memory management and I/O management To provide mechanism for synchronization of processes so that processes synchronize their actions. To provide mechanism for interprocess communication. The UNIX operating system is based on kernel approach (figure 8). It consists of two separ table parts: (i) Kernel (ii) System Programs As shown in the figure 7 , kernel is between system programs and hardware. The kernel supports the file system, processor scheduling, memory management and other operating system functions through system calls. UNIX operating system supports a large number of system calls for process management and other operating system functions. Through these system calls program utilises the services of operating system (kernel).

Figure 7 : UNIX operating system structure

VIRTUAL MACHINE
It is a concept which creates an illusion of a real machine. It is created by a virtual machine operating system that makes a single real machine appear to be several real machines. This type of situation is analogous to communication line of telephone company which enables separate and isolated conversations over the same wire's). The following figure illustrates this concept.

Figure 8 : Creation of several virtual machines by a single physical machine From the user's view point, virtual machine can be made to appear to very similar to existing real machine or they can be entirely different An important aspect of this technique is that each user can run operating system of his own choice. This fact is depicted by 0S1 (Operating System 1), OS2, OS3 etc. in figure 8. To understand this concept, let us try to understand the difference between conventional multiprogramming system (figure 9 ) and virtual machine multiprogramming (figure 10 ). In conventional multiprogramming processes are allocated a portion of the real machines resources. The same machine resources are distributed among several processes.

FIGURE 9 : CONVE NTIONAL PROGRAMMING

FIGURE 10 : VIRTUAL MACHINE PROGRAMMING In virtual multiprogramming system, a single real machine gives an illusion of several virtual machines, each having its own virtual processor, storage and 110 devices possibly with much larger capacities. This is possible through process scheduling and virtual memory organisation technique . Process scheduling can be used to share the CPU and make it appear that users have their own processor. Virtual memory organisation technique can create illusion of very large memory for program execution. The virtual machine has many uses and advantages Concurrent running of dissimilar operating systems by different users. Elimination of certain conversion problem. Software development - programs can be developed and debugged for machine configurations that is different from those of host (for example virtual operating system VM/370 can produce virtual 370 that are different from the real 370 such as larger main memory). Security and Privacy: The high degree of separation between independent virtual machines aids in ensuring privacy and security. The most widely used operating system in this category is VM/370. It manages IBM/370 computer and creates illusion that each of several users has a complete system 370 (including wide range of I/O devices) which can run different operating systems at once, each of them on its own virtual machine. It is also possible through software to share files existing on physical disk and several information through virtual communication software. The virtual machines are created by sharing the resources of physical computer. CPU scheduling can be used to share the CPU and make it appear that users have their own processor. Users are thus given their own virtual machine. They can then run on their virtual machines, any software desired. The virtual machine software is concerned with multiprogramming multiple virtual machines onto a physical machine but need not consider any other software support from user. The heart of the system, known as the virtual machine monitor, runs on the bare hardware and does the multiprogramming, providing not one, but several virtual machines to the next layer up, as shown in figure 8 . However, unlike all other operating systems, these virtual machines are not extended machines, with files and other nice features. Instead, they are exact copies of the bare hardware, including kernel/user mode, VO, interrupts, and everything else the real machine has. Because each virtual machine is identical to the true hardware, each one can run any operating system that will run directly on the hardware.

In fact, different virtual machines can, and usually do, run different operating systems. Some run one of the descendants of OS/360 for batch processing, while other ones run a simple, single-user, interactive system called CMS (Conversational Monitor System) for time-sharing users. When a CMS program executes a system call, the call is trapped to the operating system in its own virtual machine, not to VM/370, just as it would if it were running on a real machine instead of virtual one. CMS then issues the normal hardware I/O instructions for reading its virtual disk or whatever is needed to carry out the call. These I/O instructions are trapped by VM/370, which then performs them as part of its simulation of the real hardware. By making a complete separation of the functions of multiprogramming and providing an extended machine, each of the pieces can be mush simpler and more flexible. The virtual machine concept has several advantages. Notice that there is complete protection. Each machine is completely isolated from all other virtual machines, so there is no problem with protection. On the other hand, there is no sharing. To provide sharing, two approaches have been implemented. First, it is possible to share a minidisk. This scheme is modelled after a physical shared disk, but implemented by software. With this technique, files can be shared. Second, it is possible to define a network of virtual machines, each of which can send information over the virtual communications network. Again, the network is modelled after physical communication networks, but implemented in software. Such a virtual machine system is a perfect vehicle for operating systems research and development. Normally changing an operating system is a difficult process, since operating systems are large and complex programs, it is difficult to be sure that a change in one point does not cause obscure bugs in some other part. This situation can be particularly dangerous because of the power of the operating system. Since the operating system executes in monitor mode, a wrong change in a pointer could cause an error that would destroy the entire file system. Thus it is necessary to test all changes to the operating system carefully. But the operating system runs on and controls the entire machine. Therefore, the current system must be stopped and taken out of use, while changes are made and tested. This is commonly called system development time. Since it makes the system unavailable to users, system development time is often schedules late at night or on weekends. A virtual machine system can eliminate much of this problem. System programmers are given their own virtual machine and system development is done on the virtual machine, instead of on a physical machine. Normal system operation seldom need be disrupted for system development.

CLIENT-SERVER MODEL
VM/370 gains much in simplicity by moving a large part of the traditional operating system code (implementing the extended machine) into a higher layer itself. VM/370 is still a complex program because simulating a number of virtual 370s is not that simple (especially if you want to do it efficiently). A trend in modern operating systems is to take this idea of moving code up into higher layers even further, and remove as much as possible from the operating system, leaving a minimal kernel. The usual approach is to implement most of the operating system functions in user processes. To request a service, such as reading a block of a file, a user process (now known as the client process) sends the request to a server process, which then does the work and sends back the answer. In this model, shown in figure 11, all the kernel does is to handle the communication between clients and servers. By splitting the operating system up into parts, each of which only handles one facet of the system, such as file service, process service, terminal service or memory service. This way, each part becomes small and manageable. Furthermore, because all the servers run as usermode processes, and not in kernel mode, they do not have direct access to the hardware. As a consequence, if a bug in the file server is triggered, the file service may crash, but this will not usually bring the whole machine down

FIGURE 11 : CLIENT SERVER MODEL Another advantage of the client-server model is its adaptability to use in distributed systems (figure 12). If a client communicates with a server by sending it messages, the client need not know whether the message is handled locally in its own machine, or whether it was sent across a network to a server on a remote machine. As far as the client is concerned. the same thing happens in both cases: a request was sent and a reply came back.

Figure 12: The Client-Server Model In a distributed System The picture painted above of a kernel that handles only the transport of messages from clients to servers and back is not completely realistic. Some operating system functions (such as loading commands into the physical I/O device registers) are difficult, if not impossible, to do from userspace programs. There are two ways of dealing with this problem. One way is to have some critical server processes (e.g. I/O device drivers) actually run in kernel mode, with complete access to all the hardware, but still communicate with other processes using the normal message mechanism. The other way is to build a minimal amount of mechanism into the kernel, but leave the policy decisions up to server in user space. For example, the kernel might recognize that a message sent to a certain special address means to take the contents of that message and load it into the I/O device registers for some disk, to start a disk read. In this example, the kernel would not even inspect the bytes, in the message to see if they were valid or meaningful; it would just blindly copy them into the disk's device registers. (Obviously some scheme for limiting such messages to authorized processes only must be used.) The split between mechanism and policy is an important concept; it occurs again and again in operating systems in various contexts

FUTURE OPERATING SYSTEM TRENDS


A number of clear trends are emerging which will point the way the future operating system will be designed. These are :

Multiprocessing feature will be much more common because of development of VLSI chips and the decline in the cost of hardware. Microcode will support all the functions of operating system integrated into it, currently performed by software, therefore, the execution time will be faster. The trend is towards distributed control among the localized processors in place of centralized system. Therefore the use of distributed operating system will increase. The concurrency is becoming an important feature of programming language. Hardware and operating systems are being designed to execute concurrent programs more efficiently. Developments in software engineering will result in operating systems that will be easily maintainable, more reliable and simpler. Lots of developments are taking place in computer networking. The data transmission rate is also increasing. Therefore, the use of networking operating system will also increase. Virtually all operating systems (specially desktop based operating systems) will be supporting multimedia (video and graphics) applications in the near future.

SUMMARY
Operating system is an essential component of system software which consists of procedures for managing computer resources. Initially computers were operated from the front console. System software such as Assemblers, Loaders and Compilers greatly improved in software development but also required substantial setup time. To reduce the setup time an operator was hired and similar jobs were batched together. Batch systems allowed automatic job sequencing by a resident monitor and improved the overall utilisation of systems greatly. The computer no longer had to wait for human operations - but CPU utilisation was still low because of slow speed of I/O devices compared to the CPU. A new concept buffering was developed to improve system performance by overlapping the input, output and computation of a single job. Spooling was another new concept in improving the CPU utilisation by overlapping input of one job with the computation and output of other jobs. Operating systems are now almost always written in a higher level languages (C, PASCAL etc.). UNIX was the first operating system developed in C language. This feature improves their implementation, maintenance and portability. Operating system provides a number of services. At the lowest level, there are system calls which allow a running program to make a request from operating system directly. At a higher level, there is a command interpreter which supports a mechanism for a user to issue a request without writing a program.

INTRODUCTION TO MEMORY MANAGEMENT


The utilisation and performance of CPU can be greatly improved by sharing it among several processes. This is achieved by keeping all the processes in primary memory. Therefore, the sharing of memory is very essential to improving the performance of CPU. The organisation and management of main memory has been one of the most important factors influencing operating system design. Memory management is primarily concerned with allocation of main memory of united capacity to requesting processes. No process can ever run before a certain amount of memory is allocated to it. The overall resource utilisation and other performance criteria of a computer system are largely affected by performance of memory management module. Two important features of memory management function are protection and sharing. In order to protect one process from another, their address spaces must be spared by memory management scheme. An active process should never attempt to access erroneously or maliciously and destroy the contents of each other's address space. Apart from it, memory management scheme must support sharing of common data or data structure such as symbol table in compilers or assemblers.

SINGLE PROCESS MONITOR


This is the simplest memory management approach. The memory is divided into two sections, contiguous p" for operating system program (also called monitor) and second section is for user program.

Figure 13 : Memory Layout for a Single Process Monitor In this type of approach, operating system only keeps the track of the first and the last location available for allocation of user programs. In order to provide a contiguous area of free storage for user program, operating system is loaded at one extreme end, either at the bottom or at the top. This important factor affecting this decision is the location of interrupt vector. Since the interrupt vector is often in low memory, operating system program (monitor) is kept in low memory.{? A new program (user process) is loaded only when the operating system passes a control to it. After receiving a control, it starts running until its completion or termination due to I/O or some error. When this program is completed or terminated, the operating system may load another program for execution. This type of memory management scheme was commonly used in single process operating system such as CP/M. Two important issues such as protection and sharing of code must be addressed while designing any memory management scheme. Sharing of code and data in a single process environment does not make much sense because only one process (program) resides in memory at a time. Protection is also hardly supported by a single process monitor because only one process is memory- resident at a time. However, protection of operating system program from user code is must otherwise it may crash. The protection is supported through hardware mechanism such as a dedicated register. This is done as follows: Operating system codes usually resides in low memory area. A register also called fence register is set to the highest address occupied by operating system code. A memory address generated by user program to access certain memory location is first compared with fence register's content. If the address generated is below the fence, it will be trapped and denied permission. Since modification of the fence register is considered as a privileged operation therefore, only operating system is allowed to make any changes to it. It is a simple memory management approach. But, due to lack of support of multiprogramming, results in lower utilisation of CPU and memory capacity. Since only on program is residing at a time in memory, it may not occupy whole memory, therefore memory is under-utilised. CPU will be sitting idle during the period when a running program requires some I/O.

MULTIPROGRAMMING WITH FIXED PARTITION


In a multiprogramming environment, several programs reside in primary memory at a time and the CPU passes its control rapidly between these programs. One way to support multiprogramming is to divide the main memory into several partitions each of which is allocated to a single process. Depending upon how and when partitions are created, there may be two types of memory partitioning: (1) Static and (2) Dynamic. Static partitioning implies that the division of memory into number of partitions and its size is made in the beginning (during the system generation process) and remain fixed thereafter. In dynamic partitioning, the size and the number of partitions are decided during the run time by the operating system. In this section we will take up static partitioning and multiprogramming with dynamic (variable) partitioning will be discussed in the next section. In this section, we present several memory management schemes based on contiguous allocation. The basic approach here is to divide memory into several fixed size partitions where each partition will accommodate only one program for execution. The number of programs (i.e. degree of multiprogramming) residing in memory will be bound by the number of partition When a program terminates, that partition is free for another program waiting in a queue. An example of partition memory is shown in figure 14

Figure 14 : Fixed Size Partition As shown, the memory is partitioned into 6 regions. The first region Rower area) is reserved for operating system. The remaining five regions are for user programs. Three partitions occupied by programs P1, P2 and P3. Only the first and last one are free and available for allocation. Once partitions are defined, operating system keeps track of status (whether allocated or free) of memory partitions. Ibis is done through a data structure called partition description table (figure 15 ).

Figure 15 : Partition Description Table The two most Common strategies to allocate free partitions to ready processes are: (i) first-fit and (ii) best-fit. The approach followed in the first fit is to allocate the first free partition large enough to accommodate the process. The best fit approach on the other hand allocates the smallest free partition that meets the requirement of the process. Both strategies require to scan the partition description table to find out free partitions. However, the first-fit terminates after finding the first such partition whereas the best-fit continues searching for the near exact size. As a result, the first-fit executes faster whereas the best- fit achieves higher utilisation of memory by searching the smallest free partition. Therefore, a trade-off between execution speed of first-fit and memory utilisation of best-fit must be made. To explain these two strategies, let us take one example. A new process P4 with size 80K is ready to be allocated into memory whose partition layout is given in figure 2. Using first-fit strategy, P4 will get the first partition leaving 120K of unused memory. Best fit will continue searching for the best possible partition and allocate the last partition to the process leaving just 20K bytes of unused memory. Wherever a new process is ready to be loaded into memory and if no partition is free, swapping of processes between main memory and secondary storage is done. Swapping helps in CPU utilisation by replacing suspend able processes but residing into main memory with ready to execute processes from secondary storages. When the scheduler admits a new process (of high priority) for which no partition is free, a memory manager is invoked to make a partition free to accommodate the process. The memory manager performs this task by swapping out low priority processes suspended for a comparatively long time in order to load and execute the higher priority process. When the higher priority process is terminated, the lower priority process can be swapped back and continued. Swapping requires secondary storage device such as fast disk to store the suspended processes from main memory. One problem with swapping process is that it takes lengthy time to access process from secondary storage device. For example, to get an idea of total swap time, assume that user program is 100K words and secondary storage device is a fixed head disk with an average latency of 8m sec and a transfer rate of 250,000 words/second then a transfer of 100K words to or from memory takes:

8msec + (100K words/250,000 words/sec) = 8msec + 100000 words/250,000 words/sec = 8msec + 215 sec = 8msec + 2000/5 msec = 8msec + 400 msec = 408 msec (approximately) Since we must both swap in and swap out the total swap time is about 408+408 = 816 msec. The overhead must be considered when deciding whether to swap a process in order to make room for another process. One important issue concerning swapping is whether the process removed temporarily from any partition should be brought back to the same partition or any partition of adequate size. This is dependent upon partitioning policy. The binding of process to a specific partition (static partitioning) eliminates overhead of run time allocation of partition at the expense of lowest utilisation of primary memory. On the other hand, systems where processes are not permanently bound to a specific partition (dynamic partition) are much more flexible and utilizes memory more efficiently. The only drawback with dynamic partitioning approach is run time overhead of partition allocation whenever a new process is swapped in. As said earlier, the loading of a process into a same partition where from it was swapped out into a different partition is dependent upon relocation policy. The term relocation usually refers to the ability to load and execute a given program into an arbitrary memory partition as opposed to fixed set of memory locations specified at program translation time. Depending upon when and how the addresses translation from the virtual address to actual address (also called physical address) of primary memory, takes place, process or program relocation may be regarded as static relocation and dynamic relocation. There is a difference between virtual and physical address. Virtual address refers to information within a program's address space, while physical address specifies the actual physical memory locations where program and data are stored in memory during execution time. If the relocation is performed before or during the loading of a program into memory by a relocating linker or a relocating loader, the relocation approach is called static relocation. The static relocation is practically restricted to support only static binding of processes to partition. Dynamic relocation refers to run-time mapping of virtual address into physical address with the support of some hardware mechanism such as base registers and limit registers. Relocation of memory references at run-time is illustrated in the following figure:

Figure 16 : Dynamic Relocation When a process is scheduled, the base register is loaded with the starting address. Every memory address generated automatically has the base register contents added to it before being sent to main memory.

Thus, if the base register is 100000 (IOOK), a MOVE R1, 200 which is supposed to load the contents of virtual address 200 (relative to program beginning) into register, effectively turned into a MOVE R1, 100000 + 200, without the instruction itself being modified. The hardware protects the base register to prevent user programs from modifying. An additional advantage of using a base register for relocation is that a program can be moved anywhere in memory after it has started execution. Protection and Sharing: Multiprogramming introduces one essential problem of protection. Not only that the operating system must be protected from user programs/processes but each user process should also be protected from maliciously accessing the areas of other processes. In system that uses base register for relocation, a common approach is to use limit (bound) register for protection. The primary function of a limit register is to detect attempts to access memory location beyond the boundary assigned by the operating system. When a process is scheduled, the limit register is loaded with the highest virtual address in a program. As illustrated in figure 16 , each memory access of a running program is first compared with the contents of the limit register. If it exceeds the limit register, no permission is given to the user process. In this way, any attempt to access a memory location beyond the boundary is trapped

Figure 16 : Protection through Limit and Base Register In addition to protection, a good memory management mechanism must also provide for controlled sharing of data and code between cooperating processes. One traditional approach to sharing is to place data and code in a dedicated common partition. However, any attempt by a participating process to access memory outside of its own participation is normally regarded as a protection violation. In systems with protection keys, this obstacle may be circumvented by changing the keys of all shared blocks upon every process switch in order to grant access rights to currently running process. Fixed partitioning imposes several restrictions: No single program/process may exceed the size of the largest partition in a given system. It does not support a system having dynamically data structure such as stack, queue, heap etc. It limits the degree of multiprogramming which in turn may reduce the effectiveness of short-term scheduling.

Potrebbero piacerti anche