Sei sulla pagina 1di 5

and code generation.

Earlier, the approach taken to design a compiler used to be


affected by the resources available, the programmer’s experience in designing it and
the difficulty of processing. For simple languages, the compiler was a single piece of
program in the past. However, as the source languages became larger and more
Notes complex, and high quality output became crucial, the design got split into a number
of independent phases. In today’s online software development scenario, the smallest
of the compilers has 2-3 phases. These phases are always regarded as front-end,
middle-end and back-end. The front end is where the syntactic parsing as well as the
translation of the source code takes place. The middle end is used to perform
optimizations. The back end phase takes the analysis, optimizations and
transformation process a bit further. Then the code is generated for a particular
processor and operating system.

Programming Languages
A computer language is the means by which instructions and data are transmitted to
computers. In another words, computer languages are the interface between a
computer and a human being. Technically, a programming language is a machine-
readable artificial language designed to express computations that can be performed
by a machine, particularly a computer. Programming languages can be used to create
programs that specify the behavior of a machine, to express algorithms precisely, or as
a mode of human communication.
Many programming languages have some form of written specification of their syntax
and semantics, since computers require precisely defined instructions.

Classification of Computer Languages


Broadly speaking, the computer languages can be classified into the following-
 Low Level Language or Machine Language or Binary Language (LLL): The
language that a microprocessor can read and interpret is called low level
language or machine language. It is also called binary language because low-
level language code consists of binary bits i.e. 0 & 1. These ‘0’ and ‘1’
represents the state of the pulse in the circuit. If there is pulse (or current) in
the circuit then it is represented as “1” and the absence of pulse is indicated
by “0”. Machine code or machine language is a system of instructions and
data executed directly by a computer's central processing unit. Machine code
may be regarded as a primitive (and cumbersome) programming language or
as the lowest-level representation of a compiled and/or assembled computer
program. Programs in interpreted language are not represented by machine
code however, although their interpreter (which may be seen as a processor
executing the higher level program) often is. Machine code is sometimes
called native code when referring to platform-dependent parts of language
features or libraries. Machine code should not be confused with so called
"bytecode", which is executed by an interpreter. Every processor or processor
family has its own machine code instruction set. Instructions are patterns of
bits that by physical design correspond to different commands to the
machine. The instruction set is thus specific to a class of processors using
(much) the same architecture. Successor or derivative processor designs often
include all the instructions of a predecessor and may add additional
instructions. Occasionally a successor design will discontinue or alter the
meaning of some instruction code (typically because it is needed for new
purposes), affecting code compatibility to some extent; even nearly
completely compatible processors may show slightly different behavior for
some instructions but this is seldom a problem. Systems may also differ in
other details, such as memory arrangement, operating systems, or peripheral
devices; because a program normally relies on such factors, different systems
will typically not run the same machine code, even when the same type of
processor is used. A machine code instruction set may have all instructions of
the same length, or it may have variable-length instructions. How the patterns
are organized varies strongly with the particular architecture and often also Notes
with the type of instruction. Most instructions have one or more opcode fields
which specifies the basic instruction type (such as arithmetic, logical, jump,
etc) and the actual operation (such as add or compare) and other fields that
may give the type of the operand(s), the addressing mode(s), the addressing
offset(s) or index, or the actual value itself (such constant operands contained
in an instruction are called immediates). A computer program is a sequence
of instructions that are executed by a CPU. While simple processors execute
instructions one after the other, superscalar processors are capable of
executing several instructions at once.
 Assembly Language: It is a programming language that uses symbolic
codes called MNEMONICS for developing the program. But, before
execution, the program written in assembly language is required to
be translated into machine language. Assembler does the task of
translating assembly code into machine code. The oldest non-machine
language, allowing for a more human readable method of writing
programs than writing in binary bit patterns (or even hexadecimal
patterns). Assembly language is not a single language, but rather a
group of languages. Each processor family (and sometimes
individual processors within a processor family) has its own
assembly language. In contrast to high-level languages, data
structures and program structures in assembly language are created
by directly implementing them on the underlying hardware.
Assembly languages also include directives to the assembler,
directives to the linker, directives for organizing data space, and
macros. Macros can be used to combine several assembly language
instructions into a high level language-like construct (as well as other
purposes). There are cases where a symbolic instruction is translated
into more than one machine instruction. But in general, symbolic
assembly language instructions correspond to individual executable
machine instructions. Assembly language is much harder to program
than high-level languages. The programmer must pay attention to far
more detail and must have an intimate knowledge of the processor in
use. But high quality hand crafted assembly language programs can
run much faster and use much less memory and other resources than
a similar program written in a high level language. Speed increases of
two to 20 times faster are fairly common, and increases of hundreds
of times faster are occasionally possible. Assembly language
programming also gives direct access to key machine features
essential for implementing certain kinds of low-level routines, such as
an operating system kernel or microkernel, device drivers, and
machine control. Assembly language is very flexible and powerful;
anything that the hardware of the computer is capable of doing can
be done in assembly.
 High Level Language (HLL): In computing, a high-level programming
language is a programming language with strong abstraction from
the details of the computer. In comparison to low-level programming
languages, it may use natural language elements, be easier to use, or
more portable across platforms. Such languages hide the details of
CPU operations such as memory access models and management of
scope. It is a problem-oriented language that is portable across
several platforms. It is easily understandable by human beings
because high level language code consists of English language like
statements e.g. COBOL, PASCAL, BASIC, FORTRAN, C, C++, Java
Notes etc. This greater abstraction and hiding of details is generally
intended to make the language user-friendly, as it includes concepts
from the problem domain instead of those of the machine used. A
high level language isolates the execution semantics of a computer
architecture from the specification of the program, making the
process of developing a program simpler and more understandable
with respect to a low-level language. The term "high-level language"
does not imply that the language is superior to low-level
programming languages – in fact, in terms of the depth of knowledge
of how computers work required to productively program in a given
language, the inverse may be true. Rather, "high-level language"
refers to the higher level of abstraction from machine language.
Rather than dealing with registers, memory addresses and calling
stacks, high-level languages deal with usability, threads, locks,
objects, variables, arrays and complex arithmetic or boolean
expressions. In addition, they have no opcodes that can directly
compile the language into machine code, unlike low-level assembly
language. Other features such as string handling routines, object-
oriented language features and file input/output may also be present.
High-level languages are abstract. Typically a single high-level
instruction is translated into several (sometimes dozens or in rare
cases even hundreds) executable machine language instructions.
Some early high-level languages had a close correspondence between
high-level instructions and machine language instructions. For
example, most of the early COBOL instructions translated into a very
obvious and small set of machine instructions. The trend over time
has been for high-level languages to increase in abstraction. Modern
object oriented programming languages are highly abstract. High-
level programming languages are much easier for less skilled
programmers to work in and for semi-technical managers to
supervise. And high-level languages allow faster development times
than work in assembly language, even with highly skilled
programmers. Development time increases of 10 to 100 times faster
are fairly common. Programs written in high level languages
(especially object oriented programming languages) are much easier
and less expensive to maintain than similar programs written in
assembly language.

Generations of Programming Languages


There are five generations of programming languages:
 1GL or First Generation Languages: A first-generation programming
language refers to the machine language. The machine language code run fast
and efficiently, as it is directly executed by the CPU. These languages require
no translators or assemblers for their translation.
 2GL or Second Generation Languages: It refers to Assembly language, which
is platform specific (depends on the microprocessor). Assembly language
code can be easily read and written by human beings. However, it needs to be
translated from assembly language to machine language using assembler,
before its final execution.
 3GL or Third Generation Languages: It refers to a high level structured
programming language. The term structured, refers to procedural languages,
which means such languages requires the specification of processing steps to
accomplish a desired task e.g. C, COBOL, PASCAL, BASIC etc.
 4GL or Fourth Generation Languages: It refers to a non-procedural language. Notes
Non-programming professionals use these languages because prior
knowledge of language is not required. It includes certain commands that can
directly used to interact with the computer. E.g. SQL, NATURAL, Informix-
4GL etc.
 5GL or Fifth Generation Languages: It is a programming language that uses
constraints (conditions) rather than algorithm. These languages are designed
to make the computer solve the problem for you. These languages are mainly
used in artificial intelligence research e.g. PROLOG, MERCURY etc.

Features of Programming Language Design


 Ease of use
 Simplicity
 Code Reusability
 Automatic Memory Management
 The less typing the better : Syntactic Sugar
 Community acceptance (extensions & library)
 Speed
 Defining Tokens & Syntax
 Visualize the dynamic process of program runtime by examining the static
program code
 Code optimization

Programming Language Processors


There are following language processors:
 Editor: A program that allows text to be entered and changed. These are used
for developing programs using high level language e.g. Notepad, EditPlus
etc.
 Translator: A program that accepts text expressed in one language and
generates semantically equivalent text expressed in another language.
In other words, a translator is a special program that is used to convert one
format into another format. Example- query interpreter, text formatter, infix
® postfix converter etc.
 Compiler: It is a translator program, which reads the whole code written in
High Level language at once, and then converts it into machine language. It
plays no role in execution. It requires large memory space because it is copied
to main memory whenever compilation is required. Compilation process
requires less time. Compiled program runs much faster. Thus, it is better than
interpreter if we talk in terms of execution. After making changes to a
program, compilation is to be done again. Examples of compiler-based
languages are C, C++ etc. In the past, compilers were divided into many
passes to save space. When each pass is finished, the compiler can free the
space needed during that pass. Many modern compilers share a common 'two
stage' design. The first stage, the 'compiler front end' translates the source
language into an intermediate representation. The second stage, the 'compiler
back end' works with the internal representation to produce code in the
output language. The modern compilers are often portable and allow
multiple dialects of a language to be compiled.
Notes
 Interpreter: It is also a translator program, which reads the code written in
high-level language line-by-line converts it into machine language and also
executes it. It requires comparatively less memory space for its storage.
Interpretation method is much more time consuming. Interpreted program
run at a slower rate. It gives fast response to changes in a source program.
This is especially important when prototyping and testing code when an edit-
interpret-debug cycle can often be much shorter than an edit-compile-run-
debug cycle. Thus, it is better than compiler if we talk in terms of debugging.
Examples of interpreter-based languages are BASIC, LISP, and SQL etc. It
may be possible to execute the same source code either directly by an
interpreter or by compiling it and then executing the machine code produced.
However, some languages may require both compiler as well as interpreter to
produce the final output of the program. Example of such language is Java.
Interpreting code is slower than running the compiled code because the
interpreter must analyze each statement in the program each time it is
executed and then perform the desired action whereas the compiled code just
performs the action. This run-time analysis is known as "interpretive
overhead". Access to variables is also slower in an interpreter because the
mapping of identifiers to storage locations must be done repeatedly at run-
time rather than at compile time.
 Assemblers: Assembler is also a translator program used to convert the code
written in assembly language to machine language. An assembler is computer
program for translating assembly language into object code. A cross
assembler produces code for one processor, but runs on another. As well as
translating assembly instruction mnemonics into opcodes, assemblers provide
the ability to use symbolic names for memory locations and macro facilities
for performing textual substitution. Assemblers are far simpler to write than
compilers for high-level languages, and have been available since the 1950s.
Modern assemblers, especially for RISC based architectures, such as MIPS,
Sun SPARC and HP PA-RISC; optimize instruction scheduling to exploit the
CPU pipeline efficiently. Most assemblers are 'macro assemblers', which allow
complex macro constructs. High-level assemblers provide high-level-
language abstractions such as advanced control structures, high-level
procedure/function declarations and invocations, and high-level abstract
data types including structures/records, unions, classes, and sets.
 Linker: A program, which combines two or more object modules into a single
object module or into an executable file. In IBM mainframe environments
such as OS/360, this program is known as a linkage editor. On Unix variants
the term loader is often used as a synonym for linker. Link editors are
commonly known as linkers. The compiler automatically invokes the linker as
the last step in compiling a program. The linker inserts code (or maps in
shared libraries) to resolve program library references, and/or combines
object modules into an executable image suitable for loading into memory.
On Unix-like systems, the linker is typically invoked with the ld command.
Dynamic linking is accomplished by placing the name of a sharable library in
the executable image. Actual linking with the library routines does not occur
until the image is run, when both the executable and the library are placed in
memory. An advantage of dynamic linking is that multiple programs can
share a single copy of the library. Static linking is the result of the linker

Potrebbero piacerti anche