Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Model Set - I
1. (a) Why does pipelining improve performance?(Year-2008)
Solution:
RISC characteristics
(i) Relatively few instructions.
(ii) Relatively few addressing modes.
(iii) Memory access limited to load and store instructions.
(iv)All operations done within the registers of the CPU.
(v) Fixed length, easily decoded instruction format.
(vi)Single cycle instruction execution.
(vii)Hardware rather than micro programmed control.
CISC. (Complexes Instruction set Computer)
A computer with large number of instructions is classified as a complex instruction set computer.
Characteristics of CISC.
(i)A large number of instructions typically from 100 to 250 instructions.
(ii)Some instructions that perform specialized tasks and are used infrequently.
(iii)A large variety of addressing mode typically from 5 to 20 different modes.
(iv)Variable length instruction format.
(v)Instructions that manipulate operands in memory.
We are assuming L2 time is the same for an L2 hit or L2 miss. We are also assuming that the access
doesn't begin to go to memory until the L2 miss has occurred.
(d) Write at least four differences between a multiprocessor and multicomputer system. (Year - 2008)
Solution:
Multiprocessor: -
1. Multiprocessor is more than one CPU or one CPU with more than one core in one.
Multiprocessing is the use of two or more central processing units (CPUs) within a single
computer system. The term also refers to the ability of a system to support more than one processor
and/or the ability to allocate tasks between them.
2. A multiprocessor system is simply a computer that has more than one CPU on its motherboard. If the
operating system is built to take advantage of this, it can run different processes (or different threads
belonging to the same process) on different CPUs.
3. There are many variations on this basic theme, and the definition of multiprocessing can vary with
context, mostly as a function of how CPUs are defined (multiple cores on one die, multiple chips in one
package, multiple packages in one system unit, etc.).
4. Multiprocessing sometimes refers to the execution of multiple concurrent software processes in a system
as opposed to a single process at any one instant. However, the terms multitasking or multiprogramming
are more appropriate to describe this concept, which is implemented mostly in software, whereas
multiprocessing is more appropriate to describe the use of multiple hardware CPUs.
Multicomputer: -
1. Computer multicomputer is more than one computer or a network of computers. A computer made up of
several computers. Something similar to parallel computing.
2. A multicomputer may be considered to be either a loosely coupled NUMA computer or a tightly coupled
cluster. Multicomputers are commonly used when strong computer power is required in an environment
with restricted physical space or electrical power.
3. Distributed computing deals with hardware and software systems containing more than one processing
element or storage element, concurrent processes, or multiple programs, running under a loosely or tightly
controlled regime.
4. In distributed computing a program is split up into parts that run simultaneously on multiple computers
communicating over a network. Distributed computing is a form of parallel computing, but parallel
computing is most commonly used to describe program parts running simultaneously on multiple
processors in the same computer.
(e) Discuss anti-dependence / Name-dependence Vs True dependence. (Year - 2006)
Solution:
Anti-dependency occurs when an instruction requires a value that is later updated. In the following
example, instruction 3 anti-depends on instruction 2 - the ordering of these instructions cannot be
Solution:
1 2 3 4 5
Load IF RF/ID EX MEM WB
R-type uses register file’s write port during the 4th stage.
COA -II Model Q uest ion Paper - I 3
Page
No -
1 2 3 4
R-type IF RF/ID EX WB
Example:
Consider a load followed immediately by a store processor only has a single write port.
Solution:
There are, therefore, two main components to the locality of reference
temporal there is a tendency for a program to reference in the near future, memory items that
it has referenced in the recent past. For example, loops, temporary variables, arrays, stacks, …
spatial there is a tendency for a program to make references to a portion of memory in the
neighbourhood of the last memory reference
(i) Write down Methods for improving performance Cache Performance. (Year - 2008)
Solutions:
Methods for improving performance
increase cache size
increase block size
increase associatively
add a 2nd level cache
(j) Identify the kind of hazard occurs while executing the following instruction in the
pipeline.Draw the path to avoid the hazard.
Here a load instruction followed immediately by a store processor only has a single port.
Solutions:
1. Delay instruction until functional unit is ready.
Hardware inserts a pipeline stall or a bubble that delays execution of all instructions that
follow (previous instruction continue).
Increase CPI from the ideal value of 1
COA -II Model Q uest ion Paper - I 4
Page
No -
2. Build more sophisticated functional units so that all combinations of instructions can be
accommodated.
Example: Allow two simultaneous writes to the register file.
Write Back Stall Solution:
Delay R-Type register write by one cycle.
1 2 3 4 5
R-type IF RF/ID EX MEM WB
Advantage:
1. Different sets of addresses are assigned to different memory modules. For example, in a two-module
memory system, the even addresses may be in one module and the odd addresses in the other.
2. A modular memory is useful in systems with pipeline and vector processing. A vector processor that uses
an n-way interleaved memory can fetch n operands from n different modules.
3. By which the effective memory cycle time can be reduced by a factor close to the number of modules.
(b) Identify the data hazards while executing the following instruction in DLX pipeline. Draw the
forwarding path to avoid the hazard. (Year-2008)
ADD R1, R2, R3
SUB R4, R1 , R5
AND R6,R1 ,R7
OR R8,R1,R9
XOR R10,R1 ,R11
Solution:
Data hazards occur when pipeline changes the order of read/write access to operands sot that the order
differs from the order seen by sequentially executing instructions caused by several types of
dependencies.
Data Hazard Solution:
Solution:
Flynn's taxonomy
Single Instruction Multiple Instruction
Single Data SISD MISD
Multiple Data SIMD MIMD
Classifications
The four classifications defined by Flynn are based upon the number of concurrent instruction (or control)
and data streams available in the architecture:
(i) Single Instruction, Single Data stream (SISD)
A sequential computer which exploits no parallelism in either the instruction or data streams.
Examples of SISD architecture are the traditional uniprocessor machines like a PC or old mainframes.
The aim of this example is to improve the throughput performance of a server system with
respect to a particular benchmark, e.g., SPEC Web. The parallelism takes the form of
multiple processors
multiple disc drives
The general idea is to spread the overall workload amongst the available processors and disc
drives
Scalability is viewed as a valuable asset for server applications
Ideally, the overall improvement in performance over a single processor would be a factor of N,
where N is the number of processors.
Processor level Parallelism
Advantage can be taken of the fact that not all instructions in a program rely on the result of their
predecessors
Thus sequences of instructions can be executed with varying degrees of overlap, which is a form
of parallelism
This is the basis of instruction pipelining which we study later in the module
Examples:
set associative caches use multiple banks of memory that may be searched in parallel to
find a desired item
modern ALUs use carry-lookahead, which uses parallelism to speed up the process of
computing sums from linear to logarithmic in the number of bits per operand
Solution:
The performance gain that can be made by improving some portion of the operation of a
computer can be calculated using Amdahl’s Law
Amdahl’s Law states that the improvement gained from using some faster mode of
execution is limited by the fraction of the time the faster mode can be used
Little point in improving rare tasks
Amdahl’s Law defines the overall speedup that can be gained for a task by using the new feature
designed to speed up the execution of the task
execution time for the task without the new feature
Overall speedup =
execution time for the task with the new feature
Amdahl’s Law
Fraction enhanced
ExTime new = ExTime old × (1 − Fraction enhanced ) +
Speedup enhanced
ExTime old 1
Speedup overall = =
ExTime new (1 − Fraction Fraction enhanced
enhanced ) +
Speedup enhanced
Best you could ever hope to do:
1
Speedup maximum =
(1 - Fraction enhanced )
F = 0.4
S = 10
1 1
1
overall speedup = F = 0.4 = = 1.56
( F − 1) + (1 − 0.4) + 0.64
S 10
6. (a) Explicit parallelism vs. implicit parallelism. (Year - 2006)
Solution:
Explicit parallelism:
1. Explicit parallel programming is the absolute programmer control over the parallel execution
2. In some instances, explicit parallelism may be avoided with the use of an optimizing compiler that
automatically extracts the parallelism inherent to computations (see implicit parallelism).
3. In computer programming, explicit parallelism is the representation of concurrent computations by
means of primitives in the form of special-purpose directives or function calls.
4. Most parallel primitives are related to process synchronization, communication or task
partitioning. As they seldom contribute to actually carry out the intended computation of the
program, their computational cost is often considered as parallelization overhead.
Advantage
A skilled parallel programmer takes advantage of explicit parallelism to produce very efficient
code.
COA -II Model Q uest ion Paper - I Page 10
No -
Disadvantage
However, programming with explicit parallelism is often difficult, especially for non computing
specialists, because of the extra work involved in planning the task division and synchronization of
concurrent processes.
Programming with explicit parallelism:
Implicit parallelism
1. In computer science, implicit parallelism is a characteristic of a programming language that allows a
compiler or interpreter to automatically exploit the parallelism inherent to the computations expressed by
some of the language's constructs.
2. A pure implicitly parallel language does not need special directives, operators or functions to enable
parallel execution.
Programming languages with implicit parallelism include Lab VIEW, MATLAB M-code.
Example:
If a particular problem involves performing the same operation on a group of numbers (such as taking the sine
or logarithm of each in turn), a language that provides implicit parallelism might allow the programmer to
write the instruction thus:
numbers = [0 1 2 3 4 5 6 7];
result = sin(numbers);
The compiler or interpreter can calculate the sine of each element independently, spreading the effort across
multiple processors if available.
Advantages
Implicit parallelism generally facilitates the design of parallel programs and therefore results in a substantial
improvement of programmer productivity.
Disadvantages
1. Languages with implicit parallelism reduce the control that the programmer has over the parallel
execution of the program,
2. Experiments with implicit parallelism showed that implicit parallelism made debugging difficult
(b) Write a short note on Instruction-level parallelism.
Solutions:
1. Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can
be performed simultaneously. Consider the following program:
1. e = a + b
2. f = c + d
3. g = e * f
Operation 3 depends on the results of operations 1 and 2, so it cannot be calculated until both of them are
completed. However, operations 1 and 2 do not depend on any other operation, so they can be calculated
Instruction pipelining where the execution of multiple instructions can be partially overlapped.
Superscalar execution in which multiple execution units are used to execute multiple instructions in parallel.
Out-of-order execution where instructions execute in any order that does not violate data dependencies.
Register renaming which refers to a technique used to avoid unnecessary serialization of program
operations imposed by the reuse of registers by those operations, used to enable out-of-order execution.
7. (a) How do tightly coupled system differs from loosely coupled ones ? (Year-2008)
Solution:
Shared Memory Systems
Tightly Coupled Systems
Uniform and Non-Uniform Memory Access
Tightly Coupled Systems
Multiple CPUs share memory.
Each CPU has full access to all shared memory through a common bus.
Communication between nodes occurs via shared memory.
Performance is limited by the bandwidth of the memory bus.
Performance:
Performance is potentially limited in a tightly coupled system by a number of factors. These include
various system components such as the memory bandwidth, CPU to CPU communication bandwidth,
the memory available on the system, the I/O bandwidth, and the bandwidth of the common bus.
Uniform and Non-Uniform Memory Access
Shared memory systems can be loosely coupled with memory. Uniform memory access from the CPU
on the left, and non-uniform memory access (NUMA) between the left and right disks.
Advantages
COA -II Model Q uest ion Paper - I 12
Page
No -
Memory access is cheaper than inter-node communication. This means that internal
synchronization is faster than using the distributed lock manager.
Shared memory systems are easier to administer than a cluster.
Shared Disk Systems
Shared disk systems are typically loosely coupled.
Loosely Coupled Systems
4) Amdahl’s Law
Fraction enhanced
ExTime new = ExTime old × (1 − Fraction enhanced ) +
Speedup enhanced
ExTime old 1
Speedup overall = =
ExTime new (1 − Fraction Fraction enhanced
enhanced ) +
Speedup enhanced
Best you could ever hope to do:
1
Speedup maximum =
(1 - Fraction enhanced )
For Example:
• Fraction = 0.9, Speedup = 10
1 1
Speedup overall = = = 5. 3
Model Q uest ion 0.9 14
Page
COA -II
(1 − 0.9) + Paper0.19
- I
No - 10
5) Processor performance equation
Note: A hypercube is an arrangement of processors such that each processor is connected to log2n other
processors, where n is the number of processors in the hypercube. Log2n is said to be the "dimension" of
the hypercube. For example, in the 8-processor hypercube shown in Figure, dimension = 3; each
processor is connected to three other processors.
A massively parallel system may have as many as several thousand nodes. Each node may have its own
Oracle instance, with all the standard facilities of an instance
An MPP has access to a huge amount of real memory for all database operations (such as sorts or the
buffer cache), since each node has its own associated memory. To avoid disk I/O, this advantage will be
significant in long running queries and sorts. This is not possible for 32 bit machines which have a 2 GB
addressing limit; the total amount of memory on an MPP system may well be over 2 GB.
As with loosely coupled systems, cache consistency on MPPs must still be maintained across all nodes
in the system. Thus, the overhead for cache management is still present.
Advantages
(b) Identify the data hazards while executing the following instruction in DLX pipeline.
Draw the forwarding path to avoid the hazard.
LW R1,O(R2)
SUB R4,R1,R6
AND R6,R1,R7
OR R8,R1,R9
Solution:
Hardware stall,a pipeline interlock checks and stops the instruction issue