Sei sulla pagina 1di 17

Computer Architecture & Organization - II

Model Set - I
1. (a) Why does pipelining improve performance?(Year-2008)

Solution:

1. Pipeline is an implementation technique in which multiple instructions are overlapped in


execution. Today’s processors are fast because of pipelining.
2. A pipeline is like an assembly line: each step completes one piece of the whole job. Assemble
line does not reduce the time it takes to complete an individual job; it increases the amount of job
being built simultaneously and the rate.
3. Pipe stage (pipe segment) small piece of the pipeline instruction.
4. Therefore, pipelining improves instruction throughput rather than individual instruction
execution time. The throughput of the instruction pipeline is determined by how often an
instruction exists in pipeline.
5. The goal of designers is to balance the length of each stage; otherwise there will be idle time
during a stage. If the stage are perfectly balanced then the time between instructions on the
pipelined machine is = Time between instructions (no pipelined)/Number of pipe stages.

(b) Differentiate between RISC and CISC @ machines. (Year – 2008)


Solution:

RISC (Reduced Instruction Set Computer).


A computer with fewer instructs with simple constructs, so they can be executed much faster within
the CPU without having to use memory as often. This type of computer is classified as a reduced
instruction set computer or RISC.

RISC characteristics
(i) Relatively few instructions.
(ii) Relatively few addressing modes.
(iii) Memory access limited to load and store instructions.
(iv)All operations done within the registers of the CPU.
(v) Fixed length, easily decoded instruction format.
(vi)Single cycle instruction execution.
(vii)Hardware rather than micro programmed control.
CISC. (Complexes Instruction set Computer)
A computer with large number of instructions is classified as a complex instruction set computer.
Characteristics of CISC.
(i)A large number of instructions typically from 100 to 250 instructions.
(ii)Some instructions that perform specialized tasks and are used infrequently.
(iii)A large variety of addressing mode typically from 5 to 20 different modes.
(iv)Variable length instruction format.
(v)Instructions that manipulate operands in memory.

COA -II Model Q uest ion Paper - I 1


Page
No -
(c) Why the performance of a parallel computer is improved by using a two level cache
memory? (Year - 2008)
Solution:
Modern high end PCs and workstations all have at least two levels of caches: A very fast, and hence not
very big, first level (L1) cache together with a larger but slower L2 cache. Some recent microprocessors
have 3 levels.
When a miss occurs in L1, L2 is examined, and only if a miss occurs there is main memory referenced.
So the average miss penalty for an L1 miss is
(L2 hit rate)*(L2 time) + (L2 miss rate)*(L2 time + memory time)

We are assuming L2 time is the same for an L2 hit or L2 miss. We are also assuming that the access
doesn't begin to go to memory until the L2 miss has occurred.

(d) Write at least four differences between a multiprocessor and multicomputer system. (Year - 2008)
Solution:
Multiprocessor: -
1. Multiprocessor is more than one CPU or one CPU with more than one core in one.
Multiprocessing is the use of two or more central processing units (CPUs) within a single
computer system. The term also refers to the ability of a system to support more than one processor
and/or the ability to allocate tasks between them.
2. A multiprocessor system is simply a computer that has more than one CPU on its motherboard. If the
operating system is built to take advantage of this, it can run different processes (or different threads
belonging to the same process) on different CPUs.
3. There are many variations on this basic theme, and the definition of multiprocessing can vary with
context, mostly as a function of how CPUs are defined (multiple cores on one die, multiple chips in one
package, multiple packages in one system unit, etc.).
4. Multiprocessing sometimes refers to the execution of multiple concurrent software processes in a system
as opposed to a single process at any one instant. However, the terms multitasking or multiprogramming
are more appropriate to describe this concept, which is implemented mostly in software, whereas
multiprocessing is more appropriate to describe the use of multiple hardware CPUs.
Multicomputer: -
1. Computer multicomputer is more than one computer or a network of computers. A computer made up of
several computers. Something similar to parallel computing.
2. A multicomputer may be considered to be either a loosely coupled NUMA computer or a tightly coupled
cluster. Multicomputers are commonly used when strong computer power is required in an environment
with restricted physical space or electrical power.
3. Distributed computing deals with hardware and software systems containing more than one processing
element or storage element, concurrent processes, or multiple programs, running under a loosely or tightly
controlled regime.
4. In distributed computing a program is split up into parts that run simultaneously on multiple computers
communicating over a network. Distributed computing is a form of parallel computing, but parallel
computing is most commonly used to describe program parts running simultaneously on multiple
processors in the same computer.
(e) Discuss anti-dependence / Name-dependence Vs True dependence. (Year - 2006)
Solution:
Anti-dependency occurs when an instruction requires a value that is later updated. In the following
example, instruction 3 anti-depends on instruction 2 - the ordering of these instructions cannot be

COA -II Model Q uest ion Paper - I 2


Page
No -
changed, nor can they be executed in parallel (possibly changing the instruction ordering), as this would
affect the final value of A.
1. B=3
2. A=B+1
3. B=7
Anti-dependency is an example of a name dependency. That is, renaming of variables could
remove the dependency, as in the next example:
1. B=3
N. B2 = B
2. A = B2 + 1
3. B=7
A new variable, B2, has been declared as a copy of B in a new instruction, instruction N. The anti-
dependency between 2 and 3 has been removed, meaning that these instructions may now be executed in
parallel.
True dependence.
However, the modification has introduced a new dependency: instruction 2 is now truly dependent on
instruction N, which is truly dependent upon instruction 1. As true dependencies, these new dependencies
are impossible to safely remove.
(f) What do you mean by cache coherence? (Year - 2006)
Solution:
In a shared memory multiprocessor with a separate cache memory for each processor , it is
possible to have many copies of any one instruction operand : one copy in the main memory and
one in each cache memory. When one copy of an operand is changed, the other copies of the
operand must be changed also. Cache coherence is the discipline that ensures that changes in the
values of shared operands are propagated throughout the system in a timely fashion.
There are three distinct levels of cache coherence:
1. Every write operation appears to occur instantaneously.
2. All processes see exactly the same sequence of changes of values for each separate operand.
3. Different processes may see an operand assume different sequences of values. (This is
considered noncoherent behavior.)
In both level 2 behavior and level 3 behavior, a program can observe stale data . Recently,
computer designers have come to realize that the programming discipline required to deal with
level 2 behavior is sufficient to deal also with level 3 behavior. Therefore, at some point only
level 1 and level 3 behavior will be seen in machines.
(g) Explain what structural hazard with suitable example.

Solution:

It occurs when combinations of instructions cannot be accommodated because of resource conflicts.


Often arise when some functional unit is not fully pipelined. Load uses register file’s Write port
during its 5 th stage.

1 2 3 4 5
Load IF RF/ID EX MEM WB

R-type uses register file’s write port during the 4th stage.
COA -II Model Q uest ion Paper - I 3
Page
No -
1 2 3 4
R-type IF RF/ID EX WB
Example:
Consider a load followed immediately by a store processor only has a single write port.

(h) What do mean by Locality of Reference?

Solution:
There are, therefore, two main components to the locality of reference
 temporal there is a tendency for a program to reference in the near future, memory items that
it has referenced in the recent past. For example, loops, temporary variables, arrays, stacks, …
 spatial there is a tendency for a program to make references to a portion of memory in the
neighbourhood of the last memory reference
(i) Write down Methods for improving performance Cache Performance. (Year - 2008)
Solutions:
Methods for improving performance
 increase cache size
 increase block size
 increase associatively
 add a 2nd level cache
(j) Identify the kind of hazard occurs while executing the following instruction in the
pipeline.Draw the path to avoid the hazard.

Here a load instruction followed immediately by a store processor only has a single port.
Solutions:
1. Delay instruction until functional unit is ready.
 Hardware inserts a pipeline stall or a bubble that delays execution of all instructions that
follow (previous instruction continue).
 Increase CPI from the ideal value of 1
COA -II Model Q uest ion Paper - I 4
Page
No -
2. Build more sophisticated functional units so that all combinations of instructions can be
accommodated.
 Example: Allow two simultaneous writes to the register file.
Write Back Stall Solution:
Delay R-Type register write by one cycle.
1 2 3 4 5
R-type IF RF/ID EX MEM WB

2. (a) What do you mean by interleaved memory organization? (Year-2008)


Solution:
1. Pipeline and vector processors often require simultaneous access to memory from two or more sources.
For example: An instruction pipeline, an arithmetic pipeline usually requires two or more operands to
enter the pipeline at the same time.
2. Instead of using two memory buses for simultaneous access, the memory can be partitioned into a number
of modules connected to a common memory address and data buses. Each memory array has its own
address register AR and data register DR.
3. The modular system permits one module to initiate a memory access while other modules are in the
process of reading or writing a word and each module can honor a memory request independent of the
state of the other modules.

Advantage:
1. Different sets of addresses are assigned to different memory modules. For example, in a two-module
memory system, the even addresses may be in one module and the odd addresses in the other.
2. A modular memory is useful in systems with pipeline and vector processing. A vector processor that uses
an n-way interleaved memory can fetch n operands from n different modules.
3. By which the effective memory cycle time can be reduced by a factor close to the number of modules.

(b) Explain in details cache coherence mechanisms.


Solutions:
Cache coherence mechanisms
COA -II Model Q uest ion Paper - I 5
Page
No -
1. Directory-based coherence:In a directory-based system, the data being shared is placed in a
common directory that maintains the coherence between caches. The directory acts as a filter
through which the processor must ask permission to load an entry from the primary memory to
its cache. When an entry is changed the directory either updates or invalidates the other caches
with that entry.
2. Snooping is the process where the individual caches monitor address lines for accesses to
memory locations that they have cached. When a write operation is observed to a location that a
cache has a copy of, the cache controller invalidates its own copy of the snooped memory
location.
3. Snarfing is where a cache controller watches both address and data in an attempt to update its
own copy of a memory location when a second master modifies a location in main memory.
3. (a) List down various Pipeline Hazards. Year - 2008)
Solution:
 structural hazards: attempt to use the same resource two different ways at the same time
 E.g., two instructions try to read the same memory at the same time
 data hazards: attempt to use item before it is ready
 instruction depends on result of prior instruction still in the pipeline
add r1, r2, r3
sub r4, r2, r1
 control hazards: attempt to make a decision before condition is evaulated
 branch instructions
beq r1, loop
add r1, r2, r3
 Can always resolve hazards by waiting
• pipeline control must detect the hazard
• take action (or delay action) to resolve hazards

(b) Identify the data hazards while executing the following instruction in DLX pipeline. Draw the
forwarding path to avoid the hazard. (Year-2008)
ADD R1, R2, R3
SUB R4, R1 , R5
AND R6,R1 ,R7
OR R8,R1,R9
XOR R10,R1 ,R11
Solution:
Data hazards occur when pipeline changes the order of read/write access to operands sot that the order
differs from the order seen by sequentially executing instructions caused by several types of
dependencies.
Data Hazard Solution:

(i) Stalls: Delay next instruction until ready

COA -II Model Q uest ion Paper - I 6


Page
No -
(ii) Or register file writes on first half and reads on second half. Or, “Forward” the data to the appropriate unit

4. (a) Describe the Flynn’s classification of computer architecture. (Year - 2006)

Solution:
Flynn's taxonomy
Single Instruction Multiple Instruction
Single Data SISD MISD
Multiple Data SIMD MIMD
Classifications
The four classifications defined by Flynn are based upon the number of concurrent instruction (or control)
and data streams available in the architecture:
(i) Single Instruction, Single Data stream (SISD)
A sequential computer which exploits no parallelism in either the instruction or data streams.
Examples of SISD architecture are the traditional uniprocessor machines like a PC or old mainframes.

(i) SISD (ii) SIMD


(ii) Single Instruction, Multiple Data streams (SIMD)
A computer which exploits multiple data streams against a single instruction stream to perform
operations which may be naturally parallelized. For example, an array processor or GPU.

COA -II Model Q uest ion Paper - I 7


Page
No -
(iii) MISD (iv) MIMD
(iii) Multiple Instruction, Single Data stream (MISD)
Multiple instructions operate on a single data stream. Uncommon architecture which is
generally used for fault tolerance. Heterogeneous systems operate on the same data stream
and must agree on the result. Examples include the Space Shuttle flight control computer.
(iv) Multiple Instruction, Multiple Data streams (MIMD)
Multiple autonomous processors simultaneously executing different instructions on different
data. Distributed systems are generally recognized to be MIMD architectures; either
exploiting a single shared memory space or a distributed memory space.

(b) Discuss various Levels of Parallelism. (Year-2006)


Solution:
Exploiting Parallelism
 Taking advantage of parallelism is another very important method for improving the
performance of a computer system. We consider three examples that demonstrate the advantages
of parallelism at three different levels
 System Level
 Processor Level
 Detailed digital design level
System level Parallelism

 The aim of this example is to improve the throughput performance of a server system with
respect to a particular benchmark, e.g., SPEC Web. The parallelism takes the form of
 multiple processors
 multiple disc drives
 The general idea is to spread the overall workload amongst the available processors and disc
drives
 Scalability is viewed as a valuable asset for server applications
 Ideally, the overall improvement in performance over a single processor would be a factor of N,
where N is the number of processors.
Processor level Parallelism
 Advantage can be taken of the fact that not all instructions in a program rely on the result of their
predecessors
 Thus sequences of instructions can be executed with varying degrees of overlap, which is a form
of parallelism
 This is the basis of instruction pipelining which we study later in the module

COA -II Model Q uest ion Paper - I 8


Page
No -
 Instruction pipelining has the effect of improving performance by decreasing the CPI of a
processor
Detailed Digital Design Level Parallelism

 Examples:
 set associative caches use multiple banks of memory that may be searched in parallel to
find a desired item
 modern ALUs use carry-lookahead, which uses parallelism to speed up the process of
computing sums from linear to logarithmic in the number of bits per operand

5. (a) Explain "pipelining & Pipeline Taxonomies ". (Year-2006)


Solution:
• There are two main ways to increase the performance of a processor through high-level system
architecture
 Increasing the memory access speed
 Increasing the number of supported concurrent operations
 Pipelining
 Parallelism
• Pipelining is the process by which instructions are parallelized over several overlapping stages of
execution, in order to maximize data path efficiency
• Pipelining is analogous to many everyday scenarios
 Car manufacturing process
 Batch laundry jobs
 Basically, any assembly-line operation applies
• Two important concepts:
 New inputs are accepted at one end before previously accepted inputs appear as outputs
at the other end;
 The number of operations performed per second is increased, even though the elapsed
time needed to perform any one operation remains the same
Pipeline Taxonomies
• There are two types of pipelines used in computer systems
 Arithmetic pipelines
 Used to pipeline data intensive functionalities
 Instruction pipelines
 Used to pipeline the basic instruction fetch and execute sequence
• Other classifications include
 Linear vs. nonlinear pipelines
 Presence (or lack) of feed forward and feedback paths between stages
 Static vs. dynamic pipelines
 Dynamic pipelines are multifunctional, taking on a different form depending on the
function being executed
 Scalar vs. vector pipelines
 Vector pipelines specifically target computations using vector data
(b) Consider an improvement to a processor that makes the original processor run 10 times
COA -II Model Q uest ion Paper - I 9
Page
No -
faster, but is only usable for 40% of the time. What is the overall speedup gained by
incorporating this improvement using Amdahl’s Law?

Solution:
 The performance gain that can be made by improving some portion of the operation of a
computer can be calculated using Amdahl’s Law
 Amdahl’s Law states that the improvement gained from using some faster mode of
execution is limited by the fraction of the time the faster mode can be used
 Little point in improving rare tasks
 Amdahl’s Law defines the overall speedup that can be gained for a task by using the new feature
designed to speed up the execution of the task
execution time for the task without the new feature
Overall speedup =
execution time for the task with the new feature
Amdahl’s Law
 Fraction enhanced 
ExTime new = ExTime old × (1 − Fraction enhanced ) + 
 Speedup enhanced 

ExTime old 1
Speedup overall = =
ExTime new (1 − Fraction Fraction enhanced
enhanced ) +
Speedup enhanced
Best you could ever hope to do:
1
Speedup maximum =
(1 - Fraction enhanced )
 F = 0.4
 S = 10
1 1
1
overall speedup = F = 0.4 = = 1.56
( F − 1) + (1 − 0.4) + 0.64
S 10
6. (a) Explicit parallelism vs. implicit parallelism. (Year - 2006)
Solution:
Explicit parallelism:
1. Explicit parallel programming is the absolute programmer control over the parallel execution
2. In some instances, explicit parallelism may be avoided with the use of an optimizing compiler that
automatically extracts the parallelism inherent to computations (see implicit parallelism).
3. In computer programming, explicit parallelism is the representation of concurrent computations by
means of primitives in the form of special-purpose directives or function calls.
4. Most parallel primitives are related to process synchronization, communication or task
partitioning. As they seldom contribute to actually carry out the intended computation of the
program, their computational cost is often considered as parallelization overhead.
Advantage
A skilled parallel programmer takes advantage of explicit parallelism to produce very efficient
code.
COA -II Model Q uest ion Paper - I Page 10
No -
Disadvantage
However, programming with explicit parallelism is often difficult, especially for non computing
specialists, because of the extra work involved in planning the task division and synchronization of
concurrent processes.
Programming with explicit parallelism:

 Message Passing Interface


 Parallel Virtual Machine
 Ease programming language
 Ada programming language
 Java programming language
 Java Spaces

Implicit parallelism
1. In computer science, implicit parallelism is a characteristic of a programming language that allows a
compiler or interpreter to automatically exploit the parallelism inherent to the computations expressed by
some of the language's constructs.
2. A pure implicitly parallel language does not need special directives, operators or functions to enable
parallel execution.
Programming languages with implicit parallelism include Lab VIEW, MATLAB M-code.
Example:
If a particular problem involves performing the same operation on a group of numbers (such as taking the sine
or logarithm of each in turn), a language that provides implicit parallelism might allow the programmer to
write the instruction thus:
numbers = [0 1 2 3 4 5 6 7];
result = sin(numbers);
The compiler or interpreter can calculate the sine of each element independently, spreading the effort across
multiple processors if available.
Advantages
Implicit parallelism generally facilitates the design of parallel programs and therefore results in a substantial
improvement of programmer productivity.

Disadvantages
1. Languages with implicit parallelism reduce the control that the programmer has over the parallel
execution of the program,
2. Experiments with implicit parallelism showed that implicit parallelism made debugging difficult
(b) Write a short note on Instruction-level parallelism.
Solutions:
1. Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can
be performed simultaneously. Consider the following program:
1. e = a + b
2. f = c + d
3. g = e * f
Operation 3 depends on the results of operations 1 and 2, so it cannot be calculated until both of them are
completed. However, operations 1 and 2 do not depend on any other operation, so they can be calculated

COA -II Model Q uest ion Paper - I 11


Page
No -
simultaneously. If we assume that each operation can be completed in one unit of time then these three
instructions can be completed in a total of two units of time, giving an ILP of 3/2.
2. A goal of compiler and processor designers is to identify and take advantage of as much ILP as
possible.
3. ILP allows the compiler and the processor to overlap the execution of multiple instructions or
even to change the order in which instructions are executed.
4. How much ILP exists in programs is very application specific. In certain fields, such as graphics
and scientific computing the amount can be very large.
5. Micro-architectural techniques that are used to exploit ILP include:

 Instruction pipelining where the execution of multiple instructions can be partially overlapped.
 Superscalar execution in which multiple execution units are used to execute multiple instructions in parallel.
 Out-of-order execution where instructions execute in any order that does not violate data dependencies.
 Register renaming which refers to a technique used to avoid unnecessary serialization of program
operations imposed by the reuse of registers by those operations, used to enable out-of-order execution.

7. (a) How do tightly coupled system differs from loosely coupled ones ? (Year-2008)
Solution:
Shared Memory Systems
 Tightly Coupled Systems
 Uniform and Non-Uniform Memory Access
Tightly Coupled Systems
 Multiple CPUs share memory.
 Each CPU has full access to all shared memory through a common bus.
 Communication between nodes occurs via shared memory.
 Performance is limited by the bandwidth of the memory bus.

Performance:
Performance is potentially limited in a tightly coupled system by a number of factors. These include
various system components such as the memory bandwidth, CPU to CPU communication bandwidth,
the memory available on the system, the I/O bandwidth, and the bandwidth of the common bus.
Uniform and Non-Uniform Memory Access
Shared memory systems can be loosely coupled with memory. Uniform memory access from the CPU
on the left, and non-uniform memory access (NUMA) between the left and right disks.
Advantages
COA -II Model Q uest ion Paper - I 12
Page
No -
 Memory access is cheaper than inter-node communication. This means that internal
synchronization is faster than using the distributed lock manager.
 Shared memory systems are easier to administer than a cluster.
Shared Disk Systems
Shared disk systems are typically loosely coupled.
Loosely Coupled Systems

 Each node consists of one or more CPUs and associated memory.


 Memory is not shared between nodes.
 Communication occurs over a common high-speed bus.
 Each node has access to the same disks and other resources.
Advantages
 Shared disk systems permit high availability. All data is accessible even if one node dies.
 These systems have the concept of one database, which is an advantage over shared nothing
systems.
 Shared disk systems provide for incremental growth.
Disadvantages
 Inter-node synchronization is required, involving DLM overhead and greater dependency on
high-speed interconnect.
 If the workload is not partitioned well, there may be high synchronization overhead.
 There is operating system overhead of running shared disk software.
(b) What do you understand by quantitative principle of computer design?(year-2008)
Solution:
Quantitative Principles of Design
1. Take Advantage of Parallelism
2. Principle of Locality
3. Focus on the Common Case
4. Amdahl’s Law
5. The Processor Performance Equation
1) Taking Advantage of Parallelism
• Increasing throughput of server computer via multiple processors or multiple disks
• Detailed HW design:
– Carry lookahead adders uses parallelism to speed up computing sums from linear to logarithmic
in number of bits per operand
– Multiple memory banks searched in parallel in set-associative caches
• Pipelining: overlap instruction execution to reduce the total time to complete an instruction sequence.
COA -II Model Q uest ion Paper - I 13
Page
No -
– Not every instruction depends on immediate predecessor ⇒ executing instructions
completely/partially in parallel possible
– Classic 5-stage pipeline:
1) Instruction Fetch (Ifetch),
2) Register Read (Reg),
3) Execute (ALU),
4) Data Memory Access (Dmem),
5) Register Write (Reg)

2) The Principle of Locality


• The Principle of Locality:
– Program access a relatively small portion of the address space at any instant of time.
• Two Different Types of Locality:
– Temporal Locality (Locality in Time): If an item is referenced, it will tend to be
referenced again soon (e.g., loops, reuse)
– Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are
close by tend to be referenced soon.

3) Focus on the Common Case

• Common sense guides computer design


– Since its engineering, common sense is valuable
• In making a design trade-off, favor the frequent case over the infrequent case.
– Ex: Instruction fetche and decode unit used more frequently than multiplier, so optimize it First.
• Frequent case is often simpler and can be done faster than the infrequent case
– Ex: overflow is rare when adding 2 numbers, so improve performance by optimizing
more common case of no overflow.
• What is frequent case and how much performance improved by making case faster => Amdahl’s
Law

4) Amdahl’s Law

 Fraction enhanced 
ExTime new = ExTime old × (1 − Fraction enhanced ) + 
 Speedup enhanced 

ExTime old 1
Speedup overall = =
ExTime new (1 − Fraction Fraction enhanced
enhanced ) +
Speedup enhanced
Best you could ever hope to do:
1
Speedup maximum =
(1 - Fraction enhanced )
For Example:
• Fraction = 0.9, Speedup = 10
1 1
Speedup overall = = = 5. 3
Model Q uest ion 0.9 14
Page
COA -II
(1 − 0.9) + Paper0.19
- I
No - 10
5) Processor performance equation

Seconds Instructions Cycles Seconds


CPU time = = X X
Program Program Instructions Cycle

 The execution time of a program can be refined into three components


 number of instructions
 number of clock cycles per instruction
 duration of clock cycle
 It is relatively straightforward to count the number of instructions executed, and the number of
processor clock cycles for a program
 We can then calculate the average number of clock cycles per instruction (CPI)
Number of clock cycles
CPI =
Number of instruction

8. (a) Write short note on Massively Parallel Systems (MPP).

Massively parallel (MPP) systems have the following characteristics:

 From only a few nodes, up to thousands of nodes are supported.


 The cost per processor may be extremely low because each node is an inexpensive processor.
 Each node has associated non-shared memory.
 Each node has its own devices, but in case of failure other nodes can access the devices of the
failed node (on most systems).
 Nodes are organized in a grid, mesh, or hypercube arrangement.
 Oracle instances can potentially reside on any or all nodes.

COA -II Model Q uest ion Paper - I 15


Page
No -
System: A Hypercube Example

Note: A hypercube is an arrangement of processors such that each processor is connected to log2n other
processors, where n is the number of processors in the hypercube. Log2n is said to be the "dimension" of
the hypercube. For example, in the 8-processor hypercube shown in Figure, dimension = 3; each
processor is connected to three other processors.

A massively parallel system may have as many as several thousand nodes. Each node may have its own
Oracle instance, with all the standard facilities of an instance

An MPP has access to a huge amount of real memory for all database operations (such as sorts or the
buffer cache), since each node has its own associated memory. To avoid disk I/O, this advantage will be
significant in long running queries and sorts. This is not possible for 32 bit machines which have a 2 GB
addressing limit; the total amount of memory on an MPP system may well be over 2 GB.

As with loosely coupled systems, cache consistency on MPPs must still be maintained across all nodes
in the system. Thus, the overhead for cache management is still present.

Advantages

 Shared nothing systems provide for incremental growth.


 System growth is practically unlimited.
 MPPs are good for read-only databases and decision support applications.
 Failure is local: if one node fails, the others stay up.
Disadvantages
 More coordination is required.
 A process can only work on the node that owns the desired disk.
 If one node dies, processes cannot access its data.
 Physically separate databases which are logically one database can be extremely complex and
time-consuming to administer.
COA -II Model Q uest ion Paper - I 16
Page
No -
 Adding nodes means reconfiguring and laying out data on disks.
 If there is a heavy workload of updates or inserts, as in an online transaction processing system,
it may be worthwhile to consider data-dependent routing to alleviate contention.

(b) Identify the data hazards while executing the following instruction in DLX pipeline.
Draw the forwarding path to avoid the hazard.
LW R1,O(R2)
SUB R4,R1,R6
AND R6,R1,R7
OR R8,R1,R9

Solution:

Hardware stall,a pipeline interlock checks and stops the instruction issue

Data hazard with forwarding

COA -II Model Q uest ion Paper - I 17


Page
No -

Potrebbero piacerti anche