Sei sulla pagina 1di 26

NCCkUL ISLAM CCLLLGL CI LNGINLLkING, kUMAkACCIL

DLAk1MLN1 CI CCMU1Lk SCILNCL AND LNGINLLkING


2 & 16 Marks Question Answers
CS64- Advanced Computer Architecture
S6 BE(CSE)-AU
Prepared by
R.Suji Pramila
Lecturer/CSE
NIU
UNIT I
1. What is Instruction Level parallelism?
The technique which is used to overlap the execution of instructions and improve
performance is called ILP.
2. What are the approaches to exploit ILP?
The two separable approaches to exploit ILP are,
Dynamic or hardware intensive approach
Static or Compiler intensive approach
3. What is pipelining?
Pipelining is an implementation technique whereby multiple instructions are overlapped
in execution when they are independent of one another.
4. Write down the formula to calculate the pipeline CPI?
The value of the CPI (Cycles per Instruction) for the pipelined processor is the sum of the
CPI and all contributions from stalls.
Pipeline CPI = Ideal pipeline CPI + structural stalls + Data hazard stalls + control stalls.
5. What is loop level parallelism?
Loop level parallelism is a way to increase the amount of parallelism available among
instructions is to exploit parallelism among iterations of loop.
6. Give the methods to enhance performance of ILP?
To obtain substantial performance enhancements, the ILP across multiple basic blocks
are exploited using
loop level parallelism
vector instructions
7. List out the types of dependences.
There are three different types of dependences
Data dependences
Name dependences
Control dependences
8. What is Data hazard?
A hazard is created whenever there is dependence between instructions and they are close
enough that the overlap caused by pipelining, or other reordering of instructions, would change
the order of access to the operand involved in the dependence.
9. Give the classification of Data hazards
Data Hazards are classified into three types depending on the order of read and write
accesses in the instructions
RAW (Read After Write)
WAW (Write After Write)
WAR (Write After Read)
10. List out the constraints imposed by control dependences?
The two constraints imposed by control dependencies are
An instruction that is control dependent on branch cannot be moved before the branch
so that its execution is no longer controlled by the branch.
An instruction that is not control dependent on a branch cannot be moved after the
branch so that its execution is controlled by the branch.
11. What are the properties used for preserving control dependence?
Control dependence is preserved by two properties in a simple pipeline.
Instruction execute in program order
Detection of control or branch hazards
12. Define Dynamic Scheduling?
Dynamic scheduling is a technique in which the hardware rearranges the instruction
execution to reduce the stalls while maintaining data flow and exception behavior.
13. List the advantages of dynamic scheduling?
It handles dependences that are unknown at compile time.
It simplifies the compiler.
Uses speculation techniques to improve the performance.
14. What is score boarding?
Score boarding is defined as it allows out of order execution when all the resources are
available and there is no data dependence. It cant be eliminated until these two hazards WAW,
WAR are cleared.
15. What are the advantages of Tomosulos Approach?
Distribution of hazard detection layer
Elimination of WAR and WAW hazard
16. What are the types of branch prediction?
There are two types of branch prediction. They are,
Dynamic branch prediction
Static branch prediction
17. Define Amdahls Law?
This law states that particular performance of the computer can be improved by
improving some portion of the computer. This is known as Amdahls Law.
18. What are the things present in Dynamic branch prediction?
It uses two things they are,
Branch prediction buffer
Branch history table
19. Define Correlating branch prediction?
Branch prediction that uses the behavior of other branches to make a prediction is called
correlating branch prediction.
20. What are the basic ideas of pipeline scheduling?
The basic ideas of pipeline scheduling are,
To keep pipeline full: Find sequence of unrelated instructions that can be
overlapped in the pipeline.
To avoid pipeline stall: Separate dependent instructions by a distance in clock
dependent instructions by a distance in clock cycles equal to the pipeline latency
of that source instruction.
21. What are the four fields involved in ROB?
ROB contains four fields,
Instruction type
Destination field
Value field
Ready field
22. What is reservation station?
In Tomasulos scheme register renaming is provided by reservation station. The basic
idea is that the reservation station fetches and buffers an operand as soon as it is available,
eliminating the need to get the operand from a register.
23. What is ROB?
ROB stands for reorder buffer. It supplies operands in the interval between completion of
instruction execution and instruction commit. ROB is similar to the store buffer in Tomasulos
algorithm.
24. What is imprecise exception?
An exception is imprecise if the processor state when an exception is raised does not look
exactly as if the instructions were executed sequentially in strict program order.
25. What are the two possibilities of imprecise exceptions?
If the pipeline has already completed instructions that are later in program order
then that instruction will cause exception.
If the pipeline has not yet completed instructions that are earlier in program order
then that instructions will cause exception.
26. What are the two main features preserved by maintaining both data and control dependence?
Exception behavior
Data flow
27. What are the types of dependence?
Anti dependence
Output dependence
28. What is anti dependence?
An anti dependence between instruction i and instruction j occurs when instruction j
writes a register or memory location that instruction i reads. The original ordering must be
preserved to ensure that i read the correct value.
29. What is output dependence?
An output dependence occurs when instruction i and instruction j write the same register
or memory location. The ordering between the instructions must be preserved to ensure that the
value finally written corresponds to instruction j.
30. What is register renaming?
Renaming of register operand is called register renaming. It can be either done statically
by the compiler or dynamically by the hardware.
UNIT 2
1. Define VLIW.
VLIW is a technique for ILP by executing instructions without dependencies in parallel.
The compiler analysis the program and detects operations to be executed in parallel; such
operations are packed into one large instruction.
2. List out the advantages of VLIW processor.
Simple hardware
Number of functional units can be increased without needing additional
sophisticated hardware to detect parallelism like in superscalus.
Good compilers can detect parallelism based on global analysis of the whole
program.
3. Define EPIC
Epic is Explicit Parallel Instruction Computing
It is an architecture framework proposed by HP.
It is based on VLIW and was designed to overcome the key limitations of VLIW
while simultaneously giving more flexibility to compiler writers.
4. What is loop level analysis?
Loop level analysis involves determining what depends exist among the operands in a
loop across the iterations of a loop are data dependent on data values produced in earlier
iterations.
5. What are the types of Data dependencies in loops?
Loop Carried dependencies
Not loop carried dependence
6. What is loop carried dependence?
Data dependence between different loop iterations (data produced in earlier iterations
used in a later one) is called a loop carried dependence.
7. What are the tasks in finding the dependence in a program?
There are 3 tasks. They are
Have good scheduling of code
Determine which loop might contain parallelism
Eliminate name dependence
8. Define dependence analysis algorithm.
Dependence analysis algorithm is algorithm used to detect the dependence by the
compiler based on the assumptions that
Array indices are affine
There exist GCD of the two affine indices
9. What is copy propagation?
Copy propagation is the algebraic simplifications of expressions and an optimization
which eliminates operation that copy values.
10. What is tree-height reduction technique?
Tree-height reduction is optimization which reduces the height of the tree structure
representing a computation, making it wider but shorter.
11. What are the components of software pipeline loop?
A software pipeline loop consists of a loop body, start- up code and clean-up
code.
Start up code is to execute code left out from the first original loop iterations.
Finish code to execute instructions from the last original iterations.
12. What is trace scheduling?
Trace scheduling is way to organize the process of global code motion it simplifies
instruction scheduling by incuring the cost of possible code motion on the less critical
paths.
13. List out steps used for trace scheduling.
Trace selection
Trace compaction
14. Define Inter-procedural analysis.
A procedure with pointer parameters and if we want to analyse the procedure across the
boundaries of the particular procedure. It is called interprocedural analysis.
15. What is software pipelining?
It is a technique for reorganizing loop such that each iteration in the code is made from
instructions chosen from different iterations of original loop.
16. Define critical path.
Critical path is defined as the longest sequence of dependent instructions in a program.
17. Define IA-64 processor.
The IA-64 is a RISC-Style, register-register instruction set with the features designed to
support compiler based exploitation of ILP.
18. What is CFM and what is its use?
CFM stands for Current Frame Pointer
CFM pointer points to the set of registers to be used by a given procedure.
19. What are the parts of CFM pointer?
There are two parts. They are
Local area Used for local storage
Output area - Used to pass values to any called procedure.
20. What is Itanium processor?
Itanium processor is a implementation of Intel IA-64 processor. It is capable of having 6
issues per clock cycle. The 6 issues includes 3 branches and 2 memory reference.
21. What are the parts of 10 stage pipeline in Itanium processor?
Front end
Instruction delivery(EXP, REN)
Operand delivery(WLD, REG)
Execution(EXE, DEG, WRB)
22. What are the limitations of ILP?
Limitations on hardware model
Limitations on window size and maximum issue count
Effect of finite register
Effects of imperfect alias analysis
23. List the two techniques for eliminating dependent computations
Software pipelining
Trace scheduling
24. Define Trace selection and Trace compaction
Trace Selection
Trace selection tries to find a likely sequence of basic blocks whose operations will be
put into small number of instructions this sequence is called trace.
Trace Compaction
Trace compaction tries to squeeze the trace into a small number
of wide instructions. Trace compaction is code scheduling hence it attempts to move
operations as early as it can in a sequence packing the operations into as few wide
instructions as possible.
25. Define Superblocks.
Superblocks are formed by a process similar to that used for traces, but are a form of
extended basic blocks, which are restricted to a single entry point but allow multiple exits.
26. Use of conditional or predicted instructions.
Conditional or predicted instructions are used to eliminate braches, converting a control
dependencies and potentially improving performance.
27. Define Instruction Group
Instruction group is a sequence of consecutive instructions with no register data dependencies
among them. All the instructions in a group could be executed in parallel if sufficient
hardware resources existed and if any dependences through memory were preserved.
28. Use of template field in bundle.
The 5 bit template field within each bundle describes both the presence of any stops
associated with the bundle and the execution unit type required by each instruction within
the bundle.
29. List the two types of speculation supported by IA 64 processor.
Control Speculation
Memory reference speculation
30. Define Advance loads.
Memory reference support in the IA 64 uses a concept called advanced loads. Advance load
is a load that has been speculatively moved above store instructions on which it is potentially
dependent. To speculatively perform a load the ld.a instruction is used.
31. Define ALAT
Executing advance load instructions created an entry in a special table called ALAT. It
stores both the register destination of the load and the address of the accessed memory
location. When a store is executed, an associative look up against the active ALAT
entries is performed. If there is an ALAT entry with the same memory memory address
as the store, mark the ALAT entry as invalid.
32. What are the functional units in Itanium Processor?
There are nine functional units in the Itanium processor.
Two I units
Two M units
Three B units
Two F units
All the functional units are pipelined.
33. Define Scoreboard
In Itanium processor 10 stage pipeline divided into 4 parts. In operand delivery part
scoreboard is used to detect when individual instruction can proceed so that a start of one
instruction in a bundle need not cause the entire bundle to stall.
34. Define Book Keeping Code
Basic block consists of 1 entry and 1 exit code. This code is known as Book 1Keeping
Code.
Unit-3
1. Define cache coherence problem?
Cache coherence problem describes how two different processors can have two different
values for the memory location.
2. What are the two aspects of cache coherence problem?
i. coherence- It determines what value can be returned by the particular read
operation.
ii. Consistency- It determine when the value may be returned by the read
operation.
3. What are the two types of cache coherence protocol?
i. Directory based protocol.
ii. Snooping protocol.
4. Define Directory based protocol.
The shared portion of the main memory may be kept in one common place called
directory. From this directory we can retrieve the data.
5. Name the different types of snooping protocol.
i. invalidate protocol
ii. update/write broadcast protocol.
6. Difference between write Update and invalidate protocol.
Write update:
i. Multiple write broadcast is present
ii. Here they consider separate word for each cache block
iii. Access time is less
Invalidate:
i. Only one invalidation is present
ii. Invalidation is performed for entire cache block
iii. Access time is high
7. What are the different types of access in distributed shared memory architecture?
i, Local:
If the processor refers the local memory then it is called local access.
ii. Remote:
If the processor refers the other process memory then it is called remote access
8. What are the disadvantages of remote access?
Compiler mechanism for cache coherence is very limited
Without the cache coherence property the multiprocessor system loss the
advantage of fetch and use multiple words
Prefetch is very useful only when the multiprocessor fetch multiple word
9. What are the states available in directory based protocol?
i. Shared:-One or more processor can have the copies of same dat.
ii. Uncached :- No processor has the copy of data block.
iii. Exclusive:- Exactly one processor has the copy of data block.
10. What are the nodes available in distributed system?
i. Local Node
ii. Home Node
iii. Remote Node
11. Define Synchronization.
Synchronization is the mechanism that is build with user level software routine,
which depends on hardware supplied synchronization instruction.
12. Name the basic hardware primitives.
i. Atomic Exchange
ii. Test and set
iii. Fetch and Increment
13. Define spinlock.
It is a lock that a processor continuously tries to acquire spinning around a loop until it
succeeds
It is mainly used when the programmer wants to use the lock for a small period of time
14. What are the mechanism to implement locks?
There are two methods to implement the locks.
i. Implementing lock without using cache coherence
ii. Implementing lock using cache coherence.
15. What are the advantage of using spin lock?
There are two advantages of using spin lock
i. They have low overhead
ii. Performance is high
16. Name the synchronization mechanisms for large scale multiprocessor.
i. Exponential back off
ii. queuing locks
iii. combining tree
17. What are the two primitives used for implementing synchronization?
Lock Based Implementation
Barrier based Implementation
18. Define sequential consistency.
It requires that the result of any execution be the same as, if the memory access executed
by each processor where kept in order and accesses among different processor are
interleaved.
It reduces the amount of incorrect execution
19. Define multithreading.
The process of executing the multiple thread by common memory or common
processor in which the execution is done is overlapping fashion.
20. What are the types of multi threading?
i. Fine grained multithreading:- It has the ability to switch threads for each
instruction
ii. coarse grained multithreading:- It has the ability to switch the threads only for
costly stalls.
Unit-4
1. Define cache.
Cache is the name given to the first level of the memory hierarchy encountered once
the address leaves the CPU.
Eg: file caches, name caches.
2. What are the factors on which the cache miss depends on?
The time required for the cache miss depends on both
Latency
Bandwidth
3. What is the principle of locality?
Program access a relatively small portion of the address space at any instant of
time is called principle of locality.
4. What is called pages?
The address space is usually broken into fixed-size blocks, called pages. Each
page resides either in main memory or on disk.
5. What is called memory stall cycles?
The number of cycles during which the CPU is stalled waiting for a memory
access is called memory stall cycles.
6. Write down the formula for calculating average memory access time?
Average memory access time=Hit time+Miss rate*Miss penalty.
When hit time is the time to hit in the cache, the formula can help us decide
between split caches and a unified cache.
7. What are the techniques to reduce the miss rate?
Larger block size
Larger caches
Higher associativity
Way prediction and pseudo associative caches
Compiler optimizations.
8. What are the techniques to reduce hit time?
Small and simple cache: direct mapped
Avoid address translation during indexing of the cache
Pipelined cache access
Trace cache
9. List out the types of storage devices.
Magnetic storages : disk, floppy, tape
Optical storages : compact disks(CD), digital/video/ verstaile
disks(DVD)
Electrical storage : flash memory
10. What is sequence recorded?
The sequence recorded on the magnetic medics is a sector number, a gap, the
information for that sector including error correction code, a gap, the sector number of
the next sector and so on.
11. What is termed as cylinder?
The term cylinder is used to refer to all the tracks under the arms at a given point
on all surfaces.
12. List the components to a disk access.
There are three mechanical components to a disk access:
Rotation latency
Transfer time
Seek time
13. What is average seek time?
Average seek time is the sum of the time for all possible seeks divided by the
number of possible seek. Average seek times are advertised to be 5 ms to 12 ms.
14. What is transfer time
Transfer time is the time it takes to transfer a block of bits, typically a sector,
under the read-write head. This time is a function of the block size, disk size, rotation
speed, recording density of the track, and speed of the electronics connecting the disk to
computer.
15. Write the formula to calculate the CPU execution time.
CPU execution time=(CPU clock cycles+ memory stall cycles)*clock cycle time.
16. Write the formula to calculate the CPU time.
CPU time=(CPU execution clock cycles+ memory stall clock cycles)* clock cycle
time.
17. Define miss penalty for an out of order execution processor.
For an out of order execution processor, miss penalty is defined as follows.
(Memory stall cycles/Instruction) *( misses/instruction) *(total miss latency-
overlapped miss latency.
18. What are the techniques available to reduce cache penalty or miss rate via parallelism?
The three techniques that overlap the execution of instructions are
1.Non blocking caches to reduce stalls on cache miss- to match the out of
order processors
2.Hardware prefetching of instructions and data
3.Compiler- controlled prefetching.
19. How are the conflict misses divided?
The four divisions of conflict misses are,
Eight way
Four way
Two way
One way
20. List the advantage of memory hierarchy?
Memory hierarchy takes advantageof
a.locality
b.cost/performance of memory technologies
22. What is the goal of memory hierarchy?
The goal is to provide a memory system with
*cost almost as low as the cheapest level of memory
*speed almost as fast as the faster level
23. Define cache hit ?
When the cpu finds a requests data item in the cache, it is called a cache hit.
*Hit Rate: the fraction of cache access found in the cache
*Hit Time: time to access the upperlevel which consists of RAM access
time+Time to determine hit\miss
24.Define cache miss?
When the cpu doesnot find a data item it needs in the cache, a cache miss occurs
*Miss Rate-1-(Hit Rate)
*Miss penalty-Time to replace a block in cache +time to deliver the block to the
processor
25. What does Latency and Bandwidth determine?
-Latency determine the time to retrieve the first word of the block
-Bandwidth determine the time to retrieve the rest of this block
26. What are the types of locality?
*Temporal locality(Locality in time)
*Spatial locality(Locality in space)
27. How does page fault occur?
When the cpu references an item within a page that is not present in the cache or main
memory, a page fault occurs, and the entire page is moved from the disk to main memory
28. What is called the miss penalty?
The number of memory stall cycles depends on both the number of misses and the
cost per miss, which is called the miss penalty
29. What is Average memory access time?
The average memory access time for processors is the better measure of memory
hierarchy performance with in-order execution
30. What are the categories of cache miss(3cs of cache miss)
*compulsory
*capacity
*conflict
31. What are the techniques to reduce miss penalty?
*multi-level caches
*critical word first and early restart
*giving priority to read misses over writes
*Merging writes buffer
*victim caches
UNIT-5
1) What is the function of Power Processing Unit?
*a full set of 64-bit power pc register.
*32-168 bit vector multimedia register.
*a 32 KB LI data cache.
*a 32 KB LI instruction cache.
2) List out the disadvantages of Heterogeneous multi-core processors?
*Developer productivity.
*Portability.
*Manage ability.
3) Define Software Multithreading
Software multithreading is a piece of software that is aware of more than one
core/processor and can use these to be able to simultaneously complete multiple tasks.
4) Define Hardware Multithreading
Hardware multithreading is a multithreading that allows multiple to share the functional
units of a single processor in an overlapping fashion.
5) Difference between Software and Hardware Multithreading
*Multithreading(Computer Architecture), multithreading in hardware.
*Thread(Computer Science), multithreading in software.
6) List some advantages of Software Multithreading.
*Increased responsiveness and worker productivity.
-Increased application responsiveness when different tasks run in parallel.
*Improved performance in parallel environments.
-When running computations on multiple processors.
*More computations per cubic foot of data center.
-Web based applications are often multi-threaded in nature.
7) List out the two approaches of Hardware Multithreading.
The two main approaches in Hardware multithreading are
*Fine-grain Multithreading.
*Coarse-grain Multithreading.
8) Define Simultaneous Multithreading(SMT)
SMT is a variation on multithreading that uses resources of a multiple issue,
dynamically scheduled processor to exploit ILP at the samw time it exploits ILP. ie., convert
thread-level parallelism into more ILP.
9) Give the features exploited by SMT.
It exploits the following features of modern processors
*Multiple Functional Units.
-Modern Processors typically have more functional units available than a
single thread can utilize.
*Register Renaming and Dynamic Scheduling.
-Multiple instructions from independent threads can co-exist and co-
execute.
10) What are the Design challenges of SMT?
The Design Challenges of SMT processor includes the following-
*Larger Files needed to hold multiple contents.
*Not affecting clock cycle time.
*Instruction issue-more candidate instructions need to be considered.
*Instruction comlpletion-choosing which instructions to commit may be challenging.
*Ensuring that cache and TLB conflicts generated by SMT do not degrade performance.
11) Compare the SMT processor with the base Superscalar Processor
The SMT processor are compared to the base superscalar processor in several key
measures
*Utilization of functional units.
*Utilization of Fetch units.
*Accuracy of branch predictors.
*Hit rates of primary caches.
*Hit rates of secondary caches.
12) List the factors that limits the issue slot usage
The issue slot usage is limited by the following factors.
*Imbalances in resources needs.
*Resources availabilty over multiple threads.
*Number of active threads considered.
*Finite Limitations of buffer.
*Ability to fetch enough instruction from multiple threads.
13) Define Multi-core microprocessor
A multi-core microprocessor is one that combines two or more separate processors in one
package.
14) What is Het erogeneous Multi-core processors?
Herogeneous Multi-core processor is a processor in which multiple cores of different
types are implemented in one CPU.
15) List out the advantages of Herogeneous Multi-core processors.
*Massive parallelism.
*Specialization of Hardware for tools.
16) List out the Disadvantages of Herogeneous Multi-core processors.
*Developer productivity.
*Portability.
*Manageability.
17) What is IBM cell processor?
The IBM cell processor is a heterogeneous multi-core processor comprised of control-
intensive processor and computative-intensive SIMD processor cores, each with its own
distinguishing feature.
18) List the components of IBM cell architecture
*Power Processing Elements(PPE).
*Synergistic Processor Elements(SPE).
*I/O controller.
*Element Interconnect Bus(EIB).
19) What are the components of PPE?
The PPE is made out of two main units..
1.Power Processor Unit(PPU)
2.Power Processor Storage Subsystem(PPSS)
20) What is Memory Flow Controller(MFC)?
The Memory Flow Controller is actually the interface between the Synergistic
Processor(SPU) and the rest of the cell chip. Actually, the MFC interfaces the SPU with the EIB.
16 MARKS
1. Explain the concepts and challenges of Instruction-Level Parallelism.
Define Instruction-Level Parallelism
Data dependences and hazards
o Data dependences
o Name dependences
o Data hazards
Control Dependences
2. Explain dynamic scheduling using Tomasulos approach.
Explain the 3 steps:
o Issue
o Execute
o Write result
Explain the 7 fields of reservation station
Figure: The basic structure of a MIPS floating-point unit using Tomasulos
algorithm.
3. Explain the techniques for reducing branch costs with dynamic hardware prediction.
Define basic branch prediction and branch-prediction buffers.
Figure: The states in a 2-bit prediction scheme
Correlating branch predictors
Tournament predictors: Adaptively combining local and global predictors
Figure: state transition diagram for tournament predictors with 4 states.
4. Explain in detail about hardware-based speculation.
Define hardware speculation, instruction commit, reorder buffer.
Four steps involved in instruction execution.
o Issue
o Execute
o Write result
o Commit
Figure: The basic structure of a MIPS FP unit using Tomasulos algorithm and
extended to handle speculation
Multiple issue with speculation
5. Explain in detail about basic compiler techniques for exposing ILP.
Basic pipeline scheduling and loop unrolling.
Example codes
Using loop unrolling and pipeline scheduling with static multiple issue
6. Explain in detail about static multiple issue using VLIW approach.
Define VLIW.
The basic VLIW approach:
o Explain about the registers used.
o Functional units used.
o Complex global scheduling.
Example code
Technical and logistical problems.
7. Explain in detail about advanced compiler support for exposing and exploiting ILP.
Detecting and Enhancing loop-level parallelism.
o Finding dependences
o Eliminating dependent computations.
Software pipelining: Symbolic loop unrolling
o Example code fragment.
Global code scheduling
o Trace scheduling: focusing on the critical path.
o Super blocks
o Example code fragment.
8. Explain in detail about hardware support for exposing more parallelism at compile time.
Conditional or Predicated instructions
o Example codes
Compiler speculation with hardware support.
o Hardware support for preserving exception behavior
o Hardware support for memory reference speculation
o Example codes.
9. Explain in detail about the Intel IA-64 instruction set architecture.
The IA-64 register model
Instruction format and support for explicit parallelism.
Instruction set basics
Predication and Speculation support
The Itanium processor
o Functional units and instruction issue
o Itanium performance
10. Explain the limitations of ILP.
Hardware model
Limitations of the window size and maximum issue count.
Effects of realistic branch and jump prediction
Effects of finite register.
Effects of imperfect alias analysis.
11. Explain in detail about symmetric shared memory architecture.
Define multiprocessor cache coherence.
Basic schemes for enforcing coherence.
o Define directory based
o Define snooping
Snooping protocols.
Basic implementation techniques.
An example protocol.
12. Explain the performance of symmetric shared-memory multiprocessors.
Define true sharing and false sharing.
Performance measurements of the commercial workload.
Performance of the multiprogramming and OS workload.
Performance for the scientific/technical workload.
13. Explain in detail about synchronization.
Basic hardware primitives.
o Define atomic exchange.
o Define test and set, fetch-and-increment, load linked and store conditional
instructions.
Implementing locks using coherence.
Synchronization performance challenges.
o Barrier synchronization
o Code for simple and sense reversing barrier.
Synchronization mechanisms for larger-scale multiprocessors.
o Software implementations
o Hardware primitives
14. Explain the models of memory consistency.
Sequential consistency.
Relaxed consistency models.
o W->R ordering
o W->W ordering
o R->W and R->R ordering
15. Explain the performance of symmetric shared-memory and distributed shared-memory
multiprocessors.
Symmetric shared-memory multiprocessors:
Define true sharing and false sharing.
Performance measurements of the commercial workload.
Performance of the multiprogramming and OS workload.
Performance for the scientific/technical workload.
Distributed shared-memory multiprocessor:
Miss rate
Memory access cost unit
16. Explain in detail about reducing cache miss penalty.
First miss penalty reduction technique: multilevel caches.
Second miss penalty reduction technique: critical word first and early restart.
Third miss penalty reduction technique: giving priority to read misses over writes.
Fourth miss penalty reduction technique: merging write buffer
Fifth miss penalty reduction technique: victim caches
17. Explain in detail about reducing miss rate.
First miss rate reduction technique: Larger block size.
Second miss rate reduction technique: Larger caches.
Third miss rate reduction technique: Higher associativity.
Fourth miss rate reduction technique: Way prediction and Pseudoassociative
caches
Fifth miss rate reduction technique: Compiler optimizations.
o Loop interchange
o Blocking
18. Explain in detail about memory technology.
DRAM technology.
SRAM technology.
Embedded processor memory technology: ROM and Flash
Improving memory performance in a standard DRAM chip
Improving memory performance via a new DRAM interface: RAMBUS
Comparing RAMBUS and DDR SDRAMDes.
19.Explain the types of storage devices.
Magnetic Disks
The future of magnetic disks.
Optical disks
Magnetic tapes
Automated tape libraries
Flash memory
20. Explain in detail about Buses-Connecting I/O devices to CPU/Memory
Bus design decisions
Bus standards
Interfacing storage devices to the CPU- Figure: A typical interface of I/O devices
and an I/O bus to the CPU-memory bus.
Delegating I/O responsibility from the CPU
21. Explain in detail about SMT.
Converting thread-level parallelism to instruction-level parallelism
Design challenges in SMT processors
Potential performance advantages from SMT
22. Explain about CMP architecture.
Define CMP
Architecture
Explanation
23. Explain detail about software and hardware multithreading
o Software multithreading
o Hardware multithreading
o Explanation
24. Explain about heterogeneous multi core processor.
o Define multi core processor.
o heterogeneous multi core processor
o Diagram
25. Explain about IBM cell processor
Define cell processor
Architecture
Explanation