Sei sulla pagina 1di 14

Name:

SID (last 6):

Email:

CMSC 411 Computer Architecture


Sample Final
Draft of Answers
Fall, 2019
Michelle Carter McElvany Hugue, PhD

rtFp

Instructions:
• The Goal: Show me what you know.
• The Method: For each section, read the directions carefully.
• The Plan: If you don’t know what to do, skip to another problem.
• The Warning: There is about a test-and-a-half of work here–don’t be intimidated.
• The Requirement: Explain your answer clearly and with appropriate assumptions.
• The Reward: Generous partial credit for correct and clear partial answers.
• The Penalty: No partial credit for content-free answer. Nothing relevant or correct, no credit. So don’t waste
the effort.
• The Expected Outcome: You never have to take me again, unless you really want to do so.

1
Scratch Paper–Just in Case

2
Problem 1: Now, It’s Marsupials? (30 pts)
Welcome to the land of uncertainty. Circle either True or False as appropriate for the statement below. If
you feel the need to write an explanation, don’t–it won’t be graded; so, just circle False instead.

True False [1.1 To be meaningful, benchmarks must specify a suite of programs to be run; a set of
compiler parameter requirements; and a set of hardware specifications for which expected results are
available from a trusted source.

True False [1.2 Decreasing the average IC for the CPU will never decrease the execution times of
programs run on that CPU.

True False 1.3 In the context of conditional branches, the address of the “taken” instruction is in the
PC (program counter) after the branch instruction is fetched.

True False [1.4 Executing the MIPS command LH R0, 8(R2) will never cause a byte alignment error,
regardless of the value in register R2.

True False [1.5 The floating point pipe-like-pipe (FPLP) increased the amount of ILP available by
using pipelined functional units where possible.

True False [1.6 In the superscalar Tomasulo algorithm, instructions start executing in compiler order.

True False [1.7 In Tomasulo’s algorithm, instructions leave the reservation stations and enter the
functional units in the order that they were issued.

True False [1.8 In a superscalar machine (also called multiple-issue), any two instructions may be
issued during the same clock cycle.

True False [1.9 Tomasulo’s algorithm prevents all data hazards using register renaming.

True False [1.10 Old sequential machines, often described as SISD, were not vulnerable to hazards.

True False [1.11 Superscalar architectures may issue more than one instruction per clock cycle, but
need not.

True False [1.12 Branch prediction using only prediction bits does not require compiler assistance.

True False [1.13 Current machines called GP GPU’s (General Purpose Graphic Processing Units) can
support SIMD applications that don’t require graphical outputs.

True False [1.14 The MIPS64 architecture supports only register, immediate, and displacement ad-
dressing modes.

True False [1.15 The Branch Target and Branch Prediction Buffers are static methods of handling
control hazards.

3
Problems 2 and 3 are essentially the practice material for week fourteen. Check the work-
sheet and quiz solutions linked to the Week 14 page for more detailed answers.

Problem 2: Dingos Around? Really Talk Strangely (42pts)


The correct answer(s) to the 3 point questions in this problem come from the set of terms below. Your job
is to select all of the following scalar pipelined architectures that satisfy the statements below, and circle
the letter(s) corresponding to your answer in the space provided. That is, for full credit, you must capture
all possible correct answers. Assume that the letters F and T refer to scalar implementations which employ
static methods of dealing with control hazards. Zero points for a blank answer.

Letter Potential Answer


F Floating Point Pipe-like Pipe w/forwarding, bypassing, and load interlocks
T Tomasulo’s Algorithm
N None (None of the above)

F T N (2.1) Eliminates all stalls due to control hazards.

F T N (2.2) Start of next instruction stalled until WAR hazards have cleared.

F T N (2.3) All instructions stall in execute until WAW hazards clear.

F T N (2.4) Out-of-Order instruction completion can never occur.

F T N (2.5) The functional units receive all values directly from registers.

F T N (2.6) Removes hazards associated with anti-dependences by register renaming.

F T N (2.7) Supports out-of-order instruction issue.

F T N (2.8) Suffers from structural hazards associated with the Common Data Bus.

F T N (2.9) Exploits instruction level parallelism.

F T N (2.10) Forwarding, bypassing, and load interlocks are used to avoid WAR hazards.

F T N (2.11) Stalls, waiting for correct operands, to fix RAW hazards.

F T N (2.12) Has a pipelined floating point divide unit.


F T N (2.13) Uses static instruction scheduling.

F T N (2.14) Issues at least one instruction every clock cycle.

4
Problem 3: Under? New Option! (52 points)
3.1–3.8 Suppose that we have implemented a (0,3) GPR machine, using a Superscalar Speculative Tomasulo
(SST) architecture that executes the MIPS64 ISA.
General Instructions:

• Read each statement below carefully.


• Find the corresponding entry at the bottom of this page.
• Circle T or F, as appropriate.
• Circle the letters of the relevant attribute of the SST SUperscalar, SPeculative, or TOmasulo

For example, the correct response to the statement Uses register renaming to prevent data dependences
from becoming hazards would be to circle F and TO, since register renaming is not associated with
either SUperscalar or SPeculative.
If you feel the need to write an explanation don’t–it won’t be graded; so, just circle F instead–and at
least guess which attribute, SU, SP, or T rules the day. Each correctly circled item is worth two (2)
points.

3.1 Out-of-order issue in the SST increases the amount of ILP available to be exploited.
3.2 The SST reorder buffer supports Out-of-order instruction completion.
3.3 The SST uses register renaming to prevent WAR hazards.
3.4 The Ideal CPI of the SST is greater than or equal to one because several instructions can complete
simultaneously.
3.5 Compiler support for the SST always identifies safe instructions and schedules them to run im-
mediately after a branch is encountered in the MIPS64 code intended for the SST.
3.6 The SST can have at most one iteration of a loop in flight simultaneously.
3.7 In the SST, instructions are eligible to start running as soon as their operands are ready.
3.8 Committed instructions in the SST exit the reorder buffer in compiler order.

3.1 T F and SU SP TO
3.2 T F and SU SP TO
3.3 T F and SU SP TO
3.4 T F and SU SP TO
3.5 T F and SU SP TO
3.6 T F and SU SP TO
3.7 T F and SU SP TO
3.8 T F and SU SP TO

5
3.9–3.18 Who am I? Identify which of the methods below (for minimizing the impact of control
hazards) is introducing itself in the sentences that follow. Each correct response is worth two
points. If more than one answer is correct, you need only indicate one of them for full credit
of 2pts.
Potential Answers:
D (branch Delay slots.)
F (branch Folding)
P (branch Prediction buffer)
T (branch Target buffer without prediction)
None (None of the above)

3.9 I am a static method to decrease the stalls caused by control hazards.

3.10 I am a dynamic method that strives to decrease the impact of structural hazards
on execution time.

3.11 I contain the next instruction to be executed after some known unconditional
branch instructions.

3.12 I am modified only after I know that the branch was predicted taken.

3.13 I require compiler support to decrease all stalls or no-ops caused by control
hazards.

3.14 I contain the “predicted next PC ” for some taken branches in a program.

3.15 I always contain the next PC value after the branch condition has been evalu-
ated.

3.16 I contain the predicted PC for jumps.

3.17 I may consult a 2-bit saturating predictor.

3.18 I can remove all stalls associated with incorrect branch prediction.

6
Problem 4: Racing Injures Startled Koalas (60pts)
Fill in the short answer that best matches the following phrase, where your answers come from the following
set of terms. Not all need to be used, and terms may be reused, as well. Each correct answer is worth up to
3 points. Zero credit for a blank answer.

anti-dependence associativity branch target buffer


clock rate conditional branch conflict miss
control-Z data dependence data hazard
dirty bits dynamic schedule effective address
exception forwarding immediate AND
index bits reservation station sign extend
speculation speedup structural hazard
superscalar Tomasulo unroll loop
unsigned binary valid bits VLIW

4.1 Amdahl’s Law:

4.2 Cache lines per set

4.3 Ideal CPI less than 1

4.4 CPU Performance × instruction count × clocks per instruction

4.5 Potential RAW hazard

4.6 Out-of-Order Issue

4.7 Zero Extend

4.8 Instructions start executing out of compiler order

4.9 Common Data Bus

4.10 Dynamic branch prediction

7
anti-dependence associativity branch target buffer
clock rate conditional branch conflict miss
control-Z data dependence data hazard
dirty bits dynamic schedule effective address
exception forwarding immediate AND
index bits reservation station sign extend
speculation speedup structural hazard
superscalar Tomasulo unroll loop
unsigned binary valid bits VLIW

4.11 Cache has been modified

4.12 Register Renaming

4.13 Reorder Buffer

4.14 A recoverable exception

4.15 BNEZ soccer, R6

4.16 Location of data at execution time

4.17 Most Significant Bit of Immediate Operand

4.18 Direct Mapped

4.19 Predicted PC

4.20 Compiler optimization

8
Problem 5: Pademelon Offers Lorikeet Opals! (30pts)
Welcome to the land of uncertainty, where a ’null’ response means ’no opinion–no points’. In this problem,
You are to assess the veracity (that’s truthfulness) of the statements that follow. Your answer must come
from the selections below. Blank answers are worth zero (0) points. Correct answer are worth two(2) points.

Letter Potential Answer


C Can be True
A Always True
N Never True
O Other (Give your own answer)

Hint: Mark the ones that you can tell are absolutely false ( N ) or absolutely true (A). Then, go back and
look at the others more carefully.

C A N 5.1 Tomasulo’s algorithm uses register renaming to avoid WAR hazards in


the MIPS64 SST.
C A N 5.2 More than one instruction will be started during a single clock cycle in a
superscalar architecture.
C A N 5.3 MIPS64 immediate logic instructions zero-extend the immediate operands
from 16 bits to 64 bits.
C A N 5.4 Compulsory misses occur only when the machine is first booted up.
C A N 5.5 Pure RISC architectures such as MIPS64 allow ALU instructions to get
one operand directly from memory.
C A N 5.6 When a cache miss occurs, the memory manager must retrieve the desired
block from RAM.
C A N 5.7 Increasing the clock rate of a desktop computer will improve the rate of
program execution on the machine.
C A N 5.8 The execution time of a computer program is independent of the average
number of clock cycles per instruction.
C A N 5.9 A write miss does not require allocation of a block into cache when write
through is used.
C A N 5.10 It is safe to permit memory reads and writes to occur in an order different
than that given in the compiler.
C A N 5.11 The block offset in a memory address represents the location of the rele-
vant information in the block.
C A N 5.12 The number of index bits required by a fully associative cache having 128
lines is 7.
C A N 5.13 The average miss penalty for a given layer of memory is more than the
actual miss penalty experienced for a given program.
C A N 5.14 A conflict miss occurs when all entries in at least one set of a 4-way set
associative cache are occupied by valid blocks.
C A N 5.15 An L1 cache block is larger than a block in main memory.

9
Problem 6: Pull Over! Kangaroo Enters Race (24pts)
Once again, you have the privilege of being the memory management system. Use the information below
regarding the cache organization and the breakdown of the address of a byte in memory to determine if the
indicated memory requests result in a hit or a miss.

• Memory is byte addressable.

• Memory accesses are to 1-byte words (not 4-byte words).

• Logical addresses are 13 bits wide.

• The cache is 2-way set associative, with a 4 byte line size and 16 total lines.

Based on this information, we have the following:


The block offset: CO is 2 bits
The cache index: CI is 3 bits
The cache tag: CT is 8 bits
In the following tables, all numbers are given in hexadecimal. The contents of the cache are as
follows:
2-way Set Associative Cache
Index Tag Valid Byte 0 Byte 1 Byte 2 Byte 3 Tag Valid Byte 0 Byte 1 Byte 2 Byte 3
0 09 1 86 30 3F 10 00 0 99 04 03 48
1 45 1 60 4F E0 23 42 1 00 BC 0B 37
2 EB 0 2F 81 FD 09 0B 0 8F E2 05 BD
3 06 0 3D 94 9B F7 32 1 12 08 7B AD
4 C7 1 06 78 07 C5 05 1 40 67 C2 3B
5 71 1 0B DE 18 4B 6E 0 B0 39 D3 F7
6 91 1 A0 B7 26 2D F0 0 0C 71 40 10
7 16 1 B1 0A 32 0F DE 1 12 C0 88 37

10
Part 1 (10pt)
For the given address, indicate the cache entry accessed and the cache byte value returned in hex. Indicate
whether a cache miss occurs.
If there is a cache miss, enter “-” for “Cache Byte returned”.
Address: 0x 1D33

Address format (one bit per box)


12 11 10 9 8 7 6 5 4 3 2 1 0

Memory reference

Parameter Value
Byte offset 0x
Cache Index 0x
Cache Tag 0x
Cache Hit? (Y/N)
Cache Byte returned 0x

Part 2 (10pt)
For the given address, indicate the cache entry accessed and the cache byte value returned in hex. Indicate
whether a cache miss occurs.
If there is a cache miss, enter “-” for “Cache Byte returned”.
Adress: 0x 0845

Address format (one bit per box)


12 11 10 9 8 7 6 5 4 3 2 1 0

Memory reference

Parameter Value
Byte offset 0x
Cache Index 0x
Cache Tag 0x
Cache Hit? (Y/N)
Cache Byte returned 0x

11
Problem 7: Brisbane Onions, Cabbage, Carrots–Excellent! (60pts)
The L3 cache for an Intel Itanium machine with byte-addressable memory is 32M bytes, with a block size
of 4K bytes. Answer the following 4-point questions about potential cache organizations for this machine,
assuming that the address has 64 bits.

7.1 Diagram the address of a typical memory access, assuming that the organization of the cache is 4-way
set associative. Label all parts of the address clearly with name and size.

7.2 Suppose that your colleague modifies the cache organization so that you have 12 index bits. What is the
organization of this new cache, assuming that no other cache parameters were altered? Be as specific
as possible, and justify your work for full credit.

7.3 Another colleague proposes a different modification to the 32MB cache. She raises the block size to 8K
bytes, she makes the cache fully associative and writeback. Draw a diagram of a typical cache slot
or cache line associated with this configuration, being sure to label each field clearly with name and
size.

12
7.4 Write an expression for the number of bits of overhead required to access information in the 32MB
cache, as a function of t, the number of tag bits for a machine with 64 address bits. For this problem
only, assume that the block size is 8K and that space is present for a dirty bit. To assist the grader,
make sure you label all fields in the cache line that are considered part of the overhead. Note: since t
is represented by a variable, you don’t have to know anything about the associativity of the cache to
answer this question.

Erase the assumptions for the first four questions from your mind for the remaining questions about
Memory Matters

7.5 Write an expression for memory access time, given that the hit time (HT) is 2 clock cycles, the miss
rate is 5% and the miss penalty is 500 clock cycles.

13
7.6 Suppose that we consider a slightly different memory model. The miss penalty for reads is 500 clock
cycles; however, the miss penalty for writes is now 750 clock cycles. Separate memories are used for
instructions and data, with the following values for the appropriate parameters.

• The miss rate for instructions is 7%,


• The miss rate for data is 13%.
• Loads transfer data from memory to registers, and make up 30% of the the instructions.
• Stores transfer data from registers to memory, and comprise 15% of the instructions.

Write an expression for the memory access time when the cache is not perfect.

Other Practice Materials

The topics that follow will be covered at a level significantly less than they were in the worksheets,
quizzes and exam in the first portion of the course. It might be better to use the detailed answer
keys to review the material instead of rolling back to the original information in the videos and books.
Don’t forget to visit the speedup related word problems, the floating point conversion problems (you
will be given the floating point formats, the so-called fixed point representations when the binary point
is always to the right of the least significant bit. You should also expect to write or analyze some MIPS
code.

14

Potrebbero piacerti anche