Sei sulla pagina 1di 5

Subject: Computer Organization and Architecture

Assignment
1. In a k-way set associative cache, the cache is divided into v sets, each of which consists of k
lines. The lines of a set are placed in sequence one after another. The lines in set s are sequenced
before the lines in set (s+1). The main memory blocks are numbered 0 onwards. The main memory
block numbered j must be mapped to any one of the cache lines from.

(j mod v) * k to (j mod v) * k + (k-1)


mod v) + (k-
1)A)

B (j mod v) to (j mod v) + (k-1)


C (j mod k) to (j mod k) + (v-1)
D
(j mod k) * v to (j mod k) * v + (v-1)

Question 1 Explanation:
Number of sets in cache = v. So, main memory block j will be mapped to set (j mod v), which will
be any one of the cache lines from (j mod v) * k to (j mod v) * k + (k-1). (Associativity plays no
role in mapping- k-way associativity means there are k spaces for a block and hence reduces the
chances of replacement.)

Consider the following sequence of micro-operations.

MBR ← PC
MAR ← X
PC ← Y
Memory ← MBR

Which one of the following is a possible operation performed by this sequence?

A Instruction fetch
B Operand fetch
C Conditional branch
D Initiation of interrupt service

Question 2 Explanation:
MBR - Memory Buffer Register ( that stores the data being transferred to and from the immediate
access store) MAR - Memory Address Register ( that holds the memory location of data that needs
to be accessed.) PC - Program Counter ( It contains the address of the instruction being executed at
the current time ) The 1st instruction places the value of PC into MBR The 2nd instruction places an
address X into MAR. The 3rd instruction places an address Y into PC. The 4th instruction places the
value of MBR ( which was the old PC value) into Memory. Now it can be seen from the 1st and the
4th instructions, that the control flow was not sequential and the value of PC was stored in the
memory, so that the control can again come back to the address where it left the execution. This
behavior is seen in the case of interrupt handling. And here X can be the address of the location in
the memory which contains the beginning address of Interrupt service routine. And Y can be the
beginning address of Interrupt service routine. In case of conditional branch (as for option C ) only
PC is updated with the target address and there is no need to store the old PC value into the
memory. And in the case of Instruction fetch and operand fetch ( as for option A and B), PC value is
not stored anywhere else. Hence option D.
Question 3
Consider an instruction pipeline with five stages without any branch prediction: Fetch Instruction
(FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand
(WO). The stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively.
There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns. A
program consisting of 12 instructions I1, I2, I3, …, I12 is executed in this pipelined processor.
Instruction I4 is the only branch instruction and its branch target is I9. If the branch is taken during
the execution of this program, the time (in ns) needed to complete the program is

A 132
B 165
C 176
D 328

Explanation:
Pipeline will have to be stalled till Ei stage of l4 completes, as Ei stage will tell whether to take
branch or not. After that l4(WO) and l9(Fi) can go in parallel and later the following instructions.
So, till l4(Ei) completes : 7 cycles * (10 + 1 ) ns = 77ns.
From l4(WO) or l9(Fi) to l12(WO) : 8 cycles * (10 + 1)ns = 88ns.
Total = 77 + 88 = 165 ns.

4. A RAM chip has a capacity of 1024 words of 8 bits each (1K × 8). The number of 2 × 4 decoders
with enable line needed to construct a 16K × 16 RAM from 1K × 8 RAM is

A 4
B 5
C 6
D 7

RAM chip size = 1k ×8[1024 words of 8 bits each]

RAM to construct =16k ×16


Number of chips required = (16k x 16)/ ( 1k x 8)
= (16k x 2)
[16 chips vertically with each having 2 chips
horizontally]
So to select one chip out of 16 vertical chips,
we need 4 x 16 decoder.

Available decoder is 2 x 4 decoder


To be constructed is 4 x 16 decoder

Hence 4 + 1 = 5 decoders are required.

5. The following code segment is executed on a processor which allows only register operands in its
instructions. Each instruction can have atmost two source operands and one destination operand.
Assume that all variables are dead after this code segment.
c = a + b;
d = c * a;
e = c + a;
x = c * c;
if (x > a) {
y = a * a;
}
else {
d = d * d;
e = e * e;
}

6. Suppose the instruction set architecture of the processor has only two registers. The only allowed
compiler optimization is code motion, which moves statements from one place to another while
preserving correctness. What is the minimum number of spills to memory in the compiled code?

A 0
B 1
C 2
D 3
Explanation:
r1......r2
a.......b......c = a + b
a.......c......x = c * c
a.......x......but we will have to store c in mem as we don't know if x > a
................. or not
y.......x......y = a * a
choosing the best case of x > a , min spills = 1

Question 6
Consider the same data as above question. What is the minimum number of registers needed in the
instruction set architecture of the processor to compile this code segment without any spill to
memory? Do not apply any optimization other than optimizing register allocation.

A 3
B 4
C 5
D 6

Explanation: Note that for solving the above problem we are not allowed for code motion.
So, we will start analyzing the code line by line and determine how many registers will be
required to execute the above code snippet.
Assuming the registers are numbered R1, R2, R3 and R4. The analysis has been shown in the
table below
So from the above analysis we can conclude that we will need minimum 4 registers to execute the
above code snippet.

Question 7
The amount of ROM needed to implement a 4 bit multiplier is

A 64 bits
B 128 bits
C 1 Kbits
D 2 Kbits

Explanation: For a 4 bit multiplier, there are 24 * 24 combinations, i.e., 28 combinations.


Also, Output of a 4 bit multiplier is 8 bits.
Thus, the amount of ROM needed = 28 * 8 = 211 = 2048 bits = 2Kbits.

Question 8
Register renaming is done in pipelined processors

A as an alternative to register allocation at compile time


B for efficient access to function parameters and local variables
C to handle certain kinds of hazards
D as part of address translation
Explanation: Register renaming is done to avoid data hazards.

Question 9
A computer has a 256 KByte, 4-way set associative, write back data cache with block size of 32
Bytes. The processor sends 32 bit addresses to the cache controller. Each cache tag directory entry
contains, in addition to address tag, 2 valid bits, 1 modified bit and 1 replacement bit. The number
of bits in the tag field of an address is

A 11
B 14
C 16
D 27
Explanation: A set-associative scheme is a hybrid between a fully associative cache, and direct
mapped cache. It’s considered a reasonable compromise between the complex hardware needed for
fully associative caches (which requires parallel searches of all slots), and the simplistic direct-
mapped scheme, which may cause collisions of addresses to the same slot (similar to collisions in a
hash table).

Number of blocks = Cache-Size/Block-Size = 256 KB / 32 Bytes = 213

Number of Sets = 213 / 4 = 211

Tag + Set offset + Byte offset = 32


Tag + 11 + 5 = 32
Tag = 16.

Question 10

Consider the data given in previous question. The size of the cache tag directory is

160 Kbits
B 136 bits
C 40 Kbits
32 bits

Answer: (A)

Explanation: 16 bit address


2 bit valid
1 modified
1 replace
Total bits = 20
20 × no. of blocks
= 160 K bits.

Potrebbero piacerti anche