Sei sulla pagina 1di 13

CSE320FinalExamPracticeQuestions

SingleCycleDatapath/MultiCycleDatapathAddinginstructions
Modifythedatapathandcontrolsignalstoperformthenewinstructionsinthecorrespondingdatapath.
Usetheminimalamountofadditionalhardwareandclockcycles/controlstates.
Remember:

Whenaddingnewinstructions,don'tbreaktheoperationofthestandardones.
AvoidaddingALUs,adders,RegFiles,ormemoriestothedatapath
YoucanaddMUXes,logicgates,etc.buttrytodominimally.(thesecostintermsofarea,cycle
time,etc)

a. LoadWordRegister(usesRinstructionformat)
lwrRt,Rd(Rs)#Reg[Rt]=Mem[Reg[Rd]+Reg[Rs]]
b. Add3operands(newinstructionformat:opcode(6),rs(5),rt(5),rd(5),rx(5),(6bitsnotused))
add3Rd,Rs,Rt,Rx#Reg[Rd]=Reg[Rs]+Reg[Rt]+Reg[Rx]
c. AddtoMemory(newinstructionformat:opcode(6),rs(5),rt(5),rd(5),offset(11))
addmRd,Rt,Offset(Rs)#Reg[Rd]=Reg[Rt]+Mem[signextendedoffset+Reg[Rs]]
d. BranchonlessthanorEqual(usesIinstructionformat)
blezRs,label#ifReg[Rs]<0,PC=PC+4+(signextendedoffset<<2)
e. BranchEqualtoMemory(newinstructionformat:opcode(6),rs(5),rt(5),rd(5),offset(11))
beqmRd,Rt,Offset(Rs)#ifReg[Rt]=Mem[Offset+Reg[Rs]],PC=PC+4+Reg[Rd]
f. BranchEqualto0toImmediate(usesRinstructionformat)
beqzi(Rs),Label#ifMem[Reg[Rs]]=0,thenPC=PC+(signextendedoffset)
(NOTE:ThisisnotPC+4,andnotshiftedby2)
g. StoreWordandIncrement
swincRt,offset(Rs)#Mem[Reg[Rs]+signextendedoffset]=Reg[Rt],Reg[Rs]=Reg[Rs]+4
h. StoreWordandDecrement
swdecRt,offset(Rs)#Mem[Reg[Rs]+signextendedoffset]=Reg[Rt],Reg[Rs]=Reg[Rs]4
Whatifyouweretoadd(g)and(h)simultaneouslytothedatapaths?
DatapathTiming
1. Calculatethedelayinthemodifieddatapathswhenperforminginstructionsabove.Assumethe
followingdelays:

Memory:200ps
RegisterFilesAccess(READ/Write):50ps
ALUandadders:100ps
LogicGatesandMultiplexors:1ps
Allothertimesarenegligible

2. CalculatetheminimalclockcycletimeifallofthenewinstructionswereaddedintheSingleand
Multicyclecases.

OtherDatapathQuestions
GivenMIPScode,canyoudetermine..

WhatishappeningatclockcycleXintheSingleCycleDatapath?OrwhatcycleisoperationX
happening?
WhatishappeningatclockcycleXintheMultiCycleDatapath?OrwhatcycleisoperationX
happening?
Howmanycyclesitwilltaketoexecutethecode?
Canyouidentifythesignals(controlandvalues)inthedatapathforagivenclockcycle?
Andotherquestionsofthisnature.

ShortAnswerMiscQuestions
1. Whatistheprimaryadvantageoffixedsizedopcodes?

Instruction decode is faster and more efficient. Control does not need to determine the
length/ position of the opcode in the instruction.

2. Willaspeedupof20on50%ofaprogramresultinanoverallspeedupofatleast2times?
Explainyouranswer
ThenewoverallspeedupiscalculatedaccordingtoAmdahlsLaw.Foranoverallspeedupof2,
thenewexecutiontimemustbe50%orlessoftheoldexecutiontime.
No.Thenewoverallexecutiontime=50%+50%/20=52.5%oftheold.

3. Whatarethe5componentsofamoderncomputersystem(Hint:Twoofthemcanbecombined
andcalledtheprocessor)
Datapath+control=processor,memory,input,output
4. Whatisastoredprogramcomputer?
Acomputerwheretheinstructionoftheprogramarestoredinmemory,theCPUisassignedthe
taskoffetchingtheinstructionfrommemory,decodingthemandexecutingthem.

5. TrueorFalse:
Programexecutiontimeincreasewhentheinstructioncountincrease(IC)TRUE
Inaload/storearchitecture,theonlyinstructionsthataccessmemoryareloadandstore
types.TRUE
Morepowerfulinstructionsleadtohigherperformancesincethetotalnumberof
instructionsexecutedissmallerforagiventaskwithmorepowerfulinstructions.FALSE
Anaddoperationhas3operands(2inputand1output),thereforeaddinstructions
mustbe3addressinstructions.FALSE
6. Inasystemexecutingjobs,whenisthroughput=1/latency?
Throughputofamachineisthenumberofinstructionswhichareexecutedpersecond.Latency
isthelengthoftimeperexecutionofaninstruction.

Throughput=1/latencywhenasystemisexecutingonetaskatatimeeg.Inasingleormulti
cycledatapath

Pipleliningincreasesthroughputsincemultipleinstructionsareexecutingsimultaneously.
Thereforethelatencyofeachinstruction(onaverage)isshorterthanthelengthofan
instruction.

7. Whataretheadvantagesanddisadvantagesofwritethroughandwritebackcache
modificationsinsharedmemorysystems?
Writethroughwillslowthesystemdown,takingmoretimeforeachwrite.However,witha
writebackcache,theremaybedatacontentionsincethemultiplereferencescouldbe
referencingthedatawhenitisdirty.

Pipelining

1. Whatarethemainbenefitsanddisadvantagesofpipelining?
2. Namethetypeofpipelininghazards.Definehowandwhentheycanoccurinsystems(in
general).Definehow/whentheyoccurinMIPS.GiveaMIPSdatapathorcodeexampleofeach
type.
3. Manyprocessorshave5or6stagepipelines.AtypicalvaluefortheCPI(cyclesperinstruction)in
suchprocessorsisintherangeof1.0to1.5.Doesitmeanthatthelatencyofexecutionofmost
instruction1or2clockcycles?Why,orwhynot?
4. Whydoconditionalbranchesimpacttheperformanceofapipelinedimplementation?
5. Brieflydescribe2solutionstoreducetheperformanceimpactofconditionalbranchinstructions
inapipelinedimplementation.
6. Givensequencesofinstructionsdeterminetheforwardingpathsandrequiredstalls

Performance

1. Thecomputerspends82%ofthetimecomputingand18%waitingforthedisk.Theinstruction
mixandtheaveragecyclesperinstruction(CPI)foreachtypeis:
Type

Instruction %

CPI

int

40%

FP

30%

Other

30%

a. Consider3modificationstothecomputer.Computethespeedupforeach.
i.
Theprocessorisreplacedwithanewonethatreducesthetotalcomputationtime
by35%.
Speedup = 1 / ((1 - 0.82) + 0.82 * 0.65) = 1.40

ii.
Thediskisreplacedwithasolidstatedevicethatreducesthediskwaitingtimeby
85%.
Speedup = 1 / ((1 - 0.18) + 0.18 * 0.15) = 1.18

iii.
Theprocessorisreplacedwithanewonethathasimprovedfloatingpoint
performance.TheaveragefloatingpointCPIisreducedto3;allotheraspectsare
unchanged.

Average CPI (old) = 0.40 * 1 + 0.30 * 5 + 0.30 * 2 = 2.5


Average CPI (enhanced) = 0.40 * 1 + 0.30 * 3 + 0.30 * 2 = 1.9
Speedup (computation) = 2.5 / 1.9 = 1.315
Speedup = 1 / ((1 - 0.82) + 0.82 / 1.315) = 1.244

b. Whichmodificationgavethebestspeedup?

Modification(i)providesthebestspeedup.

c. Forthetwomodificationsinpart(i)thatdidnotresultinthebestspeedup,isitpossiblefor
themtoachievethespeedupachievedbythemodificationinpart(ii)?Showyourworkand
explainyouranswer.
i. Aninfineitelyfastdask:Speedup=1/(10.18)=1.22whichisstillslowerthani
ii. IfFPonly1clockcycle:
Average CPI (enhanced) = 0.40 * 1 + 0.30 * 1 + 0.30 * 2 = 1.3
Speedup (computation) = 2.5 / 1.3 = 1.92
Speedup = 1 / ((1 - 0.82) + 0.82 / 1.92) = 1.647

2. YouhavetwoRiSC16processorsXandZ,withthefollowingcharacteristics.Theyareboth
multicycleprocessors,inwhichaninstructionexecutesinavariablenumberofprocessor
cycles.XandZexecutevariationsonthesameinstructionset(RiSC)isthefollowingway:
a. ProcessorXimplementsthebaseinstructionset,includingLUI.ProcessorXimplements
multiplicationinsoftware,meaningthereisnotMULTinstruction.
b. ProcessorZeliminatestheLUIinstructioninfavorofaMULTinstruction,gettingLUI
functionalityfromLW.
c. ProcessorZsMULTinstructionusestheALUover&overagaininaloop,performing
shiftsandconditionaladds,andrequires80processorcyclespermultiply.
d. ExecutingoneMULTinstructiononProcessorZeliminatesonaverage30instructions
thatwouldbeexecutedonProcessorXwhenimplementedinasoftware.However,
ProcessorZthenneedadditionalALUfunctionalitywhichincreasingtheALUscritical
pathfrom10nsto12ns.
Also,Assumethefollowing:
- Cacheread/write:10ns
- Registerfileread/write:8ns
- ALUoperation:10nsforprocessorX,12nsforprocessorZ

Assumethefollowingdistributionofinstructiontypes(assumethatLUIrequires3cycles):

MULT
LUI
LW
SW
RType
BEQ

ProcessorX
0%
5%
20%
10%
45%
20%

ProcessorZ
5%
0%
25%
10%
40%
20%

Forexample,ifprocessorZexecutes5MULTinstructionsoutofevery100.ForeachMULT
instruction,processorXexecutesanadditional30instructions.
a. Comparetheexecutiontimesofthetwoprocessors.
Executiontime=TIC=Cycletime*InstructionCount*AverageCPI
Exectime(x)=Tx*Ix*Cx
Exectime(z)=Tz*Iz*Cz
Tx=10ns;Tz=12ns

IfIz=100,Ix=95+(5*30)=245
CPIforeachinstructiontype:
MULT=80cycles,LUI=3,LW=5,SW=4,RTYPE=4,BEQ=3

Therefore:
AverageCPIforX=Cx=(0.05*3)+(0.2*5)+(0.1*4)+(0.45*4)+(0.2*3)=3.95
AverageCPIforZ=Cz=(0.05*80)+(0.25*5)+(0.1*4)+(0.4*4)+(0.2*3)=7.85

Comparingexecutiontimes:
Exectime(x)=10ns/c*3.95c/i*245i=9677.5ns
Exectime(z)=12ns/c*7.85c/i*100i=9420ns

ProcessorZwiththemultiplyinstructionisabout1.03timesfasterthanprocessorXforthis
instructionmix.
b. AtwhatclockspeedforprocessorZarethetwodesignsequalinperformance?

EquatingthetwoexcutiontimesandsolvingforTz
10ns*3.95cpi*245instructions=Tz*7.85cpi*100instructions
Tz=(10ns*3.95*245)/(7.85*100)
Tz=12.33ns

ForsmallerTz(fasterclock),processorZhasbetterperformance;forlargerTz(slowerclock),
processorXhasbetterperformance.
c. (moredifficult)AssumingtheoriginalALUlatencyforprocessorZ(12ns),howfastwould
yoursoftwareemulatedmultiplyhavetobe(onaverage)forprocessorXtobejustasfastas
processorZ?Inotherwords,howmanyinstructionswouldprocessorXexecuteinplaceof1
MUL?
RememberthatIxwasdefinedtobe95+(#ofmultiplications)*(costofeach)
WefirstneedtofindtheinstructioncountofprocessorXnecessaryforequalperformance.
10ns*3.95*Ix=12ns*7.85*100
whichimpliesIx=238.5
Totalnumberofmultiplyemulateinstructionsis(238.595)=143.5
Thereforenumberofinstructionspermultiply=143.5/5=~28instructions

3. Twoimportantparameterscontroltheperformanceofaprocessor:cycletimeandcyclesper
instruction.Thereisanenduringtradeoffbetweenthesetwoparametersinthedesignprocess
ofmicroprocessors.Whilesomedesignersprefertoincreasetheprocessorfrequencyatthe
expenseoflargeCPI,otherdesignersfollowadifferentschoolofthoughtinwhichreducingthe
CPIcomesattheexpenseoflowerprocessorfrequency.Considerthefollowingmachines,and
comparetheirperformanceusingthefollowinginstructionmix:25%loads,13%stores,47%ALU
instructions,and15%branches/jumps.Assumetheunmodifiedmulticycledatapathandfinite
statemachine.
M1:Themulticycledatapathisdesignedwitha1GHzclock
M2:AmachinelikeM1exceptthatregisterupdatesaredoneinthesameclockcycleasa
memoryreadofALUoperation.Thusinthefinitestatemachine,states6and7andstates3
and4arecombined.Thismachinehasan3.2GHzclock,sincetheregisterupdateincreases
thelengthofthecriticalpath.
M3:AmachinelikeM2exceptthateffectiveaddresscalculationsaredoneinthesameclock
cycleasamemoryaccess.Thusstates2,3,and4canbecombined,ascan2and5,aswellas
6and7.Thismachinehasa2.8GHzclockbecauseofthelongcyclecreatedbycombining
addresscalculationandmemoryaccess.
Findoutwhichofthemachinesisfastest.Arethereinstructionmixesthatwouldmakeanother
machinefaster,andifso,whatarethey?
IntheoriginalmulticycledatapaththeCPIforeachinstructionisasfollows:

Loads:5cycles

Stores:4cycles

ALU:4cycles

Branch/Jumps:3cycles

PerformanceM1:

AverageCPI=.25*5+.13*4+.47*4+.15*3=4.1

CycleTime=(CPI*#instructions)/clockrate=4.1I/1GHz=4.1I*109seconds

PerformanceM2:

Loadsshortento4cycles

ALUsshortento3cycles

AverageCPI=.25*4+.13*4+.47*3+.15*3=3.38

CycleTime=(CPI*#instructions)/clockrate=3.38I/3.2GHz=1.06I*109seconds

PerformanceM3:

Loadsshortento3cycles

Storesshortento3cycles

ALUsshortento3cycles

AverageCPI=.25*3+.13*3+.47*3+.15*3=3

CycleTime=(CPI*#instructions)/clockrate=3I/2.8GHz=1.07I*109seconds

M2isfastest.


M1canneverbefasterthanM2,evenifalltheinstructionsarebranchinstructions,theCPIwill
be3forall3cases,andtheclockrateisfasterontheother2processors.

M3canbefasterthanM2,ifallinstructionloadsorallstoresthen

Ex:
M2:AverageCPI=1*4+0*4+0*3+0*3=4

M3:AverageCPI=1*3+0*3+0*3+0*3=3

M2CycleTime=(CPI*#instructions)/clockrate=4I/3.2GHz=1.25I*109seconds

M3CycleTime=(CPI*#instructions)/clockrate=3I/3.2GHz=1.07I*109seconds

ReviewAdderandALUCreationandBuildinglargerALUsfromunits
Considerthe4bitALUbelowwhichcanperformthefollowing5operations:add,sub,AND,ORand
negateB.
InputsareA={A3,A2,A1,A0},B={B3,B2,B1,B0},andCin.OutputsareresultR={R3,R2,R1,R0}andCout.Numbers
arein2scomplementform.Fillinthetablebelow,foreachoperation,whatthevaluesofthecontrol
signalsshouldbe.Indicatedontcareswhereappropriate.

Operation
Add
Sub
OR
AND
NegateB

m1
0
0
1
0
0

M0
0
0
0
1
0

DigitalLogic
1. UsingBooleanalgebra,provethefollowing:

Cin
0
1
X
X
0

BINV
0
1
0
0
0

Az
1
1
1
1
1

a. bd+cd=((bd)+(cd))
bd+cd=(bd)(cd)

bd+cd=(b+d)(c+d)

bd+cd=bc+cd+bd+dd
bd+cd=bc+cd+bd+0
bd+cd=bc(d+d)+cd+bd
bd+cd=bcd+bcd+cd+bd
bd+cd=cd(1+b)+bd(c+1)
bd+cd=bd+cd

b. abc+bcd+abd=abc+abd

DeMorgans

DeMorgans

Distributive
Complementary

Null
Commutative
Distributive
Null

abc+bcd(a+a)+abd=abc+abd
abc+abcd+abcd+abd=abc+abd
abc(1+d)+abd(c+1)=abc+abd
abc+abd=abc+abd

c. a+a(ab+bc)=a+b+c
a+a((ab)(bc))=a+b+c

a+a((a+b)(b+c))=a+b+c

a+a(ab+bb+ac+bc)=a+b+c
a+a(ab+0+ac+bc)=a+b+c

a+a(ab+ac+bc)=a+b+c

a+aab+aac+abc=a+b+c

a+ab+ac+abc=a+b+c

a+a(b+c+bc)=a+b+c

a+b+c+bc=a+b+c

a+b+c(1+b)=a+b+c

a+b+c=a+b+c

Null
Distributive
Commutative
Null

DeMorgans
DeMorgans
Distributive
Complementary

Distributive
Idempotence
Idempotence

NoName
Null

2. Considerthefollowingfunction:z(x3,x2,x1,x0)=x3x2+x3x1x0+x3x2x0+x3x2x0
a. Howmanyliteralsdoeszcontain?
11
b. Isz,minimal?Ifnot,findtheminimalexpressionusingBooleanalgebra.
No.

x3x2+x3x1x0+x3x2x0+x3x2x0
x3x2(1+x0)+x3x1x0+x3x2x0
x3x2(1+x0)+x3x1x0+x3x2x0
x3x2+x3x2x0+x3x1x0+x3x2x0
x3x2+x3x1x0+x2x0

c. Findtheequivalentsumofminterms(SOP)forz(usingmnotation)

z(x3,x2,x1,x0)=m(4,5,6,7,11,12,14,15)

d. Findtheequivalentproductofmaxterms(POS)forz(usingMnotation)

z(x3,x2,x1,x0)=M(0,1,2,3,8,9,10,13)

3. Youwererecentlyhiredasanengineerinacompanythatdesignsalarmsystemscustommade
tomeetthecustomersspecifications.Youareaskedtodesignasystemthatusestheinputsof
threesensorsA,B,andC.Thealarmshouldgooff(activated)whenthefollowingcriteriaare
met:

WhenAisoff,or
WhenBisonandCisoff,or
WhenbothAandCareon.

a. Writethetruthtableforthefunction

Alarm

b. WritetheBooleanexpressioninProductofSums(POS)form.
A+B+C
c. Draw/Implementthefunctionusing2selectorMUXgates.

a.

d. Draw/Implementthefunctionusing2levelNANDNANDgates.

Thestraightforwardsolutionusingminterms/SOPexpression:

ABetterSolutionusingDemorganslaw:A+B+C=((A+B+C))=(ABC)

4. Findtheminimal2levelimplementationusingNORNORgates,ofasystemwithtwo2bit
inputs(A={a1,a0}&B={b1,b0})whichoutputthefollowing.IfA+Biseven,thentheoutputis
theirproduct.IfA+Bisodd,thentheoutputistheirsum.
a1 a0 b1 b0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1

0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1

0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1

0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1

A+B
Z
Z3 Z2 Z1 Z0
(decimal) (decimal)
0
0
0 0 0 0
1
1
0 0 0 1
2
0
0 0 0 0
3
3
0 0 1 1
1
0
0 0 0 0
2
2
0 0 1 0
3
3
0 0 1 1
4
3
0 0 1 1
2
0
0 0 0 0
3
3
0 0 1 1
4
4
0 1 0 0
5
5
0 1 0 1
3
3
0 0 1 1
4
3
0 0 1 1
5
5
0 1 0 1
6
9
1 0 0 1

Z3=a1a0b1b0
Z2=a1a0b1+a1a0b1b0=a1a0b1+a1b1b0
Z1=a1b1b0+a1a0b0+a1a0b1+a1a0b1+a1b1b0

Z0=(a0+b0)(a1+a0+b1

5. Implementthefunctionalityofa2inputDecoderusingminimalAND,ORandNOTgates.
Decoderstakeabinarynumberandmapthisvaluetoanoutputline.2inputvalues,means4
differentvalues(4outputs)

S1
0
0
1
1

S0
0
1
0
1

F3
0
0
0
1

F2
0
0
1
0

F1
0
1
0
0

F0
1
0
0
0

6. Implementa7segmentController.(truthtable,Booleanexpressions,gatelogic).Practicewith
anycombinationoflogicunits.

q0
q1
q2
q3

A
B
C
D
E
F
G

7-segment
Controller

7. Implementthe7segmentusinga4selectorDEMUXandOrgates.

Potrebbero piacerti anche