Cse320 Final Exam Practice Solutions

CSE320FinalExamPracticeQuestions
SingleCycleDatapath/MultiCycleDatapathAddinginstructions
Modifythedatapathandcontrolsignalstoperformthenewinstructionsinthecorrespondingdatapath.
Usetheminimalamountofadditionalhardwareandclockcycles/controlstates.
Remember:
Whenaddingnewinstructions,don'tbreaktheoperationofthestandardones.
AvoidaddingALUs,adders,RegFiles,ormemoriestothedatapath
YoucanaddMUXes,logicgates,etc.buttrytodominimally.(thesecostintermsofarea,cycle
time,etc)
a. LoadWordRegister(usesRinstructionformat)
lwrRt,Rd(Rs)#Reg[Rt]=Mem[Reg[Rd]+Reg[Rs]]
b. Add3operands(newinstructionformat:opcode(6),rs(5),rt(5),rd(5),rx(5),(6bitsnotused))
add3Rd,Rs,Rt,Rx#Reg[Rd]=Reg[Rs]+Reg[Rt]+Reg[Rx]
c. AddtoMemory(newinstructionformat:opcode(6),rs(5),rt(5),rd(5),offset(11))
addmRd,Rt,Offset(Rs)#Reg[Rd]=Reg[Rt]+Mem[signextendedoffset+Reg[Rs]]
d. BranchonlessthanorEqual(usesIinstructionformat)
blezRs,label#ifReg[Rs]<0,PC=PC+4+(signextendedoffset<<2)
e. BranchEqualtoMemory(newinstructionformat:opcode(6),rs(5),rt(5),rd(5),offset(11))
beqmRd,Rt,Offset(Rs)#ifReg[Rt]=Mem[Offset+Reg[Rs]],PC=PC+4+Reg[Rd]
f. BranchEqualto0toImmediate(usesRinstructionformat)
beqzi(Rs),Label#ifMem[Reg[Rs]]=0,thenPC=PC+(signextendedoffset)
(NOTE:ThisisnotPC+4,andnotshiftedby2)
g. StoreWordandIncrement
swincRt,offset(Rs)#Mem[Reg[Rs]+signextendedoffset]=Reg[Rt],Reg[Rs]=Reg[Rs]+4
h. StoreWordandDecrement
swdecRt,offset(Rs)#Mem[Reg[Rs]+signextendedoffset]=Reg[Rt],Reg[Rs]=Reg[Rs]4
Whatifyouweretoadd(g)and(h)simultaneouslytothedatapaths?
DatapathTiming
1. Calculatethedelayinthemodifieddatapathswhenperforminginstructionsabove.Assumethe
followingdelays:
Memory:200ps
RegisterFilesAccess(READ/Write):50ps
ALUandadders:100ps
LogicGatesandMultiplexors:1ps
Allothertimesarenegligible
2. CalculatetheminimalclockcycletimeifallofthenewinstructionswereaddedintheSingleand
Multicyclecases.
OtherDatapathQuestions
GivenMIPScode,canyoudetermine..
WhatishappeningatclockcycleXintheSingleCycleDatapath?OrwhatcycleisoperationX
happening?
WhatishappeningatclockcycleXintheMultiCycleDatapath?OrwhatcycleisoperationX
happening?
Howmanycyclesitwilltaketoexecutethecode?
Canyouidentifythesignals(controlandvalues)inthedatapathforagivenclockcycle?
Andotherquestionsofthisnature.
ShortAnswerMiscQuestions
1. Whatistheprimaryadvantageoffixedsizedopcodes?
Instruction decode is faster and more efficient. Control does not need to determine the
length/ position of the opcode in the instruction.
2. Willaspeedupof20on50%ofaprogramresultinanoverallspeedupofatleast2times?
Explainyouranswer
ThenewoverallspeedupiscalculatedaccordingtoAmdahlsLaw.Foranoverallspeedupof2,
thenewexecutiontimemustbe50%orlessoftheoldexecutiontime.
No.Thenewoverallexecutiontime=50%+50%/20=52.5%oftheold.
3. Whatarethe5componentsofamoderncomputersystem(Hint:Twoofthemcanbecombined
andcalledtheprocessor)
Datapath+control=processor,memory,input,output
4. Whatisastoredprogramcomputer?
Acomputerwheretheinstructionoftheprogramarestoredinmemory,theCPUisassignedthe
taskoffetchingtheinstructionfrommemory,decodingthemandexecutingthem.
5. TrueorFalse:
Programexecutiontimeincreasewhentheinstructioncountincrease(IC)TRUE
Inaload/storearchitecture,theonlyinstructionsthataccessmemoryareloadandstore
types.TRUE
Morepowerfulinstructionsleadtohigherperformancesincethetotalnumberof
instructionsexecutedissmallerforagiventaskwithmorepowerfulinstructions.FALSE
Anaddoperationhas3operands(2inputand1output),thereforeaddinstructions
mustbe3addressinstructions.FALSE
6. Inasystemexecutingjobs,whenisthroughput=1/latency?
Throughputofamachineisthenumberofinstructionswhichareexecutedpersecond.Latency
isthelengthoftimeperexecutionofaninstruction.
Throughput=1/latencywhenasystemisexecutingonetaskatatimeeg.Inasingleormulti
cycledatapath
Pipleliningincreasesthroughputsincemultipleinstructionsareexecutingsimultaneously.
Thereforethelatencyofeachinstruction(onaverage)isshorterthanthelengthofan
instruction.
7. Whataretheadvantagesanddisadvantagesofwritethroughandwritebackcache
modificationsinsharedmemorysystems?
Writethroughwillslowthesystemdown,takingmoretimeforeachwrite.However,witha
writebackcache,theremaybedatacontentionsincethemultiplereferencescouldbe
referencingthedatawhenitisdirty.
Pipelining
1. Whatarethemainbenefitsanddisadvantagesofpipelining?
2. Namethetypeofpipelininghazards.Definehowandwhentheycanoccurinsystems(in
general).Definehow/whentheyoccurinMIPS.GiveaMIPSdatapathorcodeexampleofeach
type.
3. Manyprocessorshave5or6stagepipelines.AtypicalvaluefortheCPI(cyclesperinstruction)in
suchprocessorsisintherangeof1.0to1.5.Doesitmeanthatthelatencyofexecutionofmost
instruction1or2clockcycles?Why,orwhynot?
4. Whydoconditionalbranchesimpacttheperformanceofapipelinedimplementation?
5. Brieflydescribe2solutionstoreducetheperformanceimpactofconditionalbranchinstructions
inapipelinedimplementation.
6. Givensequencesofinstructionsdeterminetheforwardingpathsandrequiredstalls
Performance
1. Thecomputerspends82%ofthetimecomputingand18%waitingforthedisk.Theinstruction
mixandtheaveragecyclesperinstruction(CPI)foreachtypeis:
Type
Instruction %
CPI
int
40%
FP
30%
Other
30%
a. Consider3modificationstothecomputer.Computethespeedupforeach.
i.
Theprocessorisreplacedwithanewonethatreducesthetotalcomputationtime
by35%.
Speedup = 1 / ((1 - 0.82) + 0.82 * 0.65) = 1.40
ii.
Thediskisreplacedwithasolidstatedevicethatreducesthediskwaitingtimeby
85%.
Speedup = 1 / ((1 - 0.18) + 0.18 * 0.15) = 1.18
iii.
Theprocessorisreplacedwithanewonethathasimprovedfloatingpoint
performance.TheaveragefloatingpointCPIisreducedto3;allotheraspectsare
unchanged.
Average CPI (old) = 0.40 * 1 + 0.30 * 5 + 0.30 * 2 = 2.5

Average CPI (enhanced) = 0.40 * 1 + 0.30 * 3 + 0.30 * 2 = 1.9
Speedup (computation) = 2.5 / 1.9 = 1.315
Speedup = 1 / ((1 - 0.82) + 0.82 / 1.315) = 1.244
b. Whichmodificationgavethebestspeedup?
Modification(i)providesthebestspeedup.
c. Forthetwomodificationsinpart(i)thatdidnotresultinthebestspeedup,isitpossiblefor
themtoachievethespeedupachievedbythemodificationinpart(ii)?Showyourworkand
explainyouranswer.
i. Aninfineitelyfastdask:Speedup=1/(10.18)=1.22whichisstillslowerthani
ii. IfFPonly1clockcycle:
Average CPI (enhanced) = 0.40 * 1 + 0.30 * 1 + 0.30 * 2 = 1.3
Speedup (computation) = 2.5 / 1.3 = 1.92
Speedup = 1 / ((1 - 0.82) + 0.82 / 1.92) = 1.647
2. YouhavetwoRiSC16processorsXandZ,withthefollowingcharacteristics.Theyareboth
multicycleprocessors,inwhichaninstructionexecutesinavariablenumberofprocessor
cycles.XandZexecutevariationsonthesameinstructionset(RiSC)isthefollowingway:
a. ProcessorXimplementsthebaseinstructionset,includingLUI.ProcessorXimplements
multiplicationinsoftware,meaningthereisnotMULTinstruction.
b. ProcessorZeliminatestheLUIinstructioninfavorofaMULTinstruction,gettingLUI
functionalityfromLW.
c. ProcessorZsMULTinstructionusestheALUover&overagaininaloop,performing
shiftsandconditionaladds,andrequires80processorcyclespermultiply.
d. ExecutingoneMULTinstructiononProcessorZeliminatesonaverage30instructions
thatwouldbeexecutedonProcessorXwhenimplementedinasoftware.However,
ProcessorZthenneedadditionalALUfunctionalitywhichincreasingtheALUscritical
pathfrom10nsto12ns.
Also,Assumethefollowing:
- Cacheread/write:10ns
- Registerfileread/write:8ns
- ALUoperation:10nsforprocessorX,12nsforprocessorZ
Assumethefollowingdistributionofinstructiontypes(assumethatLUIrequires3cycles):
MULT
LUI
LW
SW
RType
BEQ
ProcessorX
0%
5%
20%
10%
45%
20%
ProcessorZ
5%
0%
25%
10%
40%
20%
Forexample,ifprocessorZexecutes5MULTinstructionsoutofevery100.ForeachMULT
instruction,processorXexecutesanadditional30instructions.
a. Comparetheexecutiontimesofthetwoprocessors.
Executiontime=TIC=Cycletime*InstructionCount*AverageCPI
Exectime(x)=Tx*Ix*Cx
Exectime(z)=Tz*Iz*Cz
Tx=10ns;Tz=12ns
IfIz=100,Ix=95+(5*30)=245
CPIforeachinstructiontype:
MULT=80cycles,LUI=3,LW=5,SW=4,RTYPE=4,BEQ=3
Therefore:
AverageCPIforX=Cx=(0.05*3)+(0.2*5)+(0.1*4)+(0.45*4)+(0.2*3)=3.95
AverageCPIforZ=Cz=(0.05*80)+(0.25*5)+(0.1*4)+(0.4*4)+(0.2*3)=7.85
Comparingexecutiontimes:
Exectime(x)=10ns/c*3.95c/i*245i=9677.5ns
Exectime(z)=12ns/c*7.85c/i*100i=9420ns
ProcessorZwiththemultiplyinstructionisabout1.03timesfasterthanprocessorXforthis
instructionmix.
b. AtwhatclockspeedforprocessorZarethetwodesignsequalinperformance?
EquatingthetwoexcutiontimesandsolvingforTz
10ns*3.95cpi*245instructions=Tz*7.85cpi*100instructions
Tz=(10ns*3.95*245)/(7.85*100)
Tz=12.33ns
ForsmallerTz(fasterclock),processorZhasbetterperformance;forlargerTz(slowerclock),
processorXhasbetterperformance.
c. (moredifficult)AssumingtheoriginalALUlatencyforprocessorZ(12ns),howfastwould
yoursoftwareemulatedmultiplyhavetobe(onaverage)forprocessorXtobejustasfastas
processorZ?Inotherwords,howmanyinstructionswouldprocessorXexecuteinplaceof1
MUL?
RememberthatIxwasdefinedtobe95+(#ofmultiplications)*(costofeach)
WefirstneedtofindtheinstructioncountofprocessorXnecessaryforequalperformance.
10ns*3.95*Ix=12ns*7.85*100
whichimpliesIx=238.5
Totalnumberofmultiplyemulateinstructionsis(238.595)=143.5
Thereforenumberofinstructionspermultiply=143.5/5=~28instructions
3. Twoimportantparameterscontroltheperformanceofaprocessor:cycletimeandcyclesper
instruction.Thereisanenduringtradeoffbetweenthesetwoparametersinthedesignprocess
ofmicroprocessors.Whilesomedesignersprefertoincreasetheprocessorfrequencyatthe
expenseoflargeCPI,otherdesignersfollowadifferentschoolofthoughtinwhichreducingthe
CPIcomesattheexpenseoflowerprocessorfrequency.Considerthefollowingmachines,and
comparetheirperformanceusingthefollowinginstructionmix:25%loads,13%stores,47%ALU
instructions,and15%branches/jumps.Assumetheunmodifiedmulticycledatapathandfinite
statemachine.
M1:Themulticycledatapathisdesignedwitha1GHzclock
M2:AmachinelikeM1exceptthatregisterupdatesaredoneinthesameclockcycleasa
memoryreadofALUoperation.Thusinthefinitestatemachine,states6and7andstates3
and4arecombined.Thismachinehasan3.2GHzclock,sincetheregisterupdateincreases
thelengthofthecriticalpath.
M3:AmachinelikeM2exceptthateffectiveaddresscalculationsaredoneinthesameclock
cycleasamemoryaccess.Thusstates2,3,and4canbecombined,ascan2and5,aswellas
6and7.Thismachinehasa2.8GHzclockbecauseofthelongcyclecreatedbycombining
addresscalculationandmemoryaccess.
Findoutwhichofthemachinesisfastest.Arethereinstructionmixesthatwouldmakeanother
machinefaster,andifso,whatarethey?
IntheoriginalmulticycledatapaththeCPIforeachinstructionisasfollows:
Loads:5cycles
Stores:4cycles
ALU:4cycles
Branch/Jumps:3cycles
PerformanceM1:
AverageCPI=.25*5+.13*4+.47*4+.15*3=4.1
CycleTime=(CPI*#instructions)/clockrate=4.1I/1GHz=4.1I*109seconds
PerformanceM2:
Loadsshortento4cycles
ALUsshortento3cycles
AverageCPI=.25*4+.13*4+.47*3+.15*3=3.38
CycleTime=(CPI*#instructions)/clockrate=3.38I/3.2GHz=1.06I*109seconds
PerformanceM3:
Loadsshortento3cycles
Storesshortento3cycles
ALUsshortento3cycles
AverageCPI=.25*3+.13*3+.47*3+.15*3=3
CycleTime=(CPI*#instructions)/clockrate=3I/2.8GHz=1.07I*109seconds
M2isfastest.

M1canneverbefasterthanM2,evenifalltheinstructionsarebranchinstructions,theCPIwill
be3forall3cases,andtheclockrateisfasterontheother2processors.
M3canbefasterthanM2,ifallinstructionloadsorallstoresthen
Ex:
M2:AverageCPI=1*4+0*4+0*3+0*3=4
M3:AverageCPI=1*3+0*3+0*3+0*3=3
M2CycleTime=(CPI*#instructions)/clockrate=4I/3.2GHz=1.25I*109seconds
M3CycleTime=(CPI*#instructions)/clockrate=3I/3.2GHz=1.07I*109seconds
ReviewAdderandALUCreationandBuildinglargerALUsfromunits
Considerthe4bitALUbelowwhichcanperformthefollowing5operations:add,sub,AND,ORand
negateB.
InputsareA={A3,A2,A1,A0},B={B3,B2,B1,B0},andCin.OutputsareresultR={R3,R2,R1,R0}andCout.Numbers
arein2scomplementform.Fillinthetablebelow,foreachoperation,whatthevaluesofthecontrol
signalsshouldbe.Indicatedontcareswhereappropriate.
Operation
Add
Sub
OR
AND
NegateB
m1
0
0
1
0
0
M0
0
0
0
1
0
DigitalLogic
1. UsingBooleanalgebra,provethefollowing:
Cin
0
1
X
X
0
BINV
0
1
0
0
0
Az
1
1
1
1
1
a. bd+cd=((bd)+(cd))
bd+cd=(bd)(cd)
bd+cd=(b+d)(c+d)
bd+cd=bc+cd+bd+dd
bd+cd=bc+cd+bd+0
bd+cd=bc(d+d)+cd+bd
bd+cd=bcd+bcd+cd+bd
bd+cd=cd(1+b)+bd(c+1)
bd+cd=bd+cd
b. abc+bcd+abd=abc+abd
DeMorgans
DeMorgans
Distributive
Complementary
Null
Commutative
Distributive
Null
abc+bcd(a+a)+abd=abc+abd
abc+abcd+abcd+abd=abc+abd
abc(1+d)+abd(c+1)=abc+abd
abc+abd=abc+abd
c. a+a(ab+bc)=a+b+c
a+a((ab)(bc))=a+b+c
a+a((a+b)(b+c))=a+b+c
a+a(ab+bb+ac+bc)=a+b+c
a+a(ab+0+ac+bc)=a+b+c
a+a(ab+ac+bc)=a+b+c
a+aab+aac+abc=a+b+c
a+ab+ac+abc=a+b+c
a+a(b+c+bc)=a+b+c
a+b+c+bc=a+b+c
a+b+c(1+b)=a+b+c
a+b+c=a+b+c
Null
Distributive
Commutative
Null
DeMorgans
DeMorgans
Distributive
Complementary
Distributive
Idempotence
Idempotence
NoName
Null
2. Considerthefollowingfunction:z(x3,x2,x1,x0)=x3x2+x3x1x0+x3x2x0+x3x2x0
a. Howmanyliteralsdoeszcontain?
11
b. Isz,minimal?Ifnot,findtheminimalexpressionusingBooleanalgebra.
No.
x3x2+x3x1x0+x3x2x0+x3x2x0
x3x2(1+x0)+x3x1x0+x3x2x0
x3x2(1+x0)+x3x1x0+x3x2x0
x3x2+x3x2x0+x3x1x0+x3x2x0
x3x2+x3x1x0+x2x0
c. Findtheequivalentsumofminterms(SOP)forz(usingmnotation)
z(x3,x2,x1,x0)=m(4,5,6,7,11,12,14,15)
d. Findtheequivalentproductofmaxterms(POS)forz(usingMnotation)
z(x3,x2,x1,x0)=M(0,1,2,3,8,9,10,13)
3. Youwererecentlyhiredasanengineerinacompanythatdesignsalarmsystemscustommade
tomeetthecustomersspecifications.Youareaskedtodesignasystemthatusestheinputsof
threesensorsA,B,andC.Thealarmshouldgooff(activated)whenthefollowingcriteriaare
met:
WhenAisoff,or
WhenBisonandCisoff,or
WhenbothAandCareon.
a. Writethetruthtableforthefunction
Alarm
b. WritetheBooleanexpressioninProductofSums(POS)form.
A+B+C
c. Draw/Implementthefunctionusing2selectorMUXgates.
a.
d. Draw/Implementthefunctionusing2levelNANDNANDgates.
Thestraightforwardsolutionusingminterms/SOPexpression:
ABetterSolutionusingDemorganslaw:A+B+C=((A+B+C))=(ABC)
4. Findtheminimal2levelimplementationusingNORNORgates,ofasystemwithtwo2bit
inputs(A={a1,a0}&B={b1,b0})whichoutputthefollowing.IfA+Biseven,thentheoutputis
theirproduct.IfA+Bisodd,thentheoutputistheirsum.
a1 a0 b1 b0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
A+B
Z
Z3 Z2 Z1 Z0
(decimal) (decimal)
0
0
0 0 0 0
1
1
0 0 0 1
2
0
0 0 0 0
3
3
0 0 1 1
1
0
0 0 0 0
2
2
0 0 1 0
3
3
0 0 1 1
4
3
0 0 1 1
2
0
0 0 0 0
3
3
0 0 1 1
4
4
0 1 0 0
5
5
0 1 0 1
3
3
0 0 1 1
4
3
0 0 1 1
5
5
0 1 0 1
6
9
1 0 0 1
Z3=a1a0b1b0
Z2=a1a0b1+a1a0b1b0=a1a0b1+a1b1b0
Z1=a1b1b0+a1a0b0+a1a0b1+a1a0b1+a1b1b0
Z0=(a0+b0)(a1+a0+b1
5. Implementthefunctionalityofa2inputDecoderusingminimalAND,ORandNOTgates.
Decoderstakeabinarynumberandmapthisvaluetoanoutputline.2inputvalues,means4
differentvalues(4outputs)
S1
0
0
1
1
S0
0
1
0
1
F3
0
0
0
1
F2
0
0
1
0
F1
0
1
0
0
F0
1
0
0
0
6. Implementa7segmentController.(truthtable,Booleanexpressions,gatelogic).Practicewith
anycombinationoflogicunits.
q0
q1
q2
q3
A
B
C
D
E
F
G
7-segment
Controller
7. Implementthe7segmentusinga4selectorDEMUXandOrgates.

Cse320 Final Exam Practice Solutions

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Cse320 Final Exam Practice Solutions

Caricato da

Copyright:

Formati disponibili

CSE320FinalExamPracticeQuestions

Average CPI (old) = 0.40 * 1 + 0.30 * 5 + 0.30 * 2 = 2.5

Potrebbero piacerti anche