Co - Svew

1/6/2011
1
Introduction to Computer
Def .of Computer:
A computer is an electronic machine
that
takes data and Instructions as input p
operates on the stored data or on the
received data (Processing)
displays the processed data as output.
1 COMPUTERORGANIZATION SVEW
Modern computers vary in cost,
speed, size and capabilities. Based
on which We have several types of
Computers as listed below:
DigitalComputer
Personal Computer PersonalComputer
PortableNotebookComputer
Workstations
Mainframes
SuperComputer
But Still Most of the components are the
SAME SAME
Whatarethosecomponents
??
1/6/2011
2
FunctionalUnits
Memory
CentralProcessingUnit
InputUnit
OutputUnit
FunctionalUnits
Memory
Input and
Arithmetic
logic
Basic functional units of a computer.
I/O Processor
Output Control
UnderstandingComponents Memory
Memoryisanessentialcomponentofcomputer.
Memory of the computer is an ordered sequence of storage
locations called memory cells.
Each cell is identified by unique address which can be used to
store or access the information.
The contents of the memory cell can contain any type of data The contents of the memory cell can contain any type of data
or instructions.
The size of the memory cell usually varies from computer to
computer.
It is represented as bytes.
Few computers use 1 byte for 1 memory cell, few use 2 bytes
and some more use either 4 or 8 or 16 bytes or even more.
Byte is composed of even smaller units of storage called bits
where, bit is binary digit which can be 0 or 1. One byte is 8 bits
UnderstandingComponents
Memory(Contd.)
StorageandRetrievalofinformationinMemory
The computer can store or retrieve value to or from
memory.
To store a value the computer sets each bit of the memory
cell to either 0 or 1. For Example to store A in a memory
cell the bits are set as 01000001 if one byte of memory is cell the bits are set as 01000001 if one byte of memory is
allocated to memory cell. (i.e., for the character A the
ASCII (American Standard Code for Information
Interchange) code is 65 which is converted to bits as
01000001)
To retrieve a value the computer copies the patterns of 0s
and 1s stored in the memory cell to another storage area
for processing.
1/6/2011
3
Memory(Contd.) MainMemory
Mainmemorystoresprograms,dataandresults.Therearetwo
typesofmemorynamelyRAM(RandomAccessMemory)andROM
(ReadOnlyMemory).
RAM:
This memory is used by processor for temporary storage of program,
data and results.
It is volatile in nature which means the program or data will be lost
if power is switched off.
ROM:
This memory consists of predefined instructions defined by the
manufacturer and are usually executed during system start up.
It is non volatile in nature which means instructions are not lost if
power is switched off. It is read only memory.
UnderstandingComponents Memory(Contd.)
SecondaryStorageDevices
Computersneedadditionalstoragedevicesotherthan
themainmemoryfortworeasons
1.Becausetheprogram,dataorinstructionsaretobe
storedinapermanentareasothatitcanberetainedwhen
requiredand
2.Itcanstoremoreinformationthanmainmemory.
Someofthefrequentlyusedsecondarystoragedevices
areharddisk,magnetictape,floppydisk,zipdisk,CD
(CompactDisc)andDVD(DigitalVideoDisk).
EachofthesedisksstoragecapacityvariesfromMB
(MegaBytes)toGB(GigaBytes).
Memory(Contd.)
Tableshowingstoragecapacitiesintermsof
bytes
Term Abbreviation Equivalent to
Byte B 8bits y
Kilobyte KB 2
10
bytes
Megabyte MB 2
20
bytes
Gigabyte GB 2
30
bytes
Terabyte TB 2
40
bytes
Memory(Contd.)
Tableshowingthememorycapacityof
secondarystoragedevices.
Secondary Storage Device Storage Capacity
HardDiskDrive 60GB or more
DVD 4.7GB or more
CD 650MB to700MB
Zipdrive 250MB or more
Floppydiskdrive 1.44MB
1/6/2011
4
UnderstandingComponents CPU
It is the heart of the computer system.
It interprets the program in the main memory and
accordingly generates control signals to other components.
Within CPU(Processor) there is one unit ALU which
performs arithmetic and logical operations when required.
Wi hi CPU(P ) h i h i CU hi h Within CPU(Processor) there is another unit CU which
controls and coordinates the operations between the
components. It generates signals for I/O transfer.
The CPU also uses a high speed storage location called
register for temporary computations and the register is
located inside the CPU.
UnderstandingComponents Input
andOutputDevices
These devices are used to communicate with the computer
by allowing the programmer or user to enter data for
computation and to observe the results after computation.
Keyboard is the standard input device used to key data,
instructions or program. On the other end mouse is
another input device used to move the mouse pointer to a another input device used to move the mouse pointer to a
location on the screen and perform operation of clicking
graphical icons which has defined functionalities. Other
input devices are scanners and touch pads.
Monitor is the standard output device that is used to
display the processed information onto the screen. Other
output devices are printers and speakers.
SecondaryStorageDevices
Computersneedadditionalstoragedevicesotherthan
themainmemoryfortworeasons
1.Becausetheprogram,dataorinstructionsaretobe
storedinapermanentareasothatitcanberetainedwhen
required.
It can store more information than main memory Itcanstoremoreinformationthanmainmemory.
Someofthefrequentlyusedsecondarystoragedevices
are
harddisk,magnetictape,floppydisk,zipdisk,CD
(CompactDisc)andDVD(DigitalVideoDisk).Eachof
thesedisksstoragecapacityvariesfromMB(Mega
Bytes)toGB(GigaBytes).
BasicOperationalConcepts
1/6/2011
5
Review
Activityinacomputerisgovernedbyinstructions.
Toperformatask,anappropriateprogramconsistingofa
listofinstructionsisstoredinthememory.
Individualinstructionsarebroughtfromthememoryinto
theprocessor.
Datatobeusedasoperandsarealsostoredinthe
memory.
ATypicalInstruction
AddLOCA,R0
AddtheoperandatmemorylocationLOCAtothe
operandinaregisterR0intheprocessor.
PlacethesumintoregisterR0.
The original contents of LOCA are preserved TheoriginalcontentsofLOCAarepreserved.
TheoriginalcontentsofR0isoverwritten.
Instructionisfetchedfromthememoryintothe
processor theoperandatLOCAisfetchedandaddedto
thecontentsofR0 theresultingsumisstoredin
registerR0.
SeparateMemoryAccessandALU
Operation
LoadLOCA,R1
AddR1,R0
Whosecontentswillbeoverwritten?
Transfer of instructions/data between
memory and processor is done by sending the
address of the memory location to be
accessed from the memory unit and by issuing
appropriate control signals.
Below is shown a diagram showing the Below is shown a diagram showing the
connection between memory and processor.
1/6/2011
6
ConnectionBetweentheProcessor
andtheMemory
Memory
MDR
Control
MAR
Figure 1.2. Connections between the processor and the memory.
Processor
PC
IR
ALU
R
n 1 -
R
1
R
0
n general purpose
registers
Registers
Instructionregister(IR)
Programcounter(PC)
Generalpurposeregister(R
0
R
n1
)
Memoryaddressregister(MAR)
Memorydataregister(MDR)
Instructionregister:holdstheinstructionthatis
currentlybeingexecuted.
Programcounter:holdstheaddressofnext
instructiontobefetched.
MemoryAddressRegister(MAR):holdsthe
addressofthememorylocationtobeaccessed
eitherforreadingorwriting
( ) MemoryDataRegister(MDR):holdsthedatato
bereadfrommemoryorthedatatobewritten
intomemory.
GeneralPurposeRegisters:ThereareR0..Rn1
generalpurposeregistersthatisusedbythe
processorforintermediatecalculations.
TypicalOperatingSteps
Programsresideinthememorythroughinput
devices
PCissettopointtothefirstinstruction
ThecontentsofPCaretransferredtoMAR
A R d i l i h AReadsignalissenttothememory
Thefirstinstructionisreadoutandloadedinto
MDR
ThecontentsofMDRaretransferredtoIR
Decodeandexecutetheinstructionthatisthere
intheIR.
1/6/2011
7
TypicalOperatingSteps(Cont)
GetoperandsforALU
Generalpurposeregister
Memory(addresstoMAR Read MDRtoALU)
PerformoperationinALU p
Storetheresultback
Togeneralpurposeregister
Tomemory(addresstoMAR,resulttoMDR Write)
Duringtheexecution,PCisincrementedto
thenextinstruction
BUSSTRUCTURES
A group of lines that serves as a connecting
path for several devices is called a bus.
Thegroupoflinesare:
Data lines Datalines
Addresslines
Controllines
Common Way of connecting the functional
units is through single bus structure which is
shown below:
Memory Input Output Processor
Figure 1.3. Single-bus structure.
Different devices have different
transfer/operate speed.
If the speed of bus is bounded by the slowest
device connected to it, the efficiency will be
very low.
How to solve this? How to solve this?
A common approach use buffers(buffer
registers).
1/6/2011
8
SYSTEMSOFTWARE
Software is a collection of programs that are executed
when needed to perform functions such as:
Receiving and interpreting user commands.
Entering and editing application programs.
Managing the storage and retrieval of files from
secondary storage devices secondary storage devices.
Running application programs like msword, msexcel
etc.
Controlling i/o units to receive i/p information and to
produce o/p results.
Translating program from source form to object form.
Linking & running user written application programs.
Multiprocessors
Largecomputersystemsthatcontainmorethan
oneprocessoriscalledasmultiprocessors.
Thesesystemshavethecapabilityofprocessing
different applications in parallel differentapplicationsinparallel
Thesesystemshavethecapabilityofprocessinga
largetaskintoseveralsmallsubtasksinparallel.
Allprocessorsinthesystemhaveaccessto
memoryandhencealsocalledasSharedMemory
Multiprocessors
Multicomputers
group of complete computers getting
interconnected is called as multi computers.
Thesesystemshavethecapabilityofprocessing
within the system only. withinthesystemonly.
Ifthereisneedtocarryoutthetaskswhichneeds
datafromothersystemthenthatisdonethrough
communicationchannelsbysendingmessages.
hencecalledasMessagePassingMulti
computers.
Performance
1/6/2011
9
Performance
Themostimportantmeasureofacomputeris
howquicklyitcanexecuteprograms.
Threefactorsaffectperformance:
Hardware design Hardwaredesign
Instructionset
Compiler
Performance
Theprocessorandthecachememorycanbefabricatedonasingle
chipsothatthebasicstepsforinstructionprocessingisfasterthanthe
conventionalapproach.
Main
memory
Processor
Cache
memory memory
Bus
memory
Figure 1.5. The processor cache.
ProcessorClock
Processorcircuitsarecontrolledbytimingsignals
calledasclock.
Clockdefinesregulartimeintervalscalledclockcycle.
Theexecutionofeachmachineinstructionisdivided
into several steps each of which requires one clock intoseveralsteps,eachofwhichrequiresoneclock
cycle.
Clockratedefines:cyclespersecond Hertz.
UsuallyMillioncyclespersecondiscalledasMega
Hertz.
TrillioncyclespersecondcalledasTera Hertz.
BasicPerformanceEquation
T processortimerequiredtoexecuteaprogramthathasbeen
preparedinhighlevellanguage
N numberofactualmachinelanguageinstructionsneededto
completetheexecution(note:loop)
S averagenumberofbasicstepsneededtoexecuteonemachine
instruction.Eachstepcompletesinoneclockcycle
R clockrate
Note:thesearenotindependenttoeachother
R
S N
T

HowtoimproveT?
1/6/2011
10
TheObjectiveofthedesigneristominimizeT,
minimizeNandSandincreaseR.
Designershavetothinkoverofwhatchanges
should be made in the hardware design to shouldbemadeinthehardwaredesignto
achievetheaboveobjective.
HowtoMinimizeS.
?
PipelineandSuperscalarOperation
Improvement in performance can be achieved
overlapping the execution of successive instructions
called as Pipelining. This leads to minimizing S.
Superscalar operation multiple instruction pipelines are
implemented in the processor. This leads to minimizing S p p g
much more than the previous approach.
ClockRate(R)
Increaseclockrate
Improve the integratedcircuit (IC) technology to make the circuits
faster
Reduce the amount of processing done in one basic step (however,
this may increase the number of basic steps needed)
Increases in R that are entirely caused by
improvements in IC technology affect all
aspects of the processors operation equally
except the time to access the main memory.
CISCandRISC
Itismucheasiertoimplementefficientpipelininginprocessor
withsimpleinstructionsets
ReducedInstructionSetComputers(RISC)
Complex Instruction Set Computers (CISC) ComplexInstructionSetComputers(CISC)
1/6/2011
11
Compiler
Acompilertranslatesahighlevellanguageprograminto
asequenceofmachineinstructions.
ToreduceN,weneedasuitablemachineinstructionset
andacompilerthatmakesgooduseofit.
Goal reduceNS
Note:A compilermaynotbedesignedforaspecific
processor;however,ahighqualitycompilerisusually
designedfor,andwith,aspecificprocessor.
PerformanceMeasurement
Tisdifficulttocompute.
Measurecomputerperformanceusingbenchmarkprograms.
SystemPerformanceEvaluationCorporation(SPEC)selectsandpublishes
representativeapplicationprogramsfordifferentapplicationdomains,
togetherwithtestresultsformanycommerciallyavailablecomputers.
Compileandrun(nosimulation)
R f t Referencecomputer
n
i
n
i
SPEC rating SPEC
rating SPEC
1
1
) (
under test computer on the time Running
computer reference on the time Running

MultiprocessorsandMulticomputers
Multiprocessorcomputer
Executeanumberofdifferentapplicationtasksinparallel
Executesubtasksofasinglelargetaskinparallel
Allprocessorshaveaccesstoallofthememory sharedmemory
multiprocessor
Cost processors,memoryunits,complexinterconnectionnetworks
Multicomputers
Eachcomputeronlyhaveaccesstoitsownmemory
Exchangemessageviaacommunicationnetwork messagepassing
multicomputers
DataRepresentation
Datadefinedascollectionoffactsandfigures.
Informationdefinedasprocesseddata.
Informationincomputersisusuallystoredin
memory or processor registers memoryorprocessorregisters.
Processorregisterscontaineitherdataor
controlinformation.
wherecontrolinformationconsistsof
sequenceofbitsthatsendsignalsfor
manipulationofdata.
1/6/2011
12
Differenttypesofdataresideincomputer
registersinbinaryform.
Numbers
Letters
symbols
FixedPointRepresentation
Introduction:
Conventional approach of representing positive
and negative numbers is to precede the number with
a sign.
Unfortunately , the computer hardware has its
own limitations.
How these numbers are represented in the
registers.
Assume that if 8 bits are allocated for a register
then the Most Significant Bit is used to represent the
sign.
If it 0 (zero), it represents positive number
If it 1, it represents negative number
Inadditiontothesign,anumbermayhave
adecimalpoint.
thepositionofthedecimal(binary)pointis
neededtorepresentfractions,integers,mixed
integerfractionnumber.
Thispositionisfixedintheregister.
Th f i hi Therearetwowaysofrepresentingthis
position.
fixedpointrepresentation
floatingpointrepresentation
FixedRepresentation:
binarypointisalwaysfixedinoneposition.
iteithercanbetotheextremeleftoftheregister
torepresentthestorednumberasfraction
oritcanbetotheextremerightoftheregisterto
representthestorednumberasinteger.
In the absence of the decimal point the number is Intheabsenceofthedecimalpointthenumberis
assumedaseitherfractionorinteger.
FloatingRepresentation:
itusesasecondregistertostorethenumberthat
designatesthepositionofthedecimalpointin
thefirstregister.
1/6/2011
13
IntegerRepresentation
Apositiveintegernumberisrepresentedin
binaryformwiththesignbitas0(MSB)and
theremainingbitsarerepresentedinbinary
form form.
ANegativeintegernumbercanbe
representedinoneofthefollowingways:
SignedMagnitude
SignedonesComplement
SignedTwosComplement
SignedMagnitude:
Negativenumberconsistsofmagnitude
andanegativesign.
SignedOnesComplement
Thebitsthatrepresentthepositiveinteger
numberiscomplementedtorepresentnegative
integer number. integernumber.
SignedTwosComplement
Thebitsthatrepresentthepositiveinteger
numberiscomplementedandthenbinary
value1isaddedtoLSBtorepresentnegative
integernumber.
ForExample
Note:
IntegerValue Method BitsRepresentation
+10 00001010
10 Signed Magnitude 10001010
10 SignedOnes
complement
11110101
10 Signedtwos
complement
11110110
SignedMagnitudeisusedforordinaryarithmetic.
SignedOnesComplementwereusedforarithmetic
Operationsinoldersystems.Atpresentthisapproach
mostlyusedforlogicaloperation.
SignedTwosComplementisusedforrepresenting
Negativenumbersandarithmeticoperations.
ARITHMETICADDITION:SIGNEDMAGNITUDE
[1]Comparetheirsigns
[2]Iftwosignsarethesame,
ADD thetwomagnitudes Lookoutforanoverflow
[3]Ifnotthesame,comparetherelativemagnitudesofthenumbersand
thenSUBTRACT thesmallerfromthelarger>needasubtractortoadd
[4]Determinethesignoftheresult
6 0110
91001
6+96+9
60110
+)91001
151111>01111
)60110
30011>00011
91001
)60110
30011>10011
60110
+)91001
151111>11111
6+( 9)6+(9)
Overflow 9+9or(9)+(9)
91001
+)91001
(1)0010
overflow
1/6/2011
14
ARITHMETICADDITION:SIGNED1sCOMPLEMENT
Addthetwonumbers,includingtheirsignbits.
Ifthereisacarryoutofthemostsignificant(sign)bit,theresultis
incrementedby1andthecarryisdiscarded.
600110
910110
311100
611001
901001
(1) 0(1)0010
+) +)
endaroundcarry
Example
(1)0(1)0010
1
300011
+)
910110
910110
(1)01100
1
01101
+)
+)
901001
901001
1(1)0010
+)
overflow
notoverflow (c
n1
c
n
)=0
(c
n1
c
n
)
ARITHMETICADDITION:SIGNED2sCOMPLEMENT
Example
600110
901001
1501111
611010
901001
300011
Addthetwonumbers,includingtheirsignbit,anddiscardanycarryoutofleftmost
(sign)bit Lookoutforanoverflow
+) +)
600110
910111
311101
910111
910111
18(1)01110
overflow
901001
901001
+)
+) +)
1810010
2operandshavethesamesign
andtheresultsignchanges
x
n1
y
n1
s
n1
+x
n1
y
n1
s
n1
=c
n1
c
n
x
n1
y
n1
s
n1
(c
n1
c
n
)
x
n1
y
n
s
n1
(c
n1
c
n
)
ARITHMETICSUBTRACTION
Takethecomplementofthesubtrahend(includingthesignbit)
andaddittotheminuendincludingthesignbits.
( A) ( B)=( A)+B
( A ) B ( A ) ( B )
ArithmeticSubtractionin2scomplement
( A) B=( A)+( B)
Overflow
Whentwounsignednumbersareadded,then
theoverflowisdetectedfromthecarryoutof
themostsignificantbit.
When two signed numbers are added if the Whentwosignednumbersareadded,ifthe
twonumbersaddedarepositiveornegative,
thenoverflowoccurs.
Note:Forthesecondcasealwaysthesignbit
isconsideredaspartofthenumber.
1/6/2011
15
DecimalFixedPointRepresentation
Torepresentadecimaldigitintheregisters,we
requirefourbitdecimalcodethatusesfourflip
flops.
Advantages
Directoperationonthedecimaldata.
Noneedtoconvertfromdecimaltobinaryandvice
versa.
Disadvantages
Numberofflipflopsincreases.
complexityinbuildingthecircuit.
SignedDecimalNumbersinBCDissimilarto
representationofSignedNumbersinbinary.
Aplussignisindicatedwithdecimaldigit0
andusesfourzerostorepresentinBCD.
Aminussignisindicatedwiththedecimal
digit9andusesbits1001torepresentinBCD. g p
Howeverforperformingarithmeticoperation,
itisalwaysdesirabletorepresentnegative
numberusing10scomplement(whichisgot
bytaking9scomplementofthedigitsand
adding1tothelastdigit).
Fore.g.,
considertheaddition+375+(240)
+(0)375
03750000001101110101
+(9)760
92401001011101100000
01350000000100110101(BCD)
(240=9scomplement=999240=759
10scomplement=9scomplement+1
=759+1=+760)
BCDADDERCIRCUIT
1/6/2011
16
FloatingPointRepresentation
FloatingPointRepresentationofnumberhas
twoparts:
the first part represents a signed fixed point
number called mantissa.
the second part designates the position of the the second part designates the position of the
decimal(binary) point called exponent.
The fixed point mantissa may be a fraction or
integer.
For e.g., the decimal number +6132.789 is
represented in floating point with a fraction and
an exponent as follows:
fractionexponent
+0.6132789+04
thevalueoftheexponentindicatestheactual
positionofthedecimalpoint.
onlythemantissaandexponentarerepresentedin
theregisters.Theradixandtheradixpointare
assumed.
TwostandardformsoffloatingpointsarefromANSI
and IEEE andIEEE.
AccordingtoANSI32bit,floatingpointnumberin
byteformatisshownbelow:
Byte1byte2byte3byte4
SEEEE.MMMMMMMM MMMM
WhereSindicatesthesignbit,Eindicatestheexponent,
MindicatestheMantissa,.Indicatesthebinarypoint
AccordingtoANSI32bit,floatingpointnumberin
byteformatisshownbelow:
Byte1byte2byte3byte4
SEEEE.MMMMMMMM MMMM
WhereSindicatesthesignbit,Eindicatesthe
exponent,MindicatestheMantissa,.Indicates
thebinarypoint
Fore.g
13=1101=0.1101x10
4
00000100110100000000000000000000
17=10001=0.10001X105
10000101100001000000000000000000
ERRORDETECTINGCODES
BinaryInformationtransmittedthroughcommunicationmediagets
changedduetoexternalnoise.
Errordetectioncodeisamechanismofdetectingerrorsduring
transmission.
Errorcorrectioncodeisamechanismofdetectingandcorrectingerrors
duringtransmission.
Commonmethodoferrordetectioncodeusedisgivenbelow:
ParitySystem
Simplestmethodforerrordetection
Oneparity bitattachedtotheinformation
EvenParityand OddParity
EvenParity
Onebitisattachedtotheinformationsothat
thetotalnumberof1bitsisanevennumber
1011001 0
1010010 1
OddParity
Onebitisattachedtotheinformationsothat
thetotalnumberof1bitsisanoddnumber
10110011
10100100
1/6/2011
17
ParityBitGeneration
Forb
6
b
5
...b
0
(7bitinformation);evenparitybitb
even
b
even
=b
6
b
5
... b
0
F dd it bit
PARITYBITGENERATION
Foroddparitybit
b
odd
=b
even
1= b
even
Note: Theoutputoftheparitycheckeris1indicateserrorstatus.
Itdependsontheadoptedparity.i.e.,ifoddparityischosenthen
thenumberofonesareeventhenitindicateserrorstatuslikewise
ifevenparityischosenthenthenumberofonesareoddthenit
indicateserrorstatus.
PARITYGENERATORANDPARITYCHECKERFOREVENPAIRTY
SYSTEM
ParityGeneratorCircuit(evenparity)
b
6
b
5
b
4
b
3
b
2
b
1
b
even
b
0
ParityChecker
b
6
b
5
b
4
b
3
b
2
b
1
b
0
b
even
EvenParity
errorindicator
TableShowingOriginaMessage
withcombinationofoddandeven
parity
Original
message
Oddparity Evenparity
000 1 0
001 0 1
01 0 0 1
011 1 0
100 0 1
1 01 1 0
110 1 0
111 0 1
CircuitShowingtheParityGenerator and
ParityCheckerforOddParitySystemforthe
MessageShowninthePreviousslide
1/6/2011
18
MIPS:StandsforMillionsofInstructionsPer
Second
MFLOPS:StandsforMillionsofFloatingPoint
OperationsPerSecond.
theabovetwomeasuresareusedto
indicatetheabilityoftheprocessorin
performingtasks.
ReviewQuestions
1. Discuss the basic functional units of a computer.(refer slide no.5 to 15)
2. Explain the basic operational concepts of a computer with a block diagram.(refer
slide no.20 to 25)
3. Define Bus. Explain bus structures.(refer slide no.26 to 28)
4. List the functions of system software.(refer slide no.29)
5. Explain Multiprocessors and Multicomputers. (refer slide 30 and 31).
6. How to improve the performance of a system. Explain the factors that affect the p p y p
performance. (refer slide no.33 to 41).
7. Explain about Sign Magnitude, Sign ones complement and Sign Twos
complement. (refer slide no.49 to 55).
8. Explain about Fixed Point and Floating Point Representation. (refer slide no.46 to
62).
9. Explain about overflow and decimal point representation. (refer slide no. 56 to
60).
10. Explain error detection codes.(refer slide no.64 to 68).
11. Write short notes on MIPS and MFLOPS. (refer slide no.69).
2/5/2011
1
InstructionCodes
ComputerRegisters
BASICCOMPUTERORGANIZATIONANDDESIGN
p g
InstructionCycle
MemoryReferenceInstructions
InputOutputandInterrupt
1 COMPUTERORGANIZATIONSVEW
Introduction
An instruction code is a group of bits that instruct the computer
to perform a specific operation.
A computer instruction is often divided into two parts A computer instruction is often divided into two parts
An opcode (Operation Code) that specifies the operation for
that instruction
An address that specifies the registers and/or locations in
memory to use for that operation
Belowisshownthesymbolicrepresentationoftheinstructionformat.
InstructionFormat
Opcode
Address
15 14 12 0
I
11
Addressing
mode
StoredProgramOrganization
Simplest way to organize a computer is to have
one processor register called Accumulator and an
instruction code format with two parts. p
The first part specifies the operation to be
performed and the second part specifies the
address.
The memory address tells the control where to
find the operand in memory find the operand in memory.
Below is shown a diagram showing this type of
Organization.
COMPUTERORGANIZATIONSVEW 3
Instructions(Programs)
4096x16memory
15 0
BinaryOperand
Instructions(Programs)
Operand(Data)
COMPUTERORGANIZATIONSVEW
Operand(Data)
ProcessorRegisteror
AC(Accumulator)
4
2/5/2011
2
From the above diagram, it is clear that
instruction are stored in one section of the
memory and data in another section of the
memory.
Computers with single processor register is Computers with single processor register is
usually given the name as Accumulator(AC).
For a memory unit with 4096 words, 12 bits are
required to specify the address of these words.
If we store each instruction code in one 16bit
memory word then of the 16 bits 4 bits are used memory word, then of the 16 bits, 4 bits are used
to specify the operation and 12 bits are used to
specify the address of the operand.
It is sometimes convenient to represent the
address bits of the instruction code to
represent the operand. Then such instruction
is said to have an immediate operand.
Wh h d f h i i When the second part of the instruction
specifies the address of the operand then it is
said to have a direct address.
When the second part of the instruction
specifies the address of the operand, which in
i h dd f h l turn contains the address of the actual
operand, then it is said to have indirect
address.
Considertheinstructionformatshownabove:
itcontains3bitsoperationcode,12bitsaddressand1bitdesignating
theaddressingmode.
ifi=0,itindicatesdirectaddressingmode
ifi=1,itindicatesindirectaddressingmode.
ADDRESSINGMODES
0 ADD 457
22
1 ADD 300 35
Directaddressing Indirectaddressing
Operand
457
1350 300
Operand 1350
+
AC
+
AC
2/5/2011
3
In the above diagram for direct addressing, the instruction
code resides in the memory at address 22. the addressing
mode is 0 indicating direct addressing mode, the opcode
contains ADD indicating the operation ADD to be performed,
the address part contains 457, which specifies the address of
the operand. The control finds the operand in memory and
adds it to the contents of AC and places the result in AC adds it to the contents of AC and places the result in AC.
In the above diagram for indirect addressing, the instruction
code resides in the memory at address 35. the addressing
mode is 1 indicating indirect addressing mode, the opcode
contains ADD indicating the operation ADD to be performed,
the address part contains 300, which specifies the address of
a location in memory which actually holds the address of the
operand. The control finds the operand in memory and adds it
t th t t f AC d l th lt i AC to the contents of AC and places the result in AC.
EffectiveAddress(EA)
Theaddress,thatcanbedirectlyusedwithoutmodificationtoaccessan
operand.
COMPUTER/PROCESSORREGISTERS
A processor has many registers to hold instructions, addresses, data, etc
The processor has a register named Program Counter (PC) that holds the
memory address of the next instruction to get
In a direct or indirect addressing the processor needs to keep track of In a direct or indirect addressing, the processor needs to keep track of
which locations in memory it is addressing. So, it uses a register named
Address Register (AR). This register uses 12 bits to refer a location in
memory.
When an operand is found, using either direct or indirect addressing, it is
placed in the Data Resgister (DR). The processor then uses this value as
data for its operation
The Basic Computer has a single general purpose register called as The Basic Computer has a single general purpose register called as
Accumulator (AC)
COMPUTER/PROCESSORREGISTERS
The significance of a general purpose register is that it can be referred to in
instructions
e.g. load AC with the contents of a specific memory location; store the contents of AC into
a specified memory location
Often a processor will need a scratch register to store intermediate results
or other temporary data; in the Basic Computer this is the Temporary
Register (TR)
The Basic Computer uses a very simple model of input/output (I/O)
operations
Input devices are considered to send 8 bits of character data to the processor
The processor can send 8 bits of character data to output devices
Th I R i (INPR) h ld 8 bi h f i d i The Input Register (INPR) holds an 8 bit character got from an input device
The Output Register (OUTR) holds an 8 bit character to be send to an
output device
BASICCOMPUTERREGISTERS
RegistersintheBasicComputer
11 0
PC
11 0
AR
Memory
4096x16
List of BC Registers
15 0
IR
15 0
TR
7 0
OUTR
15 0
DR
15 0
AC
AR
INPR
0 7
CPU
ListofBCRegisters
DR16DataRegister Holdsmemoryoperand
AR12AddressRegister Holdsaddressformemory
AC16Accumulator Processorregister
IR 16InstructionRegisterHoldsinstructioncode
PC12ProgramCounter Holdsaddressofinstruction
TR16TemporaryRegister Holdstemporarydata
INPR8InputRegister Holdsinputcharacter
OUTR8 OutputRegister Holdsoutputcharacter
2/5/2011
4
CommonBusSystem
Basic computer has 8 registers, memory unit and
control unit.
Path must be provided between registers, memory
unit and control unit for transfer of information.
No. of wires become more if connections are made
from the output of registers/memory/control unit to
the input of registers/memory/control unit.
Efficient way for transferring information is to use a
b common bus.
The connection of registers and memory of a basic
computer to a common bus is shown below:
COMMONBUSSYSTEM
The output of seven registers and memory unit are connected to common
bus.
Specific output out of these eight is selected based on binary values of the
selection variables s
2
,s
1
and s
0
.
The number along each output line shows the decimal equivalent for the
binary selection.
For eg. The number along output of DR is 3, this implies the contents of
th i t DR i t f d t th b h d t k the register DR is transferred to the common bus when s
2
,s
1
and s
0
takes
binary input 0,1 and 1.
The lines from the common bus are passed as inputs to the registers and
memory unit.
The register whose LD input is enabled receives the data from the bus.
The memory unit will receive the data from the bus when its Write Input is
enabled.
The memory unit will place the data to the bus when its Read input is
enabled and s
2
s
1
and s
0
takes binary input 1 1 and 1 enabled and s
2
,s
1
and s
0
takes binary input 1,1 and 1.
Four Registers DR,AC,IR and TR have 16 bits each.
Two Registers AR and PC have 12 bits each. When 12 bits are applied to 16
bit common bus, the remaining bits will be filled with zero likewise when
16bits are transferred from bus to AR and PC, then only 12 bits will be
transferred.
The INPR and OUTR have 8 bits each and these bits get placed in the least
significant 8 bits of the 16bit common bus.
Five Registers have three control inputs namely
LD(Load),INR(Increment),CLR(clear).
ALU receives the input from the DR,AC and INPR.
There is a flip flop E connected to the ALU, which plays the key role to There is a flip flop E connected to the ALU, which plays the key role to
store the extended carry from the accumulator which results as a result of
addition operation.
The content of any register can be applied onto the bus and operation can
be performed in the adder and logic circuit during the same clock cycle.
2/5/2011
5
COMPUTERINSTRUCTIONS
TheBasicComputersupportsthreeInstructioncodeFormatsasshown
below:
Memory Reference Instructions (OP code = 000 ~ 110)
1514 1211
0
I Opcode Address
MemoryReferenceInstructions (OPcode=000 110)
RegisterReferenceInstructions (OPcode=111,I=0)
I t O t t I t ti (OP d 111 I 1)
15 1211 0
Registeroperation 0111
InputOutputInstructions (OPcode=111,I=1)
15 1211 0
I/Ooperation 1111
BASICCOMPUTERINSTRUCTIONS
HexCode
Symbol I=0I=1Description
AND 0xxx8xxx ANDmemorywordtoAC
ADD 1xxx9xxx AddmemorywordtoAC
LDA 2xxxAxxx LoadACfrommemory
STA 3xxxBxxx StorecontentofACintomemory
BUN 4xxxCxxx Branchunconditionally
BSA 5xxx Dxxx Branch and save return address BSA 5xxxDxxx Branchandsavereturnaddress
ISZ 6xxxExxx Incrementandskipifzero
CLA 7800 ClearAC
CLE 7400 ClearE
CMA 7200 ComplementAC
CME 7100 ComplementE
CIR 7080 CirculaterightACandE
CIL 7040 CirculateleftACandE
INC 7020 IncrementAC
SPA 7010 Skipnextinstr.ifACispositive
SNA 7008 Skipnextinstr.ifACisnegative
SZA 7004 Ski t i t if AC i SZA 7004 Skipnextinstr.ifACiszero
SZE 7002 Skipnextinstr.ifEiszero
HLT 7001 Haltcomputer
INP F800 InputcharactertoAC
OUT F400 OutputcharacterfromAC
SKI F200 Skiponinputflag
SKO F100 Skiponoutputflag
ION F080 Interrupton
IOF F040 Interruptoff
InstructionCycle
Eachinstructioncycleconsistsofthefollowing
h phases:
1. Fetchaninstructionfrommemory
2. Decodetheinstruction
3. Readtheeffectiveaddressfrommemoryiftheinstructionhasan
indirectaddress
4. Executetheinstruction
InstructionCycle FetchandDecode
ProgramCounterisloadedwiththeaddressof
the first instruction in the program thefirstinstructionintheprogram.
Sequencecounterisclearedtozero.
Aftereachclockpulse,SCisincrementedby
one,sothatthetimingsignalsgothrougha
sequenceT
0
,T
1
,T
2
andsoon.
Themicrooperationsforthefetchanddecode
canbespecifiedbythefollowingregister
transferstatements.
2/5/2011
6
FETCHandDECODE
FetchandDecode T0:AR PC(S
0
S
1
S
2
=010,T0=1)
T1:IR M[AR],PC PC+1(S0S1S2=111,T1=1)
T2:D0,...,D7 DecodeIR(1214),AR IR(011),I IR(15)
S
2
T1
S
1
S
0
Bus
7
Memory
unit
Address
Read
AR
LD
1
T0
LD
PC
INR
IR
LD
Clock
2
5
Commonbus
FLOWCHARTFORINSTRUCTIONCYCLE
COMPUTERORGANIZATIONSVEW 23 COMPUTERORGANIZATIONSVEW 24
2/5/2011
7
2/5/2011
8
The input device transfers the character through an
interface to the INPR register serially and in turn these
bits of data is transferred in parallel to the
Accumulator. This transfer takes place only if FGI flag is
enabled(i.e., set to 1) when FGI flag is 0, no data
t f t k l transfer takes place.
The data from the Accumulator is transferred in
parallel to the OUTR register and the contents of this
register is transferred serially to the output device
through an interface. This transfer takes place only if
FGO flag is enabled(i.e., set to 1) when FGO flag is 0, no
data transfer takes place. p
The programmed control transfer from input device to
accumulator and accumulator to output device is
shown in the next slide.
2/5/2011
9
STACKORGANIZATION
Stack
Very useful feature that is supported in most of the computers for nested
subroutines, nested interrupt services
Also efficient for arithmetic expression evaluation
Storage which can be accessed in LIFO
Pointer: SP
l d l bl
RegisterStack
Push,Popoperations
Only PUSH and POP operations are applicable
A
B
C
0
1
2
3
4
63
Address
FULL EMPTY
SP
Flags
Stackpointer
stack
6bits
/*Initially,SP=0,EMPTY=1,FULL=0*/
PUSH POP
SP SP+1 DR M[SP]
M[SP] DR SP SP 1
If(SP=0)then(FULL 1) If(SP=0)then(EMPTY 1)
EMPTY 0 FULL 0
DR
2/5/2011
10
STACKORGANIZATION Memory
MemorywithProgram,Data,
andStackSegments
Data
(operands)
Program
(instructions)
1000
PC
AR
Aportionofmemoryisusedasastackwitha
processorregisterasastackpointer
PUSH: SP SP 1
4001
4000
3999
3998
3997
3000
SP
stack
Stackgrows
Inthisdirection
MemoryStack
M[SP] DR
POP: DR M[SP]
SPSP+1
Mostcomputersdonotprovidehardwaretocheckstackoverflow(full
stack)orunderflow(emptystack)mustbedoneinsoftware
DR
REVERSEPOLISHNOTATION
A+B Infixnotation
+AB PrefixorPolishnotation
AB+ PostfixorreversePolishnotation
The reverse Polish notation is very suitable for stack
ArithmeticExpressions:A+B
ThereversePolishnotationisverysuitableforstack
manipulation
EvaluationofArithmeticExpressions
Anyarithmeticexpressioncanbeexpressedinparenthesisfree
Polishnotation,includingreversePolishnotation
(3*4)+(5*6) 34*56*+
3 3 12 12 12 12 42
4 5 5
6
30
3 4 * 5 6 * +
PROCESSORORGANIZATION
Ingeneral,mostprocessorsareorganizedinoneof3ways
Singleregister(Accumulator)organization
BasicComputerisagoodexample
Accumulatoristheonlygeneralpurposeregister
Generalregisterorganization
Usedbymostmoderncomputerprocessors
Anyoftheregisterscanbeusedasthesourceordestinationforcomputer
operations
Stackorganization
Alloperationsaredoneusingthehardwarestack
Forexample,anORinstructionwillpopthetwotopelementsfromthestack,doa
logicalORonthem,andpushtheresultonthestack
INSTRUCTIONFORMAT
OPcodefield specifiestheoperationtobeperformed
Addressfield designatesmemoryaddress(es)oraprocessorregister(s)
Modefield determineshowtheaddressfieldistobeinterpreted(to
get effective address or the operand)
InstructionFields
geteffectiveaddressortheoperand)
Thenumberofaddressfieldsintheinstructionformat
dependsontheinternalorganizationofCPU
ThethreemostcommonCPUorganizations:
Singleaccumulatororganization:
ADD X /*AC AC+M[X]*/
Generalregisterorganization:
ADD R1,R2,R3 /*R1 R2+R3*/
ADD R1,R2 /*R1 R1+R2*/
MOV R1,R2 /*R1 R2*/
ADD R1,X /*R1 R1+M[X]*/
Stackorganization:
PUSH X /*TOS M[X]*/
ADD
2/5/2011
11
ThreeAddressInstructions
ProgramtoevaluateX=(A+B)*(C+D):
ADD R1,A,B /*R1 M[A]+M[B] */
ADD R2 C D /* R2 M[C] + M[D] */
THREE,ANDTWOADDRESSINSTRUCTIONS
ADD R2,C,D /*R2 M[C]+M[D] */
MUL X,R1,R2 /*M[X] R1*R2 */
Resultsinshortprograms
Instructionbecomeslong(manybits)
TwoAddressInstructions
MOVR1,A/*R1M[A]*/
ADDR1,B/*R1 R1+M[A]*/
MOVR2,C/*R2 M[C]*/
ADDR2,D/*R2 R2+M[D]*/
MULR1,R2/*R1 R1*R2*/
MOVX,R1/*M[X] R1*/
ONE,ANDZEROADDRESSINSTRUCTIONS
OneAddressInstructions
UseanimpliedACregisterforalldatamanipulation
LOAD A/*ACM[A] */
ADD B/*AC AC+M[B]*/
STORE T/*M[T] AC */
LOAD C/*AC M[C] */
ADD D/*AC AC+M[D] */
MUL T/*AC AC*M[T] */
STORE X/*M[X] AC */
ZeroAddressInstructions
Canbefoundinastackorganizedcomputer
/ / PUSH A /*TOS A */
PUSH B /*TOS B */
ADD /*TOS (A+B) */
PUSH C /*TOS C */
PUSH D /*TOS D */
ADD /*TOS (C+D) */
MUL /*TOS (C+D)*(A+B)*/
POP X /*M[X] TOS */
ADDRESSINGMODES
AddressingModes
*Specifiesaruleforinterpretingormodifyingthe
addressfieldoftheinstruction(beforetheoperand
isactuallyreferenced)
*Varietyofaddressingmodes
togiveprogrammingflexibilitytotheuser
tousethebitsintheaddressfieldofthe
instruction efficiently instructionefficiently
TYPESOFADDRESSINGMODES
ImpliedMode
Addressoftheoperandsarespecifiedimplicitly
inthedefinitionoftheinstruction
No need to specify address in the instruction Noneedtospecifyaddressintheinstruction
EA=AC,orEA=Stack[SP]
ExamplesfromBasicComputer
CLA,CME,INP
ImmediateMode
Insteadofspecifyingtheaddressoftheoperand,
operanditselfisspecified
However,operanditselfneedstobespecified
Sometimes,requiremorebitsthantheaddress
Fasttoacquireanoperand
2/5/2011
12
RegisterMode
Addressspecifiedintheinstructionistheregisteraddress
Designatedoperandneedtobeinaregister
Shorteraddressthanthememoryaddress
S i dd fi ld i h i i Savingaddressfieldintheinstruction
Fastertoacquireanoperandthanthememoryaddressing
EA=IR(R)(IR(R):RegisterfieldofIR)
RegisterIndirectMode
Instructionspecifiesaregisterwhichcontains
thememoryaddressoftheoperand
Savinginstructionbitssinceregisteraddress
is shorter than the memory address isshorterthanthememoryaddress
Slowertoacquireanoperandthanboththe
registeraddressingormemoryaddressing
EA=[IR(R)]([x]:Contentofx)
AutoincrementorAutodecrementMode
Whentheaddressintheregisterisusedtoaccessmemory,the valuein
theregisterisincrementedordecrementedby1
automatically
DirectAddressMode
Instructionspecifiesthememoryaddresswhich
canbeuseddirectlytoaccessthememory
Faster than the other memory addressing modes Fasterthantheothermemoryaddressingmodes
Toomanybitsareneededtospecifytheaddress
foralargephysicalmemoryspace
EA=IR(addr)(IR(addr):addressfieldofIR)
IndirectAddressingMode
Theaddressfieldofaninstructionspecifiestheaddressofamemory location
thatcontainstheaddressoftheoperand
Whentheabbreviatedaddressisusedlargephysicalmemorycanbe
addressedwitharelativelysmallnumberofbits
Slowtoacquireanoperandbecauseofanadditionalmemoryaccess
EA=M[IR(address)]
RelativeAddressingModes
TheAddressfieldsofaninstructionspecifiesthepartoftheaddress
(abbreviatedaddress)whichcanbeusedalongwithadesignated
register to calculate the address of the operand registertocalculatetheaddressoftheoperand
Addressfieldoftheinstructionisshort
Largephysicalmemorycanbeaccessedwithasmallnumberof
addressbits
EA=f(IR(address),R),Rissometimesimplied
3differentRelativeAddressingModesdependingonR;
* PCRelativeAddressingMode (R=PC)
EA=PC+IR(address)
*IndexedAddressingMode (R=IX,whereIX:IndexRegister)
EA=IX+IR(address)
*BaseRegisterAddressingMode
(R=BAR,whereBAR:BaseAddressRegister)
EA=BAR+IR(address)
ADDRESSINGMODES EXAMPLES
LoadtoACMode
Address=500
Nextinstruction
200
201
202
Memory Address
PC=200
R1=400
Addressing
M d
Effective
Add
Content
f AC
399
400
450
700
500 800
600 900
702 325
XR=100
AC
Mode Address ofAC
Directaddress 500 /*AC (500) */800
Immediateoperand /*AC 500 */500
Indirectaddress 800 /*AC ((500)) */300
Relativeaddress 702 /*AC (PC+500) */325
Indexedaddress 600 /*AC (RX+500) */900
Register /*AC R1 */400
Registerindirect 400/*AC (R1) */700
Autoincrement 400 /*AC (R1)+ */700
Autodecrement 399 /*AC (R) */450
800 300
2/5/2011
13
ADDRESSINGMODES
AddressingModes
*Specifiesaruleforinterpretingormodifyingthe
addressfieldoftheinstruction(beforetheoperand
isactuallyreferenced)
*Varietyofaddressingmodes
togiveprogrammingflexibilitytotheuser
tousethebitsintheaddressfieldofthe
instruction efficiently instructionefficiently
ImpliedMode
Addressoftheoperandsarespecifiedimplicitly
inthedefinitionoftheinstruction
EA=AC,orEA=Stack[SP]
ExamplesfromBasicComputer
CLA,CME,INP
ImmediateMode
Insteadofspecifyingtheaddressoftheoperand,
operanditselfisspecified
However,operanditselfneedstobespecified
Sometimes,requiremorebitsthantheaddress
Fasttoacquireanoperand
RegisterMode
Addressspecifiedintheinstructionistheregisteraddress
Designatedoperandneedtobeinaregister
Shorteraddressthanthememoryaddress
Savingaddressfieldintheinstruction
F t t i d th th dd i Fastertoacquireanoperandthanthememoryaddressing
EA=IR(R)(IR(R):RegisterfieldofIR)
RegisterIndirectMode
Instructionspecifiesaregisterwhichcontains
thememoryaddressoftheoperand
Savinginstructionbitssinceregisteraddress
isshorterthanthememoryaddress
Slower to acquire an operand than both the Slowertoacquireanoperandthanboththe
registeraddressingormemoryaddressing
EA=[IR(R)]([x]:Contentofx)
Autoincrement orAutodecrement Mode
Whentheaddressintheregisterisusedtoaccessmemory,the valuein
theregisterisincrementedordecrementedby1 automatically
DirectAddressMode
Instructionspecifiesthememoryaddresswhich
canbeuseddirectlytoaccessthememory
Faster than the other memory addressing modes Fasterthantheothermemoryaddressingmodes
Toomanybitsareneededtospecifytheaddress
foralargephysicalmemoryspace
EA=IR(addr)(IR(addr):addressfieldofIR)
IndirectAddressingMode
Theaddressfieldofaninstructionspecifiestheaddressofamemory location
thatcontainstheaddressoftheoperand
Whentheabbreviatedaddressisusedlargephysicalmemorycanbe
addressedwitharelativelysmallnumberofbits
Slowtoacquireanoperandbecauseofanadditionalmemoryaccess
EA=M[IR(address)]
2/5/2011
14
RelativeAddressingModes
TheAddressfieldsofaninstructionspecifiesthepartoftheaddress
(abbreviatedaddress)whichcanbeusedalongwithadesignated
register to calculate the address of the operand registertocalculatetheaddressoftheoperand
Addressfieldoftheinstructionisshort
Largephysicalmemorycanbeaccessedwithasmallnumberof
addressbits
EA=f(IR(address),R),Rissometimesimplied
3differentRelativeAddressingModesdependingonR;
* PCRelativeAddressingMode (R=PC)
EA=PC+IR(address)
*IndexedAddressingMode (R=IX,whereIX:IndexRegister)
EA=IX+IR(address)
*BaseRegisterAddressingMode
(R=BAR,whereBAR:BaseAddressRegister)
EA=BAR+IR(address)
ADDRESSINGMODES EXAMPLES
LoadtoACMode
Address=500
Nextinstruction
200
201
202
Memory Address
PC=200
R1=400
Addressing
M d
Effective
Add
Content
f AC
399
400
450
700
500 800
600 900
702 325
XR=100
AC
Mode Address ofAC
Directaddress 500 /*AC (500) */800
Immediateoperand /*AC 500 */500
Indirectaddress 800 /*AC ((500)) */300
Relativeaddress 702 /*AC (PC+500) */325
Indexedaddress 600 /*AC (RX+500) */900
Register /*AC R1 */400
Registerindirect 400/*AC (R1) */700
Autoincrement 400 /*AC (R1)+ */700
Autodecrement 399 /*AC (R) */450
800 300
DATATRANSFERINSTRUCTIONS
Load LD
Store ST
Move MOV
NameMnemonic
TypicalDataTransferInstructions
Exchange XCH
Input IN
Output OUT
Push PUSH
Pop POP
Direct address LD ADR AC M[ADR]
Mode
Assembly
Convention
RegisterTransfer
DataTransferInstructionswithDifferentAddressingModes
Directaddress LDADR AC M[ADR]
Indirectaddress LD@ADR AC M[M[ADR]]
Relativeaddress LD$ADR AC M[PC+ADR]
Immediateoperand LD#NBR AC NBR
Indexaddressing LDADR(X) AC M[ADR+XR]
Register LDR1 AC R1
Registerindirect LD(R1) AC M[R1]
Autoincrement LD(R1)+ AC M[R1],R1 R1+1
AutodecrementLD(R1)R1 R1 1,AC M[R1]
DATAMANIPULATIONINSTRUCTIONS
ThreeBasicTypes:
Arithmeticinstructions
Logicalandbitmanipulationinstructions
Shiftinstructions
ArithmeticInstructions
NameMnemonic
Increment INC
NameMnemonic
L i l hift i ht SHR
NameMnemonic
LogicalandBitManipulationInstructions ShiftInstructions
IncrementINC
DecrementDEC
AddADD
SubtractSUB
MultiplyMUL
DivideDIV
AddwithCarryADDC
SubtractwithBorrowSUBB
Negate(2sComplement)NEG
Clear CLR
Complement COM
AND AND
OR OR
ExclusiveOR XOR
Clearcarry CLRC
Setcarry SETC
Complementcarry COMC
Enableinterrupt EI
Disableinterrupt DI
Logicalshiftright SHR
Logicalshiftleft SHL
Arithmeticshiftright SHRA
Arithmeticshiftleft SHLA
Rotateright ROR
Rotateleft ROL
Rotaterightthrucarry RORC
Rotateleftthrucarry ROLC
2/5/2011
15
FLAG,PROCESSORSTATUSWORD
InBasicComputer,theprocessorhadseveral(status)flags 1bitvaluethat
indicatedvariousinformationabouttheprocessorsstate E,FGI,FGO,I,IEN,
R
In some processors, flags like these are often combined into a register the Insomeprocessors,flagsliketheseareoftencombinedintoaregister the
processorstatusregister(PSR);sometimescalledaprocessorstatusword
(PSW)
CommonflagsinPSWare
C(Carry):Setto1ifthecarryoutoftheALUis1
S(Sign):TheMSBbitoftheALUsoutput
Z(Zero):Setto1iftheALUsoutputisall0s
V(Overflow):Setto1ifthereisanoverflow
StatusFlagCircuit
AB
88
c
7
c
8
8bitALU
V ZS C
F
7
F
7
F
0
8
F
Checkfor
zerooutput
PROGRAMCONTROLINSTRUCTIONS
PC
+1
InLineSequencing(Nextinstructionisfetchedfromthe
nextadjacentlocationinthememory)
Addressfromothersource;CurrentInstruction,Stack,etc;
Branch,ConditionalBranch,Subroutine,etc
ProgramControlInstructions
NameMnemonic
BranchBR
JumpJMP
SkipSKP
CallCALL
ReturnRTN
Compare(by )CMP
Test(byAND)TST
*CMPandTSTinstructionsdonotretaintheir
resultsofoperations( andAND,
respectively).
TheyonlysetorclearcertainFlags.
CONDITIONALBRANCHINSTRUCTIONS
BZ Branchifzero Z=1
BNZ Branchifnotzero Z=0
BC Branchifcarry C=1
BNC Branch if no carry C = 0
MnemonicBranchconditionTestedcondition
BNC Branchifnocarry C=0
BP Branchifplus S=0
BM Branchifminus S=1
BV Branchifoverflow V=1
BNV Branchifnooverflow V=0
BHI Branchifhigher A>B
BHE Branchifhigherorequal A B
BLO Branchiflower A<B
BLOE Branchiflowerorequal A B
BE Branchifequal A=B
BNE Branchifnotequal A B
Unsigned compareconditions(A B)
q
BGT Branchifgreaterthan A>B
BGE Branchifgreaterorequal A B
BLT Branchiflessthan A<B
BLE Branchiflessorequal A B
BE Branchifequal A=B
BNE Branchifnotequal A B
Signed compareconditions(A B)
SUBROUTINECALLANDRETURN
Callsubroutine
Jumptosubroutine
Branchtosubroutine
Branchandsavereturnaddress
SubroutineCall
Two Most Important Operations are Implied; TwoMostImportantOperationsareImplied;
*BranchtothebeginningoftheSubroutine
SameastheBranchorConditionalBranch
*SavetheReturnAddresstogettheaddress
ofthelocationintheCallingProgramupon
exitfromtheSubroutine
LocationsforstoringReturnAddress
CALL
SP SP 1
FixedLocationinthesubroutine(Memory)
FixedLocationinmemory
InaprocessorRegister
Inmemorystack
mostefficientway
g
SP SP 1
M[SP] PC
PC EA
RTN
PC M[SP]
SP SP+1
2/5/2011
16
PROGRAMINTERRUPT
TypesofInterrupts
Externalinterrupts
ExternalInterruptsinitiatedfromtheoutsideofCPUandMemory
I/ODevice DatatransferrequestorDatatransfercomplete
TimingDevice Timeout
PowerFailure
Operator
Internalinterrupts(traps)
InternalInterruptsarecausedbythecurrentlyrunningprogram
Register,StackOverflow
Dividebyzero
OPcodeViolation
Protection Violation ProtectionViolation
SoftwareInterrupts
BothExternalandInternalInterruptsareinitiatedbythecomputerHW.
SoftwareInterruptsareinitiatedbytheexecutinganinstruction.
SupervisorCall Switchingfromausermodetothesupervisormode
Allowstoexecuteacertainclassofoperations
whicharenotallowedintheusermode
INTERRUPTPROCEDURE
Theinterruptisusuallyinitiatedbyaninternalor
anexternalsignalratherthanfromtheexecutionof
aninstruction(exceptforthesoftwareinterrupt)
InterruptProcedureandSubroutineCall
Theaddressoftheinterruptserviceprogramis
determinedbythehardwareratherthanfromthe
addressfieldofaninstruction
Aninterruptprocedureusuallystoresallthe
informationnecessarytodefinethestateofCPU
ratherthanstoringonlythePC.
ThestateoftheCPUisdeterminedfrom;
ContentofthePC
Contentofallprocessorregisters
Contentofstatusbits
ManywaysofsavingtheCPUstate
dependingontheCPUarchitectures
RISC/CISCCHARACTERISTICS
RISC(Reduced InstructionSetComputer) CISC(Complex InstructionSetComputer)
Supports fewer number of instructions Supports large number of instructions Supportsfewernumber ofinstructions Supportslargenumberofinstructions
Supports fewaddressingmodes Supportsmorenumberofaddressing
modes
Fixedlengthinstructionformat Variablelengthinstructionformat
Accesslimitedtoregisters Accessto memoryandregisters
Can be applied for simple tasks Can be applied for complex tasks Canbeappliedforsimpletasks Canbeappliedforcomplextasks
ReviewQuestions
1. ExplaintheProcessorregistersofBasiccomputer.(ReferSlideNo.10,11
and12).
2. Explainaboutbasicinstructioncycle.Whatarethedifferentphasesinit?
Drawflowchartforinstructioncycle.(ReferslideNo.19,20and22).
3 Briefly discuss memory reference instructions ( Refer slide No 24 25 and 3. Brieflydiscussmemoryreferenceinstructions.(ReferslideNo.24,25and
26).
4. Explaintheorganizationofprocessorregistersinacommonbussystem
withadiagram.(ReferSlideNo.13,14,15,16).
5. ExplainaboutstackorganizationinDetail.(ReferSlideNo.36,37and38).
6. Explainthedifferentinstructionformats.(ReferslideNo.40,41,42).
7. Explainthedifferentaddressingmodeswithanumericalexample.(Refer
SlideNo.49to54).
8 Explain Data transfer and Manipulation Instructions (Refer slide No 55 to 8. ExplainDatatransferandManipulationInstructions.(ReferslideNo.55to
58).
9. Whatisaninterrupt?Explainthetypesofinterrupts.(ReferslideNo.61
and62).
10. WriteshortnotesonRISCCharacteristics.(ReferslideNo.63).
3/5/2011
1
MICROPROGRAMMEDCONTROL
Control Memory
Sequencing Microinstructions
Microprogram Example
Design of Control Unit
Microinstruction Format
Introduction
The control unit generates signals that
initiates sequence of micro operations for
execution of a microinstruction execution of a microinstruction.
There are two approaches of generating
signals.
Hardwired control
Micro programmed control Micro programmed control
TERMINOLOGY
Microprogram
- Program stored in memory that generates all the control signals required
to execute the instruction set correctly
- Consists of microinstructions
Microinstruction Microinstruction
- Contains a control word and a sequencing word
Control Word - All the control information required for one clock cycle
Sequencing Word - Information needed to decide the next microinstruction
address
Control Memory(Control Storage: CS)
- Storage in the micro programmed control unit to store the micro program
Writeable Control Memory(Writeable Control Storage:WCS)
- CS whose contents can be modified
-> Allows the microprogram can be changed
-> Instruction set can be changed or modified
Dynamic Microprogramming
- Computer system whose control unit is implemented with a micro program
in WCS
- Microprogram can be changed by a systems programmer or a user
TERMINOLOGY
Sequencer (Microprogram Sequencer)
A Microprogram Control Unit that determines the Microinstruction
Address to be executed in the next clock cycle
- In-line Sequencing
- Branch
- Conditional Branch
- Subroutine
- Loop
- Instruction OP-code mapping
3/5/2011
2
COMPARISONOFCONTROLUNITIMPLEMENTATIONS
Control Unit Implementation
Combinational Logic Circuits (Hard-wired)
I R Status F/Fs
Control Data
Memory
Microprogram
Combinational
Logic Circuits
Control
Points
CPU
Timing State
Ins. Cycle State
Control Unit's State
Control Data
M
Status F/Fs
Control Data
Next Address
Generation
Logic
C
S
A
R
Control
Storage
(-program
memory)
M
e
m
o
r
y
I R
C
S
D
R
C
P
s
CPU D
}
MICROINSTRUCTIONSEQUENCING
Instruction code
Mapping
logic
Multiplexers Branch
logic
Status
bits
MUX
select
Control memory (ROM)
Subroutine
register
(SBR)
Microoperations
Control address register
(CAR)
Incrementer
select a status
bit
Sequencing Capabilities Required in a Control Storage
- Incrementing of the control address register
- Unconditional and conditional branches
- A mapping process from the bits of the machine
instruction to an address for control memory
- A facility for subroutine call and return
Microoperations
Branch address
CONDITIONALBRANCH
MUX
Load address
Increment
Conditional Branch
If Condition is true then Branch (address from
Control memory
MUX
Status
(condition)
bits
Micro-operations
Condition select
Next address
...
Unconditional Branch
Fixing the value of one status bit at the input of the multiplexer to 1
If Condition is true, then Branch (address from
the next address field of the current microinstruction)
else Fall Through
Conditions to Test: O(overflow), N(negative), Z(zero), C(carry), etc.
MAPPINGOFINSTRUCTIONS
ADDRoutine
ANDRoutine
LDARoutine
STARoutine
BUN Routine
0000
0001
0010
0011
0100
OPcodesofInstructions
ADD
AND
LDA
0000
0001
0010
.
.
.
DirectMapping
Address
BUNRoutine
Control
Storage
0100
STA
BUN
0011
0100
.
Address
100000010
100001010
Mapping
Bits
10xxxx010
ADDRoutine
ANDRoutine
100010010
100011010
100100010
LDARoutine
STARoutine
BUNRoutine
3/5/2011
3
MAPPINGOFINSTRUCTIONSTOMICROROUTINES
Mapping from the OP-code of an instruction to the
address of the Microinstruction which is the starting
microinstruction of its execution microprogram
1 0 1 1 Address
OP-code
Machine
Instruction
Mapping function implemented by ROM or PLA
OP-code
1 0 1 1 Address
Mapping bits
Microinstruction
address
0 x x x x 0 0
0 1 0 1 1 0 0
Instruction
Mapping memory
(ROM or PLA)
Control Memory
MICROPROGRAMEXAMPLE
ComputerConfiguration
MUX
AR
10 0
AR
PC
10 0
Address Memory
2048x16
MUX
DR
15 0
SBR
6 0
CAR
6 0
Arithmetic
logicand
shiftunit
AC
15 0
SBR CAR
Controlmemory
128x20
Controlunit
MACHINEINSTRUCTIONFORMAT
Machine instruction format
I Opcode
15 14 11 10
Address
0
EA is the effective address
Symbol OP-code Description
ADD 0000 AC AC + M[EA]
BRANCH 0001 if (AC < 0) then (PC EA)
STORE 0010 M[EA] AC
EXCHANGE 0011 AC M[EA], M[EA] AC
Sample machine instructions
F1 F2 F3 CD BR AD
3 3 3 2 2 7
F1, F2, F3: Microoperation fields
CD: Condition for branching
BR: Branch field
AD: Address field
MICROINSTRUCTIONFIELDDESCRIPTIONS F1,F2,F3
F1 Microoperation Symbol
000 None NOP
001 AC AC + DR ADD
010 AC 0 CLRAC
011 AC AC + 1 INCAC
100 AC DR DRTAC
000 None NOP
001 AC AC - DR SUB
010 AC AC DR OR
011 AC AC DR AND
100 DR M[AR] READ 100 AC DR DRTAC
101 AR DR(0-10) DRTAR
110 AR PC PCTAR
111 M[AR] DR WRITE
100 DR M[AR] READ
101 DR AC ACTDR
110 DR DR + 1 INCDR
111 DR(0-10) PC PCTDR
000 None NOP
001 AC AC DR XOR
010 AC AC COM 010 AC AC COM
011 AC shl AC SHL
100 AC shr AC SHR
101 PC PC + 1 INCPC
110 PC AR ARTPC
111 Reserved
3/5/2011
4
MICROINSTRUCTIONFIELDDESCRIPTIONS CD,BR
CD Condition Symbol Comments
00 Always = 1 U Unconditional branch
01 DR(15) I Indirect address bit
10 AC(15) S Sign bit of AC 10 AC(15) S Sign bit of AC
11 AC = 0 Z Zero value in AC
BR Symbol Function
00 JMP CAR AD if condition = 1
CAR CAR + 1 if condition = 0
01 CALL CAR AD, SBR CAR + 1 if condition = 1
CAR CAR + 1 if condition = 0
10 RET CAR SBR (Return from subroutine)
11 MAP CAR(2-5) DR(11-14), CAR(0,1,6) 0
SYMBOLICMICROINSTRUCTIONS
Symbols are used in microinstructions as in assembly language
A symbolic microprogram can be translated into its binary equivalent
by a microprogram assembler.
Sample Format p
five fields: label; micro-ops; CD; BR; AD
Label: may be empty or may specify a symbolic
address terminated with a colon
Micro-ops: consists of one, two, or three symbols
separated by commas
CD: one of {U, I, S, Z}, where U: Unconditional Branch { }
I: Indirect address bit
S: Sign of AC
Z: Zero value in AC
BR: one of {JMP, CALL, RET, MAP}
AD: one of {Symbolic address, NEXT, empty}
SYMBOLICMICROPROGRAM FETCHROUTINE
During FETCH, Read an instruction from memory
and decode the instruction and update PC
Sequence of microoperations in the fetch cycle:
AR PC
DR M[AR], PC PC + 1
AR DR(0-10), CAR(2-5) DR(11-14), CAR(0,1,6) 0
Symbolic microprogram for the fetch cycle:
ORG 64
PCTAR U JMP NEXT
READ, INCPC U JMP NEXT
DRTAR U MAP
FETCH:
q p y
Binary equivalents translated by an assembler
1000000 110 000 000 00 00 1000001
1000001 000 100 101 00 00 1000010
1000010 101 000 000 00 11 0000000
Binary
address F1 F2 F3 CD BR AD
SYMBOLICMICROPROGRAM
Control Storage: 128 20-bit words
The first 64 words: Routines for the 16 machine instructions
The last 64 words: Used for other purpose (e.g., fetch routine and other subroutines)
Mapping: OP-code XXXX into 0XXXX00, the first address for the 16 routines are
0(0 0000 00), 4(0 0001 00), 8, 12, 16, 20, ..., 60
Partial Symbolic Microprogram
ORG 0
NOP
READ
ADD
ORG 4
NOP
NOP
NOP
ARTPC
ORG 8
NOP
ACTDR
WRITE
I
U
U
S
U
I
U
I
U
U
CALL
JMP
JMP
JMP
JMP
CALL
JMP
CALL
JMP
JMP
INDRCT
NEXT
FETCH
OVER
FETCH
INDRCT
FETCH
INDRCT
NEXT
FETCH
ADD:
BRANCH:
OVER:
STORE:
Label Microops CD BR AD
Partial Symbolic Microprogram
WRITE
ORG 12
NOP
READ
ACTDR, DRTAC
WRITE
ORG 64
PCTAR
READ, INCPC
DRTAR
READ
DRTAR
U
I
U
U
U
U
U
U
U
U
JMP
CALL
JMP
JMP
JMP
JMP
JMP
MAP
JMP
RET
FETCH
INDRCT
NEXT
NEXT
FETCH
NEXT
NEXT
NEXT
EXCHANGE:
FETCH:
INDRCT:
3/5/2011
5
Address Binary Microinstruction
Micro Routine Decimal Binary F1 F2 F3 CD BR AD
ADD 0 0000000 000 000 000 01 01 1000011
1 0000001 000 100 000 00 00 0000010
2 0000010 001 000 000 00 00 1000000
3 0000011 000 000 000 00 00 1000000
BRANCH 4 0000100 000 000 000 10 00 0000110
BINARYMICROPROGRAM
BRANCH 4 0000100 000 000 000 10 00 0000110
5 0000101 000 000 000 00 00 1000000
6 0000110 000 000 000 01 01 1000011
7 0000111 000 000 110 00 00 1000000
STORE 8 0001000 000 000 000 01 01 1000011
9 0001001 000 101 000 00 00 0001010
10 0001010 111 000 000 00 00 1000000
11 0001011 000 000 000 00 00 1000000
EXCHANGE 12 0001100 000 000 000 01 01 1000011
13 0001101 001 000 000 00 00 0001110
14 0001110 100 101 000 00 00 0001111
15 0001111 111 000 000 00 00 1000000
This microprogram can be implemented using ROM
FETCH 64 1000000 110 000 000 00 00 1000001
65 1000001 000 100 101 00 00 1000010
66 1000010 101 000 000 00 11 0000000
INDRCT 67 1000011 000 100 000 00 00 1000100
68 1000100 101 000 000 00 10 0000000
DESIGNOFCONTROLUNIT
DECODINGALUCONTROLINFORMATION
microoperation fields
3 x 8 decoder
F1
3 x 8 decoder
F2
3 x 8 decoder
F3
3 x 8 decoder
7 6 5 4 3 2 1 0
3 x 8 decoder
7 6 5 4 3 2 1 0
3 x 8 decoder
7 6 5 4 3 2 1 0
Arithmetic
logic and
shift unit
AND
ADD
DRTAC
AC
Load
From
PC
From
DR(0-10)
AC
DR
D
R
T
A
R
P
C
T
A
R
Select
0 1
Multiplexers
AR
Load Clock
MICROPROGRAMSEQUENCER
NEXTMICROINSTRUCTIONADDRESSLOGIC
Subroutine
CALL
3 2 1 0
S
S
1
0
MUX1
External
(MAP)
SBR
L
In-Line
RETURN form Subroutine
Branch, CALL Address
S
1
S
0
Address Source
00 CAR + 1, In-Line
01 SBR RETURN
10 CS(AD) Branch or CALL
CALL
S0
Incrementer
CAR Clock
Address
source
selection
Control Storage
10 CS(AD), Branch or CALL
11 MAP
MUX-1 selects an address from one of four sources and routes it into a CAR
- In-Line Sequencing CAR + 1
- Branch, Subroutine Call CS(AD)
- Return from Subroutine Output of SBR
- New Machine instruction MAP
MICROPROGRAMSEQUENCER
CONDITIONANDBRANCHCONTROL
Input
logic
I
0
I
T
MUX2
Select
1
I
S
Z
Test From
CPU
BR field
f CS
L(load SBR with PC)
for subroutine Call
S
0
for next address
L
logic
I
1
CD Field of CS
of CS
S
0
S
1
for next address
selection
I
0
I
1
T Meaning Source of Address S
1
S
0
L
000 In-Line CAR+1 00 0
001 JMP CS(AD) 10 0
010 In Line CAR+1 00 0
Input Logic
010 In-Line CAR+1 00 0
011 CALL CS(AD) and SBR <- CAR+1 10 1
10x RET SBR 01 0
11x MAP DR(11-14) 11 0
S
0
= I
0
S
1
= I
0
I
1
+ I
0
T
L = I
0
I
1
T
3/5/2011
6
MICROPROGRAMPROGRAMMEDCONTROL
3 2 1 0
S
1 MUX1
External
(MAP)
SBR
Load Input
logic
I
0
T
L
I
1
S
0
Incrementer
CAR
T
MUX2
Select
1
I
S
Z
Test
Clock
C t l
S
0
Control memory
Microops CD BR AD
. . . . . .
MICROINSTRUCTIONFORMAT
Information in a Microinstruction
- Control Information
- Sequencing Information
- Constant
Information which is useful when feeding into the system Information which is useful when feeding into the system
These information needs to be organized in some way for
- Efficient use of the microinstruction bits
- Fast decoding
Field Encoding
- Encoding the microinstruction bits
- Encoding slows down the execution speed Encoding slows down the execution speed
due to the decoding delay
- Encoding also reduces the flexibility due to
the decoding hardware
HORIZONTALANDVERTICAL
MICROINSTRUCTIONFORMAT
Horizontal Microinstructions
Each bit directly controls each micro-operation or each control point
Horizontal implies a long microinstruction word
Advantages: Can control a variety of components operating in parallel.
--> Advantage of efficient hardware utilization
Disadvantages: Control word bits are not fully utilized Disadvantages: Control word bits are not fully utilized
--> CS becomes large --> Costly
Vertical Microinstructions
A microinstruction format that is not horizontal
Vertical implies a short microinstruction word
Encoded Microinstruction fields
--> Needs decoding circuits for one or two levels of decoding
One-level decoding
Two-level decoding
Field A
2 bits
2 x 4
Decoder
3 x 8
Decoder
Field B
3 bits
1 of 4 1 of 8
Field A
2 bits
2 x 4
Decoder
6 x 64
Decoder
Field B
6 bits
Decoder and
selection logic
NANOSTORAGEANDNANOINSTRUCTION
The decoder circuits in a vertical microprogram
storage organization can be replaced by a ROM
=> Two levels of control storage
First level - Control Storage
Second level - Nano Storage
Two-level microprogram
First level
-Vertical format Microprogram
Second level
-Horizontal format Nanoprogram
- Interprets the microinstruction fields, thus converts a vertical
microinstruction format into a horizontal
i t ti f t nanoinstruction format.
Usually, the microprogram consists of a large number of short
microinstructions, while the nanoprogram contains fewer
words with longer nanoinstructions.
3/5/2011
7
TWOLEVELMICROPROGRAMMING EXAMPLE
* Microprogram: 2048 microinstructions of 200 bits each
* With 1-Level Control Storage: 2048 x 200 = 409,600 bits
* Assumption:
256 distinct microinstructions among 2048
* With 2-Level Control Storage:
Nano Storage: 256 x 200 bits to store 256 distinct nanoinstructions Nano Storage: 256 x 200 bits to store 256 distinct nanoinstructions
Control storage: 2048 x 8 bits
To address 256 nano storage locations 8 bits are needed
* Total 1-Level control storage: 409,600 bits
Total 2-Level control storage: 67,584 bits (256 x 200 + 2048 x 8)
11 bits
Control memory Control memory
2048 x 8
Microinstruction (8 bits)
Nanomemory address
Nanomemory
256 x 200
Nanoinstructions (200 bits)
3/5/2011
1
COMPUTERARITHMETIC
Addition and Subtraction
Multiplication Algorithms
Division Algorithms
Floating Point Arithmetic Operations
Decimal Arithmetic Unit
Decimal Arithmetic Operations
AdditionandSubtractionwithSigned
MagnitudeData
AdditionAlgorithm
TheMagnitudesoftwonumbersarerepresentedinAand
BB.
IfthesignbitsofAandBareidenticalthen,Addthe
magnitudesoftwonumberAandBandassignthesignof
numberAtotheresult.
IfthesignbitsofAandBaredifferentthen,Comparethe
magnitudesoftwonumbers,like
If(A>B),thenSubtractBfromAandthenassignthesignof
number A to the result numberAtotheresult.
If(A<B),thenSubtractAfromBandthenassignthe
complementedsignofnumberAtotheresult.
If(A=B),thenSubtractBfromAandthenassignthesignas
positivetotheresult.
AdditionandSubtractionwithSigned
MagnitudeData
SubtractionAlgorithm
TheMagnitudesoftwonumbersarerepresentedinAand
BB.
IfthesignbitsofAandBaredifferentthen,Addthe
magnitudesoftwonumberAandBandassignthesignof
numberAtotheresult.
IfthesignbitsofAandBareidenticalthen,Comparethe
magnitudesoftwonumbers,like
If(A>B),thenSubtractBfromAandthenassignthesignof
number A to the result numberAtotheresult.
If(A<B),thenSubtractAfromBandthenassignthe
complementedsignofnumberAtotheresult.
If(A=B),thenSubtractBfromAandthenassignthesignas
positivetotheresult.
CombinationsOfOperationsfor
AdditionandSubtractionofTwoNos
Operation Add Subtract Magnitudes Operation Add
Magnitudes
SubtractMagnitudes
A>B A<B A=B
(+A)+(+B) +(A+B)
(+A)+(B) +(AB) (BA) +(AB)
(A)+(+B) (AB) +(BA) +(AB)
(A)+(B) (A+B)
(+A)(+B) +(AB) (BA) +(AB)
(+A)(B) +(A+B)
(A)(+B) (A+B)
(A)(B) (AB) +(BA) +(AB)
3/5/2011
2
HardwareImlementation
A
s
and B
s
are sign bits of the two numbers.
For Addition Operation, Augend bits are stored in
Register A, and Addend bits are stored in Register B.
For Subtraction Operation, Minuend bits are stored in
Register A, and Subtrahend bits are stored in Register g , g
B.
The Flip Flop M is 0, then the bits in register B & A are
transferred to the parallel adder and the result is
stored in register A, Any carry is stored in the flip flop
E, which will be later transferred to the flip flop
AVF(Addition Overflow) AVF(Addition Overflow)
The Flip Flop M is 1, the parallel adder receives
complemented input from register B & Normal input
from Register A and bit value 1 from M.
FlowChartShowingTheAdditionand
SubtractionOperation
Subtraction Addition
AdditionandSubtractionusingSigned
2sComplement
HardwareImplementation
3/5/2011
3
FlowChartShowingtheAdditionandSubtraction
OperationUsingSigned2sComplement
MultiplicationOperation Signed
Magnitude
HardwareImplemetation
FlowChartforMultiplyOperation
MultiplicandbitsarestoredinregisterB,MultiplierBitsinRegister
Q.
SignbitofMultiplicandisstoredinflipflopB
s
andsignbitof
MultiplierisstoredinflipflopQ
s.
FlipflopEandtheregisterAisinitializedto0.Aftertheresultofthe
operationtheSignbitforAandQissetbyXORthesignbitofQ
s
andB
s.
TheSequenceCounterisinitializedwiththecountofthenumberof
bitsintheregisterQ.
Process:
1.TheLSBofQischeckedifit1then,
1. ContentsofregisterBandAareaddedandtheresultisstoredinEA.
2. ThecontentsofEAQareshiftedtotherightbyonepositionandthe
sequencecounterisdecrementedby1.
Else Else
ThecontentsofEAQareshiftedtotherightbyonepositionandthesequence
counterisdecrementedby1.
2.Thesequencecounterischeckedifitis0thentheresultisstoredin
RegisterAandQotherwiseStep1isrepeated.
3/5/2011
4
BoothAlgorithmforMultiplicationof
Signed2sComplement
HardwareImplementation
FlowChartofBoothMultiplication
Algorithm
MultiplicandbitsarestoredinregisterBR,MultiplierBitsinRegisterQR.
Q
n+1
=0andtheSequenceCounterisinitializedwiththecountofthe
numberofbitsintheregisterQ.
Process:
1. The Q
n
Q
n+1
bits are checked if it takes value 1 0, then the multiplicand
bits are subtracted from the bits in the register AC and the result is
stored in register AC and then the contents of AC and QR are shifted to
the right by one position leaving the sign bit in AC unchanged and the the right by one position leaving the sign bit in AC unchanged and the
sequence counter is decremented by 1.
else
if it takes value 0 1, then the multiplicand bits are added to the bits
in the register AC and the result is stored in register AC and then the
contents of AC and QR are shifted to the right by one position leaving the
sign bit in AC unchanged and the sequence counter is decremented by 1.
else
the contents of AC and QR are shifted to the right by one position g y p
leaving the sign bit in AC unchanged and the sequence counter is
decremented by 1.
2. Thesequencecounterischeckedifitis0thentheresultisstoredin
RegisterACandQRotherwiseStep1isrepeated.
RestoringDivisionAlgorithm
AnnbitpositivedivisorisloadedintoregisterMand
annbitpositivedividendisloadedintoregisterQ.
RegisterAissetto0.
Afterthedivision,thenbitquotientisinregisterQand
theremainderinregisterA.
Process:
Dothefollowingntimes
1. ShiftAandQleftbyoneposition. Q y p
2. SubtractMfromA,andplacetheresultinA.
3. IfthesignofAis1,setq
0
to0andaddMbacktoA
(restoreA);otherwisesetq
0
to1.
3/5/2011
5
NonRestoringDivisionAlgorithm
Process:
St 1 D th f ll i ti Step1: Dothefollowingntimes
1. IfthesignofAis0,ShiftAandQleftbyone
positionandSubtractMfromA;otherwiseshift
AandQleftandaddMtoA.
2. IfthesignofAis0,setq
0
to1otherwisesetq
0
to
00.
Step2:
IfthesignofAis1,addMtoA.
FloatingPointOperation Addition&
Subtraction
Registers
FlowchartforAddition&Subtraction
FloatingPointOperation
Multiplication
3/5/2011
6
FloatingPointOperation Division
FlowChartforDecimalAdder
FlowChartforDecimalMultiplication
FlowChartforDecimalDivision
3/5/2011
1
MEMORYSYSTEM
Basic Concepts
Semi conductor RAM memories
Read only memories
Cache memories
Performance consideration
Virtual Memories
Introduction to RAID
BasicConcepts
The number of bits in the address register(bus)
determine the maximum size of memory words
in memory. y
Most Modern computers use byte addressable format.
ost ode co pute s use byte add essab e o at
there are two ways of using byte addresses:
big endian
little endian
BigEndian
Word
Address
ByteAddress
0 0 1 2 3
4 4 5 6 7
8 8 9 10 11 8 8 9 10 11
LittleEndian
Word
Address
ByteAddress
0 3 2 1 0
4 7 6 5 4
8 11 10 9 8 8 11 10 9 8
3/5/2011
2
Below is shown a diagram showing the
interconnection between processor and
memory
MemoryAccesstime
thetimethatelapsesbetweentheinitiationof
theoperationandthecompletionofthat
ti operation.
Memorycycletime
thetimedelayrequiredbetweentheinitiation
oftwosuccessivememoryoperations.
RAM(RandomAccessMemory) ROM(ReadOnlyMemory)
ReadandWriteMemory ReadMemoryOnly
VolatileMemory NonVolatileMemory
Canaccessanymemorylocationfor
reading andwriting.
Canaccessanymemorylocationfor
readingonly.
Cache Memory CacheMemory:
Thesearememorieswhicharesmall,fastandexpensive.Thesetypeofmemories
isplacedinbetweenthemainmemoryandtheprocessor.
3/5/2011
3
SemiconductorMemories
Memoryisanarrayofcellsarrangedintheformofrows
andcolumns.
Eachcellcapableofstoringbitofinformation.
Eachrowofcellsareconnectedtowordlineandbitlines,
therowofcellsconstituteamemoryword.
Eachcolumnofcellsareconnectedtosense/writecircuits
whichinturnareconnectedtobidirectionaldataline.
Twocontrollines,CSareprovidedinadditiontothe
address and data lines addressanddatalines.
specifiestherequiredoperationtobeperformedand
theCSinputisusedtoselectagivenchipinamultichip
memorysystem.
InternalOrganizationofMemoryChip
Organizationof1kx1MemoryChip
StaticMemory
StaticRAMhavethecapacityofretaininginformationas
longasthepowersupplyisthere,butloosesinformationif
h i l f l fl i i l thereislossofpowersupplyorfluctuationinpowersupply.
StaticRAMisfastandexpensive.
StepsintheconstructionofSRAMcell.
twoinvertersarecrossconnectedtoformalatch.
oneendofthelatchisconnectedtothetransistorT1andin
turntothebitlineb.
AnotherendofthelatchisconnectedtothetransistorT2and
inturntothebitlineb.
ThetransistorsT1andT2areconnectedtothewordline.
belowisshownthepictorialrepresentationoftheSRAMcell.
3/5/2011
4
CMOSSRAM(ComplimentaryMetal
OxideSemiconductor)
StepsintheconstructionofCMOSSRAM
Transistors (T3,T5 ) and (T4,T6) form the
inverters in the latch.
In State 1, the voltage at point X is
maintained high by having Transistors T3 and
T6 on while T4 and T5 are off T6 on, while T4 and T5 are off.
if T1 and T2 are on(closed) then bit lines b
and b will have high and low signals.
3/5/2011
5
ReadOnlyMemories(ROM)
ROM (ReadOnlyMemory):nonvolatile,allowsfor
read only readonly.
ThecontentsoftheROMcellisdecidedbythe
manufacturer.
PROM:ProgrammableROM,whichallowsdatatobe
loadedbytheuser.
EPROM: Erasable PROM which allows stored data to
EPROM:ErasablePROM,whichallowsstoreddatato
beerasedphysically.
EEPROM:ElectricallyEPROM
3/5/2011
6
MemoryHierarchy
IncreasingSize
IncreasingSpeed
andCostPerBit
Processor
Registers Primary Cache
MainMemory
MagneticDisk
Registers
SecondaryCache
PrimaryCache
Tape
OpticalStorage
CACHEMEMORIES
CacheMemoriesaresmall,faststoragedevices
thatareusedtoimprovethespeedofaccess. thatareusedtoimprovethespeedofaccess.
Theeffectivenessofcachememoryisbasedon
animportantpropertycalledaslocalityof
reference.
localityofreference:Instructionsinsomepartof
theprogramareexecutedrepeatedlyatsome
time period timeperiod.
Therearetwoways:
TEMPORALLOCALITYOFREFERENCE
SPATIALLOCALITYOFREFERENCE
TemporalLocality:
If a data location is referenced then it will
tend to be referenced again soon.
Spatial Locality: SpatialLocality:
If a data location is referenced, data
location with nearby addresses will tend to be
referenced soon.
PointstoRemember:
Block referstosetofcontiguousaddressesofsome
size.
Cache memory can store a reasonable number of
blocks at any given time, but this number is small y g
compared to the total number of blocks in the main
memory. Correspondence between the main memory
blocks and those in cache is specified by mapping
function.
When the cache Memory is full and the referenced
word is not available in cache memory then the cache
control hardware must decide which block should be control hardware must decide which block should be
removed from memory. The collection of rules for
making this decision is called as replacement
algorithm.
3/5/2011
7
When CPU issues a read or write request, the cache
control circuitry determines whether the requested
word exists in the cache memory or not.
If it exists then the corresponding operation is
performed and we say that read or write hit has p y
occurred otherwise it is said to be read or write miss.
In read operation , main memory is not involved.
In write operation, the system can proceed in two
ways:
First Technique : write through Protocol
h h l d d d the cache location and main memory are updated
simultaneously.
Second Technique : Write Back or Copy Back
Protocol
The cache memory is updated first and the
block is marked with an flag bit called as dirty or
difi d bi modified bit.
When there is a need to accommodate a
word in main memory to the cache memory, and
the cache memory is full then the block containing
the marked word is replaced and this word is
updated in the main memory updated in the main memory.
MappingFunctions
DirectMapping
AssociateMapping
SetAssociativeMapping
MappingFunctions
ThreeTypes
DirectMapping
AssociativeMapping
Examplesassume64K(4Kx16words)main
memory and 2K (128 x 16 words) cache memoryand2K(128x16words)cache
1Blockconsistsof16words
3/5/2011
8
Wherecanablockbeplacedinacache?How
isablockfound?
MAINMEMORY
CACHE
Block0
Block1
Block127
Block128
Block0
Block1
TAG
TAG
Block126 TAG
MAPPING
FUNCTION
Block4095
Block127 TAG
4
16Bit
Address
Word
12
DirectMapping
Simplest
Blockjofmainmemorymapsontoblock(jmodulo128) j y p (j )
ofthecache.
Example:Block2103ofmainmemorymapstoblock
(2103mod128)=block55
Eachmainmemoryblockhasonlyoneplaceincache
Morethanoneblockcontendsforonlyonecache
position
BlockAddressMODNumberofBlocksinCache
DirectMapping
16bitaddress(64Kwords)
16 d bl k l 4 bit 16wordsperblocklower4bits
Cacheblockpositionmiddle7bits
Higher5bitsarestoredin5tagbitsassociated
withcachelocation
32blocksaremappedtothesameword pp
Higher5bitstellwhichofthe32blocksofthe
mainmemoryaremappedtocachememory.
Howisablockfoundifitisinthecache?
DirectMapping
Middle7bitsselectdeterminewhichlocation
i h i d incacheisused
Higherorder 5 bits of main memory are
matched with tag bits in cache to check if
desired block is the one stored in the cache
3/5/2011
9
DirectMapping
MAINMEMORY
Block0
Block1
Block127
Block128
Block0
Block1
CACHE
TAG
TAG
Block4095
Block127 TAG
7 5 4
16Bit
Address
Block Word Tag
AssociativeMapping
Ablockofmainmemorycanbemappedtoanyavailable
cache block location cacheblocklocation.
Higher12bitsarestoredintagbits
Howisablockfoundifitisinthecache?
AssociativeMapping
Tagbits(Higherorder12bits)ofanaddress
d ith t bit f h bl k t arecomparedwithtagbitsofeachblockto
checkifdesiredblockispresent
CostofAssociativecacheisHigherthandirect
mappingasthereisaneedtosearchall128
tags g
Tagsmustbesearchedinparallelfor
performancereasons
AssociativeMapping
MAINMEMORY
Block0
Block1
Block127
Block128
Block0
Block1
CACHE
TAG
TAG
Block4095
Block127 TAG
12 4
16Bit
Address
Word Tag
3/5/2011
10
Cacheblocksaregroupedintosets
A i bl k id i bl k Amainmemoryblockcanresideinanyblock
ofaspecificset
Lesscontentionthandirectmapping
Lesscostthanassociativemapping
Set=(BlockAddress)MOD(NumberofSetsin ( ) (
Cache)
kwaysetassociativecache:kblocksperset
Howisablockfoundifitisinthecache?Set
AssociativeMapping
Example:Cachegroupstwoblocksperset
64 t (6 bit t fi ld) 64sets(6bitsetfield)
64blockscanbemappedontooneset
Tagbitsineachcacheblockstoreupper6bits
ofaddresstotellwhichofthe64blocksare
currently in the cache currentlyinthecache
MAINMEMORY
CACHE
Block0
Block1
Block127
Block128
Block0
Block1
TAG
TAG
Set0
Block126 TAG
Set 63
Block4095
Block127 TAG
6 6 4
16Bit
Address
Set Word Tag
Set63
PerformanceConsiderations
Interleaving
HitRateandMissPenalty
CachesontheCPUchip
OtherEnhancements
WriteBuffer
Prefetching
LockupFreecache
3/5/2011
11
MemoryInterleaving
Each memory module has its own ABR (address
buffer register) and DBR (data buffer register)
Performance can be improved with Memory Performance can be improved with Memory
access operations proceeding in more than one
module at the same time.
Two approaches are there:
Approach 1:
The memory address generated by the CPU is
decode as shown below in the diagram.
Th hi h d k bi f d l d h The high order k bits name one of n modules and the
low order m bits name particular word in that
module.
Consecutive words are accessed in a module.
MemoryInterleaving
k m
Module AddressinModule MainMemoryAddress
Module
0
ABR DBR
Module
1
ABR DBR
Module
2
ABR DBR
CONSECUTIVEWORDSINAMODULE
Approach 2:
The memory address generated by the CPU is
decode as shown below in the diagram.
The low order k bits of the memory address
l t d l d th hi h d bit select a module and the high order m bits name a
location in that module.
Consecutive addresses are located in successive
modules.
MemoryInterleaving
k m
Module AddressinModule MainMemoryAddress
Module
0
ABR DBR
Module
1
ABR DBR
Module
2
ABR DBR
CONSECUTIVEWORDSINCONSECUTIVEMODULES
(INTERLEAVED)
3/5/2011
12
MeasuringCachePerformance
HitRate
Ratioofnumberofhitstonumberofallattemptedaccesses
MissRate
Ratioofnumberofmissestonumberofallattemptedacccesses
HitRateover0.9areessentialforhighperformance
computers.
Hitratedependsonthedesignofthecacheandonthe
instruction.
Mi P l MissPenalty
Extratimetobringdesiredblockintocache.
Theperformancecanbeimprovedbyminimizingmisspenalty.
UnifiedCacheversusSplitCache
UnifiedCache
containsbothinstructionsanddata
Splitcache
separatecachesforinstructionanddata
simultaneousfetchingofinstructionanddata
The performance can be improved by having Theperformancecanbeimprovedbyhaving
multiplecachesatdifferentlevels.
SecondLevelCache
T
ave
=h
1
C
1
+(1h
1
)h
2
C
2
+(1h
1
)(1h
2
)M
h hit rate for L1 cache h
1
hitrateforL1cache
h
2
hitrateforL2cache
C
1
accesstimeforL1cache
C
2
accesstimeforL2cache
M memoryaccesstime
The above formula is used to compute the
average time experienced by the CPU in two level
cache
OtherEnhancements WriteBuffer
In Write through protocol, each write
ti lt i iti l i th operation results in writing a new value in the
main memory and the CPU has to wait until
MFC is completed.
To improve performance a write buffer can be
included for temporary storage of write p y g
requests.
3/5/2011
13
OtherEnhancements Prefetching
Fetchdataintothecachebeforetheyare
d d needed
Twotypes
HardwarePrefetching
SoftwarePrefetching
HardwarePrefetching
Fetch instructions/ data directly into the cache
i t t l b ff or into an external buffer
Hardware can detect patterns in memory
references and fetch instructions and data in
advance
Next consecutive block may be prefetched Next consecutive block may be prefetched
into cache or external buffer
SoftwarePrefetching
Prefetch instructions can be inserted into a
program either by the programmer or by the program either by the programmer or by the
compiler
Register prefetch will load value into a register
Cache prefetch will load only into the cache
and not the register.
LockupFreeCache
Cache can support multiple outstanding
i misses
A read miss caused by one instruction would
cause stalling of execution of next
instructions, this can be overcomed by having
a lock up free cache . p
3/5/2011
14
VirtualMemories
When the processor requests for execution of
a program in secondary storage device which
cannot be accommodated in the physical cannot be accommodated in the physical
memory.
a technique that makes this possible is Virtual
Memory. According to this technique it moves
program and data automatically from
secondary storage device to physical main secondary storage device to physical main
memory and gives the illusion to the user as if
large main memory exists.
The processor issues address referencing an instruction
or data. This address is called as Virtual Address or
Logical Address.
This address is translated into physical address by
combination of hardware and software components. p
If the virtual address refers to a part of the program or
data that is currently in the physical main memory,
then that is accessed immediately.
Otherwise it is to be brought into a suitable location in
memory before they can be used.
A special nit called MMU(Memor Management Unit) A special unit called MMU(Memory Management Unit)
which implements virtual memory concept and which
performs the translation from virtual address to
physical address is shown in the figure given below:
AddressTranslationMechanism
To translate virtual address to physical address
th t d d t id i we assume that programs and data reside in
fixed length unit called pages.
Pages commonly range from 2k to 16k bytes in
length.
A virtual memory address translation A virtual memory address translation
mechanism based on fixed length pages is
shown below:
3/5/2011
15
3/5/2011
16
RAID(RedundantArrayofIndependent
Disks)
RedundantArrayofIndependentDisks
R d d A f I i Di k RedundantArrayofInexpensiveDisks
6levelsincommonuse
Setofphysicaldisksviewedassinglelogical
drivebyO/S
D t di t ib t d h i l d i Datadistributedacrossphysicaldrives
Canuseredundantcapacitytostoreparity
information
3/5/2011
17
RAID0
Noredundancy
Datastripedacrossalldisks
RoundRobinstriping
Increasespeed
Multipledatarequestsprobablynotonsamedisk
Disksseekinparallel
Asetofdataislikelytobestripedacrossmultiple
disks
RAID1
MirroredDisks
Data is striped across disks Dataisstripedacrossdisks
2copiesofeachstripeonseparatedisks
Readfromeither
Writetoboth
Recoveryissimple
Swapfaultydisk&remirror
Nodowntime
Expensive
RAID2
Disksaresynchronized
Verysmallstripes y p
Oftensinglebyte/word
Errorcorrectioncalculatedacrosscorrespondingbits
ondisks
MultipleparitydisksstoreHammingcodeerror
correctionincorrespondingpositions
Lotsofredundancy
Expensive
Notused
3/5/2011
18
RAID3
SimilartoRAID2
Onlyoneredundantdisk,nomatterhowlarge
thearray
Simpleparitybitforeachsetofcorresponding
bits
Data on failed drive can be reconstructed from Dataonfaileddrivecanbereconstructedfrom
survivingdataandparityinfo
Veryhightransferrates
RAID4
Eachdiskoperatesindependently
GoodforhighI/Orequestrate
Largestripes
Bitbybitparitycalculatedacrossstripeson
eachdisk
Paritystoredonparitydisk
RAID5
LikeRAID4
Paritystripedacrossalldisks
Roundrobinallocationforparitystripe
AvoidsRAID4bottleneckatparitydisk
Commonlyusedinnetworkservers
3/31/2011
1
PIPELININGANDVECTORPROCESSING
ParallelProcessing
l Pipelining
ArithmeticPipeline
InstructionPipeline
RISCPipeline
VectorProcessing
ArrayProcessors
PARALLELPROCESSING
ExecutionofConcurrentEvents inthecomputingprocesstoachieve
fasterComputationalSpeed
LevelsofParallelProcessing
JoborProgramlevel
TaskorProcedurelevel
p p
InterInstructionlevel
IntraInstructionlevel
PARALLELCOMPUTERS
ArchitecturalClassification
Flynn'sclassification
InstructionStream
NumberofDataStreams
SequenceofInstructionsreadfrommemory
DataStream
Operationsperformedonthedataintheprocessor
BasedonthemultiplicityofInstructionStreams andDataStreams
NumberofDataStreams
Numberof
Instruction
Streams
Single
Multiple
Single Multiple
SISD SIMD
MISD MIMD
SISD :
organization of single computer containing a control
unit, processor and memory unit.
Parallel processing can be achieved by means of
pipeline processing.
SIMD:
organization of computer contains multiple data streams.
consists of multiple processing units.
MISD:
Organization of computer with Multiple instruction set and
single data stream, no such computer is constructed with
this organization. g
MIMD:
Organization of computer capable of processing several
programs at the same time.
COMPUTERORGANIZATION SVEW 4
3/31/2011
2
Weconsiderparallelprocessingunderthe
followingtopics:
pipelineprocessing
t i vectorprocessing
arrayprocessing
COMPUTERARCHITECTURESFORPARALLELPROCESSING
VonNeuman
based
SISD
Superscalarprocessors
Superpipelinedprocessors
VLIW(VeryLongInstructionWord)
Dataflow
Reduction
MISD
SIMD
MIMD
Nonexistence
Arrayprocessors
Systolicarrays
Associativeprocessors
Sharedmemorymultiprocessors
b d
Reduction
Busbased
Crossbarswitchbased
MultistageINbased
Messagepassingmulticomputers
Hypercube
Mesh
Reconfigurable
SISDCOMPUTERSYSTEMS
Control
Unit
Processor
Unit
Memory
Instructionstream
Datastream
Characteristics
StandardvonNeumannmachine
Instructionsanddataarestoredinmemory
Oneoperationatatime
Limitations
VonNeumannbottleneck
Maximumspeedofthesystemislimitedbythe
MemoryBandwidth(bits/secorbytes/sec)
LimitationonMemoryBandwidth
MemoryissharedbyCPUandI/O
SISDPERFORMANCEIMPROVEMENTS
Multiprogramming
S li Spooling
Multifunctionprocessor
Pipelining
Exploitinginstructionlevelparallelism
Superscalar
Superpipelining
VLIW(VeryLongInstructionWord)
3/31/2011
3
MISDCOMPUTERSYSTEMS
M CU P
M CU P
Memory
M CU P
y
Instructionstream
Datastream
Characteristics
Thereisnocomputeratpresentthatcanbe
classifiedasMISD
SIMDCOMPUTERSYSTEMS
ControlUnit
Memory
Databus
Instructionstream
Alignmentnetwork
P P P
M M M
Datastream
Processorunits
Memorymodules
Characteristics
Onlyonecopyoftheprogramexists
Asinglecontrollerexecutesoneinstructionatatime
TYPESOFSIMDCOMPUTERS
ArrayProcessors
ThecontrolunitbroadcastsinstructionstoallPEs,andallactivePEsexecute
thesameinstructions
ILLIACIV,GF11,ConnectionMachine,DAP,MPP
SystolicArrays
Regulararrangementofalargenumberofverysimpleprocessorsconstructedon
VLSIcircuits
CMUWarp,PurdueCHIP
AssociativeProcessors
Contentaddressing
Datatransformationoperationsovermanysets
ofargumentswithasingleinstruction
STARAN,PEPE
MIMDCOMPUTERSYSTEMS
InterconnectionNetwork
P M P M P M
SharedMemory
Characteristics
Multipleprocessingunits
Executionofmultipleinstructionsonmultipledata
TypesofMIMDcomputersystems
Sharedmemorymultiprocessors
Messagepassingmulticomputers
3/31/2011
4
SHAREDMEMORYMULTIPROCESSORS
InterconnectionNetwork(IN)
M M M
Buses,
MultistageIN,
Crossbar Switch
Characteristics
Allprocessorshaveequallydirectaccesstoonelargememoryaddressspace
Examplesystems
Busandcachebasedsystems
SequentBalance,EncoreMultimax
Multistage INbased systems
P P P
MultistageINbasedsystems
Ultracomputer,Butterfly,RP3,HEP
Crossbarswitchbasedsystems
C.mmp,AlliantFX/8
Limitations
Memoryaccesslatency
Hotspotproblem
MESSAGEPASSINGMULTICOMPUTER
MessagePassingNetwork
P P P
M M M
Pointtopointconnections
Characteristics
Interconnectedcomputers
Eachprocessorhasitsownmemory,and
communicateviamessagepassing
Examplesystems
M M M
Treestructure:Teradata,DADO
Meshconnected:Rediflow,Series2010,JMachine
Hypercube:CosmicCube,iPSC,NCUBE,FPSTSeries,MarkIII
Limitations
Communicationoverhead
Hardtoprogramming
PIPELINING
Atechniqueofdecomposingasequentialprocessintosub
operations, witheachsubprocessbeing executedina
specialdedicatedsegment thatoperatesconcurrentlywith
all other segments allothersegments.
The Simplest way of viewing Pipelining structure is to image
that each segment consists of an input register followed by
combinational circuit.
The register holds the data and the combinational circuit
performs the sub operation in that particular segment.
p p p g
The output of the combinational circuit in a segment is
applied to the input register of the next segment.
Clock is applied to all registers after enough time.
Information flows through the pipeline one step at a time.
Example of Pipeline organization is shown below:
Suppose We want to perform the combined multiply
and add operations with a stream of numbers shown
below:
A
i
*B
i
+C
i
fori =1,2,3,...,7
Each segment has one or two registers and a
combinational circuit.
The multiplier and adder are combinational circuits.
The sub operations performed in each segment of
the pipeline are as follows: p p
R1 A
i
,R2 B
i
LoadA
i
andB
i
R3 R1*R2,R4 C
i
MultiplyandloadC
i
R5 R3+R4 Add
3/31/2011
5
A
i
*B
i
+C
i
fori =1,2,3,...,7
A
i Memory
B
i C
i
R1 R2
Multiplier
R3 R4
Adder
R5
Segment1
Segment2
Segment3
R1 A
i
,R2 B
i
LoadA
i
andB
i
R3 R1*R2,R4 C
i
MultiplyandloadC
i
R5 R3+R4 Add
R5
OPERATIONSINEACHPIPELINESTAGE
Clock
Pulse
Segment1 Segment2 Segment3
NumberR1R2R3R4 R5
1 A1 B1 1A1B1
2A2B2A1*B1C1
3A3B3A2*B2C2A1*B1+C1
8 A7*B7C7A6*B6+C6
9 A7*B7+C7
GENERALPIPELINE
GeneralStructureofa4SegmentPipeline
Clock
S R
1 1
S R
2 2
S R
3 3
S R
4 4
Input
The Operands pass through all four segments in a fixed sequence.
Each segment consists of a combinational circuit S
i
that performs a sub operation
over the data stream flowing through the pipe.
The segments are separated by registers R
i
that holds the intermediate results
between the stages.
g
Information flows between adjacent stages under the control of common clock.
Below is given spacetime diagram showing how tasks are executed in 4stage
segment pipeline.
The below given diagram shows six tasks T1 through T6 executed in four
segments.
Initially Task T1 is handled by segment 1.
After the first clock cycle the segment 2 is busy with T1 , while segment 1
is busy with task T2.
Continuing in the same manner The first Task T1 will be completed after
SpaceTimeDiagram
1 2 3 4 5 6 7 8 9
T1 T2 T3 T4 T5 T6
Clockcycles
Segment 1
Continuing in the same manner The first Task T1 will be completed after
the fourth cycle.
T1
T1
T1
T2
T2
T2
T3
T3
T3 T4
T4
T4 T5
T5
T5 T6
T6
T6 2
3
4
3/31/2011
6
PIPELINESPEEDUP
n:Numberoftaskstobeperformed
ConventionalMachine(NonPipelined)
t
n
:Clockcycle
Ti i d t l t th t k t
1
:Timerequiredtocompletethentasks
t
1
=n*t
n
PipelinedMachine(kstages)
t
p
:Clockcycle(timetocompleteeachsuboperation)
t
k
:Timerequiredtocompletethentasks
t
k
=(k+n 1)*t
p
Speedup Speedup
S
k
:Speedup
S
k
=n*t
n
/(k+n 1)*t
p
n
S
k
=
t
n
t
p
(=k,ift
n
=k*t
p
) lim
PIPELINEANDMULTIPLEFUNCTIONUNITS
Example
4stagepipeline
subopertion ineachstage;t
p
=20nS
100taskstobeexecuted
1taskinnonpipelinedsystem;20*4=80nS p p y ;
PipelinedSystem
(k+n 1)*t
p
=(4+99)*20=2060nS
NonPipelinedSystem
n*k*t
p
=100*80=8000nS
Speedup
S
k
=8000/2060=3.88
4StagePipelineisbasicallyidenticaltothesystem
P
1
I i
P
2
I i+1
P
3
I i+2
P
4
I i+3
MultipleFunctionalUnits
with4identicalfunctionunits
ArithmeticPipeline
It is normally used in high speed computers.
It is used to implement floating point
operations and multiplication of fixed point
numbers.
We will see with an example of how
arithmetic pipeline is implemented for floating arithmetic pipeline is implemented for floating
point addition and subtraction.
X=Ax2
a
Y=Bx2
b
Inputstothefloatingpointadderpipelinearetwonormalizedfloatingpointbinary
numbers.
AandBaretwofractionsthatrepresentthemantissaandaandbrepresentthe
exponents.
Floatingpointadditionandsubtractioncanbeperformedinfoursegments.
TheregisterslabeledRareplacedbetweenthesegmentstostoreintermediate
results.
Thesuboperationsthatareperformedinthefoursegmentsare:
comparetheexponents
alignthemantissas
addorsubtractthemantissas
normalize the result
normalizetheresult
3/31/2011
7
ARITHMETICPIPELINE
Floatingpointadder
[1]Comparetheexponents
X=Ax2
a
Y=Bx2
b
R
Compare
exponents
a b
Exponents
R
A B
Mantissas
Difference
Segment 1:
[2]Alignthemantissa
[3]Add/subthemantissa
[4]Normalizetheresult
exponents
by subtraction
R
Choose exponent Align mantissa
R
Add or subtract
mantissas
g
Segment 2:
Segment 3:
R
Normalize
result
R
R
Adjust
exponent
R
Segment 4:
INSTRUCTIONPIPELINE
SixPhases*inanInstructionCycle
[1]Fetchaninstructionfrommemory
[2]Decodetheinstruction
[3]Calculatetheeffectiveaddressoftheoperand
[4]Fetchtheoperandsfrommemory
[5] Execute the operation [5]Executetheoperation
[6]Storetheresultintheproperplace
*Someinstructionsskipsomephases
*Effectiveaddresscalculationcanbedoneinthepartofthedecodingphase
*Storageoftheoperationresultintoaregisterisdoneautomaticallyinthe
executionphase
==>4StagePipeline g p
[1]FI:Fetchaninstructionfrommemory
[2]DA:Decodetheinstructionandcalculate
theeffectiveaddressoftheoperand
[3]FO:Fetchtheoperand
[4]EX:Executetheoperation
INSTRUCTIONPIPELINE
ExecutionofThreeInstructionsina4StagePipeline
Conventional
FI DA FO EX
FI DA FO EX
FI DA FO EX
i
i+1
i+2
Conventional
Pipelined
FI DA FO EX
FI DA FO EX
FI DA FO EX
i
i+1
i+2
INSTRUCTIONEXECUTIONINA4STAGEPIPELINE
Fetchinstruction
frommemory
Decodeinstruction
andcalculate
effectiveaddress
B h?
Segment1:
Segment2:
Branch?
Fetchoperand
frommemory
Executeinstruction
Interrupt?
Interrupt
handling
UpdatePC
no
yes
yes
no
Segment3:
Segment4:
1 2 3 4 5 6 7 8 9 10 12 13 11
FI DA FO EX 1
FI DA FO EX
FI DA FO EX
FI DA FO EX
FI DA FO EX
FI DA FO EX
FI DA FO EX
2
3
4
5
6
7
FI
Step:
Instruction
(Branch)
Emptypipe
3/31/2011
8
MAJORHAZARDSININSTRUCTIONPIPELINEEXECUTION
1.Structuralhazards(ResourceConflicts)
Twosegmentsaccessingthememoryatthesametime.Theseconflictscanbe
resolvedusingseparateinstructionanddatamemories.
2.Datahazards(DataDependencyConflicts)
An instruction dependent on the result of a previous instruction Aninstructiondependentontheresultofapreviousinstruction
,whichisnotyetavailable
3.Controlhazards(BranchDifficulties)
BranchesandotherinstructionsthatchangethePCmakethefetchofthenext
instructiontobedelayed
OVERCOMINGDATAHAZARDS
HardwareTechnique
Datahazardcanbedealtwitheitherhardwaretechniquesorsoftware
technique
Interlock
hardwaredetectsthedatadependenciesanddelaysthescheduling
ofthedependentinstructionbystallingenoughclockcycles
Forwarding(bypassing,shortcircuiting)
Accomplishedbyadatapaththatroutesavaluefromasource
(usuallyanALU)toauser,bypassingadesignatedregister.This
allowsthevaluetobeproducedtobeusedatanearlierstageinthe
pipelinethanwouldotherwisebepossible
SoftwareTechnique
InstructionScheduling(compiler)fordelayedload compilersforsuchcomputers
isdesignedtodetectadataconflictandreordertheinstructionsasnecessary
todelaytheloadingoftheconflictingdata.
OVERCOMINGCONTROLHAZARDS
*PrefetchTargetInstruction Pre fetch Target Instruction
*BranchTargetBuffer
*LoopBuffer
*BranchPrediction
*DelayedBranch
CONTROLHAZARDS
PrefetchTargetInstruction
Fetchinstructionsinbothstreams,branchnottakenandbranchtaken
Botharesaveduntilbranch isexecuted.Then,selecttheright
instructionstreamanddiscardthewrongstream
BranchTargetBuffer(BTB;AssociativeMemory)
Entry: Address of previously executed branches; Target instruction Entry:Addressofpreviouslyexecutedbranches;Targetinstruction
andthenextfewinstructions
Whenfetchinganinstruction,searchBTB.
Iffound,fetchtheinstructionstreaminBTB;
Ifnot,newstreamisfetchedandupdateBTB
LoopBuffer(HighSpeedRegisterfile)
Storageofentireloopthatallowstoexecutealoopwithoutaccessingmemory
BranchPrediction
Guessingthebranchcondition,andfetchaninstructionstreambasedon
th C t li i t th b h lt theguess.Correctguesseliminatesthebranchpenalty
DelayedBranch
Compilerdetectsthebranchandrearrangestheinstructionsequence
byinsertingusefulinstructionsthatkeepthepipelinebusy
inthepresenceofabranchinstruction
3/31/2011
9
RISCPIPELINE
RISC
Machinewithaveryfastclockcyclethat executesattherateofone
instructionpercycle
< SimpleInstructionSet
FixedLengthInstructionFormat
R i t t R i t O ti
InstructionCyclesofThreeStageInstructionPipeline
RegistertoRegisterOperations
DataManipulationInstructions
I:InstructionFetch
A:Decode,ReadRegisters,ALUOperations
E:WritetheoutputtotheRegister
LoadandStoreInstructions
I:InstructionFetch
A:Decode,EvaluateEffectiveAddress
E:RegistertoMemoryorMemorytoRegister
ProgramControlInstructions
I:InstructionFetch
A:Decode,EvaluateBranchAddress
E:WritetheoutputtotheRegister(PC)
DELAYEDLOAD
Threesegmentpipelinetiming
LOAD: R1 M[address1]
LOAD: R2 M[address2]
ADD: R3 R1+R2
STORE: M[address3] R3
g p p g
Pipelinetimingwithdataconflict
clockcycle123456
LoadR1IAE
LoadR2IAE
AddR1+R2IAE
StoreR3IAE
Pipelinetimingwithdelayedload
clockcycle1234567
LoadR1IAE
LoadR2IAE
NOPIAE
AddR1+R2IAE
StoreR3IAE
Thedatadependencyistaken
carebythecompilerrather
thanthehardware
DELAYEDBRANCH
Compileranalyzestheinstructionsbeforeandafter
thebranchandrearrangestheprogramsequenceby
insertingusefulinstructionsinthedelaysteps
Usingnooperationinstructions
1
I
3 4 6 5 2 Clock cycles:
1. Load A
2. Increment
4. Subtract
5. Branch to X
7
3. Add
8
6. NOP
E
I A E
I A E
I A E
I A E
I A E
9 10
7. NOP
8. Instr. in X
I A E
I A E
Rearranging the instructions
1
I
3 4 6 5 2 Clock cycles:
1. Load A
2. Increment
4. Add
5. Subtract
7
3. Branch to X
8
6. Instr. in X
E
I A E
I A E
I A E
I A E
I A E
Rearrangingtheinstructions
VECTORPROCESSING
VectorProcessor(computer)
Abilitytoprocessvectors,andrelateddatastructuressuchasmatrices
andmultidimensionalarrays,muchfasterthanconventionalcomputers
VectorProcessingApplications
Problemsthatcanbeefficientlyformulatedintermsofvectors
Longrangeweatherforecasting
Petroleumexplorations
Seismicdataanalysis
Medicaldiagnosis
Aerodynamicsandspaceflightsimulations
Artificialintelligenceandexpertsystems
Mappingthehumangenome
Imageprocessing
VectorProcessorsmayalsobepipelined
3/31/2011
10
VECTORPROGRAMMING
DO20I=1,100
20C(I)=B(I)+A(I)
Conventionalcomputer
InitializeI=0
20ReadA(I)
ReadB(I)
StoreC(I)=A(I)+B(I)
IncrementI=i+1
IfIs 100goto20
Vectorcomputer
C(1:100)=A(1:100)+B(1:100)
VECTORINSTRUCTIONS
f1:V- V
f2:V- S
f3:VxV- V
f4:VxS- V
V:Vectoroperand
S:Scalaroperand
Type Mnemonic Description(I=1,...,n)
f1VSQR VectorsquarerootB(I)- SQR(A(I))
VSIN Vector sine B(I) - sin(A(I)) VSIN Vectorsine B(I)- sin(A(I))
VCOM VectorcomplementA(I)- A(I)
f2VSUM VectorsummationS- E A(I)
VMAX Vectormaximum S- max{A(I)}
f3VADD Vectoradd C(I)- A(I)+B(I)
VMPY Vectormultiply C(I)- A(I)*B(I)
VAND VectorAND C(I)- A(I).B(I)
VLAR Vectorlarger C(I)- max(A(I),B(I))
VTGE Vectortest> C(I)- 0ifA(I)<B(I)
C(I)- 1ifA(I)>B(I)
f4SADD VectorscalaraddB(I)- S+A(I)
SDIV VectorscalardivideB(I)- A(I)/S
VECTORINSTRUCTIONFORMAT
Operation
code
Base address
source 1
Base address
source 2
Base address
destination
Vector
length
VectorInstructionFormat
Source
A
Source
B
Multiplier
pipeline
Adder
pipeline
PipelineforInnerProduct
MULTIPLEMEMORYMODULEANDINTERLEAVING
MultipleModuleMemory
AR AR AR AR
Addressbus
M0M1M2M3
AR
Memory
array
DR
AR
Memory
array
DR
AR
Memory
array
DR
AR
Memory
array
DR
AddressInterleaving
Differentsetsofaddressesareassignedtodifferentmemorymodules
Databus
3/31/2011
11
SuperScalarProcessors
It has a form of parallelism on a single chip
allowing the system as whole to run much faster
than it would be otherwise than it would be otherwise.
It fetches, executes and return results from more
than one instruction during a single pipeline
stage.
Scalar processor operates on one data item at a
time.
Vector processor, a single instruction operates on
multiple data items.
Super scalar processor is a mixture of the two.
Supercomputers
A computer that has support for vector
instructions and pipelined floating point instructions and pipelined floating point
arithmetic operations.
These computers are very powerful and are
used in scientific computations.
Super computers are equipped with scalar and
vector processors, multiple functional units,
and the components are packed tightly to
minimize the distance for the signals to travel.
ArrayProcessors
Itisatypeofprocessorthatperformscomputationson
largearraysofdata.
Therearetwotypesofprocessorsnamely
Attachedarrayprocessor
SIMDarrayprocessor
AttachedArrayProcessor:
Anauxillary processorattachedtothegeneralcomputer.
SIMDArrayProcessor:
itisacomputerwithmultipleprocessingunitsoperatingin
parallel.
Thegeneralblockdiagramofthearrayprocessorisshown
below:
Master
Control
Unit
PE1 M1
M2
PE2
Itcontainsasetofidenticalprocessingelements,eachhavingalocalmemoryM.
Each Processor Element includes an ALU floating point arithmetic unit and
Main
Memory
PEn Mn
Databus
EachProcessorElementincludesanALU,floatingpointarithmeticunitand
Workingregisters.
Themastercontrolunitcontrolstheoperationsintheprocessorelements.
Themainmemoryisusedforthestorageoftheprogram.
Scalarandprogramcontrolinstructionsaredirectlyexecutedwithinthemaster
Controlunit.
VectorinstructionsarebroadcastedtoallPEssimultaneously.
3/31/2011
1
MULTIPROCESSORS
CharacteristicsofMultiprocessors
InterconnectionStructures
InterprocessorArbitration
InterprocessorCommunication
andSynchronization
CacheCoherence
SharedMemoryMultiprocessors
CharacteristicsofMultiprocessors
Amultiprocessorsystemisaninterconnectionoftwo
or more CPUs with memory and inputoutput ormoreCPUswithmemoryandinput output
equipment.
Multiprocessingimprovesthereliability ofthesystem,
asfailureononeparthaslimitedeffectontherestof
thesystem.
Thesystemderivesitshighperformancefromthefact
that computations can be done: thatcomputationscanbedone:
Multipleindependentjobscanbemadetooperatein
parallel.
Asinglejobcanbepartitionedintomultipleparalleltasks.
BasedontheMemoryOrganization,Amultiprocessingsystemcan
beclassifiedas:
SharedMemoryMultiprocessororTightlycoupledSystems
DistributedMemoryorLooselyCoupledsystems
TightlyCoupledSystem
Tasksand/orprocessorscommunicateinahighlysynchronized
fashion
Communicatesthroughacommonsharedmemory
itisalsocalledasSharedmemorysystem
Loosely Coupled System LooselyCoupledSystem
Tasksorprocessorsdonotcommunicateinasynchronizedfashion
Communicatesbymessagepassingpackets
ithasitsownprivatelocalmemory.
itisalsocalledasDistributedmemorysystem
MEMORY
Shared(Global)Memory
AGlobalMemorySpaceaccessiblebyallprocessors
Processorsmayalsohavesomelocalmemory
Distributed(Local,MessagePassing)Memory
Allmemoryunitsareassociatedwithprocessors
SHAREDMEMORY
DISTRIBUTED MEMORY
y p
Toretrieveinformationfromanotherprocessor's
memoryamessagemustbesentthere
UniformMemory
Allprocessorstakethesametimetoreachallmemorylocations
Nonuniform(NUMA)Memory
Memoryaccessisnotuniform
Network
Processors
Memory
Network
Processors/Memory
DISTRIBUTEDMEMORY
3/31/2011
2
ComponentsthatformaMultiprocessorsystemareCPUs,IOPsconnectedtoinput
outputdevicesandamemoryunitthatmaybepartitionedintoanumberofseparate
modules.
Thereareseveralphysicalformsthatshowstheinterconnectionbetweenthese
components.Someoftheformsarelistedbelow:
INTERCONNECTIONSTRUCTURES
*TimeSharedCommonBus
*MultiportMemory
*CrossbarSwitch
*MultistageSwitchingNetwork
*HypercubeSystem
Acollectionofsignallinesthatcarrymoduletomodulecommunication
Data highways connecting several digital system elements
BUS
Bus
Allprocessors(andmemory)areconnectedtoa commonbusorbusses
Memoryaccessisfairlyuniform,butnotveryscalable
Datahighwaysconnectingseveraldigitalsystemelements
OperationsofBus
Bus
M3wishestocommunicatewithS5
[1] M3 sends signals (address) on the bus that causes S5 to respond
Devices
M3 S7 M6 S5
M4
S2
[1]M3sendssignals(address)onthebusthatcauses S5torespond
[2]M3sendsdatatoS5orS5sendsdatatoM3(determinedbythecommandline)
MasterDevice:Devicethatinitiatesandcontrolsthecommunication
SlaveDevice:Respondingdevice
Multiplemasterbuses
>Busconflict
>needbusarbitration
SYSTEMBUSSTRUCTUREFORMULTIPROCESSORS
LocalBus
Common
Shared
Memory
System
B
CPU IOP
Local
System
Bus
CPU
Local
M
System
Bus
Controller
CPU IOP
Local
Memory
SYSTEMBUS
Bus
Controller
CPU IOP
Memory
Bus
Controller
CPU
Memory
LocalBus LocalBus
MULTIPORTMEMORY
MultiportMemoryModule
EachportservesaCPU
MemoryModuleControlLogic
Eachmemorymodulehascontrollogic
R l d l fli t Fi d i it CPU ResolvememorymoduleconflictsFixedpriorityamongCPUs
Advantages
Multiplepaths>hightransferrate
Disadvantages
Memorycontrollogic
Largenumberofcablesand connections
MM1 MM2 MM3 MM4
CPU 1
MemoryModules
CPU1
CPU2
CPU3
CPU4
3/31/2011
3
CROSSBARSWITCH
MM1
CPU1
CPU2
Memorymodules
MM2 MM3 MM4
CPU2
CPU3
CPU4
BlockDiagramofCrossbarSwitch
data
}
data,address,and
controlfromCPU1
Memory
Module
data
address
R/W
memory
enable
}
}
}
data,address,and
controlfromCPU2
data,address,and
controlfromCPU3
data,address,and
controlfromCPU4
Multiplexers
and
arbitration
logic
MULTISTAGESWITCHINGNETWORK
A
0
A
0
InterstageSwitch
B
1
Aconnectedto0
B
1
Aconnectedto1
A
B
0
1
A
B
0
1
Bconnectedto0 Bconnectedto1
MULTISTAGEINTERCONNECTIONNETWORK
0
1
000
001
0
1
010
011
0
1
0
1
P1
P2
BinaryTreewith2x2Switches
0
1
100
101
0
1
110
111
0
1
8x8OmegaSwitchingNetwork
0
1
000
001
2
3
4
5
6
7
010
011
100
101
110
111
HYPERCUBEINTERCONNECTION
p=2
n
processorsareconceptuallyonthecornersofa
n dimensional hypercube and each is directly
ndimensionalhypercube(binaryncube)
ndimensionalhypercube,andeachisdirectly
connectedtothenneighboringnodes
Degree=n
11 01 0
010
110
011
111
Onecube Twocube Threecube
1 00 10
101
100
001
000
3/31/2011
4
INTERPROCESSORARBITRATION
Bus
Boardlevelbus
Backplanelevelbus
Interfacelevelbus
S t B A B k l l l b SystemBus ABackplanelevelbus
PrintedCircuitBoard
ConnectsCPU,IOP,andMemory
EachofCPU,IOP,andMemoryboardcanbe
pluggedintoaslotinthebackplane(systembus)
Bussignalsaregroupedinto3groups
Data,Address,andControl(pluspower)
e.g.IEEEstandard796bus
86lines
D t 16( lti l f 8)
OnlyoneofCPU,IOP,andMemorycanbe
grantedtousethebusatatime
Arbitrationmechanismisneededtohandle
multiplerequests
Data:16(multipleof8)
Address:24
Control:26
Power:20
SYNCHRONOUS&ASYNCHRONOUSDATATRANSFER
SynchronousBus
Eachdataitemistransferredoveratimesliceknowntobothsourceand
destinationunit
Commonclocksource
Orseparateclockandsynchronizationsignalistransmittedperiodicallyto
synchronize the clocks in the system synchronizetheclocksinthesystem
AsynchronousBus
*EachdataitemistransferredbyHandshake mechanism
Unitthattransmitsthedatatransmitsacontrolsignalthatindicatesthe
presenceofdata
Unitthatreceivingthedatarespondswith anothercontrolsignalto
acknowledgethereceiptofthedata
INTERPROCESSORARBITRATIONSTATICARBITRATION
SerialArbitrationProcedure
Bus
arbiter1
PI PO Bus
arbiter2
PI PO Bus
arbiter3
PI PO Bus
arbiter4
PI PO
Highest
priority
1
Tonext
arbiter
ParallelArbitrationProcedure
Busbusyline
Bus
arbiter1
Ack Req
Bus
arbiter2
Ack Req
Bus
arbiter3
Ack Req
Bus
arbiter4
Ack Req
Busbusyline
4x2
Priorityencoder
2x4
Decoder
INTERPROCESSORARBITRATIONDYNAMICARBITRATION
Prioritiesoftheunitscanbedynamicallychangeable
whilethesystemisinoperation
TimeSlice
Fixedlengthtimesliceisgivensequentiallyto
h d bi f hi eachprocessor,roundrobinfashion
Polling
Unitaddresspolling Buscontrolleradvances
theaddresstoidentifytherequestingunit
LRU Theprocessorwhichwasnotusedforthelongestperiodoftimeortheonewhichis
usedrecentlyisgivenhigherpriorty.
FIFO The processor that makes request first will be the one which will get serviced first FIFO Theprocessorthatmakesrequestfirstwillbetheonewhichwillgetservicedfirst
RotatingDaisyChain
ConventionalDaisyChain Highestprioritytothenearestunittothebuscontroller
RotatingDaisyChain Highestprioritytotheunitthatisnearesttotheunitthathas
mostrecentlyaccessedthebus(it
becomesthebuscontroller)
3/31/2011
5
INTERPROCESSORCOMMUNICATION
InterprocessorCommunication SharedMemory
CommunicationArea
Receiver(s)
Mark
Sending
Processor
Receiving
Processor
Receiving
Processor
Message
Receiving
Processor
.
.
.
Message
SharedMemory
Sending
Receiving
Processor
Interrupt
Receiver(s)
Mark
Sending
Processor
Receiving
Processor
Receiving
Processor
.
.
.
Message
Instruction
CommunicationArea
INTERPROCESSORSYNCHRONIZATION
Synchronization
Communicationofcontrolinformationbetweenprocessors
Toenforcethecorrectsequenceofprocesses
Toensuremutuallyexclusiveaccesstosharedwritabledata
Hardware Implementation HardwareImplementation
MutualExclusionwithaSemaphore
MutualExclusion
Oneprocessortoexcludeorlockoutaccesstosharedresourceby
otherprocessorswhenitisinaCriticalSection
CriticalSectionisaprogramsequencethat,
oncebegun,mustcompleteexecutionbefore
anotherprocessoraccessesthesamesharedresource
Semaphore
Abinaryvariable
1:Aprocessorisexecutingacriticalsection,
thatnotavailabletootherprocessors
0:Availabletoanyrequestingprocessor
SoftwarecontrolledFlagthatisstoredin
memorythatallprocessorscanbeaccess
SEMAPHORE
TestingandSettingtheSemaphore
Avoidtwoormoreprocessorstestorsetthesamesemaphore
Maycausetwoormoreprocessorsenterthe
samecriticalsectionatthesametime
Must be implemented with an indivisible operation Mustbeimplementedwithanindivisibleoperation
R< M[SEM]/Testsemaphore/
M[SEM]< 1/Setsemaphore/
Thesearebeingdonewhilelocked,sothatotherprocessorscannottest
andsetwhilecurrentprocessorisbeingexecutingtheseinstructions
IfR=1,anotherprocessorisexecutingthe
critical section, the processor executed criticalsection,theprocessorexecuted
thisinstructiondoesnotaccessthe
sharedmemory
IfR=0,availableforaccess,setthesemaphoreto1andaccess
Thelastinstructionintheprogrammustclearthesemaphore
CACHECOHERENCE
CachesareCoherent
X=52
X=52 X=52 X=52
Mainmemory
Caches
Bus
CacheIncoherency in
WriteThroughPolicy
X=120
X=120
P1
X=52
P2
X=52
P3
Mainmemory
Caches
Processors
Bus
P1 P2 P3 Processors
CacheIncoherencyinWriteBackPolicy
X=52
X=120
P1
X=52
P2
X=52
P3
Mainmemory
Caches
Processors
Bus
3/31/2011
6
MAINTAININGCACHECOHERENCY
SharedCache
Disallowprivatecache
Accesstimedelay
SoftwareApproaches
*ReadOnlyDataareCacheable
Private Cache is for ReadOnly data PrivateCacheisforRead Onlydata
SharedWritableDataarenotcacheable
Compilertagsdataascacheableandnoncacheable
Cacheable(nonsharedandreadable)andnoncacheable(sharedandwriteable)
Allcacheabledataarestoredinprivatecaches.
Allnoncacheabledataarestoredinmainmemory.
Degradeperformanceduetosoftwareoverhead
*CentralizedGlobalTable
StatusofeachmemoryblockismaintainedinCGT:RO(ReadOnly);RW(ReadandWrite)
AllcachescanhavecopiesofROblocks
OnlyonecachecanhaveacopyofRWblock
HardwareApproaches
*SnoopyCacheController
CacheControllersmonitorallthebusrequestsfromCPUsandIOPs
Allcachesattachedtothebusmonitorthewriteoperations
Whenawordinacacheiswritten,memoryisalsoupdated(writethrough)
Localsnoopycontrollersinallothercacheschecktheirmemorytodetermineiftheyhave
acopyofthatword;Iftheyhave,thatlocationismarkedinvalid(futurereferenceto
thislocationcausescachemiss)
SHAREDMEMORYMULTIPROCESSORS
InterconnectionNetwork(IN)
M M M
Buses,
MultistageIN,
Crossbar Switch
Characteristics
Allprocessorshaveequallydirectaccesstoonelargememoryaddressspace
Examplesystems
Busandcachebasedsystems
SequentBalance,EncoreMultimax
Multistage INbased systems
P P P
MultistageINbasedsystems
Ultracomputer,Butterfly,RP3,HEP
Crossbarswitchbasedsystems
C.mmp,AlliantFX/8
Limitations
Memoryaccesslatency
Hotspotproblem

Co - Svew

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Co - Svew

Caricato da

Copyright:

Formati disponibili

1/6/2011

Potrebbero piacerti anche