Sei sulla pagina 1di 25

INSTRUCTIONS AND INSTRUCTION SEQUENCING

Data transfers between the memory and the processor registers


Arithmetic and logic operations on data
Program sequencing and control
I/O transfers
Register Transfer Notation
Identify a location by a symbolic name standing for its hardware binary address (LOC, R0,
)
Contents of a location are denoted by placing square brackets around the name of the
location (R1[LOC], R3 [R1]+[R2])
Register Transfer Notation (RTN)
Assembly Language Notation
Represent machine instructions and programs.
Move LOC, R1 = R1[LOC]
Add R1, R2, R3 = R3 [R1]+[R2]
Basic instruction types-4 types
Three address instructions- Add A,B,C
A, B-source operands
C-destination operands
Two address instructions-Add A,B
B <[A] + [B]
One address instructions Add A
Add contents of A to accumulator & store sum back to accumulator.
Zero address instructions
Instruction store operands in a structure called push down stack.
Instruction Formats
Three-Address Instructions
o ADD
R1, R2, R3
R1 R2 + R3
Two-Address Instructions
o ADD
R1, R2
R1 R1 + R2
One-Address Instructions
o ADD
M
AC AC + M[AR]
Zero-Address Instructions
o ADD
TOS TOS + (TOS 1)
RISC Instructions
o Lots of registers. Memory is restricted to Load & Store

Example: Evaluate (A+B) (C+D)


Three-Address
1. ADD R1, A, B
; R1 M[A] + M[B]
2. ADD R2, C, D
; R2 M[C] + M[D]
3. MUL X, R1, R2
; M[X] R1 R2
Example: Evaluate (A+B) (C+D)
Two-Address
1. MOV R1, A ; R1 M[A]
2. ADD R1, B ; R1 R1 + M[B]
3. MOV R2, C ; R2 M[C]
4. ADD R2, D ; R2 R2 + M[D]
5. MUL R1, R2 ; R1 R1 R2
6. MOV X, R1 ; M[X] R1
Example: Evaluate (A+B) (C+D)
One-Address
1. LOAD A
; AC M[A]
2. ADD B
; AC AC + M[B]
3. STORE
T
; M[T] AC
4. LOAD C
; AC M[C]
5. ADD D
; AC AC + M[D]
6. MUL T
; AC AC M[T]
7. STORE
X
; M[X] AC
Example: Evaluate (A+B) (C+D)
Zero-Address
1. PUSH A
; TOS A
2. PUSH B
; TOS B
3. ADD
; TOS (A + B)
4. PUSH C
; TOS C
5. PUSH D
; TOS D
6. ADD
; TOS (C + D)
7. MUL
; TOS (C+D)(A+B)
8. POP X
; M[X] TOS
Instruction Execution and Straight-Line Sequencing

The processor control circuits use information in PC to fetch & execute instructions one at a
time in order of increasing address.
This is called straight line sequencing.
Executing an instruction-2 phase procedures.

1st phaseinstruction fetch-instruction is fetched from memory location whose address is


in PC.
This instruction is placed in instruction register in processor
2nd phase-instruction execute-instruction in IR is examined to determine which operation
to be performed.

Branching
Branch-type of instruction loads a new value into program counter.
So processor fetches & executes instruction at this new address called branch target
Conditional branch-causes a branch if a specified condition is satisfied.
E.g. Branch>0 LOOP conditional branch instruction .it executes only if it satisfies
condition.
A straight-line program for adding n numbers

Using a loop to add n numbers

Condition codes

Recording required information in individual bits called condition code flags.


These flags are grouped together in a special processor register called condition code
register or status register
Individual condition code flags-1 or 0.

Condition Codes

Condition code flags

Condition code register / status register

N (negative)

Z (zero)

V (overflow)

C (carry)

Different instructions affect different flags

Four commonly used flags are


N(negative)
Set to 1 if the result is negative; otherwise, cleared to 0
Z(zero)
Set to 1 if the result is 0; otherwise, cleared to 0
V(overflow)
Set ot1 if arithmetic overflow occurs; otherwise, cleared to 0
C(carry) Set to 1 if a carry-out results from the operation; otherwise, cleared to 0

INSTRUCTION SET ARCHITECTURE

Superscalar processor --can execute more than one instruction per cycle.
Cycle--smallest unit of time in a processor.

Parallelism--the ability to do more than one thing at once.

Pipelining--overlapping parts of a large task to increase throughput without


decreasing latency

Instruction Set Architecture (ISA)

The Instruction Set Architecture (ISA) is the part of the processor that is visible to the
programmer or compiler writer. The ISA serves as the boundary between software and hardware.
We will briefly describe the instruction sets found in many of the microprocessors used today.
The ISA of a processor can be described using 5 categories:
The 3 most common types of ISAs are:
1. Stack - The operands are implicitly on top of the stack.
2. Accumulator - One operand is implicitly the accumulator.
3. General Purpose Register (GPR) - All operands are explicitly mentioned, they are either registers or
memory locations.

Lets look at the assembly code of


A = B + C;

in all 3 architectures:
Stack
PUSH A
PUSH B
ADD
POP C

Accumulator
LOAD A
ADD B
STORE C
-

GPR
LOAD R1,A
ADD R1,B
STORE R1,C
-

Stack
Advantages: Simple Model of expression evaluation (reverse polish). Short instructions.
Disadvantages: A stack can't be randomly accessed This makes it hard to generate eficient code.
The stack itself is accessed every operation and becomes a bottleneck.
Accumulator
Advantages: Short instructions.
Disadvantages: The accumulator is only temporary storage so memory traffic is the highest for
this approach.
GPR
Advantages: Makes code generation easy. Data can be stored for long periods in registers.
Disadvantages: All operands must be named leading to longer instructions.
Earlier CPUs were of the first 2 types but in the last 15 years all CPUs made are GPR processors.
The 2 major reasons are that registers are faster than memory, the more data that can be kept
internally in the CPU the faster the program wil run. The other reason is that registers are easier
for a compiler to use.
ADDRESSING MODES
The different ways in which location of an operand is specified in an instruction are referred as
addressing modes.
TYPES OF ADDRESSING MODES
Variable-represented by allocating a register or memory location to hold its value.
1. REGISTER MODE
The operand is the contents of processor register; name of register is given in instruction.
E.g. Move Loc, R2.
Processor registers are used as temporary storage locations where data in a register are
accessed using register mode.
2. ABSOLUTE MODE (OR) DIRECT MODE
The operand is in a memory location, the address of this location is given explicitly in
the instruction.

E.g. Integer A, B
Absolute mode is used to access these variables.
3.
IMMEDIATE MODE
Address and data constants-represented in assembly language using immediate mode.
Operand is given explicitly in the instruction.
E.g. Move #200, R0
(#)-value is used as an immediate operand.
Mainly used to specify value of a source operand.
4.
INDIRECT MODE
Memory address of an operand can be determined by instruction.
Address-called Effective Address (EA) of an operand.
EA of an operand contents of a register.
When absolute mode-not available, indirect addressing through registers use to access
global variables.
5.
INDEX MODE
Deals With lists and arrays.
EA-generated by adding constant value to contents of register.
Index registers one of set of general purpose registers in a processor.
E.g. X(Ri)
X-constant value in instruction.
Ri-name of the register involved.
EA=X+[Ri]
Second register is used, index mode-(Ri, Rj).
EA-sum of contents of registers Ri, Rj.
Second register-base register.e.g X(Ri,Rj)
EA=X+[Ri]+[Rj]
Gives more flexibility.

6.

RELATIVE MODE
EA-for index mode is given using program counter.
This mode used to access data operands.
Common use-specify target address in branch instruction.
E.g. Branch>0 Loop.
Program execution got to branch target location identified by name loop if branch
condition is satisfied.
7.
AUTO INCREMENT MODE.
Useful for accessing data items in successive locations in memory.
EA of an operand contents of register specified in instruction.
After accessing operand contents of register is automatically incremented to point to
next item in a list. E.g. (Ri)+
Increment amount 1 for byte specified operands.
2 for 16-bit operands.
4 for 32-bit operands.
8.
AUTODECREMENT MODE.
Contents of register specified in instruction are first automatically decremented & used as
a EA of the operand.
E.g. (Ri)
Minus sign indicate contents to be decremented before being used as EA.

Operands are accessed in descending address order.

ALU DESIGN
Instructions that involve an arithmetic or logic operation can be executed using similar steps.
They differ from the Load instruction in two ways:
There are either two source registers, or a source register and an immediate source operand.
No access to memory operands is required.
A typical instruction of this type is
Add R3, R4, R5
It requires the following steps:
1. Fetch the instruction and increment the program counter.
2. Decode the instruction and read the contents of source registers R4 and R5.
3. Compute the sum [R4] + [R5].
4. Load the result into the destination register, R3.
The Add instruction does not require access to an operand in the memory, and therefore could be
completed in four steps instead of the five steps needed for the Load instruction.
However, as we will see in the next chapter, it is advantageous to use the same multi-stage
processing hardware for as many instructions as possible. This can be achieved if we arrange for
all instructions to be executed in the same number of steps. To this end, the Add instruction
should be extended to five steps, patterned along the steps of the Load instruction. Since no
access to memory operands is required, we can insert a step in which no action takes place
between steps 3 and 4 above. The Add instruction would then be performed as follows:
1. Fetch the instruction and increment the program counter.
2. Decode the instruction and read registers R4 and R5.
3. Compute the sum [R4] + [R5].
4. No action.
5. Load the result into the destination register, R3.
If the instruction uses an immediate operand, as in
Add R3, R4, #1000 the immediate value is given in the instruction word. Once the instruction is
loaded into the IR, the immediate value is available for use in the addition operation. The same
five-step sequence can be used, with steps 2 and 3 modified as:
2. Decode the instruction and read register R4.
3. Compute the sum [R4] + 1000.

Addition logic for a single stage

n-bit adder

Cascade n full adder (FA) blocks to form a n-bit adder.


Carries propagate or ripple through this cascade, n-bit ripple carry adder.

Carry-in c0 into the LSB position provides a convenient way to perform subtraction.
n-bit subtractor

Recall X Y is equivalent to adding 2s complement of Y to X.


2s complement is equivalent to 1s complement + 1.
XY=X+Y+1
2s complement of positive and negative numbers is computed similarly.

n-bit adder/subtractor

Add/sub control = 0, addition.


Add/sub control = 1, subtraction.

Sequential multiplication

Recall the rule for generating partial products:


If the ith bit of the multiplier is 1, add the appropriately shifted multiplicand to the
current partial product.
Multiplicand has been shifted left when added to the partial product.
However, adding a left-shifted multiplicand to an unshifted partial product is equivalent
to adding an unshifted multiplicand to a right-shifted partial product.

Circuit arrangement for binary division

BASIC PROCESSING UNIT

INTRODUCTION

Instruction Set Processor (ISP)

Central Processing Unit (CPU)

A typical computing task consists of a series of steps specified by a sequence of


machine instructions that constitute a program.

An instruction is executed by carrying out a sequence of more rudimentary


operations.

FUNDAMENTAL CONCEPTS

Processor fetches one instruction at a time and perform the operation


specified.

Instructions are fetched from successive memory locations until a branch


or a jump instruction is encountered.

Processor keeps track of the address of the memory location containing


the next instruction to be fetched using Program Counter (PC).

Instruction Register (IR)

EXECUTION OF A COMPLETE INSTRUCTION


Execution of one instruction requires the following three steps to be
performed by the CPU:
1. Fetch the contents of the memory location pointed at by the PC. The
contents of this location are intepreted as an instruction to be
executed. Hence, they are stored in the instruction register (IR).
Simbolically, this can be written as:
IR [[PC]]
2. Assuming that the memory is byte addressable, increment the
contents of the PC by 4, that is
PC [PC] + 4
3. Carry out the actions specified by the instruction stored in the IR

But, in cases where an instruction occupies more than one word, steps
1 and 2 must be repeated as many times as necessary to fetch the
complete instruction.
Two first steps are ussually referred to as the fetch phase.
Step 3 constitutes the execution phase

But, in cases where an instruction occupies more than one word, steps 1 and
2 must be repeated as many times as necessary to fetch the complete
instruction.

Two first steps are usually referred to as the fetch phase.

Step 3 constitutes the execution phase

Fetch the contents of a given memory location and load them into a CPU

Register

Store a word of data from a CPU register into a given memory location.

Transfer a word of data from one CPU register to another or to ALU.

Perform an arithmetic or logic operation, and store the result in a CPU


register.

EXECUTING AN INSTRUCTION

Transfer a word of data from one processor register to another or to the


ALU.

Perform an arithmetic or a logic operation and store the result in a


processor register.

Fetch the contents of a given memory location and load them into a
processor register.

Store a word of data from a processor register into a given memory


location.

REGISTER TRANSFER
The input and output gates for register Ri are controlled by the signals Riin
and Riout, respectively.

Thus, when Riin is set to 1, the data available on the common bus is
loaded into Ri.

Similarly, when Riout is set to 1, the contents of register Ri are placed


on the bus.

While Riout is equal to 0, the bus can be used for transferring data
from other registers.

Let us now consider data transfer between two registers. For example, to
transfer the contents of register R1 to R4, the following actions are needed:

Enable the output gate of register R1 by setting R1out to 1. This places


the contents of R1 on the CPU bus.

Enable the input gate of register R4 by setting R4in to 1. This loads data
from

the CPU bus into register R4.

Performing an Arithmetic Or Logic Operation

The ALU is a combinational circuit that has no internal storage.

ALU gets the two operands from MUX and bus. The result is temporarily
stored in register Z

A sequence of operations to add the contents of register r1 to those of


register r2 and store the result in register r3 is:

R1out, Yin

R2out, Select Y, Add, Zin

Zout, R3in

FETCHING A WORD FROM MEMORY

CPU transfers the address of the required information word to the memory
address register (MAR). Address of the required word is transferred to the
main memory.

Meanwhile, the CPU uses the control lines of the memory bus to indicate
that a read operation is required.

After issuing this request, the CPU waits until it receives an answer from
the memory, informing it that the requested function has been completed.
This is accomplished through the use of another control signal on the
memory bus, which will be referred to as Memory Function Completed
(MFC).

The memory sets this signal to 1 to indicate that the contents of the
specified location in the memory have been read and are available on the
data lines of the memory bus.

We will assume that as soon as the MFC signal is set to 1, the information
on the data lines is loaded into MDR and is thus available for use inside
the CPU. This completes the memory fetch operation.

The actions needed for instruction Move (R1), R2 are:

MAR [R1]

Start Read operation on the memory bus

Wait for the MFC response from the memory

Load MDR from the memory bus

R2 [MDR]

Signals activated for that problem are:


R1out, MARin, Read

MDRinE, WMFC
MDRout, R2in
Storing a word in Memory

That is similar procedure with fetching a word from memory.

The desired address is loaded into MAR

Then data to be written are loaded into MDR, and a write command is
issued.

If we assume that the data word to be stored in the memory is in R2 and


that the memory address is in R1, the Write operation requires the
following sequence :

MAR [R1]

MDR [R2]

Write

Wait for the MFC

Move R2, (R1) requires the following sequence (signal):


R1out, MARin
R2out, MDRin. Write
MDRoutE,WMFC
EXECUTION OF A COMPLETE INSTRUCTION
Consider the instruction :
Add (R3), R1

Executing this instruction requires the following actions :

Fetch the instruction

Fetch the first operand (the contents of the memory location pointed to
by R3)

Perform the addition

Load the result into R1

Control Sequence for instruction Add (R3), R1

PCout, MARin, Read, Select4, Add, Zin

Zout, PCin, Yin, Wait for the MFC

MDRout, IRin

R3out, MARin, Read

R1out, Yin, Wait for MFC

MDRout, Select Y, Add, Zin

Zout, R1in, End

Branch Instructions

PCout, MARin, Read, Select4, Add, Zin

Zout, PCin, Yin, Wait for the MFC (WFMC)

MDRout, Irin

offset_field_of_IRout, Add, Zin

Zout, PCin, End

Internalprocessor
bus
Controlsignals
PC
Instruction

Address
lines

decoderand
MAR

controllogic

Memory
bus
MDR

Data
lines

IR

Y
R0

Constant4
Select

MUX
Add

ALU
control
lines

Sub

R n 1

ALU
Carryin

XOR

TEMP
Z

Figure7.1.Singlebusorganizationofthedatapathinsideaprocessor.

MULTIPLE BUS ORGANIZATION


One solution to the bandwidth limitation of a single bus is to simply
add additional buses. Consider the architecture shown in Figure 2.2 that
contains N processors, P1 P2 PN, each having its own private cache, and all
connected to a shared memory by B buses B1 B2 BB. The shared memory
consists of M interleaved banks M1 M2 MM to allow simultaneous memory
requests concurrent access to the shared memory. This avoids the loss in
performance that occurs if those accesses must be serialized, which is the
case when there is only one memory bank. Each processor is connected to
every bus and so is each memory bank. When a processor needs to access a
particular bank, it has B buses from which to choose. Thus each processormemory pair is connected by several redundant paths, which implies that the

failure of one or more paths can, in principle, be tolerated at the cost of


some degradation in system performance.
In a multiple bus system several processors may attempt to access the
shared memory simultaneously. To deal with this, a policy must be
implemented that allocates the available buses to the processors making
requests to memory. In particular, the policy must deal with the case when
the number of processors exceeds B. For performance reasons this allocation
must be carried out by hardware arbiters which, as we shall see, add
significantly to the complexity of the multiple bus interconnection network.

PCout, R=B, MARin, Read, IncPC

WFMC

MDRoutB, R=B, IRin

R4out, R5outB, SelectA, Add, R6in, End.

Potrebbero piacerti anche