Sei sulla pagina 1di 43

Advanced Computer Architecture Bahria Summer 2012 Instructor: Shaftab Ahmed

Week #4 Lecture # 7&8 Instruction Set Architecture

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

Instruction Set Architecture

Instruction set architecture is based on the structure of a computer i.e. the description of the CPU in terms of Registers, Addressability and various Arithmetic / Control and Store operations etc. Assembly / Machine language programmer must understand ISA of target processor to program for it.

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

Instruction Set Architecture


The programs written in any higher level language eventually get converted to assembly level containing instructions in mnemonics of instruction set. The Assembler converts these into machine language before execution
High level language code : C, C++, Java, Fortan, compiler Assembly language code: architecture specific statements assembler Machine language code: architecture specific bit patterns

software
instruction set

hardware
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 3

ISA Metrics

Orthogonally

All operand modes are available with any data type or instruction type. Support for a wide range of operations and target applications No overloading for the meanings of instruction fields Resource needs easily determined

Completeness

Regularity

Streamlined

Ease of assembly language programming Ease of implementation


ACA Spring 2012 Bahria Shaftab Ahmed 4

4/25/2012

Instruction Set Design Issues

Instruction set design issues include: Where are operands stored?

registers, memory, stack, accumulator 0, 1, 2, or 3 register, immediate, indirect, . . . byte, int, float, double, string, vector. . . add, sub, mul, move, compare . . .
5

How many explicit operands are there?

How is the operand location specified?

What type & size of operands are supported?

What operations are supported?

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

Evolution of Instruction Sets


Single Accumulator (EDSAC 1950)
Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953)

Separation of Programming Model from Implementation


High-level Language Based (B5000 1963) Concept of a Family (IBM 360 1964)

General Purpose Register Machines Complex Instruction Sets (Vax, Intel 8086 1977-80) Load/Store Architecture (CDC 6600, Cray 1 1963-76) RISC (Mips,Sparc,88000,IBM RS6000, . . .1987+)
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 6

Classifying ISAs
Accumulator (before 1960):
1 address add A acc acc + mem[A] tos tos + next mem[A] mem[A] + mem[B] mem[A] mem[B] + mem[C] R1 R1 + mem[A] R1 mem[A] R1 R2 + R3 R1 mem[R2] mem[R1] R2

Stack (1960s to 1970s):


0 address add

Memory-Memory (1970s to 1980s):


2 address 3 address add A, B add A, B, C

Register-Memory (1970s to present):


2 address add R1, A load R1, A

Register-Register (Load/Store) (1960s to present):


3 address add R1, R2, R3 load R1, R2 store R1, R2

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

Types of Addressing Modes (VAX)


Addressing Mode 1. Register direct 2. Immediate 3. Displacement 4. Register indirect 5. Indexed 6. Direct 7. Memory Indirect 8. Autoincrement 9. Autodecrement 10. Scaled Example Add R4, R3 Add R4, #3 Add R4, 100(R1) Add R4, (R1) Add R4, (R1 + R2) Add R4, (1000) Add R4, @(R3) Add R4, (R2)+ Add R4, (R2)Add R4, 100(R2)[R3] Action R4 <- R4 + R3 R4 <- R4 + 3 R4 <- R4 + M[100 + R1] R4 <- R4 + M[R1] R4 <- R4 + M[R1 + R2] R4 <- R4 + M[1000] R4 <- R4 + M[M[R3]] R4 <- R4 + M[R2] R2 <- R2 + d R4 <- R4 + M[R2] R2 <- R2 - d R4 <- R4 + M[100 + R2 + R3*d]
8

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

Types of Addressing Modes Intel Instruction Set


Register Immediate Instructions involving data manipulation through registers Involves immediate values contained within the instruction

Direct

Transfer data to/from memory location to memory/ register

Register Indirect Transfer a data byte/word to a location whose address is specified in a register e.g. [Bx] Use of Byte PTR, Word PTR, DWord PTR specifies boundary of data. Base + Index Indirect Relative Base relative plus index Scaled Index
4/25/2012

MOV AX, [BX+SI] MOV AX, (BX+4) MOV AX, (BX+SI+4) MOV AX,[AX+4*BX]
ACA Spring 2012 Bahria Shaftab Ahmed 9

Instruction Encoding

Variable Size Instruction length varies based on opcode and address specifiers For example, VAX instructions vary between 1 and 53 bytes, while x86 instruction vary between 1 and 17 bytes. Good source code density, but difficult to decode and pipeline Fixed Size Only a single size for all instructions For example, DLX, MIPS, Power PC, Sparc all have 32 bit instructions Not as good code density, but easier to decode and pipeline Hybrid Size Have multiple format lengths specified by the opcode For example, IBM 360/370 Compromise between code density and ease in decoding
ACA Spring 2012 Bahria Shaftab Ahmed 10

4/25/2012

DLX Architecture

Introduced by Hennessey and Patterson in 1990

Derived from many different instruction set architectures from MIPS, Sun, IBM, Intel, HP, AMD, etc. 32-bit fixed length instructions 3 instruction formats Load/store architecture Simple branch conditions (no condition codes)

DLX is a typical RISC architecture.


DLX registers

32 32-bit general-purpose registers (R0 = 0) 32 32-bit (or 16 64-bit) floating point registers Special purpose registers (e.g., FP Status and PC)
ACA Spring 2012 Bahria Shaftab Ahmed 11

4/25/2012

DLX Design Decisions

DLX is based on the following design decisions


Use general purpose registers with a load-store architecture Support commonly used addressing modes displacement, immediate, and register deferred Support simple instructions that occur frequently load, store, add, subtract, move, and, shift, compare equal, branch, jump, call, and return Support commonly required data sizes 8 (byte), 16 (half word), and 32-bit (word) integers 32 (float) and 64-bit (double) floating point Use fixed length instructions that are easy to decode Provide plenty of general purpose registers and separate floating point registers
ACA Spring 2012 Bahria Shaftab Ahmed 12

4/25/2012

DLX Instruction Formats


(a) Register-Register (R-type)
31 26 25 21 20

ADD R1, R2, R3


16 15 11 10 6 5 0

Op

rs1

rs2

rd

function

(ALL reg. operations, read/write special registers and moves) (b) Register-Immediate (I-type)
31 26 25 21 20

SUB R1, R2, #3


16 15 0

Op

rs1

rd

immediate

(ALU immediate operations, loads and stores, conditional branch, jump )

(c) Jump / Call (J-type)


31 26 25

JUMP

end
0

Op

offset added to PC

(jump, jump and link, trap and return from exception)


4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 13

Intel 80x86 Integer Registers

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

14

X86 Operand Types

x86 instructions typically have two operands, where one operand is both a source and a destination operand. Possible combinations include Source/destination type Second source type Register Register Register Immediate Register Memory Memory Register Memory Immediate No memory-memory or immediate-immediate Immediate can be 8, 16, or 32 bits
ACA Spring 2012 Bahria Shaftab Ahmed

4/25/2012

15

Intel 80x86 Floating Point Registers

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

16

80x86 Instructions

Data movement
(move, push, pop)

Arithmetic and logic


(logic ops, tests CCs, shifts, integer and decimal arithmetic)

Control flow
(branches, jumps, calls, returns)

String instructions
(move and compare)

FP data movement
(load, load const., store)

Arithmetic instructions
(add, subtract, multiply, divide, square root, absolute value)

Comparisons
(Result to Flag)

Transcendental functions
(sin, cos, log, etc.)
ACA Spring 2012 Bahria Shaftab Ahmed 17

4/25/2012

80x86 Instruction Format

Instructions sizes vary from 1 to 17 bytes

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

18

Instruction Set 8088 / 8086 CPU


FORMATS 1. One Byte The instructions have implied data or register operands.The least significant three bits specify register if any 2. Register to Register

Two Byte instruction where first byte contains Opcode followed by width and second operand has 2nd register and R/ M fields. Mod field is 11

3. Register to / from Memory without displacement

NOTE: W fields D1 gives Dir i.e. 0 Byte2 Reg is Source, 1 Byte 2 Reg is Destination
W fields D0 bit specifies whether it is a eight bit data of 16 bit data R/M field specifies one of 8 registers. The MOD field is 11 for Register, 00 for memory without displacement, 01 for memory with 8 bit displacement and 10 for 16 bit displacement
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 19

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

20

4. Register to / from Memory with Displacement One or Two additional bytes specify displacement

5. Immediate operand to Register In this instruction the 7bits of first byte and bits 3-4 of second byte specify the op code. The last two bytes specify the data

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

21

6. Immediate Operand to Memory with 16 bit Displacement First two bytes specify the Opcode MOD and R/M as before followed by two bytes of displacement and two bytes of data

Significance of OPCODE fields

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

22

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

23

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

24

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

25

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

26

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

27

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

28

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

29

Graphics and Multimedia Instruction Set Extensions

Several companies have extended their computers instruction sets to support graphics and multimedia applications.

Intels MMX Technology Intels Internet Streaming SIMD Extensions AMDs 3DNow! Technology Suns Visual Instruction Set Motorolas and IBMs AltiVec Technology

These extensions improve the performance of


Computer-aided design Internet applications Computer visualization Video games Speech recognition
ACA Spring 2012 Bahria Shaftab Ahmed 30

4/25/2012

MMX Instructions

MMX Technology adds 57 new instructions to the x86 architecture (Reference article on PII MMX) Some of these instructions include

PADD(b, w, d) PSUB(b, w, d) PCMPE(b, w, d) PMULLw PMULHw PMADDwd PSRL(w, d, q) PACKSS(wb, dw) PUNPCK(bw, wd, dq) PAND, POR, PXOR

Packed addition Packed subtraction Packed compare equal Packed word multiply low Packed word multiply high Packed word multiply-add Pack shift right logical Pack data Unpack data Packed logical operations

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

31

MMX Data Types

MMX Technology supports operations on the following 64-bit integer data types.
Packed byte (eight 8-bit elements)

Packed word (four 16-bit elements)

Packed double word (two 32-bit elements)

Packed quad word (one 64-bit elements)

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

32

SIMD Operations
MMX Technology allows a Single Instruction to work on Multiple pieces of Data (SIMD). Example:
A3

PADD[W]: Packed add word


A2 A1 A0

B3
A3+B3

B2
A2+B2

B1
A1+B1

B0
A0+B0

4 parallel adds are performed on 16-bit elements. Most MMX instructions only require a single cycle.

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

33

Saturating Arithmetic
Both wrap-around and saturating ADD instructions are supported. With saturating arithmetic, results that overflow are set to the largest value. Below are examples for both types

PADD[W]: Packed wrap-around add

PADDUS[W]: Packed saturating add

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

34

Pack and Unpack Instructions

Pack and unpack instructions provide conversion between standard data types and packed data types

PACKSS[DW]: Packed Signed with Saturating Double to Packed Word


4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 35

Multiply-Add Operations

Many graphics applications require multiplyaccumulate operations


Vector Dot Products Matrix Multiplication Fast Fourier Transforms (FFTs) Filter implementations

PMADDWD: Packed multiply-add word to double


4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 36

Vector Dot Product of two 8 Byte vectors

A dot product on two 8-element vector can be performed using 9 MMX instructions

a0*c0+..+ a3*c3

a4*c4+..+ a7*c7 a0*c0+..+ a7*c7 With MMX 9 Instructions 2 loads for one of the vectors Other vector is loaded by PMADD 2 PMADDs, 2 PADDs, 2 shifts (if reqd. to fix precision) 1 Store

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

37

Vector Dot Product of two 8 Byte vectors


Without MMX 40 instructions 16 Load 8 Multiply 8 Shift 7 Add 1 Store

a0*c0+..+ a3*c3

a4*c4+..+ a7*c7 a0*c0+..+ a7*c7

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

38

MMX Technology Summary

MMX technology extends the Intel x86 architecture to improve the performance of multimedia and graphics applications. Most MMX instructions can be executed in one clock cycle, so the performance improvement will be more dramatic than the simple ratio of instruction counts. It provides a speedup of 1.5 to 2.0 for certain applications. Only increase the chip area by about 5%.

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

39

MMX Technology Summary

MMX instructions are hand-coded in assembly or implemented as libraries to achieve high performance. MMX data types use the x86 floating point registers

Makes it easy to handle context switches

Makes it hard to perform MMX and floating point instructions at the same time

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

40

Internet Streaming SIMD Extensions

ISSE introduced eight 128-bit data registers


(called XMM registers) In 64-bit modes, they are available as 16 X 64-bit registers The 128-bit packed single-precision floating-point data type, allows four single-precision operations to be performed simultaneously

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

41

ISSE Data Type

ISSE extensions introduced one new data type

128-Bit Packed Single-Precision Floating-Point Data Type

SSE 2 introduced five data types

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

42

Internet Streaming SIMD Extensions

Intels Internet Streaming SIMD Extensions (ISSE)


Help improve the performance of video and 3D applications 70 new instructions beyond MMX Technology Adds new 128-bit registers Provide the ability to perform parallel floating point operations

Four parallel operations on 32-bit numbers


Reciprocal and reciprocal root instructions - normalization Packed average instruction Motion compensation

Provide data pre-fetch instructions Make certain applications 1.5 to 2.0 times faster.

4/25/2012

ACA Spring 2012 Bahria

Shaftab Ahmed

43

Potrebbero piacerti anche