Instruction Set Architecture of x86 and DLX CPUs

Advanced Computer Architecture Bahria Summer 2012 Instructor: Shaftab Ahmed
Week #4 Lecture # 7&8 Instruction Set Architecture
4/25/2012
ACA Spring 2012 Bahria
Shaftab Ahmed
Instruction Set Architecture
Instruction set architecture is based on the structure of a computer i.e. the description of the CPU in terms of Registers, Addressability and various Arithmetic / Control and Store operations etc. Assembly / Machine language programmer must understand ISA of target processor to program for it.
4/25/2012
Shaftab Ahmed
Instruction Set Architecture

The programs written in any higher level language eventually get converted to assembly level containing instructions in mnemonics of instruction set. The Assembler converts these into machine language before execution
High level language code : C, C++, Java, Fortan, compiler Assembly language code: architecture specific statements assembler Machine language code: architecture specific bit patterns
software
instruction set
hardware
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 3
ISA Metrics
Orthogonally
All operand modes are available with any data type or instruction type. Support for a wide range of operations and target applications No overloading for the meanings of instruction fields Resource needs easily determined
Completeness
Regularity
Streamlined
Ease of assembly language programming Ease of implementation

ACA Spring 2012 Bahria Shaftab Ahmed 4
4/25/2012
Instruction Set Design Issues
Instruction set design issues include: Where are operands stored?
registers, memory, stack, accumulator 0, 1, 2, or 3 register, immediate, indirect, . . . byte, int, float, double, string, vector. . . add, sub, mul, move, compare . . .
5
How many explicit operands are there?
How is the operand location specified?
What type & size of operands are supported?
What operations are supported?
4/25/2012
Shaftab Ahmed
Evolution of Instruction Sets

Single Accumulator (EDSAC 1950)
Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from Implementation

High-level Language Based (B5000 1963) Concept of a Family (IBM 360 1964)
General Purpose Register Machines Complex Instruction Sets (Vax, Intel 8086 1977-80) Load/Store Architecture (CDC 6600, Cray 1 1963-76) RISC (Mips,Sparc,88000,IBM RS6000, . . .1987+)
Classifying ISAs
Accumulator (before 1960):
1 address add A acc acc + mem[A] tos tos + next mem[A] mem[A] + mem[B] mem[A] mem[B] + mem[C] R1 R1 + mem[A] R1 mem[A] R1 R2 + R3 R1 mem[R2] mem[R1] R2
Stack (1960s to 1970s):

0 address add
Memory-Memory (1970s to 1980s):

2 address 3 address add A, B add A, B, C
Register-Memory (1970s to present):

2 address add R1, A load R1, A
Register-Register (Load/Store) (1960s to present):

3 address add R1, R2, R3 load R1, R2 store R1, R2
4/25/2012
Shaftab Ahmed
Types of Addressing Modes (VAX)

Addressing Mode 1. Register direct 2. Immediate 3. Displacement 4. Register indirect 5. Indexed 6. Direct 7. Memory Indirect 8. Autoincrement 9. Autodecrement 10. Scaled Example Add R4, R3 Add R4, #3 Add R4, 100(R1) Add R4, (R1) Add R4, (R1 + R2) Add R4, (1000) Add R4, @(R3) Add R4, (R2)+ Add R4, (R2)Add R4, 100(R2)[R3] Action R4 <- R4 + R3 R4 <- R4 + 3 R4 <- R4 + M[100 + R1] R4 <- R4 + M[R1] R4 <- R4 + M[R1 + R2] R4 <- R4 + M[1000] R4 <- R4 + M[M[R3]] R4 <- R4 + M[R2] R2 <- R2 + d R4 <- R4 + M[R2] R2 <- R2 - d R4 <- R4 + M[100 + R2 + R3*d]
8
4/25/2012
Shaftab Ahmed
Types of Addressing Modes Intel Instruction Set

Register Immediate Instructions involving data manipulation through registers Involves immediate values contained within the instruction
Direct
Transfer data to/from memory location to memory/ register
Register Indirect Transfer a data byte/word to a location whose address is specified in a register e.g. [Bx] Use of Byte PTR, Word PTR, DWord PTR specifies boundary of data. Base + Index Indirect Relative Base relative plus index Scaled Index
4/25/2012
MOV AX, [BX+SI] MOV AX, (BX+4) MOV AX, (BX+SI+4) MOV AX,[AX+4*BX]
Instruction Encoding
Variable Size Instruction length varies based on opcode and address specifiers For example, VAX instructions vary between 1 and 53 bytes, while x86 instruction vary between 1 and 17 bytes. Good source code density, but difficult to decode and pipeline Fixed Size Only a single size for all instructions For example, DLX, MIPS, Power PC, Sparc all have 32 bit instructions Not as good code density, but easier to decode and pipeline Hybrid Size Have multiple format lengths specified by the opcode For example, IBM 360/370 Compromise between code density and ease in decoding
4/25/2012
DLX Architecture
Introduced by Hennessey and Patterson in 1990
Derived from many different instruction set architectures from MIPS, Sun, IBM, Intel, HP, AMD, etc. 32-bit fixed length instructions 3 instruction formats Load/store architecture Simple branch conditions (no condition codes)
DLX is a typical RISC architecture.

DLX registers

32 32-bit general-purpose registers (R0 = 0) 32 32-bit (or 16 64-bit) floating point registers Special purpose registers (e.g., FP Status and PC)
4/25/2012
DLX Design Decisions
DLX is based on the following design decisions

Use general purpose registers with a load-store architecture Support commonly used addressing modes displacement, immediate, and register deferred Support simple instructions that occur frequently load, store, add, subtract, move, and, shift, compare equal, branch, jump, call, and return Support commonly required data sizes 8 (byte), 16 (half word), and 32-bit (word) integers 32 (float) and 64-bit (double) floating point Use fixed length instructions that are easy to decode Provide plenty of general purpose registers and separate floating point registers
4/25/2012
DLX Instruction Formats

(a) Register-Register (R-type)
31 26 25 21 20
ADD R1, R2, R3

16 15 11 10 6 5 0
Op
rs1
rs2
rd
function
(ALL reg. operations, read/write special registers and moves) (b) Register-Immediate (I-type)
31 26 25 21 20
SUB R1, R2, #3

16 15 0
Op
rs1
rd
immediate
(ALU immediate operations, loads and stores, conditional branch, jump )
(c) Jump / Call (J-type)

31 26 25
JUMP
end
0
Op
offset added to PC
(jump, jump and link, trap and return from exception)

Intel 80x86 Integer Registers
4/25/2012
Shaftab Ahmed
14
X86 Operand Types
x86 instructions typically have two operands, where one operand is both a source and a destination operand. Possible combinations include Source/destination type Second source type Register Register Register Immediate Register Memory Memory Register Memory Immediate No memory-memory or immediate-immediate Immediate can be 8, 16, or 32 bits
ACA Spring 2012 Bahria Shaftab Ahmed
4/25/2012
15
Intel 80x86 Floating Point Registers
4/25/2012
Shaftab Ahmed
16
80x86 Instructions

Data movement
(move, push, pop)
Arithmetic and logic

(logic ops, tests CCs, shifts, integer and decimal arithmetic)
Control flow
(branches, jumps, calls, returns)
String instructions
(move and compare)
FP data movement
(load, load const., store)
Arithmetic instructions
(add, subtract, multiply, divide, square root, absolute value)
Comparisons
(Result to Flag)
Transcendental functions
(sin, cos, log, etc.)
4/25/2012
80x86 Instruction Format
Instructions sizes vary from 1 to 17 bytes
4/25/2012
Shaftab Ahmed
18
Instruction Set 8088 / 8086 CPU

FORMATS 1. One Byte The instructions have implied data or register operands.The least significant three bits specify register if any 2. Register to Register
Two Byte instruction where first byte contains Opcode followed by width and second operand has 2nd register and R/ M fields. Mod field is 11
3. Register to / from Memory without displacement
NOTE: W fields D1 gives Dir i.e. 0 Byte2 Reg is Source, 1 Byte 2 Reg is Destination
W fields D0 bit specifies whether it is a eight bit data of 16 bit data R/M field specifies one of 8 registers. The MOD field is 11 for Register, 00 for memory without displacement, 01 for memory with 8 bit displacement and 10 for 16 bit displacement
4/25/2012
Shaftab Ahmed
20
4. Register to / from Memory with Displacement One or Two additional bytes specify displacement
5. Immediate operand to Register In this instruction the 7bits of first byte and bits 3-4 of second byte specify the op code. The last two bytes specify the data
4/25/2012
Shaftab Ahmed
21
6. Immediate Operand to Memory with 16 bit Displacement First two bytes specify the Opcode MOD and R/M as before followed by two bytes of displacement and two bytes of data
Significance of OPCODE fields
4/25/2012
Shaftab Ahmed
22
4/25/2012
Shaftab Ahmed
23
4/25/2012
Shaftab Ahmed
24
4/25/2012
Shaftab Ahmed
25
4/25/2012
Shaftab Ahmed
26
4/25/2012
Shaftab Ahmed
27
4/25/2012
Shaftab Ahmed
28
4/25/2012
Shaftab Ahmed
29
Graphics and Multimedia Instruction Set Extensions
Several companies have extended their computers instruction sets to support graphics and multimedia applications.

Intels MMX Technology Intels Internet Streaming SIMD Extensions AMDs 3DNow! Technology Suns Visual Instruction Set Motorolas and IBMs AltiVec Technology
These extensions improve the performance of

Computer-aided design Internet applications Computer visualization Video games Speech recognition
4/25/2012
MMX Instructions

MMX Technology adds 57 new instructions to the x86 architecture (Reference article on PII MMX) Some of these instructions include

PADD(b, w, d) PSUB(b, w, d) PCMPE(b, w, d) PMULLw PMULHw PMADDwd PSRL(w, d, q) PACKSS(wb, dw) PUNPCK(bw, wd, dq) PAND, POR, PXOR
Packed addition Packed subtraction Packed compare equal Packed word multiply low Packed word multiply high Packed word multiply-add Pack shift right logical Pack data Unpack data Packed logical operations
4/25/2012
Shaftab Ahmed
31
MMX Data Types
MMX Technology supports operations on the following 64-bit integer data types.
Packed byte (eight 8-bit elements)
Packed word (four 16-bit elements)
Packed double word (two 32-bit elements)
Packed quad word (one 64-bit elements)
4/25/2012
Shaftab Ahmed
32
SIMD Operations
MMX Technology allows a Single Instruction to work on Multiple pieces of Data (SIMD). Example:
A3
PADD[W]: Packed add word

A2 A1 A0
B3
A3+B3
B2
A2+B2
B1
A1+B1
B0
A0+B0
4 parallel adds are performed on 16-bit elements. Most MMX instructions only require a single cycle.
4/25/2012
Shaftab Ahmed
33
Saturating Arithmetic
Both wrap-around and saturating ADD instructions are supported. With saturating arithmetic, results that overflow are set to the largest value. Below are examples for both types
PADD[W]: Packed wrap-around add
PADDUS[W]: Packed saturating add
4/25/2012
Shaftab Ahmed
34
Pack and Unpack Instructions
Pack and unpack instructions provide conversion between standard data types and packed data types
PACKSS[DW]: Packed Signed with Saturating Double to Packed Word

Multiply-Add Operations
Many graphics applications require multiplyaccumulate operations

Vector Dot Products Matrix Multiplication Fast Fourier Transforms (FFTs) Filter implementations
PMADDWD: Packed multiply-add word to double

Vector Dot Product of two 8 Byte vectors
A dot product on two 8-element vector can be performed using 9 MMX instructions
a0*c0+..+ a3*c3
a4*c4+..+ a7*c7 a0*c0+..+ a7*c7 With MMX 9 Instructions 2 loads for one of the vectors Other vector is loaded by PMADD 2 PMADDs, 2 PADDs, 2 shifts (if reqd. to fix precision) 1 Store
4/25/2012
Shaftab Ahmed
37
Vector Dot Product of two 8 Byte vectors

Without MMX 40 instructions 16 Load 8 Multiply 8 Shift 7 Add 1 Store
a0*c0+..+ a3*c3
a4*c4+..+ a7*c7 a0*c0+..+ a7*c7
4/25/2012
Shaftab Ahmed
38
MMX Technology Summary
MMX technology extends the Intel x86 architecture to improve the performance of multimedia and graphics applications. Most MMX instructions can be executed in one clock cycle, so the performance improvement will be more dramatic than the simple ratio of instruction counts. It provides a speedup of 1.5 to 2.0 for certain applications. Only increase the chip area by about 5%.
4/25/2012
Shaftab Ahmed
39
MMX Technology Summary
MMX instructions are hand-coded in assembly or implemented as libraries to achieve high performance. MMX data types use the x86 floating point registers
Makes it easy to handle context switches
Makes it hard to perform MMX and floating point instructions at the same time
4/25/2012
Shaftab Ahmed
40
Internet Streaming SIMD Extensions
ISSE introduced eight 128-bit data registers

(called XMM registers) In 64-bit modes, they are available as 16 X 64-bit registers The 128-bit packed single-precision floating-point data type, allows four single-precision operations to be performed simultaneously
4/25/2012
Shaftab Ahmed
41
ISSE Data Type
ISSE extensions introduced one new data type
128-Bit Packed Single-Precision Floating-Point Data Type
SSE 2 introduced five data types
4/25/2012
Shaftab Ahmed
42
Internet Streaming SIMD Extensions
Intels Internet Streaming SIMD Extensions (ISSE)

Help improve the performance of video and 3D applications 70 new instructions beyond MMX Technology Adds new 128-bit registers Provide the ability to perform parallel floating point operations

Four parallel operations on 32-bit numbers

Reciprocal and reciprocal root instructions - normalization Packed average instruction Motion compensation
Provide data pre-fetch instructions Make certain applications 1.5 to 2.0 times faster.
4/25/2012
Shaftab Ahmed
43

Instruction Set Architecture of x86 and DLX CPUs

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Instruction Set Architecture of x86 and DLX CPUs

Caricato da

Copyright:

Formati disponibili

Advanced Computer Architecture Bahria Summer 2012 Instructor: Shaftab Ahmed

Week #4 Lecture # 7&8 Instruction Set Architecture

ACA Spring 2012 Bahria