Sei sulla pagina 1di 5

Advance Computer architecture

Assignment on :

VLIW ARCHITECTURE

Submitted to: MaamVandana Dubey

Compiled By: Somil Kumar(A7605208240) Priya Sharma(A7605208241) 8-CS-D,B-tech CS&E 2008-2012

VLIW Architecture
Very Long Instruction word (VLIW) architectures are a suitable alternative for exploiting instruction level parallelism(ILP) in programs , that is, for executing more than one basic instruction at a time. A typical VLIW machine has instruction words 100 of bit in length. As shown in fig. these processor contain multiple functional units. Sharing a common register file, fetch from the instruction cache, a very ng instruction word containing several primitive instructions and dispatch the entire VLIW for parallel execution. These capabilities are exploited by compiler which generate code that has grouped together independent primitive instructions executable in parallel. These processors have relatively simple control logic because they do not perform any dynamic scheduling nor re-ordering of operations. VLIW has been described as a natural successor to RISC because it moves complexity from hardware to the compiler allowing simpler , faster processor . The objective of VLIW is to eliminate the complicated instruction scheduling and parallel dispatch that occurs in most modern processors. The instruction set for a VLIW architecture tends to consist of simple instructions. The length of the VLIW instruction depend on two factors; The no. of execution units available and code lengths required for controlling each of the execution units. VLIWs usually incorporate a considerable no. of execution units , say 5 to 30 . Each unit requires a control word length of a about 16 to 32 bits. This results in word lengths of about 100 bits to 1 Kbit for a VLIW processor. The compiler must assemble many primitive operations in to a single instruction word such that the multiple functional unit are kept busy, which require enough instruction level parallelism in a code sequence to fill the available operation slots. Such parallelism is uncovered by the

compiler through scheduling code speculatively across basic blocks, performing software pipelining , reducing the no. of operations executed , among others.

Pipelining in VLIW Processors


The execution of instruction by an ideal VLIW processors is shown in figure . Each instruction specifies multiple operations. The effective CPI 0.33 in this particular example. VLIW machines behave much like superscalar machine with three differences: 1. The decoding of VLIW instruction is easier than that of superscalar instruction. 2. The code density of the superscalar machine is better when the available instruction level parallelism is less than that exploitable by the VLIW machine. This is because the fixed VLIW format include bits for non-executable operations, while the superscalar processors issues only executable instructions. 3. A superscalar machine can object code compatible with large family of non-parallel machines. On the contrary a VLIW machine exploiting different amounts of parallelism would require different instruction sets. Instruction parallelism and data movement in VLIW architecture are completely specified at compiled time. Runtime resources scheduling and synchronization are thus completely eliminated. One can view a VLIW processor as and extreme of a Superscalar processor in which all independent or unrelated operation are already synchronously compacted together in advance. The CPI of a VLIW processor can be even lower than that of superscalar processor.

Advantages of VLIW
Compiler prepares fixed packets of multiple operations that give the full "plan of execution" dependencies are determined by compiler and used to schedule according to function unit latencies function units are assigned by compiler and correspond to the position within the instruction packet ("slotting") compiler produces fully-scheduled, hazard-free code => hardware doesn't have to "rediscover" dependencies or schedule

Disadvantages of VLIW
Compatibility across implementations is a major problem VLIW code won't run properly with different number of function units or different latencies unscheduled events (e.g., cache miss) stall entire processor Code density is another problem low slot utilization (mostly nops) reduce nops by compression ("flexible VLIW", "variable-length VLIW")

Potrebbero piacerti anche