Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Superscalar Architecture
Relatively new, first appeared in early 1990s Builds on the concept of pipelining Superscalar architectures can process multiple instructions in one clock cycle (multiple instruction execution units) Allows for instruction execution rate to exceed the clock rate (CPI of less than 1)
- Intels first use of a superscalar architecture was its Pentium Processor - Instruction Level Parallelism instructions independent of the outcome of one another execute concurrently to utilize more of the available hardware resources and increase instruction throughput.
Intel P5 Microarchitecture
Used in initial Pentium processor Could execute up to 2 instructions simultaneously Instructions sent through the pipeline in order - if the next two instructions had a dependency issue, only one instruction (pipe) would be executed and the second execution unit (pipe) went unused for that clock cycle.
Intel P6 Microarchitecture
- Used in the Pentium II, III and Pro processors -3 instruction decoders, which break each CISC instruction (macro-op) into equivalent micro-operations (ops) for the Out-of-Order Execution unit -10 stage instruction pipeline utilized in this architecture
Intel P6 Microarchitecture
Out of Order instruction execution - executes instructions without data dependency issues out of order for a higher level of hardware utilization Scheduler unit resolves data dependency issues between individual instructions Re-Order Buffer puts instructions back in order before writing them back to memory Up to 3 instructions can be retired concurrently to memory
-New architecture used for the Intel Pentium IV and Pentium Xeon processors
Execution Trace Cache Alleviates delays in fetching and translating CISC instructions to their appropriate ops Instructions are now decoded by a translation engine, with the resulting ops stored as traces (sequence of ops) in the Execution trace cache. Traces stored in path of predicted program execution flow, with results of branches in the code integrated into this path Delivers up to 3 ops to the core of the Execution Unit per clock cycle
As a result of these two key factors: The R8000 was only in the marketplace for about a year. This processor was mainly used only in the scientific community
Superscalar Pipeline Architecture for the R10000 processor. Diagram courtesy of R10000 Microprocessor Users Manual. http://techpubs.sgi.com/library/dynaweb_docs/hdwr/SGI_Dev eloper/books/R10K_UM/sgi_html/t5.Ver.2.0.book_12.html
Each decoded instruction is sent to one of 3 instruction queues -Address Queue (Load/Store Instructions) -Integer Queue (Integer ALU Operations) -Floating Point Queue (Floating Point Arithmetic Operations)
PowerPC
Direct descendent of IBM 801, RT PC and RS/6000 All are RISC RS/6000 first superscalar PowerPC 601 superscalar design similar to RS/6000 Later versions extend superscalar concept
T1000 Architectures
The T1000 Architectures are reconfigurable computing architectures embedded into a superscalar T1000 Architectures rely on the programmable functional unit ( PFU ), integrated into the datapath. T1000 is assumed to be a 4-issue out-of-order machine. It helps tolerate the latencies of some data dependent instruction sequences. T1000 extended instruction is encoded as a registerregister operation with a specific opcode.
Hobbes
A multi-threaded architecture attempt to increase pipeline utilization by concurrently executing instructions from different threads. The architecture chosen was the aggressive speculative and out-of-order superscalar processor based on the MIPS R2000 instruction set. The Hobbes architecture combines multi-threading with superscalar issue, with the supposition that strengths of one should offset the weaknesses of the other. By supporting superscalar issue from more than one thread, the architecture overcomes the lack of instruction-level parallelism that plagues other superscalar structures.
Background
The Hobbes micro-architecture draws its inspiration from two widely differing architectures: Multi-threaded and superscalar. It is hoped that the combined of the fundamental concepts of these architecture will build upon their respective strengths and compensate for their corresponding weaknesses, allowing a hybrid to be greater than the sum of its parts.
Multi-threaded Architectures
Multi-threaded processors can concurrently execute instructions from more than one thread. The contexts of multiple threads are stored on-board, which allows instructions to be issued from different threads. Traditional multi-threaded architectures have usually implemented a round-robin execution strategy with switched that instruction execution to a new a thread every cycle.
It consists of a fetch buffer, issue buffer, decode logic, branch adder and the thread state storage.
The register file is very similar to that found on the R2000. The register file has two write ports and both of these may be from the same thread. Branches which do not affect the register file are executed in the thread unit and are not issued to the execution unit.
Execution Units Integer: 2 ALUs, Shifter, Multiply / Divide, Load / Store, Data cache interface FP: FP Convert, FP Add, FP Multiply, FP Divide
Superscalar Architecture
Superscalar processors improve performance by reducing the average number of cycles required to execute each instruction This is accomplished by issuing and executing more than one independent instruction per cycle, rather than limiting execution to just on instruction per cycle as traditional pipelined architectures. For superscalar architectures to experience speed-up over traditional pipelined architectures they require the average level of available instruction-level parallelism to be greater than one.
References
Hennessy, John L and Patterson, David A. Computer Organization and Design, The Hardware/Software Interface. San Francisco: Morgan Kaufmann Publishers 1998. Sarimento, Sara. Recent History of Intel Architecture A Refresher. 17 April 2004. Intel Corporation www.intel.com 18 April 2004 http://www.intel.com/cd/ids/developer/asmona/eng/microprocessors/ia32/pentium4/optimization/44015.htm Zhou & Martonosi. Augmenting Modern Suuperscalar Architectures with Configurable Extended Instructions. 19 April 2004. http://ipdps.eece.unm.edu/2000/raw/18000943.pdf Kish & Preiss. Hobbes: A Multi-Threaded Superscalar Architecture 19, April 2004 http://www.brpreiss.com/page75.html R10000 Processor Users Manual. 9 Dec 1996. SGI Corporation. 22 April 2004 http://techpubs.sgi.com/library/dynaweb_docs/hdwr/SGI_Developer/books/R10K_UM/sgi_html/ind ex.html#HEADING1 MIPS Architecture. 17 April 2004. Wikipedia, The Free Encyclopedia http://en.wikipedia.org/wiki/Main_Page 23 April 2004 http://en.wikipedia.org/wiki/MIPS_architecture. Mapleson, Ian. Indigo 2 and Power Indigo 2 Technical Report. SiliconGraphics. 23 April 2004 http://sgi.cartsys.net/i2sec7.html. Power PC Architecture 23 April 2004 http://www1.ibm.com/servers/eserver/pseries/hardware/whitepapers/power/ppc_arch.html