Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 1 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 2
• Computer architecture
• Definition of ISA to facilitate implementation of software layers
• CIS 501 mostly about computer micro-architecture
• Design CPU, Memory, I/O to implement ISA …
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 7 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 8
Shaping Force: Technology Complementary MOS (CMOS)
drain
• Basic technology element: MOSFET • Voltages as values
• Invention of 20th century • Power (VDD) = 1, Ground = 0
• MOS: metal-oxide-semiconductor gate channel
• Conductor, insulator, semi-conductor
• Two kinds of MOSFETs power (1)
• FET: field-effect transistor
• N-transistors p-transistor
• Solid-state component acts like electrical switch source
• Conduct when gate voltage is 1
• Channel conducts source!drain when voltage applied to gate input output
• Good at passing 0s (“node”)
• P-transistors
• Channel length: characteristic parameter (short ! fast) • Conduct when gate voltage is 0
n-transistor
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 9 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 10
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 11 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 12
Aside: VLSI + Manufacturing MOSFET Side View
• VLSI (very large scale integration) gate
• MOSFET manufacturing process insulator
• As important as invention of MOSFET itself
source channel drain
Substrate
• Multi-step photochemical and electrochemical process
• Fixed cost per step • MOS: three materials needed to make a transistor
• Cost per transistor shrinks with transistor size • Metal - Aluminum, Tungsten, Copper: conductor
• Oxide - Silicon Dioxide (SiO2): insulator
• Other production costs • Semiconductor - doped Si: conducts under certain conditions
• Packaging • FET: field effect (the mechanism) transistor
• Test • Voltage on gate: current flows source to drain (transistor on)
• Mask set • No voltage on gate: no current (transistor off)
• Design
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 13 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 14
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 15 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 16
Manufacturing Process Defects
• Grow SiO2 Defective:
• Defects can arise
• Grow photo-resist • Under-/over-doping
• Over-/under-dissolved insulator
• Burn “wire-level-1” mask
• Mask mis-alignment
• Dissolve unburned photo-resist Defective: • Particle contaminants
• And underlying SiO2
• Grow copper “wires” • Try to minimize defects
• Dissolve remaining photo-resist • Process margins
Slow:
• Continue with next wire layer… • Design rules
• Minimal transistor size, separation
• Typical number of wire layers: 3-6
• Or, tolerate defects
• Redundant or “spare” memory cells
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 17 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 18
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 19 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 20
Unit Cost: Integrated Circuit (IC) Yield/Cost Examples
• Chips built in multi-step chemical processes on wafers • Parameters
• Cost / wafer is constant, f(wafer size, number of steps) • wafer yield = 90%, " = 2, defect density = 2/cm2
• Chip (die) cost is proportional to area
• Larger chips means fewer of them Die size (mm2) 100 144 196 256 324 400
• Larger chips means fewer working ones Die yield 23% 19% 16% 12% 11% 10%
• Why? Uniform defect density 6” Wafer 139(31) 90(16) 62(9) 44(5) 32(3) 23(2)
8” Wafer 256(59) 177(32) 124(19) 90(11) 68(7) 52(5)
10” Wafer 431(96) 290(53) 206(32) 153(20) 116(13) 90(9)
• Chip cost ~ chip area"
• " = 2#3 Wafer Defect Area Dies Yield Die Package Test Total
Cost (/cm2) (mm2) Cost Cost (pins) Cost
Intel 486DX2 $1200 1.0 81 181 54% $12 $11(168) $12 $35
• Wafer yield: % wafer that is chips IBM PPC601 $1700 1.3 196 66 27% $95 $3(304) $21 $119
• Die yield: % chips that work DEC Alpha $1500 1.2 234 53 19% $149 $30(431) $23 $202
• Yield is increasingly non-binary - fast vs slow chips Intel Pentium $1500 1.5 296 40 9% $417 $19(273) $37 $473
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 21 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 22
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 23 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 24
Performance Performance Improvement
• Two definitions • Processor A is X times faster than processor B if
• Latency (execution time): time to finish a fixed task • Latency(P,A) = Latency(P,B) / X
• Throughput (bandwidth): number of tasks in fixed time • Throughput(P,A) = Throughput(P,B) * X
• Very different: throughput can exploit parallelism, latency cannot • Processor A is X% faster than processor B if
• Baking bread analogy • Latency(P,A) = Latency(P,B) / (1+X/100)
• Often contradictory • Throughput(P,A) = Throughput(P,B) * (1+X/100)
• Choose definition that matches goals (most frequently thruput)
• Car/bus example
• Example: move people from A to B, 10 miles • Latency? Car is 3 times (and 200%) faster than bus
• Car: capacity = 5, speed = 60 miles/hour • Throughput? Bus is 4 times (and 300%) faster than car
• Bus: capacity = 60, speed = 20 miles/hour
• Latency: car = 10 min, bus = 30 min
• Throughput: car = 15 PPH (count return trip), bus = 60 PPH
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 25 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 26
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 27 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 28
Other Benchmarks Adding/Averaging Performance Numbers
• Parallel benchmarks • You can add latencies, but not throughput
• Latency(P1+P2, A) = Latency(P1,A) + Latency(P2,A)
• SPLASH2 - Stanford Parallel Applications for Shared Memory
• Throughput(P1+P2,A) != Throughput(P1,A) + Throughput(P2,A)
• NAS • 1 mile @ 30 miles/hour + 1 mile @ 90 miles/hour
• SPEC’s OpenMP benchmarks • Average is not 60 miles/hour
• SPECjbb - Java multithreaded database-like workload • 0.033 hours at 30 miles/hour + 0.01 hours at 90 miles/hour
• Average is only 47 miles/hour! (2 miles / (0.033 + 0.01 hours))
• Throughput(P1+P2,A) =
• Transaction Processing Council (TPC) 1 / [(1/ Throughput(P1,A)) + (1/ Throughput(P2,A))]
• TPC-C: On-line transaction processing (OLTP)
• TPC-H/R: Decision support systems (DSS) • Same goes for means (averages)
• TPC-W: E-commerce database backend workload • Arithmetic: (1/N) * !P=1..N Latency(P)
• For units that are proportional to time (e.g., latency)
• Have parallelism (intra-query and inter-query)
• Harmonic: N / !P=1..N 1/Throughput(P)
• Heavy I/O and memory components • For units that are inversely proportional to time (e.g., throughput)
• Geometric: N" #P=1..N Speedup(P)
• For unitless quantities (e.g., speedups)
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 29 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 30
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 35 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 36
Another CPI Example Increasing Clock Frequency: Pipelining
• Assume a processor with instruction frequencies and costs
• Integer ALU: 50%, 1 cycle +
• Load: 20%, 5 cycle 4
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 39 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 40
Measuring CPI Improving CPI
• How are CPI and execution-time actually measured? • CIS501 is more about improving CPI than frequency
• Execution time: time (Unix): wall clock + CPU + system • Historically, clock accounts for 70%+ of performance improvement
• CPI = CPU time / (clock frequency * dynamic insn count) • Achieved via deeper pipelines
• How is dynamic instruction count measured? • That will (have to) change
• More useful is CPI breakdown (CPICPU, CPIMEM, etc.) • Deep pipelining is not power efficient
• So we know what performance problems are and what to fix • Physical speed limits are approaching
• CPI breakdowns • 1GHz: 1999, 2GHz: 2001, 3GHz: 2002, 4GHz? almost 2006
• Hardware event counters • Techniques we will look at
• Calculate CPI using counter frequencies/event costs • Caching, speculation, multiple issue, out-of-order issue
• Cycle-level micro-architecture simulation (e.g., SimpleScalar) • Vectors, multiprocessing, more…
+ Measure exactly what you want
+ Measure impact of potential fixes • Moore helps because CPI reduction requires transistors
• Must model micro-architecture faithfully • The definition of parallelism is “more transistors”
• Method of choice for many micro-architects (and you) • But best example is caches
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 41 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 42
200
• For actual performance X, machine capability must be > X
150 Intel x86
100
50 35%/yr
0
1982 1984 1986 1988 1990 1992 1994
Year
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 43 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 44
Transistor Speed, Power, and Reliability Transistors and Wires
• Transistor characteristics and scaling impact:
• Switching speed
• Power
• Reliability
©IBM
IBM SOI Technology From slides © Krste Asanovi!, MIT
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 45 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 46
I
1!0 0!1
1!0
©IBM
IBM CMOS7, 6 layers of copper wiring From slides © Krste Asanovi!, MIT
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 47 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 48
Resistance Capacitance
• Transistor channel resistance 1 • Source/Drain capacitance 1
• function of Vg (gate voltage) • Gate capacitance
• Wire resistance (negligible for short wires) 1!0 • Wire capacitance (negligible for short wires) 1!0
1 1
I I
1!0 0!1 1!0 0!1
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 49 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 50
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 57 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 58
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 61 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 62
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 63 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 64
Reliability Moore’s Bad Effect on Reliability
• Mean Time Between Failures (MTBF) • CMOS devices: CPU and memory
• How long before you have to reboot or buy a new one • Historically almost perfectly reliable
• Not very quantitative yet, people just starting to think about this • Moore has made them less reliable over time
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 65 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 66
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 67 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 68
Summary CIS501
• What is computer architecture Application • CIS501: Computer Architecture
• Abstraction and layering: interface and implementation, ISA OS • Mostly about micro-architecture
• Shaping forces: application and semiconductor technology Compiler Firmware • Mostly about CPU/Memory
• Moore’s Law • Mostly about general-purpose
CPU I/O
• Cost • Mostly about performance
Memory • We’ll still only scratch the surface
• Unit and startup
Digital Circuits
• Performance
• Latency and throughput Gates & Transistors • Next time
• CPU performance equation: insn count * CPI * clock frequency • Instruction set architecture
• Power and energy
• Dynamic and static power
• Reliability
UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 69 UPenn's CIS501 (Martin/Roth): Technology, cost, performance, power, and reliability 70