Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1. 2. 3. 4. 5. 6. From Components to Applications Computer Systems and Their Parts Generations of Progress Processor and Memory Technologies Peripherals, I/O, and Communications Software Systems and Applications
Register File
AL U
Software
Hardware
Electronic components Lowlevel view
3 4
Application domains
Application designer
Computer designer
System designer
Highlevel view
Figure 3.1
Architect
Engineering
Arts
Interface
Figure 3.2 Like a building architect, whose place at the engineering/arts and goals/means interfaces is seen in this diagram, a computer architect reconciles many conflicting or competing demands.
Circuit designer
Logic designer
Analog
Digital
Fixed-function
Stored-program
Electronic
Nonelectronic
General-purpose
Special-purpose
Number cruncher
Data manipulator
Figure 3.3 The space of computer systems, with what we normally mean by the word computer highlighted.
Computer Architecture, Background and Motivation 5
Price/Performance Pyramid
Super
$Millions
$100s Ks
$10s Ks
$1000s
$100s
$10s
Mainframe
Server
Differences in scale, not in substance
Workstation
Personal
Embedded
Figure 3.4
Computer Architecture, Background and Motivation
Airbags
Brakes
Engine
Figure 3.5 Embedded computers are ubiquitous, yet invisible. They are found in our automobiles, appliances, and many other places.
Computer Architecture, Background and Motivation 7
Figure 3.6 Notebooks, a common class of portable computers, are much smaller than desktops but offer substantially the same capabilities. What are the main reasons for the size difference?
Figure 3.7 The (three, four, five, or) six main units of a digital computer. Usually, the link unit (a simple bus or a more elaborate network) is not explicitly included in such diagrams.
Computer Architecture, Background and Motivation 9
Generations of Progress
Table 3.2 The 5 generations of digital computers, and their ancestors. Generation (begun)
0 (1600s) 1 (1950s) 2 (1960s) 3 (1970s) 4 (1980s) 5 (1990s)
10
Dicer
Die
Die tester
Part tester
~1 cm
~1 cm
Figure 3.8
26 dies, 15 good
Figure 3.9
Die yield =def (number of good dies) / (total number of dies) Die yield = Wafer yield [1 + (Defect density Die area) / a]a Die cost = (cost of wafer) / (total number of dies die yield) = (cost of wafer) (die area / wafer area) / (die yield)
12
Bus CPU Connector Memory (a) 2D or 2.5D packaging now common Stacked layers glued together
Figure 3.11
13
Moores Law
TIPS Tb
Processor
1.6 / yr 2 / 18 mos 10 / 5 yrs GIPS
R10000 Pentium II Pentium 256Mb 68040 64Mb 16Mb 80386 68000 MIPS 80286 4Mb 1Mb 1Gb
Processor performance
Gb
80486
Mb 4 / 3 yrs
256kb 64kb
kIPS 1980
1990
2000
kb 2010
Calendar year
Memory
Typically 2-9 cm
Floppy disk
CD-ROM
. ..
(a) Cutaway view of a hard disk drive
.. .
Figure 3.12
16
Communication Technologies
10 12
Processor bus
Geographically distributed I/O network System-area network (SAN) Local-area network (LAN) Metro-area network (MAN)
Bandwidth (b/s)
10 9
10 3 10 9 (ns)
10 6 (s)
10 3 (ms)
(min)
10 3
(h)
Latency (s)
Figure 3.13 Latency and bandwidth characteristics of different classes of communication links.
Computer Architecture, Background and Motivation 17
Software Application:
word processor, spreadsheet, circuit simulator, .. .
Translator: Coordinator:
scheduling, load balancing, diagnostics, .. . MIPS assembler, C compiler, .. .
Manager:
virtual memory, security, file system, .. .
Figure 3.15
18
Assembler
Interpreter
Compiler
Mostly one-to-one
Figure 3.14
Computer Performance
1. 2. 3. 4. 5. 6. Cost, Performance, and Cost/Performance Defining Computer Performance Performance Enhancement and Amdahls Law Performance Measurement vs Modeling Reporting Computer Performance The Quest for Higher Performance
Computer cost
$1 M
$1 K
$1 1960
1980
2000
2020
Calendar year
Computer Architecture, Background and Motivation 21
Cost/Performance
Performance Superlinear: economy of scale Linear (ideal?)
Figure 4.1
22
Input
Processing
Output
I/O-bound task
Figure 4.2 Pipeline analogy shows that imbalance between processing power and I/O capabilities leads to a performance bottleneck.
23
DC-8-50
24
Range (km)
8 300 6 700 12 300 7 450 6 400 14 000
Speed (km/h)
895 980 885 980 2 200 875
Price ($M)
120 200 120 180 350 80
27
CPU time = Instructions (Cycles per instruction) (Secs per cycle) = Instructions CPI / (Clock rate)
Instruction count, CPI, and clock rate are not completely independent, so improving one by a given factor may not lead to overall execution time improvement by the same factor.
28
Average CPI:
Clock rate:
29
Each for consists of two instructions: increment index, check exit condition 12,422,450 Instructions 2 + 20 + 124,200 instructions 100 iterations 12,422,200 instructions in all 2 + 40 + 1200 instructions 100 iterations 124,200 instructions in all 2 + 10 instructions 100 iterations 1200 instructions in all for i = 1, n while x > 0
Solution
1 GHz
20 steps 2 GHz
In this example, addition time does not improve in going from 1 GHz to 2 GHz clock
Figure 4.3
f = fraction
unaffected p = speedup of the rest
40 Speedup (s ) 30 20
f = 0.05 f = 0.01 f = 0.02
10
f = 0.1
1 s = f + (1 f)/p
50
0 0 10 20 30 Enhancement factor (p ) 40
min(p, 1/f)
Figure 4.4 Amdahls law: speedup achieved if a fraction f of a task is unaffected and the remaining 1 f part runs p times as fast.
Computer Architecture, Background and Motivation 32
33
Machine 2 Machine 3
Program A B C D E F
Figure 4.5
35
36
Performance Benchmarks
Example 4.3 You are an engineer at Outtel, a start-up aspiring to compete with Intel via its new processor design that outperforms the latest Intel processor by a factor of 2.5 on floating-point instructions. This level of performance was achieved by design compromises that led to a 20% increase in the execution time of all other instructions. You are in charge of choosing benchmarks that would showcase Outtels performance edge. a. What is the minimum required fraction f of time spent on floating-point instructions in a program on the Intel processor to show a speedup of 2 or better for Outtel? Solution a. We use a generalized form of Amdahls formula in which a fraction f is speeded up by a given factor (2.5) and the rest is slowed down by another factor (1.2): 1 / [1.2(1 f) + f / 2.5] 2 f 0.875
Computer Architecture, Background and Motivation 37
Performance Estimation
Average CPI = All instruction classes (Class-i fraction) (Class-i CPI) Machine cycle time = 1 / Clock rate CPU execution time = Instructions (Average CPI) / (Clock rate) Table 4.3 Usage frequency, in percentage, for various instruction classes in four representative applications.
Data compression C language compiler Reactor simulation Atomic motion modeling
25 32 16 0 19 8
37 28 13 0 13 9
32 17 2 34 9 6
37 5 1 42 10 4
38
a. What are the peak performances of M1 and M2 in MIPS? b. If 50% of instructions executed are class-N, with the rest divided equally among F and I, which machine is faster? By what factor? Solution a. Peak MIPS for M1 = 600 / 2.0 = 300; for M2 = 500 / 2.0 = 250 b. Average CPI for M1 = 5.0 / 4 + 2.0 / 4 + 2.4 / 2 = 2.95; for M2 = 4.0 / 4 + 3.8 / 4 + 2.0 / 2 = 2.95 M1 is faster; factor 1.2
Computer Architecture, Background and Motivation 39
a. What are run times of the two programs with a 1 GHz clock? b. Which compiler produces faster code and by what factor? c. Which compilers output runs at a higher MIPS rate? Solution a. Running time 1 (2) = (600M 1 + 400M 2) / 109 = 1.4 s (1.2 s) b. Compiler 2s output runs 1.4 / 1.2 = 1.17 times as fast c. MIPS rating 1, CPI = 1.4 (2, CPI = 1.5) = 1000 / 1.4 = 714 (667)
Computer Architecture, Background and Motivation 40
Table 4.4
Measured or estimated execution times for three programs. Time on machine X Program A Program B Program C All 3 progs 20 1000 1500 2520 Time on machine Y 200 100 150 450 Speedup of Y over X 0.1 10.0 10.0 5.6
Analogy: If a car is driven to a city 100 km away at 100 km/hr and returns at 50 km/hr, the average speed is not (100 + 50) / 2 but is obtained from the fact that it travels 200 km in 3 hours.
41
Speedup of X over Y
Geometric mean does not yield a measure of overall speedup, but provides an indicator that at least moves in the right direction
42
a. Find the effective CPI for the two applications on both machines. Solution a. CPI of DC on M1: 0.25 4.0 + 0.32 1.5 + 0.16 1.2 + 0 6.0 + 0.19 2.5 + 0.08 2.0 = 2.31 DC on M2: 2.54 RS on M1: 3.94 RS on M2: 2.89
Computer Architecture, Background and Motivation 43
44
Processor
1.6 / yr 2 / 18 mos 10 / 5 yrs GIPS
R10000 Pentium II Pentium 256Mb 68040 64Mb 16Mb 80386 68000 MIPS 80286 4Mb 1Gb
Processor performance
Gb
80486
Mb 4 / 3 yrs
1Mb 256kb
64kb
kIPS 1980
1990
2000
kb 2010
Calendar year
Memory
Figure 3.10 Trends in processor performance and DRAM memory chip capacity (Moores law).
Computer Architecture, Background and Motivation
Can I call you back? We just bought a new computer and were trying to set it up before its obsolete.
45
Super-computers
PFLOPS Massively parallel processors $240M MPPs
Supercomputer performance
$30M MPPs TFLOPS CM-5 CM-5 CM-2 Y-MP GFLOPS Vector supercomputers
Cray X-MP
MFLOPS 1980
1990
2000
2010
Calendar year
Figure 4.7
Performance (TFLOPS)
100+ TFLOPS, 20 TB
100
ASCI Purple
30+ TFLOPS, 10 TB
ASCI Q
10+ TFLOPS, 5 TB
10
3+ TFL OPS, 1.5 TB
ASCI
1 1995
Calendar year
Figure 4.8 Milestones in the DOEs Accelerated Strategic Computing Initiative (ASCI) program with extrapolation up to the PFLOPS level.
47
Figure 25.1 Trend in computational performance per watt of power used in generalpurpose processors and DSPs.
Performance
kIPS 1980
1990
2000
2010
Calendar year
48
49