Sei sulla pagina 1di 70

COMPUTER ORGANIZATION AND DESIGN ARM

Edition
The Hardware/Software Interface

EE360
EMBEDDED SYSTEMS
Slides stolen copied adapted from the original slides provided in the
instructor resources for the text
About us
• Ribhu ribhufec@iitg.ac.in
____________________
• Akankhya Sarmah akank174102022@iitg.ac.in
• Biplab Sengyung sengy174102050@iitg.ac.in
• Soumendu Ghosh ghosh174102019@iitg.ac.in

Chapter 1 — Computer Abstractions and Technology — 2


Book(s)
• Computer organization
and design by Patterson
and Hennessy.
• Indian Edition not
available
• http://staff.ustc.edu.cn/~
llxx/cod/reference_book
s_tools/Computer%20Or
ganization%20and%20D
esign%20ARM%20editio
n.pdf
• We must thank USTC
China
• Do NOT host this.

Chapter 1 — Computer Abstractions and Technology — 3


Book(s)

• Computer Organization and Design: The


Hardware/Software Interface, 4th ed. (With CD-ROM)
ARM Edition Paperback – 2009
• Available in the Institute library and seen in the Core-I
bookstore.
Chapter 1 — Computer Abstractions and Technology — 4
Scoring
End Term Mid Term Quiz Project

15%

10%
45%

30%

Chapter 1 — Computer Abstractions and Technology — 5


Class Schedule
• Thursdays : Double Lectures
• No classes from 07/02/2019 to 18/02/2019
• Quiz 1 on 07/02/2019 during regular class hours
• Quiz 2 to be announced
• Randomly sampled attendance

Chapter 1 — Computer Abstractions and Technology — 6


About the Project
• Can be a coding project or a history project
• 3 members per group.
• Find your own problems based on the hints in class and
ensure their uniqueness.
• Since the project carries a 15% weight, you are expected
to spend 15-20 hours on it.
• 15 minutes for evaluation
• History projects : Term paper on the history of a
particular aspect of computer architecture
• History projects require a power point presentation and
a written report.
• The correctness of grammar will also be evaluated in this
case.

Chapter 1 — Computer Abstractions and Technology — 7


More about the project
• Coding projects need to be well commented and
should run
• Coding projects need to be in ARMv8.
• Complexity as well as implementation count
• To be presented to the CI and TAs.
• The last date is fixed
• No project presented after the last date will be
evaluated.

Chapter 1 — Computer Abstractions and Technology — 8


COMPUTER ORGANIZATION AND DESIGN ARM
Edition
The Hardware/Software Interface

CHAPTER 1
Computer Abstractions and Technology
The Three Great Revolutions

Agricultural

Industrial

Computer
Chapter 1 — Computer Abstractions and Technology — 11
§1.1 Introduction
The Computer Revolution
• Progress in computer technology
• Underpinned by Moore’s Law
• Every 10 fold decrease in the costs opens up new
applications
• Makes novel applications feasible
• Computers in automobiles
• Cell phones/ Smart Phones
• Human genome project
• World Wide Web

• Computers are pervasive

Chapter 1 — Computer Abstractions and


Technology — 12
Classes of Computers
• Personal computers
• General purpose, variety of software
• Subject to cost/performance tradeoff

• Server computers
• Network based
• High capacity, performance, reliability
• Range from small servers to building sized

Chapter 1 — Computer Abstractions and


Technology — 13
Classes of Computers
• Supercomputers
• High-end scientific and
engineering calculations
• Highest capability
• Small number fraction
• Embedded computers
• Hidden as components of
systems
• Stringent
power/performance/cost
constraints
• Computers as components
• You don’t even know that you
are using a computer

Chapter 1 — Computer Abstractions and


Technology — 14
The PostPC Era

Cell Phones

Tablets

Chapter 1 — Computer Abstractions and


Technology — 15
The Post-PC Era
Personal Mobile Device (PMD)

• Battery operated
• Connects to the Internet
• A Few thousand rupees
• IoT
• Smart phones, tablets, electronic glasses

Cloud computing

• Warehouse Scale Computers (WSC)


• Software as a Service (SaaS)
• Portion of software run on a PMD and a portion run in the Cloud
• Amazon and Google
What You Will Learn?
• How are programs translated into the machine
language?
• And how the hardware executes them?

• The hardware/software interface


• What determines program performance
• And how it can be improved

• How hardware designers improve performance


• What is parallel processing

Chapter 1 — Computer Abstractions and


Technology — 17
What Decides the performance
of a computer system?
Algorithm
• Determines number of operations executed
• DFT/FFT
Programming language, compiler, architecture
• Determine number of machine instructions executed per operation

Processor and memory system


• Determine how fast instructions are executed

I/O system (including OS)


• Determines how fast I/O operations are executed

Chapter 1 — Computer Abstractions and


Technology — 18
§1.2 Eight Great Ideas in Computer Architecture
Eight Great Ideas
Design for Moore’s Law

Use abstraction to simplify design

Make the common case fast

Performance via parallelism

Performance via pipelining

Performance via prediction

Hierarchy of memories

Dependability via redundancy


Chapter 1 — Computer Abstractions and
Technology — 19
§1.3 Below Your Program
The Layers of Abstraction
Application software
• Written in high-level
language

System software
• Compiler: translates HLL code
to machine code
• Operating System: service code
• Handling input/output
• Managing memory and
storage
• Scheduling tasks & sharing
resources
Hardware
• Processor, memory, I/O
controllers

Chapter 1 — Computer Abstractions and


Technology — 20
Breakthrough
• You can write a program to convert a high level
program to machine language
• Example :

Chapter 1 — Computer Abstractions and Technology — 21


Levels of Program Code

• High-level language
• Level of abstraction closer to
problem domain
• Provides for productivity and
portability
• Assembly language
• Textual representation of
instructions
• Hardware representation
• Binary digits (bits)
• Encoded instructions and data
Chapter 1 — Computer Abstractions and
Technology — 22
§1.4 Under the Covers
Components of a Computer
•Same components for
The BIG Picture
all kinds of computer
• Desktop, server,
embedded
•Input/output includes
• User-interface devices
• Display, keyboard, mouse
• Storage devices
• Hard disk, CD/DVD, flash
• Network adapters
• For communicating with other
computers
Chapter 1 — Computer Abstractions and
Technology — 23
Chapter 1 — Computer Abstractions and Technology — 24
Through the Looking Glass
• LCD screen: picture elements (pixels)
• Mirrors content of frame buffer memory

Chapter 1 — Computer Abstractions and


Technology — 25
Touchscreen
•PostPC device
•Supersedes keyboard
and mouse
•Resistive and
Capacitive types
• Most tablets, smart
phones use capacitive
• Capacitive allows
multiple touches
simultaneously

Chapter 1 — Computer Abstractions and


Technology — 26
Inside an iPad 2 Capacitive multitouch
LCD screen
3.8 V, 25 Watt-hour
battery
Computer
board

Chapter 1 — Computer Abstractions and


Technology — 27
Inside the Processor (CPU)

•Datapath: performs operations on


data
•Control: sequences datapath,
memory, ...
•Cache memory
•Small fast SRAM memory for immediate
access to data
Chapter 1 — Computer Abstractions and
Technology — 28
Inside Apple A5

Chapter 1 — Computer
Abstractions and Technology
— 29
Abstractions
The BIG Picture
•Abstraction helps us deal with complexity
•Hide lower-level detail
•The hardware software interface is called the
instruction set architecture (ISA)
•Application binary interface
•The ISA plus system software interface
•Implementation
•The details underlying and interface
Chapter 1 — Computer Abstractions and
Technology — 30
Memory
•Volatile main memory
•Loses instructions and data when
power off
•Non-volatile secondary
memory
•Magnetic disk
•Flash memory (Limited
Read/Write Capability)

Chapter 1 — Computer Abstractions and


Technology — 31
Networks
•Communication, resource sharing, nonlocal
access
•Local area network (LAN): Ethernet
•Wide area network (WAN): the Internet
•Wireless network: WiFi, Bluetooth

Chapter 1 — Computer Abstractions and


Technology — 32
§1.5 Technologies for Building Processors and Memory
Technology Trends
DRAM Capacity

Chapter 1 — Computer Abstractions and


Technology — 33
§1.5 Technologies for Building Processors and Memory
Technology Trends
Year Technology Relative
performance/cost
1951 Vacuum tube 1
1965 Transistor 35
1975 IC 900
1995 VLSI 2,400,000

2013 ULSI 250,000,000,000


Semiconductor Technology

•Silicon:semiconductor
•Add materials to transform
properties:
•Conductors
•Insulators
•Switch

Chapter 1 — Computer Abstractions and


Technology — 35
Manufacturing ICs
8”-12” Dia
12”-24” Length 1”

•Yield: proportion of working dies per wafer


Intel Core i7 (2012) Wafer

• 300mm wafer, 280 chips, 32nm technology


• Each chip is 20.7 x 10.5 mm
Chapter 1 — Computer Abstractions and
Technology — 37
Integrated Circuit Cost

Cost per wafer


Cost per die =
Dies per wafer × Yield
Dies per wafer ≈ Wafer areaΤDie area

Yield
1
=
(1 + (Defects per 𝑢𝑛𝑖𝑡 area × Die area/2))2

Chapter 1 — Computer Abstractions and


Technology — 38
Integrated Circuit Cost

•Nonlinear relation to area and defect rate


•Wafer cost and area are fixed
•Defect rate determined by manufacturing process
•Die area determined by architecture and circuit
design
•The third equation is emperical

Chapter 1 — Computer Abstractions and


Technology — 39
§1.6 Performance
Defining Performance
•Which airplane has the best performance?

Boeing 777 Boeing 777

Boeing 747 Boeing 747

BAC/Sud BAC/Sud
Concorde Concorde
Douglas DC- Douglas DC-
8-50 8-50

0 200 400 600 0 5000 10000

Passenger Capacity Cruising Range (miles)

Boeing 777 Boeing 777

Boeing 747 Boeing 747

BAC/Sud BAC/Sud
Concorde Concorde
Douglas DC- Douglas DC-
8-50 8-50

0 500 1000 1500 0 200000 400000

Cruising Speed (mph) Passengers x mph

Chapter 1 — Computer Abstractions and


Technology — 40
Response Time and Throughput
•Response time
• How long it takes to do a task

•Throughput
• Total work done per unit time
• e.g., tasks/transactions/… per hour

•How are response time and throughput


affected by
• Replacing the processor
with a faster version?
• Adding more processors?

•We’ll focus on response time for now…

Chapter 1 — Computer Abstractions and


Technology — 41
Relative Performance
•Define Performance = 1/Execution Time
•“X is n time faster than Y”
PerformanceX ΤPerformanceY
= Execution timeY ΤExecution timeX = 𝑛

 Example: time taken to run a program


 10s on A, 15s on B
 Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
 So A is 1.5 times faster than B
Chapter 1 — Computer Abstractions and
Technology — 42
Measuring Execution Time
•Elapsed time
•Total response time, including all aspects
• Processing, I/O, OS overhead, idle time

•Determines system performance


•CPU time
•Time spent processing a given job
• Discounts I/O time, other jobs’ shares

•Comprises user CPU time and system CPU


time
•Determines CPU performance
Chapter 1 — Computer Abstractions and
Technology — 43
CPU Clocking
• Operation of digital hardware governed by a constant-
rate clock
Clock period

Clock (cycles)
Data transfer
and computation
Update state

 Clock period: duration of a clock cycle


 e.g., 250ps = 0.25ns = 250×10–12s
 Clock frequency (rate): cycles per second
 e.g., 4.0GHz = 4000MHz = 4.0×10 9Hz
Chapter 1 — Computer Abstractions and
Technology — 44
CPU Time
CPU Time  CPU Clock Cycles  Clock Cycle Time
CPU Clock Cycles

Clock Rate
•Performance improved by
• Reducing number of clock cycles
• Increasing clock rate
• Hardware designer must often trade off clock rate
against cycle count

Chapter 1 — Computer Abstractions and


Technology — 45
CPU Time Example
• Computer A: 2GHz clock, 10s CPU time
• Designing Computer B
• Aim for 6s CPU time
• Can do faster clock, but causes 1.2 × clock cycles

• How fast must Computer B clock be?

Chapter 1 — Computer Abstractions and


Technology — 46
Chapter 1 — Computer Abstractions and Technology — 47
CPU Time Example
• Computer A: 2GHz clock, 10s CPU time
• Designing Computer B
• Aim for 6s CPU time
• Can do faster clock, but causes 1.2 × clock cycles

• How fast must Computer B clock be?

Clock Cycles B 1.2  Clock Cycles A


Clock Rate B  
CPU Time B 6s
Clock Cycles A  CPU Time A  Clock Rate A
 10s  2GHz  20  109
1.2  20  109 24  109
Clock Rate B    4GHz
6s 6s
Chapter 1 — Computer Abstractions and
Technology — 48
Instruction Count and CPI
Clock Cycles  Instructio n Count  Cycles per Instructio n
CPU Time  Instructio n Count  CPI  Clock Cycle Time
Instructio n Count  CPI

Clock Rate
• Instruction Count for a program
• Determined by program, ISA and compiler

• Average cycles per instruction


• Determined by CPU hardware
• If different instructions have different CPI
• Average CPI affected by instruction mix

Chapter 1 — Computer Abstractions and


Technology — 49
Example
• Computer A: Cycle Time = 250ps, CPI = 2.0
• Computer B: Cycle Time = 500ps, CPI = 1.2
• Same Architechture
• Which is faster, and by how much?
CPU Time  Instructio n Count  CPI  Cycle Time
A A A
 I  2.0  250ps  I  500ps
CPU Time  Instructio n Count  CPI  Cycle Time
B B B
 I  1.2  500ps  I  600ps

B  I  600ps  1.2
CPU Time
CPU Time I  500ps
A
Chapter 1 — Computer Abstractions and
Technology — 50
More CPI
• If different instruction classes take different numbers of cycles

n
Clock Cycles   (CPIi  Instruction Count i )
i1

 Weighted average CPI


Clock Cycles n
 Instructio n Count i 
CPI     CPIi  
Instructio n Count i1  Instructio n Count 

Relative frequency
Chapter 1 — Computer Abstractions and
Technology — 51
Example(Algorithm Design)
• Alternative compiled code sequences using
instructions in classes A, B, C

Class A B C

CPI for class 1 2 3

IC in sequence 1 2 1 2

IC in sequence 2 4 1 1

Chapter 1 — Computer Abstractions and


Technology — 52
Chapter 1 — Computer Abstractions and Technology — 53
Example(Algorithm Design)
 Sequence 1: IC = 5
 Clock Cycles
= 2×1 + 1×2 + 2×3
= 10
 Avg. CPI = 10/5 = 2.0

 Sequence 2: IC = 6
 Clock Cycles
= 4×1 + 1×2 + 1×3
=9
 Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and
Technology — 54
Performance Summary
The BIG Picture

Instructions Clock cycles Seconds


CPU Time   
Program Instruction Clock cycle

• Performance depends on
• Algorithm: affects IC, possibly CPI (IC is direct, however we
may prefer lighter instructions over the heavier ones =m, FFT
has a lower CPI than DFT)
• Programming language: affects IC, CPI
• Compiler: affects IC, CPI
• Instruction set architecture: affects IC, CPI, Tc

Chapter 1 — Computer Abstractions and


Technology — 55
Example
• A Java application takes 15s to run. A new
compiler requiring 0.6 as many instructions as
the old one is released, while increasing the CPI
to 1.1 times. What is the time required on the
new complier ?

Chapter 1 — Computer Abstractions and Technology — 56


§1.7 The Power Wall
Power Trends

Chapter 1 — Computer Abstractions and


Technology — 57
§1.7 The Power Wall
Power Trends
• For CMOS circuits power is consumed in capacitative
loading while switching.
1 2
𝐸 = 𝐶𝑉
2

1 2
𝑃 = 𝐶𝑉 𝑓
2
• Leakage currents

Power  (1 / 2)Capacitive load  Voltage 2  Frequency


×30
5V → 1V ×1000

Chapter 1 — Computer Abstractions and


Technology — 58
Reducing Power
• Suppose a new CPU has
• 85% of capacitive load of old CPU
• 15% voltage and 15% frequency reduction

Pnew Cold  0.85  (Vold  0.85) 2  Fold  0.85


  0.85 4
 0.52
Cold  Vold  Fold
2
Pold

 The power wall


 We can’t reduce voltage further
 We can’t remove more heat
 How else can we improve performance?
Chapter 1 — Computer Abstractions and
Technology — 59
§1.8 The Sea Change: The Switch to Multiprocessors
Uniprocessor Performance

Constrained by power, instruction-level parallelism,


memory latency
Chapter 1 — Computer Abstractions and
Technology — 60
Multiprocessors
• Multicore microprocessors
• More than one processor per chip

• Requires explicitly parallel programming


• Compare with instruction level parallelism
• Hardware executes multiple instructions at once
• Hidden from the programmer

• Hard to do
• Programming for performance
• Load balancing
• Optimizing communication and synchronization

More on this in the following chapters.


Chapter 1 — Computer Abstractions and
Technology — 61
SPEC CPU Benchmark
• Programs used to measure performance
• Supposedly typical of actual workload

• Standard Performance Evaluation Corp (SPEC)


• Develops benchmarks for CPU, I/O, Web, …

• SPEC CPU2006
• Elapsed time to execute a selection of programs
• Negligible I/O, so focuses on CPU performance
• Normalize relative to reference machine
• Summarize as geometric mean of performance ratios
• CINT2006 (integer) and CFP2006 (floating-point)

n
n
Execution time ratio
i1
i

Chapter 1 — Computer Abstractions and


Technology — 62
CINT2006 for Intel Core i7 920

Chapter 1 — Computer Abstractions and


Technology — 63
SPEC Power Benchmark

• Power consumption of server at different workload levels


• Performance: ssj_ops/sec
• Power: Watts (Joules/sec)

 10   10 
Overall ssj_ops per Watt    ssj_ops i    poweri 
 i 0   i 0 

Chapter 1 — Computer Abstractions and


Technology — 64
SPECpower_ssj2008 for Xeon X5650

Chapter 1 — Computer Abstractions and


Technology — 65
§1.10 Fallacies and Pitfalls
Pitfall: Amdahl’s Law
• Improving an aspect of a computer and expecting a
proportional improvement in overall performance

Taf f ected
Timprov ed   Tunaf f ected
improvemen t factor
 Example: multiply accounts for 80s/100s
 How much improvement in multiply performance to
get 5× overall?
80  Can’t be done!
20   20
n
 Corollary: make the common case fast
Chapter 1 — Computer Abstractions and
Technology — 66
Fallacy: Low Power at Idle
• Look back at i7 power benchmark
• At 100% load: 258W
• At 50% load: 170W (66%)
• At 10% load: 121W (47%)

• Google data center


• Mostly operates at 10% – 50% load
• At 100% load less than 1% of the time

• Consider designing processors to make power


proportional to load
Chapter 1 — Computer Abstractions and
Technology — 67
Pitfall: MIPS as a Performance Metric
• MIPS: Millions of Instructions Per Second
• Doesn’t account for
• Differences in ISAs between computers
• Differences in complexity between instructions

Instructio n count
MIPS 
Execution time  10 6
Instructio n count Clock rate
 
Instructio n count  CPI CPI  10 6
 10 6

Clock rate

 CPI varies between programs on a given CPU

Chapter 1 — Computer Abstractions and


Technology — 68
§1.9 Concluding Remarks
Concluding Remarks
• Cost/performance is improving
• Due to underlying technology development

• Hierarchical layers of abstraction


• In both hardware and software

• Instruction set architecture


• The hardware/software interface

• Execution time: the best performance measure


• Power is a limiting factor
• Use parallelism to improve performance

Chapter 1 — Computer Abstractions and


Technology — 69
Practice Problems
• 1.2, 1.3, 1.4, 1.5, 1.7-15

Chapter 1 — Computer Abstractions and Technology — 70

Potrebbero piacerti anche