Sei sulla pagina 1di 70

COMPUTER

Main
memory

I/O
System
Bus

CPU
CPU

Registers

Structure

Internal
Bus

Control
Unit
CONTROL
UNIT
Sequencing
Logic
Control Unit
Registers and
Decoders

Control
Memory

Figure 1.1 A Top-Down View of a Computer


2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

ALU

Read

Memory

Write
N Words
Address
Data

Read

N1

I/O Module

Write
Address

M Ports

Internal
Data

Internal
Data
External
Data
Interrupt
Signals

External
Data

Address

Instructions
Data

Data

CPU

Interrupt
Signals

Control
Signals

Data

Figure 3.15 Computer Modules


2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

The interconnection structure must support the


following types of transfers:
Memory
to
processor

Processor
reads an
instruction
or a unit of
data from
memory

2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Processor
to
memory

Processor
writes a unit
of data to
memory

I/O to
processor

Processor
reads data
from an I/O
device via
an I/O
module

Processor
to I/O

I/O to or
from
memory

Processor
sends data
to the I/O
device

An I/O
module is
allowed to
exchange
data
directly with
memory
without
going
through the
processor
using direct
memory
access

Slide
4

Buses and Their Appeal


.
.
.

Control

.
.
.

Address

.
.
.

Data

Handshaking,
direction,
transfer mode,
arbitration, ...
one bit (serial)
to several bytes;
may be shared

The three sets of lines found in a bus.


A typical computer may use a dozen or so different buses:
1. Legacy Buses: PC bus, ISA, RS-232, parallel port
2. Standard buses: PCI, SCSI, USB, Ethernet
3. Proprietary buses: for specific devices and max performance
Computer Architecture, Input/Output and Interfacing

July 2004

What is a bus?
A Bus Is:
shared communication link

single set of wires used to connect multiple subsystems

Processor
Input
Control
Datapath

Memory

Output

A Bus is also a fundamental tool for composing large, complex


systems
systematic means of abstraction

Busses

Advantages of Buses

Processer

I/O
Device

I/O
Device

Versatility:
New devices can be added easily
Peripherals can be moved between computer
systems that use the same bus standard

Low Cost:
A single set of wires is shared in multiple ways

I/O
Device

Memory

Disadvantage of Buses

Processor

I/O
Device

I/O
Device

I/O
Device

Memory

It creates a communication bottleneck


The bandwidth of that bus can limit the maximum I/O throughput

The maximum bus speed is largely limited by:


The length of the bus
The number of devices on the bus
The need to support a range of devices with:
Widely varying latencies
Widely varying data transfer rates

General Organization of a Bus


Control Lines
Data Lines

Control lines:
Signal requests and acknowledgments
Indicate what type of information is on the data lines

Data lines carry information between the source and the destination:
Data and Addresses

Complex commands

Master versus Slave


Master issues command

Bus
Master

Data can go either way

Bus
Slave

A bus transaction includes two parts:


Issuing the command (and address)
Transferring the data

request
action

Master is the one who starts the bus transaction by:


issuing the command (and address)

Slave is the one who responds to the address by:


Sending data to the master if the master ask for data
Receiving data from the master if the master wants to send data

Types of Busses
Processor-Memory Bus (design specific)
Short and high speed

Only need to match the memory system


Maximize memory-to-processor bandwidth

Connects directly to the processor


Optimized for cache block transfers

I/O Bus (industry standard)


Usually is lengthy and slower
Need to match a wide range of I/O devices
Connects to the processor-memory bus or backplane bus

Backplane Bus (standard or proprietary)


Backplane: an interconnection structure within the chassis
Allow processors, memory, and I/O devices to coexist

Cost advantage: one bus for all components

Example: Pentium System


Organization
Processor/Memory
Bus

PCI Bus

I/O Busses

A Computer System with One Bus:


Backplane Bus
Backplane Bus
Processor

Memory

I/O Devices

A single bus (the backplane bus) is used for:


Processor to memory communication
Communication between I/O devices and memory

Advantages: Simple and low cost

Disadvantages: slow and the bus can become a major bottleneck


Example: IBM PC - AT

A Two-Bus System
Processor Memory Bus

Processor

Memory
Bus
Adaptor

I/O
Bus

Bus
Adaptor

Bus
Adaptor

I/O
Bus

I/O
Bus

I/O buses tap into the processor-memory bus via bus adaptors:
Processor-memory bus: mainly for processor-memory traffic
I/O buses: provide expansion slots for I/O devices

Apple Macintosh-II
NuBus: Processor, memory, and a few selected I/O devices
SCCI Bus: the rest of the I/O devices

A Three-Bus System
Processor Memory Bus
Processor

Memory
Bus
Adaptor

Backplane Bus

Bus
Adaptor
Bus
Adaptor

I/O Bus

I/O Bus

A small number of backplane buses tap into the processor-memory bus


Processor-memory bus is only used for processor-memory traffic
I/O buses are connected to the backplane bus

Advantage: loading on the processor bus is greatly reduced

North/South Bridge architectures:


separate busses
Processor

Processor Memory Bus

backside
cache
Bus
Adaptor
Backplane Bus
Bus
Adaptor

I/O Bus

I/O Bus

Separate sets of pins for different functions

Memory bus
Caches
Graphics bus (for fast frame buffer)
I/O busses are connected to the backplane bus

Advantage:
Busses can run at different speeds
Much less overall loading!

Memory

What defines a bus?


Transaction Protocol
Timing and Signaling Specification
Bunch of Wires
Electrical Specification

Physical / Mechanical Characteristics


the connectors

Synchronous and Asynchronous Bus


Synchronous Bus:
Includes a clock in the control lines
A fixed protocol for communication that is relative to the clock

Advantage: involves very little logic and can run very fast
Disadvantages:

Every device on the bus must run at the same clock rate

To avoid clock skew, they cannot be long if they are fast

Asynchronous Bus:
It is not clocked
It can accommodate a wide range of devices

It can be lengthened without worrying about clock skew


It requires a handshaking protocol

Busses so far
Master

Slave

Control Lines
Address Lines
Data Lines

Bus Master: has ability to control the bus, initiates transaction


Bus Slave: module activated by the transaction

Bus Communication Protocol: specification of sequence of


events and timing requirements in transferring information.
Asynchronous Bus Transfers: control lines (req, ack) serve to
orchestrate sequencing.
Synchronous Bus Transfers: sequence relative to common clock.

Arbitration:
Obtaining Access to the Bus
Control: Master initiates requests
Bus
Master

Data can go either way

Bus
Slave

One of the most important issues in bus design:


How is the bus reserved by a device that wishes to use it?

Chaos is avoided by a master-slave arrangement:


Only the bus master can control access to the bus:
It initiates and controls all bus requests

A slave responds to read and write requests

The simplest system:


Processor is the only bus master

All bus requests must be controlled by the processor


Major drawback: the processor is involved in every transaction

Multiple Potential Bus Masters:


the Need for Arbitration
Bus arbitration scheme:
A bus master wanting to use the bus asserts the bus request
A bus master cannot use the bus until its request is granted
A bus master must signal to the arbiter the end of the bus utilization

Bus arbitration schemes usually try to balance two factors:


Bus priority: the highest priority device should be serviced first
Fairness: Even the lowest priority device should never
be completely locked out from the bus

Bus arbitration schemes can be divided into four broad classes:


Daisy chain arbitration
Centralized, parallel arbitration
Distributed arbitration by self-selection: each device wanting the bus places a
code indicating its identity on the bus.
Distributed arbitration by collision detection:
Each device just goes for it. Problems found after the fact.

The Daisy Chain Bus


Arbitrations Scheme
Device 1
Highest
Priority
Grant
Bus
Arbiter

Device N
Lowest
Priority

Device 2

Grant

Grant
Release
Request

wired-OR

Advantage: simple
Disadvantages:
Cannot assure fairness:
A low-priority device may be locked out indefinitely
The use of the daisy chain grant signal also limits the bus speed

Centralized Parallel Arbitration


Device 1

Grant

Device 2

Device N

Req

Bus
Arbiter

Used in essentially all processor-memory busses and in high-speed


I/O busses

A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working. SPOFs are undesirable
in any system with a goal of high availability or reliability, be it a business practice, software application, or other industrial
system

PCI = Peripheral Component Interconnect

AGP = Accelerated Graphics Port (high speed point to point)

Comparison of Performance Per Pin for Various Busses

Slide
37

Simple Organization for Input/Output


Interrupts

CPU

Main
memory

Cache

System bus

I/O controller

Disk

I/O controller

I/O controller

Graphics
display

Network

Disk

Input/output via a single common bus.


Computer Architecture, Input/Output and Interfacing

July 2004

Slide
38

I/O Organization for Greater Performance


CPU

Interrupts

Main
memory

Cache

Memory bus

AGP

Bus
adapter

PCI bus

Intermediate
buses / ports

Bus
adapter

I/O bus

I/O controller

Graphics
display

Bus
adapter

I/O controller

Network

I/O controller

Disk

I/O controller

Disk

CD/DVD

Input/output via intermediate and dedicated I/O buses.


Computer Architecture, Input/Output and Interfacing

July 2004

PCI Bus Based Platform

The shared bus design (even in the case of multiple


busses, each of which has a data, address, and
control bus) was the prevalent CPU/components
interconnect architecture for decades. Regardless of
which arbitration or timing scheme, or type of bus is
in use, any shared bus architecture has limitations
whenever the capacity of the bus is reached.
The solution Physically connect computer devices
which you know are often exchanging
information/data.

Point-to-Point Interconnect
Principal reason for change
was the electrical constraints
encountered with increasing
the frequency of wide
synchronous buses

At higher and higher data


rates it becomes
increasingly difficult to
perform the synchronization
and arbitration functions in a
timely fashion

A conventional shared bus


on the same chip magnified
the difficulties of increasing
bus data rate and reducing
bus latency to keep up with
the processors

Point-to-point interconnect
has lower latency, higher
data rate, and better
scalability

Latency is the time it takes for the data requested by the CPU to start arriving.
Bandwidth is the rate at which the data arrives..

Each of the point is connected to diriectly each of the other 4 points.


2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Point to Point Interconnect


Significant Characteristics
Multiple direct connections: direct pairwise
connections to other components,
eliminates arbitration (as in shared system)

Layered protocol architecture: similar to


layered data networks rather than use of
control signals
Packetised data transfer: sequence of
packets, each of which includes comtrol
headers and error controls
2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Peripheral Component Interconnect


(PCI) Express
PCI Express (PCIe)
Point-to-point interconnect scheme
intended to replace bus-based schemes
such as PCI

Key requirement is high capacity to support


the needs of higher data rate I/O devices,
such as Gigabit Ethernet
Another requirement deals with the need to
support time dependent data streams
2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Core

Core

Chipset = Host Bridge


Gigabit
Ethernet

PCIe

Memory

Chipset
PCIePCI
Bridge

PCIe

Memory
PCIe

PCIe

PCIe
Legacy
endpoint

PCIe

Switch

PCIe
endpoint

PCIe
PCIe
endpoint

PCIe
endpoint

Figure 3.21 Typical Configuration Using PCIe

2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

From electronicdesign.com

2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

All information moves across an active PCI Express link in


fundamental chunks called packets.
The two major classes of packets exchanged between two PCIe
devices are high level Transaction Layer Packets (TLPs), and low
level link maintenance packets called Data Link Layer Packets
(DLLPs).
Collectively, the various TLPs and DLLPs allow two devices to
perform memory, IO and Configuration Space transactions
reliably and use messages to initiate power management
events, generate interrupts, report errors etc.

Each PCIe packet has a known size and format.


The packet header -- positioned at the beginning of each DLLP
and TLP packet indicates the packet type and presence of any
optional fields.

The size of each packet field is either fixed or defined by the


packet (transaction) types i.e. memory, I/O, configuration,
message.
The size of any data payload is conveyed in the TLP header
Length field

Transaction

Data Link

Transaction layer
packets (TLP)
Data link layer
packets (DLLP)

Physical

Transaction

Data Link
Physical

Figure 3.22 PCIe Protocol Layers


2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

The TL supports four address spaces:


Memory

I/O

The memory space includes system


main memory and PCIe I/O
devices

This address space is used for


legacy PCI devices, with reserved
address ranges used to address
legacy I/O devices

Certain ranges of memory


addresses map into I/O devices

Configuration

Message

This address space enables the TL


to read/write configuration
registers associated with I/O
devices
2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

This address space is for control


signals related to interrupts, error
handling, and power management

Table 3.2
PCIe TLP Transaction Types
Address Space
Memory

I/O

Configuration

Message

Memory, I/O,
Configuration

TLP Type
Memory Read Request
Memory Read Lock Request
Memory Write Request
I/O Read Request
I/O Write Request
Config Type 0 Read Request
Config Type 0 Write Request
Config Type 1 Read Request
Config Type 1 Write Request
Message Request
Message Request with Data
Completion
Completion with Data
Completion Locked
Completion Locked with Data

2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Purpose
Transfer data to or from a location in the
system memory map.
Transfer data to or from a location in the
system memory map for legacy devices.
Transfer data to or from a location in the
configuration space of a PCIe device.
Provides in-band messaging and event
reporting.

Returned for certain requests.

PCIe Layered Protocol and TLP Assembly/Disassembly

Quick Path Interconnect


Introduced in 2008

Multiple direct connections


Direct pairwise connections to other components
eliminating the need for arbitration found in shared
transmission systems

Layered protocol architecture


These processor level interconnects use a layered
protocol architecture rather than the simple use of
control signals found in shared bus arrangements

Packetized data transfer


Data are sent as a sequence of packets each of
which includes control headers and error control
codes
2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

QPI

Intel Shared Front Side Bus , up until 2004


- For Intel Xeon 64-bit processor and
- Intel Itanium 128-bit processor

Intel Dual Independent Busses


circa 2005

Intel Dedicated High Speed


Interconnects - 2007

QPI

I/O Hub

PCI Express

I/O device
DRAM

Core
D

DRAM

Core
C

I/O device

I/O device
DRAM

Core
B

I/O device

Core
A

DRAM

I/O Hub

Memory bus

Figure 3.17 Multicore Configuration Using QPI

2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Packets
Protocol

Protocol

Routing

Routing

Link
Physical

Flits

Phits

Figure 3.18 QPI Layers

2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Link
Physical

COMPONENT A

Fwd Clk

Transmission Lanes

Reception Lanes

Rcv Clk

Rcv Clk

Reception Lanes

Transmission Lanes

Fwd Clk

Intel QuickPath Interconnect Port

Intel QuickPath Interconnect Port


COMPONENT B

Figure 3.19 Physical Interface of the Intel QPI Interconnect


2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Contains multiple direct pairwise connections


between components, eliminating the need
for arbitration found in shared bus systems
Has a layered protocol architecture, similar
in design to the classical network protocols
that govern todays networks (Internet, for
example)
Relies on the concept of packets, which
are bundles of information with control, error,
data payloads, etc.

QPI Link Layer


Flow control function

Performs two key


functions: flow control
and error control

Needed to ensure that a


sending QPI entity does not
overwhelm a receiving QPI
entity by sending data faster
than the receiver can
process the data and clear
buffers for more incoming
data

Operate on the
level of the flit (flow
control unit)

Each flit consists of a


72-bit message
payload and an 8bit error control
code called a
cyclic redundancy
check (CRC)

Error control function

2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Detects and recovers


from bit errors, and so
isolates higher layers
from experiencing bit
errors

QPI Routing and Protocol Layers


Routing Layer
Used to determine the
course that a packet will
traverse across the
available system
interconnects
Defined by firmware and
describe the possible paths
that a packet can follow

Protocol Layer

Packet is defined as the unit


of transfer
One key function
performed at this level is a
cache coherency protocol
which deals with making
sure that main memory
values held in multiple
caches are consistent

A typical data packet


payload is a block of data
being sent to or from a
cache
2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Protocol Layer : The high-level set of rules for


exchanging packets of data between devices
Routing Layer : Provides the framework for directing
packets from one location of the network to another.
Rely on routing algorithms to determine fastest path, best
and least-congested route, etc.
Link Layer : Responsible for reliable transmission and
flow control. The Link layers unit of transfer is an 80-bit
Flit (flow control unit).
Physical Layer : the actual wires carrying the signals, as well
as circuitry and logic to support transmission and receipt of
1s and 0s. The unit of transfer at the Physical layer is 20 bits,
which is called a Phit (physical unit).

Potrebbero piacerti anche