Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
• Memories in Verilog
• Memories on the FPGA
• External Memories
-- SRAM (async, sync)
-- DRAM
-- Flash
Lecture 12 1
Memories: a practical primer
• The good news: huge selection of technologies
– Small & faster vs. large & slower
– Every year capacities go up and prices go down
– Almost cost competitive with hard disks: high density, fast flash
memories
• Non-volatile, read/write, no moving parts! (robust, efficient)
• The bad news: perennial system bottleneck
– Latencies (access time) haven’t kept pace with cycle times
– Separate technology from logic, so must communicate between
silicon, so physical limitations (# of pins, R’s and C’s and L’s) limit
bandwidths
• New hopes: capacitive interconnect, 3D IC’s
– Likely the limiting factor in cost & performance of many digital
systems: designers spend a lot of time figuring out how to keep
memories running at peak bandwidth
– “It’s the memory - just add more faster memory”
// read ports
assign rd1 = (ra1 == 5’d31) ? 32’d0 : regfile[ra1];
assign rd2 = (ra2 == 5’d31) ? 32’d0 : regfile[ra2];
// write port
always @(posedge clk)
if (werf) regfile[wa] <= wd;
clk
1,2,4 16K,8K,4K,2K,1K,512
1
2
4
8
16
32
Select “IP”
Fill in name
Fill in name
(again?!)
Select RAM vs
ROM
Fill in width
& depth
Click “Next” …
6.111 Fall 2017 Lecture 12 16
BRAM Example
Click “Next” …
6.111 Fall 2017 Lecture 12 17
BRAM Example
Select polarity of
control pins; active
high default is
usually just fine
Click “Next” …
6.111 Fall 2017 Lecture 12 18
BRAM Example
00000000,
00111110,
01100011, Memory contents with location 0 first, then
00000011, location 1, etc. You can specify input radix, in
00000011, this example we’re using binary. MSB is on
the left, LSB on the right. Unspecified
00011110, locations (if memory has more locations than
00000011, given in .coe file) are set to 0.
00000011,
01100011,
00111110,
00000000,
00000000,
6.111 Fall 2017 Lecture 12 20
Using result in your Verilog
• Look at generated Verilog for module defintion (click on “View HDL
Functional Model” under Coregen):
Read-Write
Memory Non-Volatile
Read-Only
Random Read-Write
Sequential Memory
Access Memory
Access
EPROM Mask-
SRAM
FIFO E2PROM Programmed
DRAM
FLASH ROM
Address
Works fine for small memory blocks (e.g., small register files)
Inefficient in area for large memories
Density is the key metric in large memory circuits
CLK CLK
Alternative view
Memory Array Architecture
Small cells small mosfets small dV on bit line
2LxM memory
AK Row Decode
AK+1 Thi i t tl b di l d
This image
cannotThis
currently
be
imag
e
displaycan…
ed.
by
Mx2 column
K
M*2K
A0
Column Decode Selects appropriate word
AK-1
(i.e., multiplexer)
Input-Output
(M bits)
M1 M3
Write: Set BL, BL to (0,VDD )
or (VDD,0) then enable WL (= VDD)
BL BL
Read: Disconnect drivers from BL and
BL, then enable WL (=VDD). Sense a
small change in BL or BL
• Address pins drive row and column • Output Enable gates the
decoders chip’s tristate driver
• Data pins are bidirectional: shared • Write Enable sets the
by reads and writes memory’s read/write mode
• Chip Enable/Chip Select acts
Concept of “Data Bus” as a “master switch”
A4 E1
Memory matrix E2
A5 256 rows
…
A7 32 Column
A8
A9
A11 W
… G Pinout
Sense Amps/Drivers
Column Decoder
0
A
6
A
A12
A1
A10
• Read cycle begins when all enable signals (E1, E2, OE) are active
OE
Bus enable Bus tristate
time time Data
Data Data Data
1 2 3
E2 assumed high (enabled), WE =1 (read mode)
Clock/E1
OE
WE
Address Address for write Address for read
OE (active_low)
clk int_data
Write data D Q
ext_data
Read data Q D
Row Decoder
Chip Enable
Memory
Address matrix
…
Pins Data
Pins
… Read
Sense Amps/Drivers Logic Output Enable
Column Decoder
long “flow-through”
difference between read and write timings combinational path creates high
creates wasted cycles (“wait states”) CLK-Q delay
R1 R2 W3 R4 W5
CE
WE
CLK
Address A1 A2 A3 A4 A5
Data Q1 Q2 D3 Q4 D5
R1 R2 W3 R4 W5
CE
WE
CLK
Address A1 A2 A3 A4 A5
Data Q1 Q2 D3 Q4 D5
pipelining register
R1 R2 W3 R4 W5
CE
WE
CLK
Address A1 A2 A3 A4 A5
Data one-cycle
Q1 Q2 D3 Q4 D5
latency... (ZBT write to A3) (ZBT write to A5)
The lower DCM is used to ensure that the fpga_clock signal, which clocks all of
the FPGA flip-flops, is in phase with the reference clock (clock_27mhz).
Intel 20 V 0V
Floating [Rabaey03]
gate
10 V 5V 20 V 5V 0V
S D S D
Address Data
Charge
Chip Enable pump
EPROM omits
Output Enable Programming FSM, charge
voltage (12V)
FSM pump, and write
Write Enable
enable
6.111 Fall 2017 Lecture 12 39
Flash Memory – Nitty Gritty
• Flash memory uses NOR or NAND flash.
– NAND cells connected in series like resembling NAND gate.
– NAND requires 60% of the area compared to NOR. NAND used in flash
drives.
– Endurance: 100,000 – 300,000 p/e cycles
– Life cycle extended through wear –leveling: mapping of physical blocks
changes over time.
Flash is slow, cache to
RAM for fast read
• Flash memory limitations speed
– Can be read or written byte a time
– Can only be erased block at a time
– Erasure sets bits to 1.
– Location can be re-written if the new bit is zero.
http://www.embeddedintel.com/special_features.php?article=124
VDD
BL
VDD/2 VDD /2
CBL sensing
Cell Plate Si
[Rabaey03]
Capacitor Insulator
Refilling Poly
To Write: set Bit Line (BL) to 0 or VDD Storage Node Poly
& enable Word Line (WL) (i.e., set to VDD ) Si Substrate
2nd Field Oxide
To Read: set Bit Line (BL) to VDD /2
& enable Word Line (i.e., set it to VDD )
RAS
CAS
(Tristate)
Data Q (data from RAM)
RAS-before-CAS CAS-before-RAS
for a read or write for a refresh
(Row and column addresses taken
on falling edges of RAS and CAS)
Address[12:0]
Address[12:0]
Address[12:0]
Data[7:0]
Data[7:0]
Data[7:0]
• SRAM-like interface often used
for peripherals
~E1
~E1
~E1
~W
~W
~G
~G
~G
– Known as “memory mapped” OE
peripherals WE
[12:0]
[12:0]
[12:0]
Address[15:0]
15 C Y7
Memory Map 14
[2:0]
B Y6
13 A Y5
‘138
Y4
0xFFFF
0xE000
EPROM ~G2B
~G2A
Y3
Y2
0xDFFF Bus Enable
SRAM 2
G1 Y1
Y0
0xC000 +5V
0xBFFF
0xA000
SRAM 1 Data[7:0]
0x9FFF
~G
~W
~E1
Data[7:0]
Address[2:0]
0x2000 Analog
0x1FFF
ADC ADC Input
0x0000