Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Word 0 Word 0
Col. decode
cell
Row decode
Control Erase block
Word 1 NAND
Word 1 and
cell address flash pages
Word 2 logic array
Word 2
1 page
Word 3
I/O Data register Data page Spare area
VT in Volts
HY27US08121A 512M SLC 105 90nm 1
MT29F2G08AAD 2G SLC 105 50nm 0
MT29F4G08AAC 4G SLC 105 72nm
-1
NAND08GW3B2C 8G SLC 105 60nm
MT29F8G08MAAWC 8G MLC 104 72nm -2
29F16G08CANC1 16G SLC 105 50nm -3
MT29F32G08QAA 32G MLC 104 50nm -4
100 101 102 103 104 105
Table 1: Devices tested Program/erase cycles
Endurance (cycles)
teristics of entire flash-based storage systems, such as 107
300
250
200
150
100 Write latency
50
0
1×105 3×105 5×105 7×105 9×105 Figure 6: USB Flash drive modified for logic analyzer prob-
ing.
Erase time (ms)
10
8 Erase latency
6
4 leveling algorithms. Additional experimentation is
2
0
needed to determine whether these results hold across
1×105 3×105 5×105 7×105 9×105 the most recent generation of devices, and whether flash
Iterations
algorithms may be tailored to produce access patterns
Figure 5: Wear-related changes in latency. Program and which maximize endurance, rather than assuming it as a
erase latency are plotted separately over the lifetime of the same constant. Finally, the increased erase time and decreased
block in the 8Gb MLC device. Quantization of latency is due programming time of aged cells bear implications for op-
to iterative internal algorithms. timal flash device performance, as well as offering a pre-
dictive failure-detection mechanism.
durance of the devices tested is typical, or is instead due
to anomalies in the testing process. In particular, we 3.2 FTL Investigation
varied both program/erase behavior and environmental
conditions to determine their effects. Due to the high Having examined performance of NAND flash itself, we
variance of the measured endurance values, we have not next turn to systems comprising both flash and FTL.
collected enough data to draw strong inferences, and so While work in the previous section covers a wide range
report general trends instead of detailed results. of flash technologies, we concentrate here on relatively
Usage patterns – The results reported above were mea- small mass-market USB drives due to the difficulties in-
sured by repeatedly programming the first page of a herent in reverse-engineering and destructive testing of
block with all zeroes (the programmed state for SLC more sophisticated devices.
flash) and then immediately erasing the entire block. Methodology: we reverse-engineered FTL opera-
Several devices were tested by writing to all pages in a tion in three different USB drives, as listed in Ta-
block before erasing it; endurance appeared to decrease ble 3: Generic, an unbranded device based on the
with this pattern, but by no more than a factor of two. Hynix HY27US08121A 512Mbit chip, House, a Mi-
Additional tests were performed with varying data pat- croCenter branded 2GB device based on the Intel
terns, but no difference in endurance was detected. 29F16G08CANC1, and Memorex, a 512MB Memorex
Environmental conditions – The processes resulting in “Mini TravelDrive” based on an unidentified part.
flash failure are exacerbated by heat [32], although in- In Figure 6 we see one of the devices with probe wires
ternal compensation is used to mitigate this effect [22]. attached to the I/O bus on the flash chip itself. Reverse-
The 16Gbit device was tested at 80◦ C, and no noticeable engineering was performed by issuing specific logical
difference in endurance was seen. operations from a Linux USB host (by issuing direct
Conclusions: The high endurance values measured I/O reads or writes to the corresponding block device)
were unexpected, and no doubt contribute to the mea- and using an IO-3200 logic analyzer to capture resulting
sured performance of USB drives reported below, which transactions over the flash device bus. From this captured
achieve high endurance using very inefficient wear- data we were then able to decode the flash-level opera-
Generic House Memorex
Structure 16 zones 4 zones 4 zones
Zone size 256 physical blocks 2048 physical blocks 1024 physical blocks
Free blocks list size 6 physical blocks per zone 30-40 physical blocks per zone 4 physical blocks per zone
Mapping scheme Block-level Block-level / Hybrid Hybrid
Merge operations Partial merge Partial merge / Full merge Full merge
Garbage collection frequency At every data update At every data update Variable
Wear-leveling algorithm Dynamic Dynamic Static
tions (read, write, erase, copy) and physical addresses before merge after merge
corresponding to a particular logical read or write. Updated page
LPN = 2
We characterize the flash devices based on the fol-
Block A Block B
lowing parameters: zone organization (number of zones,
LPN=0, valid LPN=0, valid
zone size, number of free blocks), mapping schemes, LPN=1, valid LPN=1, valid
merge operations, garbage collection frequency, and LPN=2, invalid LPN=2, valid
wear-leveling algorithms. Investigation of these specific
1. Write & Merge
attributes is motivated by their importance; they are fun- 2. Space reclamation
damental in the design of any FTL [2, 3, 17, 19, 20, Free (erased) Block A
21, 25], determining space requirements, i.e. the size of
the mapping tables to keep in RAM (zone organization, Figure 7: Generic device page update. Using block-level
mapping schemes), overhead/performance (merge oper- mapping and a partial merge operation during garbage collec-
ations, garbage collection frequency), device endurance tion. LPN = Logical Page Number. New data is merged with
block A and an entire new block (B) is written.
(wear-leveling algorithms). The results are summarized
in Table 4, and discussed in the next sections.
Zone organization: The flash devices are divided in the same logical block) ranges from 1 to 55 for Memorex
zones, which represent contiguous regions of flash mem- and 1 to 63 for House.
ory, with disjoint logical-to-physical mappings: a logical We illustrate how garbage collection works after being
block pertaining to a zone can be mapped only in a phys- triggered by a page update operation.
ical block from the same zone. Since the zones function The Generic flash drive implements a simple page up-
independently from each other, when one of the zones date mechanism (Figure 7). When a page is overwritten,
becomes unusable, other zones on the same device can a block is selected from the free block list, and the data
still be accessed. We report actual values of zone sizes to be written is merged with the original data block and
and free list sizes for the investigated devices in Table 4. written to this new block in a partial merge, resulting in
Mapping schemes: Block-mapped FTLs require the erasure of the original data block.
smaller mapping tables to be stored in RAM, compared The House drive allows multiple updates to occur be-
to page-mapped FTLs (Section 2.2). For this reason, fore garbage collection, using an approach illustrated in
the block-level mapping scheme is more practical and Figure 8. Flash is divided into two planes, even and odd
was identified in both Generic and multi-page updates of (blocks B-even and B-odd in the figure); one log block
House flash drives. For single-page updates, House uses can represent updates to a single block in the data area.
the simplified hybrid mapping scheme (which we will When a single page is written, meta-data is written to the
describe next), similar to Ban’s NFTL [3]. The Memo- first page in the log block and the new data is written to
rex flash drive uses hybrid mapping: the data blocks are the second page; a total of 63 pages may be written to
block-mapped and the log blocks are page-mapped. the same block before the log must be merged. If a page
Garbage collection: For the Generic drive, garbage is written to another block in the plane, however, the log
collection is handled immediately after each write, elim- must be merged immediately (via a full merge) and a new
inating the overhead of managing stale data. For House log started.
and Memorex, the hybrid mapping allows for several se- We observe that the House flash drive implements an
quential updates to be placed in the same log block. De- optimized mechanism for multi-page updates, requiring
pending on specific writing patterns, garbage collection 2 erasures rather than 4. This is done by eliminating the
can have a variable frequency. The number of sequential intermediary storage step in log blocks B-even and B-
updates that can be placed in a 64-page log block (before odd, and writing the updated pages directly to blocks C-
a new free log block is allocated to hold updated pages of even and C-odd.
Block A Block Log_A
before merge after merge Updated page List of
Free Blocks
LPN = 0 LPN=0, invalid Settings, valid
Block B-even Block A-even Block C-even
LPN=1, valid LPN=0, valid
...
Updated page Settings, valid LPN=0, valid LPN=0, valid LPN=2, valid LPN=Ø, free Used
LPN = 4 LPN=4, valid LPN=2, valid LPN=2, valid Used
Step T-4 Used
LPN=Ø, free LPN=4, invalid LPN=4, valid Used
Step T Block Merged_A
Used
Used
LPN=0, valid allocated
Used
Block B-odd Block A-odd Block C-odd allocated
LPN=1, valid Used
LPN=2, valid Free
Settings, valid LPN=1, valid LPN=1, valid Free
erased
LPN=Ø, free LPN=3, valid LPN=3, valid erased Free
Updated page Block B Block Log_B Free
LPN=Ø, free LPN=5, valid LPN=5, valid
LPN = 2
1. Write & Merge LPN=0, valid Settings, valid
Figure 8: House device single-page update. Using hybrid Figure 9: Memorex device page update. Using hybrid map-
mapping and a full merge operation during garbage collection. ping and a full merge operation during garbage collection. LPN
LPN = Logical Page Number. LPN 4 is written to block B, = Logical Page Number. LPN 2 is written to the log block of
“shadowing” the old value in block A. On garbage collection, block B and the original LPN 2 marked invalid. If this requires
LPN 4 from block B is merged with LPNs 0 and 2 from block a new log block, an old log block (Log A) must be freed by
A and written to a new block. doing a merge with its corresponding data block.
The Memorex flash drive employs a complex garbage In combination with the chip-level endurance measure-
collection mechanism, which is illustrated in Figure 9. ments presented above, we will demonstrate in Section
When one or more pages are updated in a block (B), a 3.4 below the use of these parameters to predict overall
merge is triggered if there is no active log block for block device endurance.
B or the active log block is full, with the following oper-
ations being performed: 3.3 Timing Analysis
• The new data pages together with some settings infor- Additional information on the internal operation of
mation are written in a free log block (Log B). a flash drive may be obtained by timing analysis—
• A full merge operation occurs, between two blocks measuring the latency of each of a series of requests
(data block A and log block Log A) that were ac- and detecting patterns in the results. This is possible be-
cessed 4 steps back. The result is written in a free cause of the disparity in flash operation times, typically
block (Merged A). Note that the merge operation may 20µs, 200-300µs, and 2-4ms for read, write and erase
be deferred until the log block is full. respectively [9]. Selected patterns of writes can trigger
• After merging, the two blocks (A and Log A) are differing sequences of flash operations, incurring differ-
erased and added to the list of free blocks. ent delays observable as changes in write latency. These
changes offer clues which can help infer the following
Wear-leveling aspects: From the reverse-engineered characteristics: (a) wear-leveling mechanism (static or
devices, static wear-leveling was detected only in the dynamic) and parameters, (b) garbage collection mecha-
case of the Memorex flash drive, while both Generic and nism, and (c) device end-of-life status.
House devices use dynamic wear-leveling. As observed Approach: Timing analysis uses sequences of writes
during the experiments, the Memorex flash drive is peri- to addresses {A1 , A2 , . . . An } which are repeated to pro-
odically (after every 138th garbage collection operation) voke periodic behavior on the part of the device. The
moving data from one physical block containing rarely most straightforward sequence is to repeatedly write the
updated data, into a physical block from the list of free same block; these writes completed in constant time for
blocks. The block into which the static data has been the Generic device, while results for the House device are
moved is taken out of the free list and replaced by the seen in Figure 10. These results correspond to the FTL
rarely used block. algorithms observed in Section 3.2 above; the Generic
Conclusions: The three devices examined were found device performs the same block copy and erase for every
to have flash translation layers ranging from simple write, while the House device is able to write to Block B
(Generic) to somewhat complex (Memorex). Our in- (see Figure 8) 63 times before performing a merge oper-
vestigation provided detailed parameters of each FTL, ation and corresponding erase.
including zone organization, free list size, mapping More complex flash translation layers require more
scheme, and static vs. dynamic wear-leveling methods. complex sequences to characterize them. The hybrid
80
40 70
Time (ms)
Time (ms)
30
60
20
50
10
5
1 40
0 200 400 600 800 1000
0 50 100 150 200 250 300
Write count
Write count
Figure 10: House device write timing. Write address is con- Figure 12: Memorex device static wear-leveling. Lower val-
stant; peaks every 63 operations correspond to the merge oper- ues represent normal writes and erasures, while peaks include
ation (including erasure) described in Section 3.2. time to swap a static block with one from the free list. Peaks
have a regular frequency of one at every 138 write/erasure.
60
55 writes/block
40
20 300
Time (ms)
0
0 20 40 60 80 100 120 140 160 180 200
60 200
60 writes/block
Time (ms)
40
100
20
50
0
0 20 40 60 80 100 120 140 160 180 200 0
60 w − 50,000 w − 25,000 w = 106,612,284
64 writes/block Write count
40
50
Figure 13: House device end-of-life signature. Latency of
0
0 20 40 60 80 100 120 140 160 180 200 the final 5 × 104 writes before failure.
Write count
ence from common writes to the highest peaks is likely Endurance limits with dynamic wear-leveling: We
due to data copy operations implementing static wear- measured an endurance of approximately 106 × 106
leveling. writes for House; in two different experiments, Generic
End-of-life signature: Write latency was measured sustained up to 103 × 106 writes and 77 × 106 writes, re-
during endurance tests, and a distinctive signature was spectively. As discussed in Section 3.2, the House flash
seen in the operations leading up to device failure. This drive performs 4 block erasures for 1-page updates, while
may be seen in Figure 13, showing latency of the final the Generic flash drive performs only one block erasure.
5 × 104 operations before failure of the House device. However, the list of free blocks is about 5 times larger
First the 80ms peaks stop, possibly indicating the end of for House (see Table 3), which may explain the higher
some garbage collection operations due to a lack of free device-level endurance of the House flash drive.
pages. At 25000 operations before the end, all operations Endurance limits with static wear-leveling: Wear-
slow to 40ms, possibly indicating an erasure for every ing out a device that employs static wear-leveling (e.g.
write operation; finally the device fails and returns an the Memorex and House-2 flash drives) takes consider-
error. ably longer time than wearing out one that employs dy-
Conclusions: By analyzing write latency for vary- namic wear-leveling (e.g. the Generic and House flash
ing patterns of operations we have been able to deter- drives). In the experiments conducted, the Memorex and
mine properties of the underlying flash translation algo- House-2 flash drives had not worn out before the paper
rithm, which have been verified by reverse engineering. was submitted, reaching more than 37 × 106 writes and
Those properties include wear-leveling mechanism and 26 × 108 writes, respectively.
frequency, as well as number and organization of log Conclusions: The primary insight from these mea-
blocks. Additional details which should be possible to surements is that wear-leveling techniques lead to a
observe via this mechanism include zone boundaries and significant increase in the endurance of the whole de-
possibly free list size. vice, compared to the endurance of the memory chip it-
self, with static wear-leveling providing much higher en-
3.4 Device-level Endurance durance than dynamic wear-leveling.
Table 5 presents a synthesis of predicted and measured
By device-level endurance we denote the number of suc- endurace limits for the devices studied. We use the fol-
cessful writes at logical level before a write failure oc- lowing notation:
curs. Endurance was tested by repeated writes to a con- N = total number of erase blocks,
stant address (and to 5 constant addresses in the case of k = total number of pages in the erase block,
Memorex) until failure was observed. Testing was per- h = maximum number of program/erase cycles
formed on Linux 2.6.x using direct (unbuffered) writes of a block (i.e. the chip-level endurance),
to the block devices. z = number of erase blocks in a zone, and
Several failure behaviors were observed: m = number of free blocks in a zone.
• silent: The write operation succeeds, but read verifies Ideally, the device-level endurance is N kh. In prac-
that data was not written. tice, based on the FTL implementation details presented
• unknown error: On multiple occasions, the test ap- in Section 3.2 we expect device-level endurance limits
plication exited without any indication of error. In of mh for Generic, between mh and mkh for House,
many casses, further writes were possible. and zkh for Memorex. In the following computations,
• error: An I/O error is returned by the OS. This was we use the program/erase endurance values, i.e. h, from
observed for the House flash drive; further write op- Figure 4, and m and z values reported in Table 4. For
erations to any page in a zone that had been worn out Generic, mh = 6 × 107 , which approaches the actual
failed, returning error. measured values of 7.7 × 107 and 10.3 × 107 . For House,
• blocking: The write operation hangs indefinitely. This mh = 3 × 107 and mkh = 30 × 64 × 106 = 1.9 × 109 ,
was encountered for both Generic and House flash with the measured device-level endurance of 10.6 × 107
drives, especially when testing was resumed after fail- falling between these two limits. For Memorex, we do
ure. not have chip-level endurance measurements, but we will
Address rounded to: X = seek (disk), or
- track number (disk) garbage collection (flash)
Start track - erase block number (flash) R = read, W = write
R R R W W W R R R W W W
Disk - unscheduled 35 X 70
X 10
X 50
X 70
X 10
X 50
X 70
X 10
X 50
X 70
X 10
X 50
R W R W R W R W R W R W
Disk - optimal scheduling 35 X 50 50 50 50
X 70 70 70 70
X 10 10 10 10
R R R W W W R R R W W W
Flash - unscheduled 70 10 50 70 10
X 50 70 10 50
X 70
X 10
X 50
R R R W R W W R W W R W
Flash - optimal scheduling 70 10 50 70 70 70 10 10 10
X 50 50 50
Time
Figure 14: Unscheduled access vs. optimal scheduling for disk and flash. The requested access sequence contains both reads
(R) and writes (W). Addresses are rounded to track numbers (disk), or erase block numbers (flash), and “X” denotes either a seek
operation to change tracks (disk), or garbage collection to erase blocks (flash). We ignore the rotational delay of disks (caused by
searching for a specific sector of a track), which may produce additional overhead. Initial head position (disk) = track 35.
use h = 106 in our computations, since it is the pre- area. Scheduling algorithms for flash need to minimize
dominant value for the tested devices. We estimate the garbage collection, and thus their design must be depen-
best-case limit of device-level endurance to be zkh = dent upon FTL implementation. FTLs are built to take
1024 × 64 × 106 ≈ 6 × 1010 for Memorex, which is advantage of temporal locality; thus a significant per-
about three orders of magnitude higher than for Generic formance increase can be obtained by reordering data
and House devices, demonstrating the major impact of streams to maximize this advantage. FTLs map succes-
static wear-leveling. sive updates to pages from the same data block together
in the same log block. When writes to the same block are
issued far apart from each other in time, however, new
3.5 Implications for Storage Systems log blocks must be allocated. Therefore, most benefit is
gained with a scheduling policy in which the same data
Space management: Space management policies for
blocks are accessed successively. In addition, unlike for
flash devices are substantially different from those used
disks, for flash devices there is no reason to reschedule
for disks, mainly due to the following reasons. Com-
reads.
pared to electromechanical devices, solid-state electronic
devices have no moving parts, and thus no mechanical To illustrate the importance of scheduling for perfor-
delays. With no seek latency, they feature fast random mance as well as the conceptually different aspects of
access times and no read overhead. However, they ex- disk vs. flash scheduling, we look at the following sim-
hibit asymmetric write vs. read performance. Write op- ple example (Figure 14).
erations are much slower than reads, since flash mem- Disk scheduling. Let us assume that the following re-
ory blocks need to be erased before they can be rewrit- quests arrive: R 70, R 10, R 50, W 70, W 10, W 50, R
ten. Write latency depends on the availability (or lack 70, R 10, R 50, W 70, W 10, W 50, where R = read,
thereof) of free, programmable blocks. Garbage collec- W = write, and the numbers represent tracks. Initially,
tion is carried out to reclaim previously written blocks the head is positioned on track 35. We ignore the rota-
which are no longer in use. tional delay of searching for a sector on a track. Without
Disks address the seek overhead problem with scheduling, the overhead (seek time) is 495. If the ele-
scheduling algorithms. One well-known method is the vator algorithm is used, the requests are processed in the
elevator algorithm (also called SCAN), in which requests direction of the arm movement, which results in the fol-
are sorted by track number and serviced only in the cur- lowing ordering: R 50, W 50, R 50, W 50, R 70, W 70, R
rent direction of the disk arm. When the arm reaches the 70, W 70, (arm movement changes direction), R 10, W
edge of the disk, its direction reverses and the remaining 10, R 10, W 10. Also, the requests to the same track are
requests are serviced in the opposite order. grouped together, to minimize seek time; however, data
Since the latency of flash vs. disks has entirely differ- integrity has to be preserved (reads/writes to the same
ent causes, flash devices require a different method than disk track must be processed in the requested order, since
disks to address the latency problem. Request schedul- they might access the same address). This gives an over-
ing algorithms for flash have not yet been implemented head of 95, which is 5x smaller with scheduling vs. no
in practice, leaving space for much improvement in this scheduling.
Flash scheduling. Let us assume that the same se- (which are remapped when updated data is allocated to
quence of requests arrives: R 70, R 10, R 50, W 70, W a free block), thus eliminating the second level of indi-
10, W 50, R 70, R 10, R 50, W 70, W 10, W 50, where rection needed by FTLs to maintain the mappings. They
R = read, W = write, and the numbers represent erase also have to manage only one free space pool instead of
blocks. Also assume that blocks are of size 3 pages, and two, as required by FTL with disk file systems. In addi-
there are 3 free blocks, with one block empty at all times. tion, unlike conventional file systems, flash file systems
Without scheduling, 4 erasures are needed to accommo- do not need to handle seek latencies and file fragmenta-
date the last 4 writes. An optimal scheduling gives the tion; rather, a new and more suited scheduling algorithm
following ordering of the requests: R 70, R 10, R 50, as described before can be implemented to increase per-
W 70, R 70, W 70, W 10, R 10, W 10, W 50, R 50, W formance.
50. We observe that there is no need to reschedule reads;
however, data integrity has to be preserved (reads/writes
4 Analysis and Simulation
to the same block must be processed in the requested or-
der, since they might access the same address). After In the previous section we have examined the perfor-
scheduling, the first two writes are mapped together to mance of several real wear leveling algorithms under
the same free block, next two are also mapped together, close to worst-case conditions. To place these results
and so on. A single block erasure is necessary to free one in perspective, we wish to determine the maximum the-
block and accommodate the last two writes. The garbage oretical performance which any such on-line algorithm
collection overhead is 4x smaller with scheduling vs. no may achieve. Using terminology defined above, we as-
scheduling. sume a device (or zone within a device) consisting of N
Applicability: Although we have explored only a few erase blocks, each block containing k separately writable
devices, some of the methods presented here (e.g. tim- pages, with a limit of h program/erase cycles for each
ing analysis) can be used to characterize other flash de- erase block, and m free erase blocks. (i.e. the physical
vices as well. FTLs range in complexity across devices; size of the device is N erase blocks, while the logical
however, at low-end there are many similarities. Our re- size is N − m blocks.)
sults are likely to apply to a large class of devices that Previous work by Ben-Aroya and Toledo [5] has
use flash translation layers, including most removable proved that in the typical case where k > 1, and with rea-
devices (SD, CompactFlash, etc.), and low-end SSDs. sonable bounds on m, upper bounds exist on the perfor-
For high-end devices, such as enterprise (e.g. the Intel mance of wear-leveling algorithms. Their results, how-
X25-E [16] or BiTMICRO Altima [6] series) or high- ever, offer little guidance for calculating these bounds.
end consumer (e.g. Intel X25-M [15]), we may expect to We approach the problem from the bottom up, using
find more complex algorithms operating with more free Monte Carlo simulation to examine achievable perfor-
space and buffering. mance in the case of uniform random writes to physi-
As an example, JMicron’s JMF602 flash con- cal pages. We choose a uniform distribution because it
troller [18] has been used for many low-end SSDs with is both achievable (by means such as Ban’s randomized
8-16 flash chips; it contains 16K of onboard RAM, and wear leveling method [4]) and in the worst case unavoid-
uses flash configurations with about 7% free space. Hav- able by any on-line algorithm, when faced with uniform
ing little free space or RAM for mapping tables, its random writes across the logical address space. We claim
flash translation layer is expected to be similar in design therefore that our numeric results represent a tight bound
and performance to the hybrid FTL that we investigated on the performance of any on-line wear-leveling algo-
above. rithm in the face of arbitrary input.
At present, several flash devices including low-end We look for answers to the following questions:
SSDs have a built-in controller that performs wear-
• How efficiently can we perform static wear leveling?
leveling and error correction. A disk file system in con-
We examine the case where k = 1, thus ignoring erase
junction with a FTL that emulates a block device is pre-
block fragmentation, and ask whether there are on-line
ferred for compatibility, and also because current flash
algorithms which achieve near-ideal endurance in the
file systems still have implementation drawbacks (e.g.
face of arbitrary input.
JFFS2 has large memory consumption and implements
• How efficiently can we perform garbage collection?
only write-through caching instead of write-back) [26].
For typical values of k, what are the conditions needed
Flash file systems could become more prevalent as the
for an on-line algorithm to achieve good performance
capacity of flash memories increases. Operating directly
with arbitrary access patterns?
over raw flash chips, flash file systems present some ad-
vantages. They deal with long erase times in the back- In doing this we use endurance degradation of an al-
ground, while the device is idle, and use file pointers gorithm, or relative decrease in performance, as a figure
across the physical device, even in the face of highly non-
Number of Program/Erase Cycles
Worn-out blocks uniform writes to the logical device. For ease of analysis
we make two simplifications:
10
8
• Erase unit and program unit are of the same size, i.e.
k = 1. We examine k > 1 below, when looking at
6 garbage collection efficiency.
• Writes are uniformly distributed across physical
4
pages, as described above.
2
Letting X1 , X2 , . . . XN be the number of times that
0 pages 1 . . . N have been erased, we observe that at any
0 5 10 15 20
point each Xi is a random variable with mean w/N ,
Page Number
where w is the total number of writes so far. If the vari-
Figure 15: Trivial device failure (N = 20, m = 4, h = 10). ance of each Xi is high and m N , then it is likely that
Four blocks have reached their erase limit (10) after 100 total m of them will reach h well before w = N h, where the
writes, half the theoretical maximum of N h or 200. expected value of each Xi reaches h. This may be seen
in Figure 15, where in a trivial case (N = 20, m = 4,
h = 10) the free list has been exhausted after a total of
2.2 N=106 m=1 only N h/2 writes.
N=106 m=32 In Figure 16 we see simulation results for a more real-
Endurance degradation
2
N=104 m=8 istic set of parameters. We note the following points:
1.8
vs. ideal
[5] B EN -A ROYA , A., AND T OLEDO , S. Competitive Analysis of [24] L EE , J., C HOI , J., PARK , D., AND K IM , K. Degradation of
Flash-Memory Algorithms. 2006, pp. 100–111. tunnel oxide by FN current stress and its effects on data retention
characteristics of 90 nm NAND flash memory cells. In IEEE Int’l
[6] B I TMICRO N ETWORKS. Datasheet: e-disk Altima E3F4FL Fi- Reliability Physics Symposium (2003), pp. 497–501.
bre Channel 3.5”. available from www.bitmicro.com, Nov.
2009. [25] L EE , S., S HIN , D., K IM , Y., AND K IM , J. LAST: Locality-
Aware Sector Translation for NAND Flash Memory-Based Stor-
[7] B OUGANIM , L., J ONSSON , B., AND B ONNET, P. uFLIP: un- age Systems. In Proceedings of the International Workshop on
derstanding flash IO patterns. In Int’l Conf. on Innovative Data Storage and I/O Virtualization, Performance, Energy, Evaluation
Systems Research (CIDR) (Asilomar, California, 2009). and Dependability (SPEED) (2008).
[8] C HUNG , T., PARK , D., PARK , S., L EE , D., L EE , S., AND [26] M EMORY T ECHNOLOGY D EVICES (MTD). Subsystem for
S ONG , H. System Software for Flash Memory: A Survey. In Linux. JFFS2. Available from http://www.linux-mtd.
Proceedings of the International Conference on Embedded and infradead.org/faq/jffs2.html, January 2009.
Ubiquitous Computing (2006), pp. 394–404.
[27] O’B RIEN , K., S ALYERS , D. C., S TRIEGEL , A. D., AND
[9] D ESNOYERS , P. Empirical evaluation of NAND flash memory P OELLABAUER , C. Power and performance characteristics of
performance. In Workshop on Hot Topics in Storage and File USB flash drives. In World of Wireless, Mobile and Multimedia
Systems (HotStorage) (Big Sky, Montana, October 2009). Networks (WoWMoM) (2008), pp. 1–4.
[10] DRAM E X CHANGE. 2009 NAND flash demand bit growth fore- [28] PARK , M., A HN , E., C HO , E., K IM , K., AND L EE , W. The ef-
cast. www.dramexchange.com, 2009. fect of negative VTH of NAND flash memory cells on data reten-
[11] G AL , E., AND T OLEDO , S. Algorithms and data structures for tion characteristics. IEEE Electron Device Letters 30, 2 (2009),
flash memories. ACM Computing Surveys 37, 2 (2005), 138–163. 155–157.
[29] S ANVIDO , M., C HU , F., K ULKARNI , A., AND S ELINGER , R.
[12] G RUPP, L., C AULFIELD , A., C OBURN , J., S WANSON , S.,
NAND flash memory and its role in storage architectures. Pro-
YAAKOBI , E., S IEGEL , P., AND W OLF, J. Characterizing flash
ceedings of the IEEE 96, 11 (2008), 1864–1874.
memory: Anomalies, observations, and applications. In 42nd In-
ternational Symposium on Microarchitecture (MICRO) (Decem- [30] S HU , F., AND O BR , N. Data Set Management Commands Pro-
ber 2009). posal for ATA8-ACS2. ATA8-ACS2 proposal e07154r6, available
from www.t13.org, 2007.
[13] G UPTA , A., K IM , Y., AND U RGAONKAR , B. DFTL: a flash
translation layer employing demand-based selective caching of [31] S OOMAN , D. Hard disk shipments reach new record level. www.
page-level address mappings. In Proceeding of the 14th inter- techspot.com, February 2006.
national conference on Architectural Support for Programming [32] YANG , H., K IM , H., PARK , S., K IM , J., ET AL . Reliability
Languages and Operating Systems (ASPLOS) (Washington, DC, issues and models of sub-90nm NAND flash memory cells. In
USA, 2009), ACM, pp. 229–240. Solid-State and Integrated Circuit Technology (ICSICT) (2006),
[14] H UANG , P., C HANG , Y., K UO , T., H SIEH , J., AND L IN , M. The pp. 760–762.
Behavior Analysis of Flash-Memory Storage Systems. In IEEE
Symposium on Object Oriented Real-Time Distributed Comput-
ing (2008), IEEE Computer Society, pp. 529–534.
[15] I NTEL C ORP. Datasheet: Intel X18-M/X25-M SATA Solid State
Drive. available from www.intel.com, May 2009.
[16] I NTEL C ORP. Datasheet: Intel X25-E SATA Solid State Drive.
available from www.intel.com, May 2009.
[17] I NTEL C ORPORATION. Understanding the flash translation layer
FTL specification. Application Note AP-684, Dec. 1998.
[18] JM ICRON T ECHNOLOGY C ORPORATION. JMF602 SATA II to
Flash Controller. Available from http://www.jmicron.
com/Product_JMF602.htm, 2008.