Sei sulla pagina 1di 4

2009 International Forum on Computer Science-Technology and Applications

APRA: Adaptive Page Replacement Algorithm for NAND Flash Memory Storages

Baichuan Shen, Xin Jin, Yong Ho Song, Sang Sun Lee


School of Electronics and Computer Engineering, Hanyang University, Seoul, 133-791, Korea
{ baichuan74, jxjason328, yhsong, ssnlee }@hanyang.ac.kr

ABSTRACT: This paper presents a new page replacement perform simulation experiments with different types of
algorithm called Adaptive Page Replacement Algorithm (APRA), generated traces. The simulation results showed that APRA
aiming at reducing the number of read, write, and erase reduced the number of read and write operation of flash
operations and thereby improving the performance of NAND memory and outperformed other algorithms in terms of read
flash memory based storage systems. APRA uses a learning rule
and write hit counts, and number of erase operations.
to adaptively and continually revise its parameter in response to
diverse workloads with different access patterns. Experiments The rest of this paper is organized as follows. Section II
through simulation studies showed that the proposed algorithm describes other work related to our study. Section III
performs better than other page replacement algorithms like discusses the details of the proposed APRA. Section IV
LRU, CFLRU, CFLRU/C, LRU-WSR, in terms of read and write describes the performance results of APRA compared to
hit counts, and number of erase operations. LRU, CRLRU, CRLRU/C, and LRU-WSR. Finally, Section
V concludes the paper.
KEYWORDS: NAND flash memory; Buffer management; Page
replacement; Embedded storages; LRU II. RELATED WORK
Buffer caching which has large influence on the
I. INTRODUCTION performance of I/O execution time is of great importance in
Flash memory has been recently adopted as a storage storage systems. DRAM was assumed as the buffer in this
medium in place of the Hard Disk Drive (HDD) for personal paper. It is at least thousand times faster than NAND Flash
computers and mobile embedded systems. Since Flash [1], full read or write cycle is about 100ns, while MLC
memory has versatile advantages compared with traditional NAND Flash costs 60us for read and 800us for write, as
HDDs, including non- volatility, shock-resistance, small and shown in TABLE I.
lightweight form, low power consumption, and solid state
TABLE I. CHARACTERISTICS OF NAND FLASH MEMORY [1]
reliability, its practical application has grown quite beyond
its original design goals. Hence, flash memory based Solid
State Disks (SSD) are expected to substitute for
conventional HDDs in the foreseeable future.
To enhance the performance of storage systems, a
common approach is to use buffer caching. The buffer is
located between the file system and the storage device, and
can reduce the number of read or write requests issued from
the file system to the storage device. Various replacement
algorithms of buffer caching for traditional HDD have been Under the demand paging model, the objective of the
proposed in the last few decades. These algorithms focus on page replacement algorithm is to select a proper victim page
maximizing the read/write hit ratio because of the high which is least likely to be accessed again, and then make it
execution time cost of read/write operations to hard disks. In free. Therefore, the replacement policy is usually the
the flash memory based storage system, read and write algorithm of great interest. In general, we measure a
operations are significantly asymmetric so the buffer replacement policy according to its hit ratio-the fraction of
replacement algorithm for flash memory has to cater to pages that can be served from the physical memory, and its
these characteristics. overhead which should be kept low through low
In this paper, we propose a novel page replacement computation complexity and minimal memory footprints.
algorithm for flash memory called Adaptive Page Various page replacement algorithms have been
Replacement Algorithm (APRA), which not only maintains proposed for traditional disk-based storage systems. LRU
a high hit ratio for read and write requests, but also can (Least Recently Used) is the most common replacement
adaptively revise itself according to different workloads. algorithm because of its simplicity and acceptable hit ratio
APRA is designed based on CFLRU to deal with the in some scenarios. LRU only considers the recency of each
asymmetric replacement cost of read and write operation for page, that is, keeping pages in the memory in order of last
flash memory. At the same time, APRA is able to track the reference time. It always selects the least recently used page
changing of workloads and dynamically adjust a relevant as a victim page when a free page is needed. Since the
parameter to achieve better performance. Below, we performance of LRU in some cases is too poor to satisfy the

978-0-7695-3930-0/09 $26.00 © 2009 IEEE 11


DOI 10.1109/IFCSTA.2009.9
requirements of certain applications, several advanced III. APRA
replacement algorithms have been proposed, such as LRFU
[2], LIRS [3], and ARC [4].
Existing replacement algorithms for traditional hard
disks are designed to minimize the page miss ratio in that
these algorithms treat the costs of page reads and writes as
equal. In the case when flash memory is used as storage
medium, it is necessary to pay attention to the physical
characteristics of flash memory.
Park et al. [5] proposed a buffer replacement algorithm
called CFLRU (Clean First LRU) for flash memory based
systems. CRLRU maintains the LRU order in the list. A
fixed size window is applied to divide the list into two
regions: working region and clean-first region. The pages in
the LRU list are classified into two types: clean pages and
dirty pages. Clean pages are the ones that have not been
changed during the residency in buffer, while dirty pages
have been modified and need to be written to the flash
memory for keeping data consistency. CFLRU attempts to
choose a clean page within the window as a victim page
because the writing cost is much more expensive. If the
window size reduces to 1, CFLRU changes to the normal
LRU. CFLRU does improve the performance of flash
memory to some extent by reducing the number of write
operations. However, for certain workloads involving
mainly read requests, CFLRU performs worse since there
are clean pages within the window. Under these
circumstances, the read hit ratio of CFLRU is extremely
low. Moreover, a pre-specified size window will contribute
to the performance for some applications, but not for others. Figure 1. APRA page replacement example.
Yoo et al. [6] proposed CFLRU/C, which also selects
We now introduce the adaptive page replacement
the least recently used clean page as the victim within the
algorithm APRA. In APRA, there are two LRU lists both
window. If there is no clean page within the window,
with the size of L. The first list (List 1) maintains a window
CFLRU/C evicts the dirty page with the lowest access
to search clean pages just like CFLRU(/C). However, there
frequency, which is not likely to be referenced again soon.
are two major differences: (a) Whereas the size of window
CFLRU/C increases the write hit ratio by considering the
in CFLRU(/C) is pre-specified, the size of window, w, in
access frequencies of dirty pages. However, CFLRU/C still
APRA can vary from Wmin to Wmax according to the
has the same drawback as CFLRU, because it is difficult to
adaptation rule that will be discussed later. The value of w,
specify an appropriate size for the window.
Jung et al. [7] proposed an enhanced LRU algorithm, which should be carefully chosen in consideration of access
called LRU-WSR (LRU-Write Sequence Reordering). Since patterns for certain applications, indicates the extent to
not-cold pages have been frequently accessed during a short which the clean pages are evicted from the buffer. (b) We
period, they are more likely to be accessed again soon. maintain access frequency for each page whether it is dirty
LRU-WSR delays the eviction of not-cold dirty pages to or clean. If there are several clean pages within the window,
reduce the number of write operations to the flash memory we select the one with least access frequency as victim
by utilizing the Hash-table-based cold-detection algorithm. when a free page is needed for replacement. If there is no
LRU-WSR does not need to maintain a fix-size window to clean page within the window, we still evict the dirty page
search clean pages; nonetheless its hit ratio in some cases with least access frequency. By considering the access
may be lower than that of CFLRU, resulting in more frequency, we incorporate the recency and frequency of
physical page reads and writes. page references to some degree for determining the victim
As mentioned above, CRLRU only considers different page.
I/O costs of read and write operations in the flash memory, The second list (L2) is just used to store information on
while CFLRU/C and LRU-WSR further consider the the pages replaced from the first list (L1). Because L2 does
frequency of page references. However, all three algorithms not store the contents of pages but just maintains the
may show poor performance for certain access patterns metadata of the evicted pages, we call it a ghost list. The
because real-time workloads possess a great deal of reason for maintaining a ghost list is that limited buffer
variation, and static, parameter fixed replacement policy space makes it impossible to store all the referenced pages
will not work well for all of them.

12
but yet practical to keep the metadata of the evicted pages. erase operations than CFLRU(/C). On the other hand,
The reference history of evicted pages will contribute to the APRA inevitably had less read hit counts than other
increase of hit ratios, since APRA exploits it to decide the algorithms. However, more write hits and less erase
access pattern of the current workload. Although dirty pages operations will be much more beneficial for the
evicted from L1 have been already written to the flash performance improvement of flash memory whose write and
memory, we still mark these pages as dirty pages in L2 to erase operations are expensive. Since CFLRU does not
indicate the write operations. consider the access frequency of page references, its write
APRA continually revises the parameter w based on the hit counts is much less than CFLRU/C.
observation of the current workload, since an appropriate
4
x 10
13

value for w will significantly increase the hit ratio. If there


12
is a hit in L2, we should increase or decrease the window
size of L1 depending on whether the state of the hit page is 11

dirty or clean. Hits in L2 indicate evicting too many either

Number of Read Hits


10

clean pages or dirty pages, and therefore APRA has to adapt


and tune w in response to an observed workload. Hence, on
9

a hit on a dirty page in L2, we increase w, and on a hit on a 8

clean page in L2, we decrease w. The magnitude of the 7


revision in w is also very important. The quantities p and q
control the magnitude of revision, where p= (L/w), and q= 6
LRU

[1/(1-w/L)] control the revision rates depending on w. Thus, 5


CFLRU
CFLRU/C
LRU-WSR
the smaller the w is, the larger the increment p will be. 4
APRA

Similarly, the larger the w is, the larger the decrement q will 1 2 4 8
Buffer Size (MB)
16 32

be.
Suppose a large number of I/O requests mainly (a)
involving write operations come from file system, APRA 2.4
5
x 10

will increase the window size to evict more clean pages for
high write hit ratio. Likewise, window size will decrement 2.2

to certain degree to achieve a high read hit ratio. In 2


summary, APRA will never stop adapting and thus track the
Number of Write Hits

changes of access pattern to change the window size 1.8

according to the recent past. 1.6

IV. SIMULATION 1.4

In this section, we present the experimental results for 1.2

various replacement algorithms to evaluate the effectiveness LRU


CFLRU
of the proposed algorithm. We compare APRA with LRU, 1
CFLRU/C
LRU-WSR

CFLRU, CFLRU/C, and LRU-WSR in terms of the number 0.8


1 2 4 8 16
APRA

32
of read and write hits, and erase operations. Buffer Size (MB)
Simulation traces were collected from Windows XP on (b)
NTFS file system using Diskmon for Windows [8]. This 4

tool can record all hard disk activities during running. There
x 10
7
LRU

were two different work tasks for our experiments: Office07 6.9
CFLRU
CFLRU/C

Installation, and Application Loading. Let L denote the size


LRU-WSR
APRA

of the buffer cache, and w*L the size of the window for
6.8
Number of Erase Opreations

CFLRU and CFLRU/C. As mentioned before, the size of 6.7

window has influence on the hit ratios of CFLRU and 6.6


CFLRU/C. In this experiment, we set w = 0.5 for CFLRU
and CFLRU/C, and Wmin = 0.1, and Wmax = 0.9 for APRA. 6.5

For the Office07 Installation work task which tend to be 6.4

write intensive, APRA increases the size of window to keep 6.3


more dirty pages in the buffer which in turn significantly
increases the write hit counts, especially when buffer size is 6.2

larger than 8MB, as shown in Fig. 2. APRA increased the 6.1


1 2 4 8 16 32
number of write hits at most by 8.9% compared with Buffer Size (MB)
CFLRU/C under the work task Office07 Installation. The (c)
size of window in APRA can be adaptively increased to
Figure 2. Office07 Installation.
Wmax = 0.9 that is larger than w = 0.5 in CFLRU(/C), which
explains why APRA has more write hit counts and less

13
10
4
x 10 access pattern of workload and thus also achieve such
higher read hit counts than other algorithms by 14% at most.
9
On the contrary, CFLRU and CFLRU/C performed a little
8
better than APRA on write hit counts, however, since they
attempted to evict all the clean pages in the window no
Number of Read Hits

7
matter how the current workload was like, the number of
6
read hit even lower than that of common LRU, as shown in
Fig. 3 (a).
5

V. CONCLUSION
4

LRU In this paper, we presented a new page replacement


algorithm for flash memory based storage systems, called
CFLRU
3
CFLRU/C
LRU-WSR

2
APRA
APRA, which is workload adaptive and self-tuning. APRA
1 2 4 8 16 32
Buffer Size (MB) maintains two LRU lists with considering the access
(a) frequency of page reference and dynamically change the size
of window of the first list depending on tracking the changes
14000
of the workload. We have empirically demonstrated that
ARPA outperforms other algorithms in terms of the number
12000
of read and write hits of buffer, and erase operations of flash
memory.
Since the physical characteristics of flash memory quite
Number of Write Hits

10000

differ from conventional hard disks, performance


8000
improvement resulting from high buffer hit is not sufficient
for flash memory based storage systems. One of our future
works will focus on the adaptation of APRA to certain
6000

LRU
specific structure based flash memory to improve the overall
4000
CFLRU
CFLRU/C
performance.
LRU-WSR
APRA
2000
1 2 4 8 16 32 REFERENCES
Buffer Size (MB)

(b) [1] Samsun Semiconductor, Inc. Product Selection Guide Memory and
Storage January 2009
9500
LRU
CFLRU [2] Lee, D., J. Choi, et al. (2001). "LRFU: A Spectrum of Policies that
CFLRU/C
LRU-WSR
Subsumes the Least Recently Used and Least Frequently Used
9000
APRA
Policies." IEEE Trans. Comput. 50(12): 1352-1361
[3] Jiang, S. and X. Zhang (2002). LIRS: an efficient low inter-reference
Number of Erase Operations

recency set replacement policy to improve buffer cache performance.


8500 Proceedings of the 2002 ACM SIGMETRICS international
conference on Measurement and modeling of computer systems.
Marina Del Rey, California, ACM.
8000
[4] Megiddo, N. and D. S. Modha (2003). ARC: A Self-Tuning, Low
Overhead Replacement Cache. Proceedings of the 2nd USENIX
Conference on File and Storage Technologies. San Francisco, CA,
7500
USENIX Association.
[5] C. Park, J.-U. Kang, S.-Y. Park, and J.-S. Kim, "Energy-aware
7000 demand paging on NAND flash-based embedded storages," in
1 2 4 8 16 32
Proceedings of the 2004 international symposium on Low power
Buffer Size (MB)
electronics and design. Newport Beach, California, USA: ACM,
(c) 2004.
Figure 3. Application Loading [6] Y.-S. Yoo, H. Lee, Y. Ryu, and H. Bahn, "Page Replacement
Algorithms for NAND Flash Memory Storages," in Computational
Science and Its Applications – ICCSA 2007, 2007, pp. 201-212.
From Fig. 3 we can see that the number of read hit
[7] H. Jung, H. Shim, S. Park, S. Kang, and J. Cha, "LRU-WSR:
counts is several times of write his counts, although the Integration of LRU and writes sequence reordering for flash
write request ratio of Application Loading is approximately memory," IEEE Transactions on Consumer Electronics, vol. 54, pp.
50%. Since the frequent read requests of Application 1215-1223, Aug 2008.
Loading has high locality, keeping more clean pages in the [8] http://technet.microsoft.com/en-us/sysinternals/bb896646.aspx
buffer will increase the read hit counts. APRA maintains a
ghost list and access frequency for each page whether it is
clean or dirty, that explains why APRA was able to track the

14

Potrebbero piacerti anche