Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Caching
How UNIX Optimizes File System
Performance and Presents Data to
User Processes Using a Virtual File
System
V-NODE Layer
Hierarchical naming;
Locking;
Quotas;
Object (file) creation and deletion, read and write, changes in space allocation:
For local data files, these functions refer to v-node refers to UNIX-specific structure
called i-node (index node) that has all necessary information to access the actual data
store.
File data is made available to applications via a pre-allocated main memory region - the
buffer cache.
The file systems transfers data between the buffer cache and disk in granularity of disk
blocks.
The data is explicitly copied from/to buffer cache to/from the user application address space
(process).
A file (or a portion thereof) is mapped into a contiguous region of the process virtual memory.
Advantages:
reduce copying
no need for a pre-allocated buffer cache in the main memory
Disadvantages:
less or no control over the actual disk writing: the file data becomes volatile
A mapped area must fit the virtual address space
Read/Write Mapping
Kernel
Main Memory
File C
File A
File B
Buffer Cache
Kernel
Buffer Cache
Buf
ptr
File C
1324
3172
// 1324=1024+300
// 724+1024+100=1848
Kernel
Buffer Cache
Buf
ptr
File C
1324
3172
Unallocat
ed
region
// 1324=1024+300
// 724+1024+100=1848
All disk I/O goes through the buffer cache. Both data and metadata (e.g., i-node, directories)
are cached using LRU replacement
Dirty (modified) marker to indicate whether write-back is needed for data blocks.
Advantages:
- Hiding disk access the user program. Block size, memory alignment, memory allocation in
multiples of the block size, etc
- Disk blocks are cached
- Block aggregation for small transfers (locality)
- Block re-use across processes
- Transient data might be never written to disk
Disadvantages:
- Extra copying: Disk->buffer cache->user space
- Vulnerability to failures
- Does not care about the user data blocks
- Control data blocks (metadata) are the real problem
INODES, pointer blocks, directories can be in cache when a failure occurs
As a result the file system internal state might be corrupted
fsck required, resulting in long (re-)boot times
File system data consists of file control data (metadata), user data
Failures can cause data loss and corruption for cached metadata or user data
Power failure during the sector write may corrupt physically the data stored in the sector
Lost or corruption of the metadata might lead to a more massive user data loss.
File systems must care about the metadata more than about the user data
The Operating System cares about the file system data (e.g. metadata)
Users must care about their data themselves (e.g., backups)
Solutions:
write-through: writes bypass cache
write-back: dirty blocks are written asynchronously [bracket processes]
Metadata writes are based on write-through policy. Updates are written to disk immediately
bypassing cache.
Problem:
- Some data is not written in-place. Can go back to the last consistent version
- Some data is replicated like UNIX superblock.
- File system goes through consistency check/repair cycle at the boot time as specified in
/etc/fstab options (see manpage on fsck, fstab).
- Write-through negatively affects performance
Solution: maintain a sequential log of metadata updates, a Journal: e.g. IBMs Journal File
System (JFS) in AIX
A cursor (pointer) is maintained. The cursor is advanced once the updated blocks associated
with the transaction are written to disk (hardened). Hardened transaction records can be
deleted from the journal.
Upon recovery: Re-do all the operations starting from the last cursor position.
Advantages:
Asynchronous metadata write
Fast recovery: depends on the Journal size and not on the file-system size
Disadvantages
extra write
space wasted by journal (insignificant)