Sei sulla pagina 1di 6

Distributed Systems

Principles and Paradigms

Maarten van Steen

VU Amsterdam, Dept. Computer Science


Room R4.20, steen@cs.vu.nl

Chapter 11: Distributed File Systems


Version: December 4, 2011

1 / 17

Contents

Chapter
01: Introduction
02: Architectures
03: Processes
04: Communication
05: Naming
06: Synchronization
07: Consistency & Replication
08: Fault Tolerance
09: Security
10: Distributed Object-Based Systems
11: Distributed File Systems
12: Distributed Web-Based Systems
13: Distributed Coordination-Based Systems

2 / 17 2 / 17

Distributed File Systems 11.1 Architecture Distributed File Systems 11.1 Architecture

Distributed File Systems

General goal
Try to make a file system transparently available to remote clients.

1. File moved to client


Client Server Client Server

Old file

New file

Requests from
client to access File stays 2. Accesses are
3. When client is done,
remote file on server done on client
file is returned to

Remote access model Upload/download model

3 / 17 3 / 17
Distributed File Systems 11.1 Architecture Distributed File Systems 11.1 Architecture

Example: NFS Architecture


NFS
NFS is implemented using the Virtual File System abstraction, which is
now used for lots of different operating systems.

Client Server

System call layer System call layer

Virtual file system Virtual file system


(VFS) layer (VFS) layer

Local file Local file


system interface NFS client NFS server system interface

RPC client RPC server


stub stub

Network

4 / 17 4 / 17

Distributed File Systems 11.1 Architecture Distributed File Systems 11.1 Architecture

Example: NFS Architecture

Essence
VFS provides standard file system interface, and allows to hide
difference between accessing local or remote file system.

Question
Is NFS actually a file system?

5 / 17 5 / 17

Distributed File Systems 11.1 Architecture Distributed File Systems 11.1 Architecture

NFS File Operations


Oper. v3 v4 Description
Create Yes No Create a regular file
Create No Yes Create a nonregular file
Link Yes Yes Create a hard link to a file
Symlink Yes No Create a symbolic link to a file
Mkdir Yes No Create a subdirectory
Mknod Yes No Create a special file
Rename Yes Yes Change the name of a file
Remove Yes Yes Remove a file from a file system
Rmdir Yes No Remove an empty subdirectory
Open No Yes Open a file
Close No Yes Close a file
Lookup Yes Yes Look up a file by means of a name
Readdir Yes Yes Read the entries in a directory
Readlink Yes Yes Read the path name in a symbolic link
Getattr Yes Yes Get the attribute values for a file
Setattr Yes Yes Set one or more file-attribute values
Read Yes Yes Read the data contained in a file
Write Yes Yes Write data to a file
6 / 17 6 / 17
Distributed File Systems 11.1 Architecture Distributed File Systems 11.1 Architecture

Cluster-Based File Systems


Observation
With very large data collections, following a simple client-server
approach is not going to work for speeding up file accesses, apply
striping techniques by which files can be fetched in parallel.

File block of file a File block of file e

a b c d e
a b c d e
a b c d e

Whole-file distribution

a b a b a b
c e c d c d
d e e

File-striped system
7 / 17 7 / 17

Distributed File Systems 11.1 Architecture Distributed File Systems 11.1 Architecture

Example: Google File System


file name, chunk index
GFS client Master
contact address

Instructions Chunk-server state

Chunk ID, range


Chunk server Chunk server Chunk server
Chunk data
Linux file Linux file Linux file
system system system

The Google solution


Divide files in large 64 MB chunks, and distribute/replicate chunks across
many servers:
The master maintains only a (file name, chunk server) table in main
memory minimal I/O
Files are replicated using a primary-backup scheme; the master is kept
out of the loop
8 / 17 8 / 17

Distributed File Systems 11.1 Architecture Distributed File Systems 11.1 Architecture

P2P-based File Systems


Node where a file system is rooted

File system layer Ivy Ivy Ivy

Block-oriented storage DHash DHash DHash

DHT layer Chord Chord Chord

Network

Basic idea
Store data blocks in the underlying P2P system:
Every data block with content D is stored on a node with hash h(D).
Allows for integrity check.
Public-key blocks are signed with associated private key and looked up
with public key.
A local log of file operations to keep track of hblockID, h(D)i pairs.

9 / 17 9 / 17
Distributed File Systems 11.5 Synchronization Distributed File Systems 11.5 Synchronization

File sharing semantics


Client machine #1
Problem
When dealing with distributed file a b
systems, we need to take into account Process
A
the ordering of concurrent read/write a b c

operations and expected semantics 2. Write "c" 1. Read "ab"


(i.e., consistency).
File server
Original file
Single machine a b

a b
Process
A 3. Read gets "ab"
a b c
Client machine #2

Process
a b
B
Process
B
1. Write "c" 2. Read gets "abc"

(a) (b)
10 / 17 10 / 17

Distributed File Systems 11.5 Synchronization Distributed File Systems 11.5 Synchronization

File sharing semantics

Semantics
UNIX semantics: a read operation returns the effect of the last
write operation can only be implemented for remote access
models in which there is only a single copy of the file
Transaction semantics: the file system supports transactions on a
single file issue is how to allow concurrent access to a
physically distributed file
Session semantics: the effects of read and write operations are
seen only by the client that has opened (a local copy) of the file
what happens when a file is closed (only one client may actually
win)

11 / 17 11 / 17

Distributed File Systems 11.5 Synchronization Distributed File Systems 11.5 Synchronization

Example: File sharing in Coda

Essence
Coda assumes transactional semantics, but without the full-fledged
capabilities of real transactions. Note: Transactional issues reappear in
the form of this ordering could have taken place.

Session S A
Client

Open(RD) File f Invalidate


Close
Server

Close
Open(WR) File f

Client

Time
Session S B

12 / 17 12 / 17
Distributed File Systems 11.6 Consistency and Replication Distributed File Systems 11.6 Consistency and Replication

Consistency and replication

Observation
In modern distributed file systems, client-side caching is the preferred
technique for attaining performance; server-side replication is done for fault
tolerance.

Observation
Clients are allowed to keep (large parts of) a file, and will be notified when
control is withdrawn servers are now generally stateful
1. Client asks for file
Client Server
2. Server delegates file
Old file

Local copy 3. Server recalls delegation

Updated file
4. Client sends returns file

13 / 17 13 / 17

Distributed File Systems 11.6 Consistency and Replication Distributed File Systems 11.6 Consistency and Replication

Example: Client-side caching in Coda

Session S A Session SA
Client A
Open(RD) Close Close
Open(RD)
Invalidate
Server File f (callback break) File f

File f OK (no file transfer)

Open(WR)
Open(WR) Close Close
Client B
Time
Session S B Session S B

Note
By making use of transactional semantics, it becomes possible to
further improve performance.

14 / 17 14 / 17

Distributed File Systems 11.6 Consistency and Replication Distributed File Systems 11.6 Consistency and Replication

Example: Server-side replication in Coda

Server Server
S1 S3

Client Broken Client


Server
A network B
S2

Main issue
Ensure that concurrent updates are detected:
Each client has an Accessible Volume Storage Group (AVSG): is a
subset of the actual VSG.
Version vector CVVi (f )[j] = k Si knows that Sj has seen version k of f .
Example: A updates f S1 = S2 = [+1, +1, +0]; B updates
f S3 = [+0, +0, +1].

15 / 17 15 / 17
Distributed File Systems 11.7 Fault Tolerance Distributed File Systems 11.7 Fault Tolerance

High availability in P2P systems

Problem
There are many fully decentralized file-sharing systems, but because
churn is high (i.e., nodes come and go all the time), we may face an
availability problem replicate files all over the place (replication
factor: rrep ).

Alternative
Apply erasure coding:
Partition a file F into m fragments, and recode into a collection F
of n > m fragments
Property: any m fragments from F are sufficient to reconstruct F .
Replication factor: rec = n/m

16 / 17 16 / 17

Distributed File Systems 11.7 Fault Tolerance Distributed File Systems 11.7 Fault Tolerance

Replication vs. erasure coding

Comparison
With an average node availability a, 2.2
rrep
and required file unavailability , we rec 2.0
have for erasure coding:
1.8
rec m  
rec m i
1 = i
a (1 a)rec mi 1.6
i =m
1.4
and for file replication: 0.2 0.4 0.6 0.8 1
Node availability
1 = 1 (1 a)rrep

17 / 17 17 / 17

Potrebbero piacerti anche