Sei sulla pagina 1di 20

Byzantine fault-tolerance

Overview
Models
Synchronous vs. asynchronous systems
Byzantine failure model
Secure storage with self-certifying data
Byzantine quorums
Byzantine state machines

Models
Synchronous system: bounded message
delays (implies reliable network!)
Asynchronous system: message delays are
unbounded

In practice (Internet): reasonable to assume
that network failures are eventually fixed
(weak synchrony assumption).
Data and services (state machines) can be
replicated on a set of nodes R.
Each node in R has iid probability of failing
Can specifiy bound f on the number of
nodes that can fail simultaneously
Byzantine failures
no assumption about nature of fault
failed nodes can behave in arbitrary ways
may act as intelligent opposition
(compromised node), with full knowledge
of the protocols
failed nodes may combine

Byzantine quorums
Data is not self-certifying (multiple writers
without shared keys)
Idea: replicate data on sufficient number of
replicas (relative to f ) to be able to rely on
majority vote
Byzantine quorums: r/w variable
Representative problem: implement a
read/write variable

Assuming no concurrent reads, writes for now
Assuming trusted clients, for now
Byzantine quorums: r/w variable
How many replicas do we need?
clearly, need at least 2f+1, so we have a majority
of good nodes
write(x): send x to all replicas, wait for
acknowledgments (must get at least f+1)
read(x): request x from all replicas, wait for
responses, take majority vote (if no concurrent
writes, must get f+1 identical votes!)
R
W
Byzantine quorums: r/w variable
Does this work? Yes, but only if
system is synchronous (bounded msg delay)
faulty nodes cannot forge messages
(messages are authenticated!)


Byzantine quorums: r/w variable
Now, assume
Weak synchrony (network failures are fixed
eventually)
messages are authenticated (e.g., signed
with senders private key)
Byzantine quorums: r/w variable
Lets try 3f+1 replicas (known lower bound)
write(x): send x to all replicas, wait for 2f+1
responses (must have at least f+1 good replicas
with correct value)
read(x): request x from all replicas, wait for 2f+1
responses, take majority vote (if no concurrent
writes, must get f+1 identical votes!? no, it is
possible that the f nodes that did not respond were
good nodes!)

R
W
Byzantine quorums: r/w variable
Lets try 4f+1 replicas
write(x): send x to all replicas, wait for 3f+1
responses (must have at least 2f+1 good replicas
with correct value)
read(x): request x from all replicas, wait for 3f+1
responses, take majority vote (if no concurrent
writes, must get f+1 identical votes!? no, it is
possible that the f faulty nodes vote with the good
nodes that have an old value of x!)

R
W
Byzantine quorums: r/w variable
Lets try 5f+1 replicas
write(x): send x to all replicas, wait for 4f+1
responses (must have at least 3f+1 good replicas
with correct value)
read(x): request x from all replicas, wait for 4f+1
responses, take majority vote (if no concurrent
writes, must get f+1 identical votes!)
Actually, can use only 5f replicas if data is written
with monotonically increasing timestamps

W
R
Byzantine quorums: r/w variable
Still rely on trusted clients
Malicious client could send different values to
replicas, or send value to less than a full quorum
To fix this, need a byzantine agreement protocols
among the replicas

Still dont handle concurrent accesses
Still dont handle group changes

Byzantine state machine
Byzantine Fault Tolerance:
Can implement any service that behaves like a
deterministic state machine
Can tolerate malicious clients
Safe with concurrent requests
Requires 3f+1 replicas
5 rounds of messages
Byzantine state machine
Clients send requests to one replica
Correct replicas execute all requests in same order
Atomic multicast protocol among replicas ensures
that all replicas receive and execute all requests in
the same order
Since all replicas start in same state, correct
replicas produce identical result
Client waits for f+1 identical results from different
replicas
BFT protocol
BFT: Protocol overview
Client c sends m = <REQUEST,o,t,c>
c
to the
primary. (o=operation,t=monotonic timestamp)
Primary p assigns seq# n to m and sends <PRE-
PREPARE,v,n,m>
p
to other replicas. (v=current
view, i.e., replica set)
If replica i accepts the message, it sends
<PREPARE,v,n,d,i>
i
to other replicas. (d is hash
of the request). Signals that i agrees to assign n to
m in v.









BFT: Protocol overview
Once replica i has a pre-prepare and 2f+1
matching prepare messages, it sends
<COMMIT,v,n,d,i>
i
to other replicas. At this
point, correct replicas agree on an order of
requests within a view.
Once replica i has 2f+1 matching prepare and
commit messages, it executes m, then sends
<REPLY,v,t,c,i,r>
i
to the client. (The need for
this last step has to do with view changes.)









BFT
More complexity related to view changes and
garbage collection of message logs
Public-key crypto signatures are bottleneck: a
variation of the protocol uses symmetric crypto
(MACs) to provide authenticated channels. (Not
easy: MACs are less powerful: cant prove
authenticity to a third party!)

Potrebbero piacerti anche