Quantcast
Channel: keeping simple
Viewing all articles
Browse latest Browse all 195

Making Paxos face facts

$
0
0

The   Paxos Made Simple paper is notoriously hard to understand but at least part of the difficulty is that the algorithm  changes radically in the middle of the presentation.  The first part of the paper presents a subtle, confusing, method that is supposed to permit multiple processes (or network sites) to concurrently propose values to be considered for consensus.  Then, with little warning, a second algorithm limited to a single proposer or leader is substituted.

The presentation in the paper involves a set  of processes (or network sites) some of which are proposers, some acceptors, and some learners.  The proposers propose values to the acceptors.  When the acceptors agree on a proposed value, that value becomes the “chosen value”. The goal is to define an algorithm so learners can get reliable answers when they ask for the chosen value. That is, learners should only learn about chosen values that were actually agreed on by the acceptors and proposed by proposers and that value should be unique. But the algorithm that is  presented first is liable to what we used to call livelock even in the absence of failure:

It’s easy to construct a scenario in which two proposers each keep issuing a sequence of proposals with increasing numbers, none of which are ever chosen.

It’s argued that this original algorithm works in the sense that it never can reach an inconsistent state, but the livelock scenario means it can easily fail to progress even without failure or loss of messages. To avoid this scenario, on page seven of an eleven page paper, the algorithm is replaced by one where the set of  proposers has a single member – a distinguished proposer.

To guarantee progress, a distinguished proposer must be selected as the only one to try issuing proposals

Then, on page nine, when going over how Paxos can be used to build replicated state machines, Lamport writes:

In normal operation, a single server is elected to be the leader, which acts as the distinguished proposer (the only one that tries to issue proposals) in all instances of the consensus algorithm

Since much of the complexity of the first part of the paper involves describing how Paxos safely resolves contention between competing proposers, the modification is  significant.  The problem of reliable consensus with a single proposer is radically simpler and doesn’t need the clever algorithm described in the first part of the paper.  But one thing that makes Paxos so hard to understand is that the specification of Paxos is not fixed. 

The original problem statement and a brute force commit solution

The Made Simple paper begins with a problem statement:

Assume a collection of processes that can propose values. A consensus algorithm ensures that a single one among the proposed values is chosen. If no value is proposed, then no value should be chosen. If a value has been chosen, then processes should be able to learn the chosen value. The safety requirements for consensus are:

  • Only a value that has been proposed may be chosen,
  • Only a single value is chosen, and
  • A process never learns that a value has been chosen unless it actually has been

What, exactly do the bolded words mean?  “Chosen” and “Learn” are the two hard ones. “Proposed” is pretty clear: a process sends messages to all the other processes that says ” I , process A, propose value V with supporting documentation x“.

A proposer sends a proposed value to a set of acceptors. An acceptor may accept the proposed value. The value is chosen when a large enough set of acceptors have accepted it.

Proposal is a simple action: the proposer sends a proposal message.

There are the usual assumptions: messages may be lost or duplicated or delivered out of order, but are neither corrupted or spurious. That is: if a process receives a message from a second process, the second process must have previously transmitted that message.   A process can propose a value, but that proposal may never arrive at any of the other process or it may arrive at all of them or some of them. It makes sense that an accept operation is the transmit of an accept message by some process  that received the proposal message.  Presumably then, the value is chosen when the original proposer receives accept messages from a large enough set of the Acceptor processes  – a majority of the processes. Learning is also simple:

To learn that a value has been chosen, a learner must find out that a proposal has been accepted by a majority of acceptors.

Suppose that there is a distinguished proposer process D. Suppose that process sends value to all Acceptors and marks the value as “chosen” if it receives an accept response from a majority of Acceptors. Acceptors only send accept messages to the distinguished proposer if they receive a proposal from the distinguished proposer.  Let Learners ask the distinguished proposer for chosen values. To make the learning process more efficient, the distinguished proposer can notify Acceptors that the value has been chosen and they can answer Learners too. All properties are now satisfied:

  • Only a value that has been proposed may be chosen: only D proposes so all it has to do is to select a single value.
  • Only a single value is chosen: follows from the first.
  • A process never learns that a value has been chosen unless it actually has been: the process D knows reliably and can inform users and Acceptors that have received notification from D can also inform Learners reliably.

If there are no failures, there is nothing else to discuss. If processes can fail, there is still not much to discuss. As long as the distinguished proposer doesn’t fail, everything works. If the distinguished proposer and some  of the Acceptors fail before any Acceptor has accepted, a poll of Acceptors will reveal no accepted value – and we know that no Learner has been falsely told there is some consensus value. If some Learner was told about a chosen value and the leader and some minority of Acceptors fail,  since a majority of Acceptors must have accepted and a minority have failed, there is at least one Acceptor that knows the accepted value and it can be copied to all surviving Acceptors. There cannot be two different accepted values. If  no Learner was told about an accepted value, either no value was chosen or one was and nobody was informed. In either case, we can just copy an accepted value if any of the Acceptors has one, or start from a blank slate otherwise. No harm no foul.

Notice, we have not had to worry about reliable store. All we need is the absence of spurious or corrupted messages and the survival of at least 1/2 of the Acceptors. If we need a sequence of values, the distinguished proposer can just rerun the same process as needed and incorporate a sequence number in the values. The distinguished proposer is, of course, the single point of failure but an election mechanism can select new proposers.

(number of minor edits for clarity )


Viewing all articles
Browse latest Browse all 195

Trending Articles