Quantcast
Channel: keeping simple
Viewing all articles
Browse latest Browse all 195

More on Fischer, Lynch, Patterson and the parrot theorem.

$
0
0

I’m thinking about distributed consensus algorithms, timestamping, and databases, and am repeatedly seeing references to the Fischer, Lynch, Paterson “theorem”. Here is the problem statement

The problem is for all the data manager processes that have participated in the processing of a particular transaction to agree on whether to install the transaction’s results in the database or to discard them. The latter action might be necessary, for example, if some data managers were, for any reason, unable to carry out the required transaction processing. Whatever decision is made, all data managers must make the same decision in order to preserve the consistency of the database.

We have a set of data manager processes that must come to a consensus about where to commit or to discard. The problem statement requires that ALL of the processes must agree on a result. The processes communicate in some way – they can send each other information or request information from each other.  Implicitly, if a process fails, the others can ignore its opinion. But there is no way for any process to detect whether another process has failed or whether it is just thinking about its response to a request – there is no upper bound on how long a non-failed process can take to send a response (maybe it’s just very busy working on some other problem).  A single failed process is then an insurmountable problem because it is impossible to tell whether it is dead or just resting.

In this paper, we show the surprising result that no completely asynchronous consensus protocol can tolerate even a single unannounced process death. We do not consider Byzantine failures, and we assume that the message system is reliable: it delivers all messages correctly and exactly once. Nevertheless, even with these assumptions, the stopping of a single process at an inopportune time can cause any distributed commit protocol to fail to reach agreement.

I cannot imagine what was ever surprising about this result. The problem statement says: you cannot distinguish between Dead and Resting. And “surprising result” is that – you cannot distinguish between Dead and Resting. Surely there is something more here?

We also assume that processes do not have access to synchronized clocks, so algorithms based on time-outs, for example, cannot be used. (In particular, the solutions in [6] are not applicable.) Finally, we do not postulate the ability to detect the death of a process, so it is impossible for one process to tell whether another has died (stopped entirely) or is just running very slowly.

There is no way to tell if a process is Dead or if it is Resting (running very slowly). Therefore, no algorithm can determine the set of live processes. There is, then,  no algorithm for obtaining a consensus among all live processes. QED. Maybe someone can enlighten me on what I missed.

 

 


Viewing all articles
Browse latest Browse all 195

Trending Articles