Distributed Consensus Algorithms
The concept of consensus is an engrossing topic in a decentralized or distributed network. Consensus means a procedure to arrive at a common agreement in a decentralized or distributed multi-agent platform. In a conventional distributed system, we apply consensus to ensure reliability which ensures correct execution in the presence of faulty individuals and fault tolerance. In addition to this, a distributed consensus mechanism should satisfy certain properties such as termination, validation, integrity and agreement. An example of consensus is state machine replication, which is a key aspect of any distributed consensus protocol. For example, if we want to run some kind of distributed protocol over a network, every individual entity runs the current protocol and they store the state of the protocol in different state machines. So the entire execution part of the protocol can be represented as a state machine. Now this state machine needs to be replicated to multiple entities so that every individual entity can reach a common output of the protocol. Achieving consensus can be easy and straightforward for certain architectures under certain scenarios. The scenarios could be either that the entire system is faultless or that there is not be any failure in the system, so that every entity can receive the message [19] correctly or the system behaves in a synchronous way, i.e., it is expected that you will receive all messages within some predefined time interval.
However, achieving consensus can be non-trivial in the case of a distributed environment [ 13] due to the presence of multiple types of failure. Typically in a distributed system, we consider three different types of failure:
- 1) Node Failure: A node suddenly crashes or becomes unavailable in the middle of communication. Therefore we are not expected to receive any message from that particular node. This can be a hardware or software fault.
- 2) Partitioned Faults: This type of fault occurs whenever links fail, which results in partition in the network. This can hamper reaching of consensus.
- 3) Byzantine Faults: This kind of fault is more difficult to handle in a distributed environment. Here the entity starts behaving maliciously. In both the above faults, we can expect an effect on the network, but in this case, it is difficult to guess because it completely depends on how maliciously the entity is behaving and what the entity is doing.
The correctness of a distributed consensus protocol can be characterized by the following two properties; safety and liveliness. Safety ensures that one will never converge to an incorrect state. And liveliness ensures every correct value must be accepted eventually.
Consensus In Permissionless Blockchain System
Conventional consensus mechanisms will not be applicable in an open or permissionless blockchain system. Therefore the following section shows how consensus has been achieved in bitcoin-like open environments along with their shortcomings.
Bitcoin Consensus
The main purpose of consensus in bitcoin is to add a new block to the existing blockchain. There can be multiple miners in the bitcoin network and these miners can propose their new blocks based on the transactions they have heard of. It is not necessary that every miner propose the same block. Generally the miners include those transactions in the new block that they have heard of since the last time a block has been added. The miners also need to ensure that size of the newly proposed block does not exceed a certain threshold. We should focus on the following two observations before designing the algorithm:
Any valid block (a block with all valid transactions) can be accepted even if it is proposed by only one miner.
The entire protocol can work in rounds. Broadcast the accepted block to the peers and collect the next set of transactions.
Based on the above two observations, we can design solutions so that every miner independently tries to solve a problem and the block is accepted for the miner who can prove first that the challenge has been solved.
Proof of Work (PoW)
The idea of PoW came in the year 1992 from Dwork and Naor to combat junk emails where we have to do some work to send a valid email. The attacker this way will be discouraged from sending junk emails because in that case they have to do some work of some complexity in order to forward the junk emails that does not prove beneficial for them. A blockchain-based PoW system must have certain features such as asymmetry: The task must be relatively hard but feasible for the service requester; the task must be easy to verify for the service provider. In this way the service requester will get discouraged from carrying out the work but the service provider can easily check the validity of the work because of the asymmetric nature of the work. Bitcoin-based PoW systems extend the hashcash-based PoW system and develop a methodology to protect the blockchain by applying the distributed consensus mechanism. The hashcash system was proposed by Adam Back and uses the puzzle-friendliness property of the cryptographic hash function. Now coming to bitcoin-based PoW, miners who take part in the consensus process need to give a proof that they have done some work before proposing a new block. The attacker will be discouraged from proposing a new block or making change in existing blocks. Because in that case they have to do the entire work of the blockchain which is computationally difficult in a generic environment. Every miner will try to find out a nonce value which will satisfy a certain hash equation. Most implementations of bitcoin PoW use a double SHA256 hash function. In the default set up, all miners wait for around ten minutes and look for all transactions which have taken place within that duration. Then they start the mining process. As we have seen, the probability of getting a PoW is very low; hence it will never happen that any miner will be able to control the bitcoin network exclusively. In case any attacker wants to make some changes in one block, they have to do more work compared to the collective work of all the blocks in the longest chain, which is computationally difficult, though not impossible, thus making an attack difficult with current hardware and hence the bitcoin system is tamper-proof. These tamperproof characteristics of PoW also ensure that double spending does not happen in the case of a blockchain network. Apart from this, PoW-based systems suffer from a number of security attacks such as sybil attack, DOS attack, etc. In the case of a sybil attack, the attacker tries to fill the network with clients under its control. This helps the attacker to control the entire network. These clients work according to the instructions of the attacker which compromises the PoW mechanism. In DOS, the malicious node will send a lot of data to a particular node or nodes which will hamper the normal functioning of the bitcoin transactions. Another major flaw in a bitcoin system is the monopoly problem. This happens when a particular miner gains control of the network by deploying huge servers for the mining process. So it may happen that this particular miner will gradually generate a huge number of blocks and hence the miner can control the entire flow of transactions. As a result, other nodes will get discouraged from joining as miners and only a few miners with large computing resources [10] will control the network.
Proof of Stake (PoS)
The PoS mechanism is an improved version of PoW to reduce the excessive electricity consumption of PoW-based systems. As a substitute to the computationally expensive PoW mechanism, PoS aims to stake nodes’ economic share in the network. Like PoW, a node (miner) is selected to add a block to the blockchain. But here the miner selection procedure is different from PoW. In PoS, the selection of a miner is proportional to the amount of bitcoin it holds. Block finality in the PoS-based system is faster compared to PoW blockchain, as computationally difficult puzzle solving is not within PoS. The PoS-based algorithm developed by Ethereum is known as Casper. The PoS algorithm pseudo-randomly chooses miners to create blocks of the blockchain, so that no miner can predict its turn in advance. It also solves the monopoly problem of PoW-based consensus. Naive PoS algorithms are prone to an attack known as Nothing-at-Stake, and thus require some improvements in order to provide safety.
Consensus in Permissioned Blockchain System
Permissioned blockchain consensus is achieved through the help of a smart contract, which is basically an extension of bitcoin scripts. Smart contracts are self-executing software programs in which the terms and conditions among the negotiating parties are written down. Generally, the concept of state machine replication mechanism is used to ensure consensus in the permissioned blockchain environment because of the following reasons such as the network being closed and the nodes knowing each other, so state replication is possible among the known nodes. And it avoids the overhead of mining.
Paxos
The first ever consensus algorithm, Paxos, was proposed by Lamport with the objective of choosing a single value under crash or network fault. The idea behind Paxos is very simple: All the nodes in the network have been categorized into three types; namely the proposers, the acceptors and the learners. Everyone is a learner in the network that learns the consensus value. The proposer initially prepares a proposal with a proposal number known as a prepare message and sends it to the acceptors. This proposal number forms a timeline and the biggest number is considered up to date. Each acceptor compares received proposal number with current known values for each proposer’s proposal message. If it gets a higher number, then it accepts the proposal or otherwise declines it. Then the acceptor prepares a response message of the following form: Prepare response where the proposal number is the biggest number the acceptor has seen and accepted values are the already accepted values from another proposer. Next a vote is taken based on the majority decision. The proposer checks whether the majority of the acceptors have rejected the proposal. If yes, then the proposer updates it with the latest proposal number. If no, then the proposer further checks whether the majority of the acceptors have already accepted values. If yes then the proposer’s value cannot be selected or otherwise it sends an accept message. Finally, the proposer sends an accept message with the following format to all the acceptors. Whenever the acceptor accepts a value, it informs the learner nodes about it so that everyone will learn about the accepted value. If more than (Nonce/2 - 1) acceptors fail, then no proposer will get enough reply messages to form a conclusion and we cannot reach consensus. Even though the concept behind Paxos is very simple to understand, the theoretical proof is very complicated. Real-life application of this may need to go for a sequence of selections which is known as a multi-Paxos system. Multi-Paxos systems need a repeated application of Paxos which increases the system complexity. This is because the number of messages exchanged between the nodes also increases. There is also a chance of livelock or starvation in Paxos-based systems.