A consensus algorithm that distributed systems must know: Raft-AI-php.cn

1. Overview of Raft

Raft algorithmis the first choice for distributed system development Consensus algorithm. For example, Etcd and Consul are popular now.

If you master this algorithm, you can easily handle the fault tolerance and of most scenarios. #ConsistencyRequirements. For example, distributed configuration systems, distributed NoSQL storage, etc. can easily break through the single-machine limitations of the system. The Raft algorithm achieves consensus on a series of values and consistency of logs of each node through all methods based on the leader.

2. Raft Role

2.1 Role

Follower ：

Ordinary people

, silently receive messages from the leader. When the leader’s heartbeat message times out, they will take the initiative to stand up and recommend themselves as candidates. Candidate (Candidate):

Candidate

will request voting RPC messages from other nodes to notify other nodes to vote if they win the majority Vote and be promoted to leader. Leader:

Overbearing President

, everything is subject to me. Process write requests, manage log replication, and continuously send heartbeat information to inform other nodes that "I am the leader, I am still alive, and you don't want to" initiate a new election without having to find a new leader to replace me. As shown in the figure below, three types of figures are used to represent followers, candidates and leaders. Role

3. Single-node system

3.1 Database server

Now let’s imagine, There is a single-node system. This node serves as a database server and stores a value of X. Database Server

3.2 Client

The solid green circle on the left is the client, and the solid blue circle on the right is Node a (Node a ). Term represents the term of office, which will be discussed later.

Client

A consensus algorithm that distributed systems must know: Raft

3.3 The client sends data to the server

The client sends data to the single-node server An update operation sets the value stored in the database to 8. In a stand-alone environment (single server node), the value the client gets from the server is also 8. Consistency is very easy to ensure.

The client sends data to the server

A consensus algorithm that distributed systems must know: Raft

3.4 How to ensure consistency among multiple nodes?

But if there are multiple server nodes, how to ensure consistency? For example, there are three nodes: a, b, c. As shown below. These three nodes form a database cluster. When the client performs update operations on these three nodes, how to ensure that the values stored in the three nodes are consistent? This is a distributed consistency issue. The Raft algorithm is here to solve this problem. Of course, there are other protocols that can also guarantee this. This article only focuses on the Raft algorithm.

In a multi-node cluster, how does the Raft algorithm ensure that there is only one leader in the cluster at the same time under abnormal circumstances such as node failure or partition error? Let’s begin to explain the process of leader election by the Raft algorithm.

4. Leadership election process

4.1 Initial state

In the initial state, all nodes in the cluster are followers status.

As shown in the figure below, there are three nodes (Node) a, b, and c, and the term (Term) is 0.

Initial state

4.2 Become a candidate

Raft algorithm implements the characteristics of random timeout, each time The timeout interval for each node to wait for heartbeat information from the leader node is random. For example, the waiting timeout interval of node A is 150 ms, that of node B is 200 ms, and that of node C is 300 ms. Then a times out first. First, it times out because it does not wait for the leader's heartbeat information. As shown in the figure below, the timeout timers for the three nodes start running.

Timeout time

When the timeout time of node A expires, node A becomes a candidate and increases its term number. The Term value is updated from 0 to 1. And cast a vote for myself.

Node A: Term = 1, Vote Count = 1.

Node B: Term = 0.

Node C: Term = 0.

Become a candidate

4.3 Vote

Let’s see how a candidate can become a leader of.

Leader election

Step 1: After node A becomes a candidate, send a request to vote RPC to other nodes message and ask them to elect themselves as leaders.

Step 2: After node B and node C receive the voting request information sent by node A, they will vote without voting during the term numbered 1. The vote is cast to node A and its term number is increased.

The third step: Node A received 3 votes, got the votes of the majority of nodes, and became the new leader during this term from the candidate.

Step 4: Node A, as the leader, sends heartbeat information to node B and node C at fixed intervals, telling node B and C that I am the leader and organizes others to follow. to initiate a new election.

Step 5: Node B and node C send response information to node A, telling node A that I am normal.

4.4 Term

The English word is term, and leaders have terms of office.

Automatic increase: After the follower times out waiting for the leader's heartbeat information, it recommends itself as a candidate and increases its term number. As shown in the figure above, the term of node A is 0. When nominating yourself as a candidate, the term number increases to 1.

Update to a larger value: When a node finds that its term number is smaller than that of other nodes, it will update to a larger number value. For example, node A's term is 1 and requests a vote. The voting message contains node A's term number, and the number is 1. After node B receives the message, it will update its term number to 1.

Revert to follower: If a candidate or leader finds that its term number is smaller than that of other nodes, it will immediately revert to follower status. This scenario occurs after partition error recovery. If the leader with term 3 receives a heartbeat message with term 4, the leader will immediately return to the follower state.

Rejection message: If a node receives a request for a smaller term number value, it will directly reject the request. For example, node A with term number 6 receives the term number. For node B's request to vote for 5 RPC messages, node A will reject the message.

4.5 Election Rules

During a term, the leader will always be the leader until there is a problem (such as downtime) or the network problem (delay), other nodes initiate a new round of elections.

In an election, each server node will cast at most one vote for a term number, and it will be gone after the vote is completed.

4.6 Most

Assuming a cluster consists of N nodes, then the majority is at least N/2 1. For example: for a cluster of 3 nodes, most are 2.

4.7 Heartbeat Timeout

In order to prevent multiple nodes from initiating voting at the same time, each node will be assigned a random election timeout. During this time, the node cannot become a candidate and can only wait until timeout. For example, in the above example, node A times out first and becomes a candidate first. With this clever design, in most cases only one server node initiates the election first instead of initiating the election at the same time, which reduces the number of election failures due to vote division.

Become a candidate

5. Leader failure

If the leader node fails, it will Trigger a new round of elections. As shown in the figure below, if leader node A fails, node B and node C will re-elect the leader.

Leader Failure

Step 1 : Node A failure , Node B And node C does not receive the heartbeat information from the leader node A, and the wait times out.

Second step: Node C times out first, and node C becomes the candidate.

Step 3: Node C initiates a request for voting information to node A and node B.

Step 4: Node C responded to the vote and voted for C, but node A was unable to respond to C’s voting request because of a failure.

Step 5: Node C receives two votes (majority of votes) and becomes the leader.

Step 6: Node C sends heartbeat information to nodes A and B. Node B responds to the heartbeat information. Node A does not respond to the heartbeat information because A is faulty.

Summary

The Raft algorithm uses the following methods to conduct leadership elections, ensuring that there is only one leader in one term, greatly reducing the chance of election failure. Condition.

Term

Leader heartbeat information

Random election timeout

First come, first served voting principle

Big Majority Vote Principle

This article uses animated graphics to explain how the Raft algorithm elects leaders, making it easier to understand and digest.

The above is the detailed content of A consensus algorithm that distributed systems must know: Raft. For more information, please follow other related articles on the PHP Chinese website!