What are the two implementation solutions for Redis high availability?-Redis-php.cn

In order to achieve High Availability (HA), the following two methods are used in Redis:

Master-slave replication data.
Use sentinels to monitor the operation of data nodes. Once a problem occurs on the master node, the service will continue on the top of the slave node.

Master-slave replication

In Redis, data replication between master and slave nodes can be divided into full replication and partial replication.

Implementation of the full copy function of the old version

Full copy is implemented using the snyc command. The process is:

From The server sends sync commands to the main server.
After receiving the sync command, the master server calls the bgsave command to generate the ***rdb file, and synchronizes this file to the slave server. In this way, after the slave server loads the rdb file, The status will be the same as when the main server executes the bgsave command.
The master server synchronizes the write commands saved in the command buffer to the slave server, and the slave server executes these commands, so that the status of the slave server is consistent with the current status of the master server.

The biggest problem with the full copy function of the old version is that when the slave server is disconnected and reconnected, even if there is already some data on the slave server, full copy is still required. The efficiency of this is very low, so the new version of Redis has made improvements in this part.

Implementation of the full copy function in the new version

The latest version of Redis uses the psync command to replace the sync command. The psync command can not only achieve full synchronization, but also partial synchronization. .

Copy offset

Both parties performing replication, the master and slave servers, will maintain a replication offset respectively:

Every time the master server synchronizes N bytes of data to the slave server, it will modify its own replication offset N.
Every time the slave server synchronizes N bytes of data from the master server, it will modify its own replication offset N.

Copy backlog buffer

The main server internally maintains a fixed-length first-in-first-out queue as the replication backlog buffer, which defaults to Size is 1MB.

When the master server performs command propagation, it will not only synchronize the write command to the slave server, but also write the write command to the replication backlog buffer.

Server running ID

Each Redis server has its running ID. The running ID is automatically generated by the server when it starts. The main server will send its own running ID Sent to the slave server, and the slave server will save the running ID of the master server.

When synchronizing after the slave server Redis is disconnected and reconnected, the synchronization progress is judged based on the running ID:

If the master server running ID is saved on the slave server If it is consistent with the running ID of the current main server, it is considered that the main server that was disconnected and reconnected this time is the previously replicated main server, and the main server can continue to try partial synchronization operations.
Otherwise, if the two main server running IDs are different, the full synchronization process is considered to be completed.

psync command process

With the previous preparations, let’s start analyzing the psync command process:

If the slave server has not replicated any master server before, or the slaveof no one command has been executed before, the slave server will send the psync? -1 command to the master server to request the master server to fully synchronize the data.
Otherwise, if the slave server has previously synchronized some data, the slave server sends the pync command to the master server, where runid is the last master server’s Run id, offset is the current replication offset from the server.

After the main server receives the psync command in the first two cases, the following three possibilities will occur:

The main server returns fullresync reply, indicating that the master server requires complete data synchronization with the slave server. The current running ID of the main server is runid, and the replication offset is offset.
If the master server responds with continue, it means that the master server and the slave server are performing partial data synchronization operations, and the missing data from the slave server can be synchronized.
If the main server responds with -err, it means that the main server version is lower than 2.8 and cannot recognize the psync command. At this time, the slave server will send the sync command to the main server and execute the complete full amount of data. Synchronize.

Overview of the sentinel mechanism

Redis uses the sentinel mechanism to achieve high availability (HA). Its general working principle is:

Redis uses a set of sentinel nodes to monitor the availability of master-slave redis services.
Once it is discovered that the Redis master node has failed, a sentinel node will be elected as the leader.
Sentinel*** then selects a Redis node from the remaining slave Redis nodes as the new primary Redis node to serve external parties.

The above divides Redis nodes into two categories:

Sentinel node (sentinel): Responsible for monitoring the operation of the node.
Data node: the Redis node that normally serves client requests, divided into master and slave.

The above is the general process. This process needs to solve the following problems:

How to monitor Redis data nodes?
How to determine if a Redis data node is invalid?
How to select a sentinel *** node?
What is the basis for the sentinel node to select the new primary Redis node?

Let’s answer these questions one by one.

Three monitoring tasks

The sentinel node monitors the service availability of the Redis data node through three scheduled monitoring tasks.

info command

Every 10 seconds, each sentinel node will send the info command to the master and slave Redis data nodes to obtain new topology information.

Redis topology information includes:

The role of this node: master or slave.
The address and port information of the master and slave nodes.

In this way, the sentinel node can automatically obtain the slave node information from the info command, so the slave node information added later can be automatically sensed without explicit configuration.

Synchronize information to sentinel:hello channel

Every 2 seconds, each sentinel node will synchronize itself to the __sentinel__:hello channel of the Redis data node to obtain The master node information and the current sentinel node information. Since other sentinel nodes have also subscribed to this channel, this operation can actually exchange information about the master node and sentinel nodes between sentinel nodes.

This operation actually accomplishes two things: * Discovering a new sentinel node: If a new sentinel node joins, the information of the new sentinel node is saved at this time, and a connection is subsequently established with the sentinel node. . Rewrite it in the following way: * Exchange the status information of the master node so that we can objectively determine whether the master node is offline later.

Perform heartbeat detection on data nodes

Every 1 second, each sentinel node sends a ping command to the master and slave data nodes and other sentinel nodes for heartbeat detection. This heartbeat detection is the basis for subsequent subjective judgments that the data node is offline.

Subjective offline and objective offline

Subjective offline

The third of the above three monitoring tasks Detecting heartbeat tasks, if no valid reply is received after the configured down-after-milliseconds, the data node is considered "subjectively offline (sdown)".

Why is it called "subjective offline"? Because in a distributed system, there are multiple machines working together, various situations may occur in the network. The judgment of one node alone is not enough to consider that a data node is offline. This requires the subsequent "objective offline" process. ".

Objective offline

When a sentinel node thinks that the master node is subjectively offline, the sentinel node needs to pass the "sentinel is-master-down-by addr" command Ask other sentinel nodes whether the master node is offline. If more than half of the sentinel nodes answer that they are offline, the master node is considered to be "objectively offline".

Electing Sentinel***

When the master node objectively goes offline, a sentinel node needs to be elected as the sentinel*** to complete the subsequent election of a new one. The work of the master node.

The general idea of this election is:

Each sentinel node applies to become a sentinel node by sending the "sentinel is-master-down-by addr" command to other sentinel nodes. sentinel***.
When each sentinel node receives a "sentinel is-master-down-by addr" command, it is only allowed to vote for the *** node. This command of other nodes will be rejected.
If a sentinel node receives more than half of the approval votes, it becomes a sentinel***.
If no sentinel*** is selected in the first three steps within a certain period of time, the next election will start again.

As you can see, the process of electing a *** is very similar to the process of electing a leader in raft.

Select a new master node

Among the remaining Redis slave nodes, select a new master node in the following order:

Filter out "unhealthy" data nodes: such as slave nodes that are subjectively offline or disconnected, nodes that have not responded to the sentinel node ping command within five seconds, and slave nodes that have lost contact with the master node.
If there is a slave node with slave-priority ***, return the node; otherwise, continue to execute the subsequent process.
Select the slave node with copy offset ***, which means that the data on this slave node is the most complete. If it exists, it will return if it does not exist and continue with the subsequent process.
At this point, the status of all remaining slave nodes is the same, select the slave node with the smallest runid.

Promote the new master node

After selecting the new master node, a *** process is required to make the node a new one Master node:

Sentinel***Issue the "slaveof no one" command to the slave node selected in the previous step to make this node the master node.
Sentinel*** sends commands to the remaining slave nodes to make them slave nodes of the new master node.
The sentinel node set will update the original master node to the slave node, and when it recovers, it will be ordered to copy the data of the new master node.

The above is the detailed content of What are the two implementation solutions for Redis high availability?. For more information, please follow other related articles on the PHP Chinese website!