What is the principle of Sentinel failover in Redis?-Redis-php.cn

What is Sentinel?

Sentinel is a high-availability solution for redis. The master-slave replication we talked about earlier is the basis of high availability, but pure master-slave replication requires manual intervention to complete. Failover, Sentinel can solve this problem. In the case of master-slave replication, when the master node fails, Sentinel can automatically detect the failure and complete the failover to achieve true redis high availability. In the sentinel cluster, sentinel will monitor the status of all redis servers and other sentinel nodes, detect failures in time and complete transfer, thereby ensuring the high availability of redis.

Building a Sentinel Cluster

Although Sentinel is essentially a Redis service, it provides different functions from ordinary Redis services. Sentinel is a distributed architecture, because if you want to ensure the high availability of redis, you first need to ensure your own high availability, so if we need to build a sentinel, we need to deploy at least three instances, preferably an odd number, because in subsequent failover There will be voting involved.

We can download the sentinel configuration file under the redis GitHub project. There is a file called sentinel.conf under the project. You can use it as our sentinel configuration template. Of course, you can also use redis.conf Configuration file, just add the sentinel related configuration.

There are not many configuration items related to Sentinel. There are mainly the following configuration items:

// 端口号，默认是 redis 实例+20000，所以我们沿用这个规则就好了 port 26379  // 是否守护进程运行 daemonize yes // 日志存放的位置，这个非常重要，通过日志可以查看故障转移的过程 logfile "26379.log"  // 监视一个名为 mymaster（自定义） 的 redis 主服务器， 这个主服务器的 IP 地址为 127.0.0.1 ， 端口号为 6379 ， // 最后面的 2 代表着至少有两个哨兵认为主服务器出现故障才会进行故障转移，否则认定主服务未失效 sentinel monitor mymaster 127.0.0.1 6379 2  // 哨兵判断服务器失效的响应时间，超过这个时间未接收到服务器的回应，就认为该服务器失效了 sentinel down-after-milliseconds mymaster 30000  // 完成故障转移之后，最多多少个从服务器可以同时发起数据复制，数字越小，说明完成全部从服务数据复制的时间越长 // 数字越大，对主服务器的压力就变大了 sentinel parallel-syncs mymaster 1  // 故障转移超时时间 sentinel failover-timeout mymaster 180000

Copy after login

Except for the different port and logfile configurations for each Sentinel instance, other configuration items are the same. . After modifying the configuration, we can use the ./redis-sentinel sentinel.conf command to start the sentinel. The command is similar to the redis instance startup. Because the sentinel is also a redis instance, we can use the ./redis-cli -p 26379 info sentinel command to view it. The current sentinel information is as shown in the figure below:

What is the principle of Sentinel failover in Redis?

Sentinel information

Question: How to discover the slave server and other servers when only the master server is configured Sentinel ?

Slave server discovery, Sentinel can obtain slave server information by asking the master server. For discovery of other Sentinel nodes, it is implemented through the publish and subscribe function, and is achieved by sending information to the channel sentinel:hello There are mainly two steps:

1. Each Sentinel will send a message to the sentinel:hello channel of all master services and slave servers through the publish and subscribe function every 2 seconds. The message contains Sentinel IP address, port number and running ID (runid)

2. Each Sentinel subscribes to the sentinel:hello channel of all master servers and slave servers monitored by it, and looks for sentinels that have not appeared before ( looking for unknown sentinels). When a Sentinel discovers a new Sentinel, it adds the new Sentinel to a list that holds all other Sentinels known to the Sentinel and monitoring the same primary server

Sentinel failover principle

Failover is the main job of Sentinel. The implementation logic behind it is also very complicated. Please check the relevant books for the specific implementation logic. I have summarized the following three points about Sentinel's failover:

1. Listening server

Each Sentinel node sends a ping command to the master node, slave node, and other Sentinel nodes every 1 second for heartbeat detection to determine the status of the server.

The node will also respond accordingly to Sentinel. Among these replies, the following three replies are valid replies:

Return PONG
Return-LOADING
Return-MASTERDOWN

If the node is in master-down-after-milliseconds set in the sentinel configuration file Within the value of the option, if there is no valid reply even once, then Sentinel will mark the server as offline. We call this kind of offline as subjective offline, which means that only this sentinel thinks that the server is offline.

If the server that is subjectively offline is the main server, in order to confirm whether the main server is really offline, the Sentinel will ask other Sentinels that are also monitoring the main server to see if they also think that The main server enters the offline state. When enough Sentinels believe that the main server is offline, the Sentinel will judge the main server as objectively offline. This is truly offline, and will perform a failover operation on it.

2. Elect Sentinel nodes to complete the transfer task

Failure transfer is not completed by all sentinels together, but by electing a sentinel node as the leader to complete this Therefore, when the main server is marked as objectively offline, the sentinels will elect a leader through the Raft algorithm to complete the failover work. The general rules and methods are as follows. Redis conducts sentinel leader election

All online sentinels are eligible to be elected as leaders, which means that every sentinel has the opportunity to become a leader.
When sentinel marks the master server as subjectively offline, it will send the sentinel is-master-down-by-addr command to other Sentinel nodes, requesting to set itself as the leader
The Sentinel node that receives the command adopts the first-come-first-served rule. If it has not agreed to the sentinel is-master-down-by-addr command of other Sentinel nodes, it will agree to the request, otherwise it will be rejected
If the Sentinel node finds that it has more than half of the votes, it will become the leader
If no one is elected within the specified time sentinel leader, then it will be re-elected after a period of time until the sentinel leader is elected.

3. Elect the new master server to complete the failover

The elected sentinel leader will complete the remaining failover work and failover There are mainly three steps:

(1) Select a new master server

Select a slave among all the slave servers of the offline master server. server and convert it to the master server. The rules for selecting a new master server are as follows:

Among the slave servers under the failed master server, those that are marked as subjective offline, The slave servers that have been disconnected or the last reply to the PING command is more than five seconds will be eliminated.
Among the slave servers under the failed master server, those slave servers that are related to the failed master server The slave servers whose connection is disconnected for more than ten times the time specified by the down-after option will be eliminated
After the above two rounds of elimination, the remaining slave servers will be selected. The slave server with the largest replication offset becomes the new master server; if the replication offset is unavailable, or the replication offsets of the slave servers are the same, then the slave server with the smallest running ID becomes the new master server. The master server

executes the slaveof no one command on the selected slave server to make it the master node.

(2) Modify the replication targets of other slave servers

When the new master server appears, the next step the sentinel leader needs to do is to let other slave servers The server replicates the new master server by sending the slaveof new_master port command to other slave servers. The replication rules are related to the parallel-syncs parameter of the configuration file

(3) Change the old master server to As a slave server

The last thing to do in the failover operation is to set the offline master server as the slave server of the new master service, keep an eye on it, and command it to go after it recovers. Copy the new master node.

The above is the detailed content of What is the principle of Sentinel failover in Redis?. For more information, please follow other related articles on the PHP Chinese website!