What is Cluster? Why is Cluster needed in Redis? This article will take you to learn more about the Cluster cluster and talk about the amount of data that the Cluster cluster can support. I hope it will be helpful to you!

What is a cluster? Why is Cluster needed in Redis?

This article will provide an in-depth dismantling of various aspects of the cluster such as nodes, slot assignment, command execution, re-sharding, steering, failover, and messages. [Related recommendations: Redis Video Tutorial]

Redis 集群原理总览

The purpose is to master what is Cluster? Cluster sharding principle, client positioning data principle, failover, master selection, what scenarios to use Cluster, how to deploy the cluster... [toc]

Why Cluster is needed

65 Brother: Brother Ma, since I used the Sentinel cluster you mentioned to achieve automatic failover, I finally Can I be happy with my girlfriend and not worry about Redis crashing in the middle of the night?

But I recently encountered a troublesome problem. Redis needs to save 8 million key-value pairs, occupying 20 GB of memory.

I used a 32G memory host for deployment, but the Redis response was sometimes very slow. I used the INFO command to check the latest_fork_usec indicator (the most recent fork took time) and found that it was particularly high.

Mainly caused by the Redis RDB persistence mechanism. Redis will Fork the child process to complete the RDB persistence operation. The time taken for the fork execution is positively related to the amount of Redis data.

When Fork is executed, the main thread will be blocked. Due to the large amount of data, the main thread will be blocked for too long, so the Redis response appears to be slow.

65 Brother: With the expansion of business scale, the amount of data is getting larger and larger. It is difficult to expand the hardware of a single instance when upgrading the master-slave architecture, and saving large amounts of data will cause slow response. Is there any way to solve it?

To save large amounts of data, in addition to using large memory hosts, we can also use slicing clusters. As the saying goes, "What everyone adds makes the flames brighter." One machine cannot save all the data, so multiple machines must share the data.

Using Redis Cluster cluster mainly solves various slow problems caused by large data storage, and also facilitates horizontal expansion.

The two solutions correspond to the two expansion solutions for the increase in Redis data: Vertical expansion (scale up) and horizontal expansion (scale out).

Vertical expansion: Upgrade the hardware configuration of a single Redis, such as increasing memory capacity, disk capacity, and using a more powerful CPU.
Horizontal expansion: Increase the number of Redis instances horizontally, and each node is responsible for a part of the data.

For example, if you need a server resource with 24 GB of memory and 150 GB of disk, there are two options:

What is a cluster? Why is Cluster needed in Redis?

##In the case of millions of When the scale of users reaches tens of millions, the horizontally scalable Redis slicing cluster will be a very good choice.

65 Brother: What are the advantages and disadvantages of these two options?

Horizontal expansion facilitates expansion without worrying about the hardware and cost limitations of a single instance. However, slicing clusters will involve distributed management issues of multiple instances.
It is necessary to solve how to reasonably distribute data to different instances, and at the same time, allow clients to correctly access the data on the instances.

What is Cluster cluster

Redis cluster is a distributed database solution. The cluster manages data through sharding (a practice of "divide and conquer thinking" ), and provides replication and failover capabilities.

Divide the data into 16384 slots, and each node is responsible for a part of the slots. Slot information is stored in each node.

It is decentralized. As shown in the figure, the cluster consists of three Redis nodes. Each node is responsible for a part of the data of the entire cluster. The amount of data that each node is responsible for may be different.

Redis 集群架构

Three nodes are connected to each other to form a peer-to-peer cluster. They exchange cluster information with each other through the Gossip protocol. Finally, each node saves the slots allocation of other nodes.

Opening Message

Technology is not omnipotent, and programmers are not the most powerful. You must understand it clearly and don't think "I am the best in the world." Once we have this awareness, it may delay our growth.

Technology is to solve problems. If a technology cannot solve problems, then this technology is worthless.

Don't show off your skills, it's meaningless.

Cluster installation

Click-> "Redis 6.X Cluster cluster construction"View

one Redis cluster usually consists of multiple nodes. At the beginning, each node is independent of each other. They are in a cluster that only contains themselves. To build a truly working cluster, we must Independent nodes are connected to form a cluster containing multiple nodes.

The work of connecting each node can be completed through the CLUSTER MEET command: CLUSTER MEET <ip> <port></port></ip>.

Send the CLUSTER MEET command to a node node to allow the node node to handshake with the node specified by ip and port. When the handshake is successful, the node node will transfer the ip The node specified by and port is added to the cluster where the node node currently resides.

CLUSTER MEET

It's like the node node said: "Hey, brother with ip = xx, port = xx, would you like to join the "Code Byte" technology group? Just join the cluster. I have found a way to grow as a great person. Follow the "Ma Ge Byte" public account and reply "Join the group". If you are a brother, come with me! "

For the detailed steps of Redis Cluster cluster construction, please click "Read the original text" in the lower left corner of the article or click -> "Redis 6.X Cluster Check out Cluster Building. For official details about Redis Cluster, please see: redis.io/topics/clus…

Cluster implementation principle

65 Brother: After data slicing, the data needs to be distributed on different instances. How to correspond between data and instances?

Starting from Redis 3.0, the official Redis Cluster solution is provided to implement slicing clusters, which implements the rules of data and instances. The Redis Cluster solution uses Hash Slot (I will just call it Slot next) to handle the mapping relationship between data and instances.

Follow "Code Byte" to enter the journey of exploring the principles of Cluster implementation...

Divide the data into multiple copies and store them on different instances

The entire database of the cluster It is divided into 16384 slots. Each key in the database belongs to one of these 16384 slots. Each node in the cluster can handle 0 or up to 16384 slots.

The mapping process between Key and hash slot can be divided into two major steps:

According to the key of the key-value pair, use the CRC16 algorithm to calculate a 16-bit value ;
Take the 16-bit value modulo 16384, and obtain a number from 0 to 16383 representing the hash slot corresponding to the key.

Cluster also allows users to force a key to hang in a specific slot. By embedding a tag tag in the key string, this can force the key to hang in a slot equal to the tag. The slot in which it is located.

Hash slot and Redis instance mapping

65 Brother: How is the hash slot mapped to the Redis instance?

In the deployment cluster example created through cluster create, Redis will automatically distribute the 16384 hash slots evenly across the cluster instances, such as N nodes , the number of hash slots on each node = 16384 / N.

In addition, you can use the CLUSTER MEET command to connect the three nodes 7000, 7001, and 7002 to a cluster. However, the cluster is still offline because the three instances are No hash slots are processed.

You can use the cluster addslots command to specify the number of hash slots on each instance.

65 Brother: Why do you need to manually formulate it?

It’s more work for those who can do it. The configurations of the Redis instances added to the cluster are different. If they bear the same pressure, it will be too difficult for garbage machines. Let the powerful machines support more.

For a cluster of three instances, allocate hash slots to each instance through the following instructions: Instance 1 is responsible for 0 ~ 5460 hash slots, Instance 2 is responsible for 5461~10922 hash slots, Instance 3 is responsible for 10923~16383 hash slots.

redis-cli -h 172.16.19.1 –p 6379 cluster addslots 0,5460
redis-cli -h 172.16.19.2 –p 6379 cluster addslots 5461,10922
redis-cli -h 172.16.19.3 –p 6379 cluster addslots 10923,16383

Copy after login

The mapping relationship between key-value pair data, hash slots, and Redis instances is as follows:

What is a cluster? Why is Cluster needed in Redis?

The key of the Redis key-value pair "Code Brother Zi" After the CRC16 calculation, the total number of hash slots 16394 is taken modulo, and the modulus results are mapped to instance 1 and instance 2 respectively.

Remember, When all 16384 slots are fully allocated, the Redis cluster can work normally.

Replication and Failover

65 Brother: How does the Redis cluster achieve high availability? Are Master and Slave still separated from reading and writing?

Master is used to process slots, and Slave nodes synchronize master node data through the "Redis master-slave architecture data synchronization" method.

When the Master goes offline, the Slave continues to process requests on behalf of the master node. There is no read-write separation between the master and slave nodes, and the Slave is only used as a high-availability backup for Master failure.

Redis Cluster can set up several slave nodes for each master node. When a single master node fails, the cluster will automatically promote one of the slave nodes to the master node.

If a master node has no slave nodes, then when it fails, the cluster will be completely unavailable.

However, Redis also provides a parameter cluster-require-full-coverage that allows some nodes to fail, and other nodes can continue to provide external access.

For example, if the master node 7000 goes down, 7003, which is a slave, becomes the master node and continues to provide services. When offline node 7000 comes back online, it will become the slave node of current 70003.

Fault Detection

65 Brother: In "Redis High Availability: Sentinel Cluster Principle" I know that Sentinel automatically switches the main database and notifies through monitoring. The client implements automatic failover. How does Cluster implement automatic failover?

Just because a node thinks that a node is out of contact does not mean that all nodes think it is out of contact. Only when most of the nodes responsible for processing slots determine that a node is offline, will the cluster consider that the node needs to perform a master-slave switch.

Redis cluster nodes use the Gossip protocol to broadcast their own status and changes in their knowledge of the entire cluster. For example, if a node discovers that a certain node is lost (PFail), it will broadcast this information to the entire cluster, and other nodes can also receive this lost connection information.

About the Gossip protocol, you can read an article by Brother Wukong: "Virus invasion, all depends on distribution"

If a node receives When the number of lost connections (PFail Count) of a certain node has reached the majority of the cluster, the node can be marked as confirmed to be offline (Fail), and then broadcast to the entire cluster, forcing other nodes to also receive the offline status of the node. fact, and immediately perform a master-slave switch on the lost node.

Failover

When a Slave finds that its master node has entered the offline state, the slave node will begin to fail over the offline master node.

Select a node from the offline Master and node Slave node list to become the new master node.
The new master node will revoke all slot assignments to the offline master node and assign these slots to itself.
The new master node broadcasts a PONG message to the cluster. This PONG message allows other nodes in the cluster to immediately know that this node has changed from a slave node to a master node, and this The master node has taken over the slot that was originally handled by the offline node.
The new master node begins to receive command requests related to the processing slot, and the failover is completed.

Master election process

65 Brother: How is the new master node elected?

The configuration epoch 1 of the cluster is a self-timer counter with an initial value of 0 and will be set to 1 every time a failover is performed.
The slave node that detects that the master node is offline broadcasts a CLUSTERMSG_TYPE_FAILOVER_AUTH_REQUEST message to the cluster, requiring all master nodes that receive this message and have voting rights to This slave node votes.
This master node has not yet voted for other slave nodes, then the master node will return a CLUSTERMSG_TYPE_FAILOVER_AUTH_ACK message to the slave node that requires voting, indicating that this master node supports the slave node Become the new master node.
All slave nodes participating in the election will receive the CLUSTERMSG_TYPE_FAILOVER_AUTH_ACK message. If the collected votes >= (N/2) 1 support, then the slave node will be Elect as the new master node.
If no slave node can collect enough support votes in a configuration epoch, the cluster will enter a new configuration epoch and conduct elections again until a new master is elected. to the node.

Similar to Sentinel, both are implemented based on the Raft algorithm. The process is as shown in the figure:

What is a cluster? Why is Cluster needed in Redis?

Use table Is it feasible to save the association between key-value pairs and instances?

65 Brother, let me test you: "The Redis Cluster solution allocates key-value pairs to different instances through hash slots. This process requires CRC calculation on the key of the key-value pair and mapping the modulo total number of hash slots to the instance. If a table is used to directly record the correspondence between the key-value pair and the instance (for example, key-value pair 1 is in instance 2 (key-value pair 2 is on instance 1), so there is no need to calculate the corresponding relationship between key and hash slot, just look up the table. Why doesn't Redis do this?"

If a global table record is used, if the relationship between key-value pairs and instances changes (re-sharding, instance increase or decrease), the table needs to be modified. If it is a single-threaded operation, all operations must be serialized, and the performance will be too slow.

Multi-threading involves locking. In addition, if the amount of key-value pair data is very large, the storage space required to save the table data of the relationship between key-value pairs and instances will also be very large.

As for hash slot calculation, although the relationship between hash slots and instance time must be recorded, the number of hash slots is much smaller, only 16384, and the overhead is very small.

How does the client locate the instance where the data is located?

65 Brother: How does the client determine which instance the accessed data is distributed on?

The Redis instance will send its hash slot information to other instances in the cluster through the Gossip protocol, realizing the diffusion of hash slot allocation information.

In this way, each instance in the cluster has mapping relationship information between all hash slots and instances.

When slicing data, the key is calculated as a value through CRC16 and then modulo 16384 to obtain the corresponding Slot. This calculation task can be performed on the client when sending a request.

However, after locating the slot, you need to further locate the Redis instance where the Slot is located.

When the client connects to any instance, the instance responds to the client with the mapping relationship between the hash slot and the instance, and the client caches the hash slot and instance mapping information locally.

When the client makes a request, the hash slot corresponding to the key will be calculated, the hash slot instance mapping information in the local cache is used to locate the instance where the data is located, and then the request is sent to the corresponding instance.

Redis 客户端定位数据所在节点

Redistribute hash slots

#65 Brother: The mapping relationship between hash slots and instances is re-distributed due to new instances or load balancing. What should I do if the distribution changes?

集群中的实例通过 Gossip 协议互相传递消息获取最新的哈希槽分配信息，但是，客户端无法感知。

Redis Cluster 提供了重定向机制：客户端将请求发送到实例上，这个实例没有相应的数据，该 Redis 实例会告诉客户端将请求发送到其他的实例上。

65 哥：Redis 如何告知客户端重定向访问新实例呢？

分为两种情况：MOVED 错误、ASK 错误。

MOVED 错误

MOVED 错误（负载均衡，数据已经迁移到其他实例上）：当客户端将一个键值对操作请求发送给某个实例，而这个键所在的槽并非由自己负责的时候，该实例会返回一个 MOVED 错误指引转向正在负责该槽的节点。

GET 公众号:码哥字节
(error) MOVED 16330 172.17.18.2:6379

Copy after login

该响应表示客户端请求的键值对所在的哈希槽 16330 迁移到了 172.17.18.2 这个实例上，端口是 6379。这样客户端就与 172.17.18.2:6379 建立连接，并发送 GET 请求。

同时，客户端还会更新本地缓存，将该 slot 与 Redis 实例对应关系更新正确。

MOVED 指令

ASK 错误

65 哥：如果某个 slot 的数据比较多，部分迁移到新实例，还有一部分没有迁移咋办？

如果请求的 key 在当前节点找到就直接执行命令，否则时候就需要 ASK 错误响应了，槽部分迁移未完成的情况下，如果需要访问的 key 所在 Slot 正在从从实例 1 迁移到实例 2，实例 1 会返回客户端一条 ASK 报错信息：客户端请求的 key 所在的哈希槽正在迁移到实例 2 上，你先给实例 2 发送一个 ASKING 命令，接着发发送操作命令。

GET 公众号:码哥字节
(error) ASK 16330 172.17.18.2:6379

Copy after login

比如客户端请求定位到 key = 「公众号:码哥字节」的槽 16330 在实例 172.17.18.1 上，节点 1 如果找得到就直接执行命令，否则响应 ASK 错误信息，并指引客户端转向正在迁移的目标节点 172.17.18.2。

ASK 错误

注意：ASK 错误指令并不会更新客户端缓存的哈希槽分配信息。

所以客户端再次请求 Slot 16330 的数据，还是会先给 172.17.18.1 实例发送请求，只不过节点会响应 ASK 命令让客户端给新实例发送一次请求。

MOVED指令则更新客户端本地缓存，让后续指令都发往新实例。

集群可以设置多大？

65 哥：有了 Redis Cluster，再也不怕大数据量了，我可以无限水平拓展么？

答案是否定的，Redis 官方给的 Redis Cluster 的规模上线是 1000 个实例。

65 哥：到底是什么限制了集群规模呢？

关键在于实例间的通信开销，Cluster 集群中的每个实例都保存所有哈希槽与实例对应关系信息（Slot 映射到节点的表），以及自身的状态信息。

在集群之间每个实例通过 Gossip协议传播节点的数据，Gossip 协议工作原理大概如下：

从集群中随机选择一些实例按照一定的频率发送 PING 消息发送给挑选出来的实例，用于检测实例状态以及交换彼此的信息。 PING 消息中封装了发送者自身的状态信息、部分其他实例的状态信息、Slot 与实例映射表信息。
实例接收到 PING 消息后，响应 PONG 消息，消息包含的信息跟 PING 消息一样。

集群之间通过 Gossip协议可以在一段时间之后每个实例都能获取其他所有实例的状态信息。

所以在有新节点加入，节点故障，Slot 映射变更都可以通过 PING，PONG 的消息传播完成集群状态在每个实例的传播同步。

Gossip 消息

发送的消息结构是 clusterMsgDataGossip结构体组成：

typedef struct {
    char nodename[CLUSTER_NAMELEN];  //40字节
    uint32_t ping_sent; //4字节
    uint32_t pong_received; //4字节
    char ip[NET_IP_STR_LEN]; //46字节
    uint16_t port;  //2字节
    uint16_t cport;  //2字节
    uint16_t flags;  //2字节
    uint32_t notused1; //4字节
} clusterMsgDataGossip;

Copy after login

所以每个实例发送一个 Gossip消息，就需要发送 104 字节。如果集群是 1000 个实例，那么每个实例发送一个 PING 消息则会占用大约 10KB。

除此之外，实例间在传播 Slot 映射表的时候，每个消息还包含了一个长度为 16384 bit 的 Bitmap。

Each bit corresponds to a Slot. If the value = 1, it means that this Slot belongs to the current instance. This Bitmap occupies 2KB, so a PING message is about 12KB.

PONG is the same as the PING message. The sum of the two messages sent and returned is 24 KB. As the cluster size increases, more and more heartbeat messages will occupy the network communication bandwidth of the cluster and reduce the cluster throughput.

Instance communication frequency

65 Brother: Brother Ma, the frequency of sending PING messages will also affect the cluster bandwidth, right?

After the Redis Cluster instance is started, 5 instances will be randomly selected from the local instance list every second by default, and then one of the 5 instances will be found that has not received a PING message for the longest time. instance, and send the PING message to the instance.

65 Brother: Randomly select 5, but there is no guarantee that the selected instance is the instance that has not received PING communication for the longest time in the entire cluster. Some instances may not have received the message, resulting in the cluster they maintain. The information has expired a long time ago, what should I do?

This is a good question. Redis Cluster instances will scan the local instance list every 100 ms. When an instance is found, the time when it last received a PONG message> cluster-node-timeout/2. Then immediately send the PING message to this instance to update the cluster status information of this node.

When the cluster size increases, it will further increase the network communication delay between instances. May cause more PING messages to be sent frequently.

Reduce the communication overhead between instances

Each instance sends a PING message every second. Reducing this frequency may result in the status information of each instance of the cluster being Unable to spread in time.
Check the instance every 100 msPONGWhether the message reception exceeds cluster-node-timeout / 2, this is the default periodic detection task frequency of the Redis instance, we do not Will be easily modified.

So, you can only modify the value of cluster-node-timeout: the heartbeat time in the cluster to determine whether an instance is faulty, the default is 15 S.

So, In order to avoid too many heartbeat messages occupying the cluster bandwidth, adjust cluster-node-timeout to 20 seconds or 30 seconds, so PONG The situation of message reception timeout will be alleviated.

However, it cannot be set too large. Otherwise, the instance will fail, but you have to wait for cluster-node-timeout to detect this failure, which will affect the normal service of the cluster.

Summary

The Sentinel cluster implements automatic failover, but when the amount of data is too large, it takes too long to generate RDB. When Fork is executed, the main thread will be blocked. Due to the large amount of data, the main thread will be blocked for too long, so Redis responds slowly.
Using Redis Cluster cluster mainly solves various slowness problems caused by large data storage, and also facilitates horizontal expansion. When facing millions or tens of millions of users, horizontally scalable Redis slicing clusters will be a very good choice.
The entire database of the cluster is divided into 16384 slots. Each key in the database belongs to one of these 16384 slots. Each node in the cluster can process 0 or Maximum 16384 slots.
Redis cluster nodes use the Gossip protocol to broadcast their own status and changes in their knowledge of the entire cluster.
After the client connects to any instance in the cluster, the instance will send the hash slot and instance mapping information to the client, and the client will save the information and use it to locate the key to the corresponding node.
The cluster cannot be increased infinitely. Since the cluster propagates cluster instance information through the Gossip protocol, the communication frequency is the main reason for limiting the cluster size. This can be mainly done by modifying the cluster-node- timeoutAdjust frequency.

For more programming-related knowledge, please visit: Programming Video! !

The above is the detailed content of What is a cluster? Why is Cluster needed in Redis?. For more information, please follow other related articles on the PHP Chinese website!