What are the strange data types and cluster knowledge of redis?-Redis-php.cn

Various data types

The string type is simple and convenient, and supports space pre-allocation, that is, more space will be allocated each time , so that if the string becomes longer next time, there is no need to apply for additional space. Of course, the premise is that the remaining space is enough.

The List type can implement a simple message queue, but please note that message loss may occur. It does not support ACK mode.

The Hash table is a bit like a relational database, but when the hash table becomes larger and larger, please be careful to avoid using statements such as hgetall, because requesting a large amount of data will cause redis to block, so that subsequent brothers We'll have to wait.

set The set type can help you do some statistics. For example, if you want to count active users on a certain day, you can directly throw the user ID into the set. The set supports some cool operations, such as sdiff, which can obtain the data between sets. Difference set, sunion can obtain the union between sets, and has many functions, but you must be cautious, because the awesome functions come at a price. These operations require some CPU and IO resources and may cause blocking, so when using large sets You should be careful when using random operations.

zset can be said to be the brightest star. It can be sorted. Because it can be sorted, there are many application scenarios, such as the top xx users who liked it, delayed queues, etc. .

bitmap The advantage of bitmap is to save space, especially when doing some statistics, such as counting how many users have signed in on a certain day and whether a certain user has signed in. If you don't use bitmap, you You might think of using set.

SADD day 1234//签到就添加到集合
SISMEMBER day 1234//判断1234是否签到
SCARD day   //有多少个签到的

Copy after login

set can be functionally satisfied, but compared to bitmap, set consumes more storage space. The bottom layer of set is mainly composed of integer collection or hashtable. Integer collection can only be used when the amount of data is very small. It can only be used, usually less than 512 elements, and the elements must all be integers. For sets, the data of integer sets are more compact, and they are continuous in memory. The query can only be binary search, and the time complexity is It is O(logN), but hashtable is different. The hashtable here is the same as the hash in the five major data types of redis, except that there is no value. The value points to null, and there is no conflict because it is a set. , but issues related to rehash need to be considered. Ok, it’s a bit far. When we talk about the user sign-in problem, when there are a lot of users, a hashtable will definitely be used for set. In the case of hashtable, in fact, each element is a dictEntry structure

typedef struct dictEntry {
    // 键
    void *key;
    // 值
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
    } v;
    // 指向下个哈希表节点，形成链表
    struct dictEntry *next;
} dictEntry;

Copy after login

From What can we see in this structure? First of all, although the values union (no value) and next (no conflict) are empty, the structure itself requires space and a key. This occupied space is real, and if you use a bitmap, one bit is enough. It represents a number and saves space. Let’s take a look at how to set up and count bitmaps.

SETBIT day 1234 1//签到
GETBIT day 1234//判断1234是否签到
BITCOUNT day//有多少个签到的

Copy after login

bf This is the Bloom filter RedisBloom supported after redis4.0, but the corresponding module needs to be loaded separately. Of course, we can also implement our own Bloom filter based on the above bitmap, but since redis It is already supported. RedisBloom can reduce our development time. I won’t go into details here about what the Bloom filter does. Let’s take a look at the related usage of RedisBloom.

# 可以通过docker的方式快速拉取镜像来玩耍
docker run -p 6379:6379 --name redis-redisbloom redislabs/rebloom:latest
docker exec -it redis-redisbloom bash
redis-cli
# 相关操作
bf.reserve sign 0.001 10000
bf.add sign 99 //99这个用户加入
bf.add exists 99//判断99这个用户是否存在

Copy after login

Because Bloom filters have misjudgments, all bf supports custom misjudgment rates. 0.001 represents the misjudgment rate, and 10000 represents the number of elements that the Bloom filter can store. When actually storing When the number of elements exceeds this value, the false positive rate will increase.

HyperLogLog can be used for statistics. Its advantage is that it takes up very little storage space. It only requires 12KB of memory to count 2^64 elements. So what does it mainly count? In fact, it is mainly about cardinality statistics, such as UV. Functionally speaking, UV can be stored using set or hash, but the disadvantage is that it consumes storage and can easily become a large key. If you want to save space, bitmap can also be used, 12KB The spatial bitmap can only count 12*1024*8=98304 elements, while HyperLogLog can count 2^64 elements. However, such a powerful technology actually has errors. HyperLogLog counts based on probability, and the standard error calculation The rate is 0.81%. In scenarios where massive data is counted and accuracy requirements are not so high, HyperLogLog is still very good at saving space.

PFADD uv 1 2 3 //1 2 3是活跃用户
PFCOUNT uv //统计

Copy after login

GEO 是可以应用在地理位置的业务上，比如微信附近的人或者附近的车辆等等，先来看一下如果没有GEO 这种数据结构，你如何知道你附近的人？首先得上报自己的地理位置信息吧，比如经度 116.397128，纬度 39.916527，此时可以用 string、hash 数据类型存储，但是如果要查找你附近的人，string 和 hash 这种就无能为例了，你不可能每次都要遍历全部的数据来判断，这样太耗时了，当然你也不可能通过 zset 这种数据结构来把经纬度信息当成权重，但是如果我们能把经纬度信息通过某种方式转换成一个数字，然后当成权重好像也可以，这时我们只需通过zrangebyscore key v1 v2也可以找到附近的人。真的需要这么麻烦吗？于是 GEO 出现了，GEO 转换经纬度为数字的方法是“二分区间，区间编码”，这是什么意思呢？以经度为例，它的范围是[-180,180]，如果要采用3位编码值，那么就是需要二分3次，二分后落在左边的用0表示，右边的用1表示，以经度是121.48941 来说，第一次是在[0,180]这个区间，因此记1，第二次是在[90,180]，因此再记1，第三次是在[90,135]，因此记0。纬度也是同样的逻辑，假设此时对应的纬度编码后是010，最后把经纬度合并在一起，需要注意的是经度的每个值在偶数位，纬度的每个值在奇数位。

1 1 0   //经度
 0 1 0  //纬度
------------
101100 //经纬度对应的数值

Copy after login

原理是这样，我们再来看看 redis 如何使用 GEO：

GEOADD location 112.123456 41.112345 99 //上报用户99的地理位置信息
GEORADIUS location  112.123456 41.112345 1 km ASC COUNT 10 //获取附近1KM的人

Copy after login

搞懂集群

生产环境用单实例 redis 的应该比较少，单实例的风险在于：

单点故障即服务故障，没有backup
单实例压力大，又要提供读，又要提供写

于是我们首先想到的就是经典的主从模式，而且往往是一主多从，这是因为大部分应用都是读多写少的情况，我们的主负责更新，从负责提供读，就算我们的主宕机了，我们也可以选择一个从来充当主，这样整个应用依然可以提供服务。

复制过程的细节

当一个 redis 实例首次成为某个主的从的时候，这时主得把数据发给它，也就是 rdb 文件，这个过程 master 是要 fork 一个子进程来处理的，这个子进程会执行 bgsave 把当前的数据重新保存一下，然后准备发给新来的从，bgsave 的本质是读取当前内存中的数据然后保存到 rdb 文件中，这个过程涉及大量的 IO，如果直接在主进程中来处理的话，大概率会阻塞正常的请求，因此使用个子进程是个明智的选择。

那 fork 的子进程在 bgsave 过程中如果有新的变更请求会怎么办？

严格来说子进程出来的一瞬间，要保存的数据应该就是当时那个点的快照数据，所以是直接把当时的内存再复制一份吗？不复制的话，如果这期间又有变更改怎么办？其实这要说到写实复制（COW）机制，首先从表象上来看内存是一整块空间，其实这不太好维护，因此操作系统会把内存分成一小块一小块的，也就是内存分页管理，一页的大小一般是4K、8K或者16K等等，redis 的数据都是分布在这些页面上的，出于效率问题，fork 出来的子进程是和主进程是共享同一块的内存的，并不会复制内存，如果这期间主进程有数据变更，那么为了区分，这时最快捷的做法就是把对应的数据页重新复制一下，然后主的变更就在这个新的数据页上修改，并不会修改来的数据页，这样就保证了子进程处理的还是当时的快照。

以上说的变更是从快照的角度来考虑的，如果从数据的一致性来说，当快照的 rdb 被从库应用之后，这期间的变更该如何同步给从库？答案是缓冲区，这个缓冲区叫做 replication buffer，主库在收到需要同步的命令之后，会把期间的变更都先保存在这个缓冲区中，这样在把 rdb 发给从库之后，紧接着会再把 replication buffer 的数据也发给从库，最终主从就保持了一致。

replication buffer不是万能的补给剂

我们来看看 replication buffer 持续写入的时间有多长。

我们知道主从同步的时候，主库会执行 fork 来让子进程完成相应地工作，因此子进程从开始执行 bgsave 到执行完毕这期间，变更是要写入 replication buffer 的。
rdb 生成好之后，需要把它发送给从库，这个网络传输是不是也需要耗点时间，这期间也是要写入 replication buffer 的。
从库在收到 rdb 之后需要把 rdb 应用到内存里，这期间从库是阻塞的，无法提供服务，因此这期间也是要写入 replication buffer 的。

Replication buffer Since it is a buffer, its size is limited. If any of the above three steps takes a long time, it will cause the replication buffer to grow rapidly (provided there is normal writing), when the replication buffer exceeds the limit, the connection between the master library and the slave library will be disconnected. After the disconnection, if the slave library is connected again, the replication will be restarted, and then the same long replication steps will be repeated. , so the size of this replication buffer is still very critical, and generally needs to be comprehensively judged based on factors such as writing speed, amount of writing per second, and network transmission speed.

What should I do if the slave database network is not good and the master database is disconnected?

Normally speaking, as long as the connection between the master and the slave is established, subsequent changes to the master library can be directly sent to the slave library for direct playback from the slave library, but we cannot guarantee that the network environment is It is 100% smooth, so the disconnection issue between the slave database and the master database must also be considered.

It should be that before redis2.8, as long as the slave database was disconnected, even for a short time, when the slave database was connected again later, the main database would directly and brainlessly perform full synchronization. In version 2.8 and later, incremental replication is supported. The principle of incremental replication is that there must be a buffer to save the change record. This buffer here is called repl_backlog_buffer. This buffer is logically a ring buffer. Write When it is full, it will be overwritten from the beginning, so there is also a size limit. When the slave library reconnects, the slave library will tell the main library: "I have copied to the xx location." After the main library receives the message from the slave library, it starts to check whether the data at the xx location is still in the repl_backlog_buffer. If so, , just send the data after xx to the slave library. If it is not there, there is nothing you can do and you can only perform full synchronization again.

Requires a manager

In the master-slave mode, if the main library hangs up, we can upgrade a slave library to the main library, but this process is manual and relies on human power. , cannot minimize the loss, a set of automatic management and election mechanisms are still needed. This is Sentinel. Sentinel itself is also a service, but it does not process the reading and writing of data. It is only responsible for managing all redis instances. The sentry will communicate with each redis at regular intervals (ping operation). Each redis instance can express its position as long as it responds in time within the specified time. Of course, the Sentinel itself may be down or the network is unavailable, so generally the Sentinel will also build a Sentinel cluster. It is best to have an odd number of clusters, such as 3 or 5. The purpose of the odd number is mainly for elections. (The minority obeys the majority).

When a sentinel does not receive pong in time after initiating a ping, the redis instance will be marked offline. At this time, it is still not really offline. At this time, other sentinels will also determine the current Is this sentinel really offline? When most sentinels determine that this redis is offline, they will kick it out of the cluster. If it is a slave database that is offline, then it is okay. Just kick it out directly. , if the main database needs to trigger an election, the election is not a blind election, it must be to select the most suitable one to always act as the new main database. This library that is most suitable to serve as the master library will generally be determined according to the following priorities:

Weight. Each slave library can actually set a weight. The slave library with a higher weight will The progress of copying is given priority
. The progress of copying from each slave library may be different. The one with the smallest data gap between the current and the main library is given priority
The ID of the service. In fact, each redis instance has its own ID. If the above conditions are the same, then the library with the smallest ID will be selected to serve as the main library

Stronger horizontal scalability

The master-slave mode solves the problem of single point of failure. At the same time, the read-write separation technology makes the application support stronger. The sentinel mode can automatically supervise the cluster, realize automatic master selection, and automatically eliminate faulty nodes. Ability.

Normally speaking, as long as the reading pressure increases, we can add slave libraries to alleviate it. But what if the pressure on the main library is very high? This brings us to the sharding technology that we will talk about next. We only need to cut the main library into several pieces and deploy them to different machines. This sharding is the slot concept in redis. When sharding, redis will be divided into 0~16383 by default, which is a total of 16384 slots. Then these slots are evenly distributed to each shard node to achieve load balancing. It worked. Which slot should each key be assigned to? The main thing is to first use CRC16 to get a 16-bit number, and then use this number modulo 16384:

crc16(key)%16384

Copy after login

客户端将缓存插槽信息，以便在每个键到达时只需计算即可确定该将其发送到哪个实例进行处理。但是客户端缓存的槽信息并不是一成不变的，比如在增加实例的时候，这时候会导致重新分片，那么原来客户端缓存的信息就会不准确，一般这时候会发生两个常见的错误，严格来说也不是错误，更像一种信息，一个叫做MOVED，一个叫做ASK。moved的意思就说，原来是实例A负责的数据，现在被迁移到了实例B，MOVED 代表的是迁移完成的，但是 ASK 代表的是正在迁移过程中，比如原来是实例A负责的部分数据，现在被迁移到了实例B，剩下的还在等待迁移中，当数据迁移完毕之后 ASK 就会变成 MOVED，然后客户端收到 MOVED 信息之后就会再次更新下本地缓存，这样下次就不会出现这两个错误了。

The above is the detailed content of What are the strange data types and cluster knowledge of redis?. For more information, please follow other related articles on the PHP Chinese website!