This article will introduce you to 15 pitfalls you may encounter when using Redis. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to everyone.
Hello everyone, my name is Kaito.
In this article, I want to talk to you about the "pits" you may encounter when using Redis.
If you have encountered the following "weird" scenarios when using Redis, then there is a high probability that you have stepped into a "pit":
Obviously If a key has an expiration time set, why does it not expire?
Using the O(1) complexity SETBIT command, Redis was OOMed?
Execute RANDOMKEY and randomly pick out a key. Will it block Redis?
With the same command, why can’t the data be found in the master database but can be found in the slave database?
Why does the slave library use more memory than the main library?
Why is the data written to Redis lost inexplicably?
...
[Related recommendation: Redis video tutorial]What exactly is
What are the reasons for these problems?
In this article, I will review with you the pitfalls you may encounter when using Redis and how to avoid them.
I divided these questions into three parts:
What are the pitfalls of common commands?
What are the pitfalls of data persistence?
What are the pitfalls of master-slave database synchronization?
The reasons leading to these problems are likely to "subvert" your understanding. If you are ready, then follow my ideas and start!
This article has a lot of useful information, I hope you can read it patiently.
First of all, let’s take a look at some common commands that will encounter “unexpected” results when using Redis.
1) The expiration date was accidentally lost?
When you use Redis, you must often use the SET command. It is very simple.
In addition to setting the key-value, SET can also set the expiration time of the key, as follows:
127.0.0.1:6379> SET testkey val1 EX 60 OK 127.0.0.1:6379> TTL testkey (integer) 59
At this time, if you want to modify the value of the key, but simply If you use the SET command without adding the "expiration time" parameter, the expiration time of this key will be "erased".
127.0.0.1:6379> SET testkey val2 OK 127.0.0.1:6379> TTL testkey // key永远不过期了! (integer) -1
Did you see it? testkey will never expire!
#If you have just started using Redis, I believe you must have stepped on this pit.
The reason for this problem is: If the expiration time is not set in the SET command, Redis will automatically "erase" the expiration time of the key.
If you find that the memory of Redis continues to grow, and many keys originally had expiration times set, but later find that the expiration time has been lost, it is most likely due to this reason.
At this time, there will be a large number of non-expired keys in your Redis, consuming too many memory resources.
So, when you use the SET command, if you set the expiration time at the beginning, then when you modify the key later, you must also add the expiration time parameter to avoid the problem of losing the expiration time.
2) DEL can also block Redis?
To delete a key, you will definitely use the DEL command. I wonder if you haven’t thought about its time complexity?
O(1)? Not necessarily.
If you read the official documentation of Redis carefully, you will find: The time it takes to delete a key is related to the type of the key.
Redis official documentation describes the DEL command as follows:
key is String type, and the DEL time complexity is O(1)
key is List/Hash/Set/ZSet type, DEL time complexity is O(M), M is the number of elements
In other words, if you want to delete a non-String type key, the more elements the key has, the longer it will take to execute DEL!
Why is this?
The reason is that when deleting this kind of key, Redis needs to release the memory of each element in turn. The more elements there are, the more time-consuming this process will be.
Such a long operation will inevitably block the entire Redis instance and affect the performance of Redis.
So, when you delete keys of List/Hash/Set/ZSet type, you must pay special attention. You cannot execute DEL without thinking. Instead, you should use the following method Delete:
Query the number of elements: Execute the LLEN/HLEN/SCARD/ZCARD command
Judge the number of elements: If the number of elements is small, you can Directly execute DEL deletion, otherwise delete in batches
Delete in batches: execute LRANGE/HSCAN/SSCAN/ZSCAN LPOP/RPOP/HDEL/SREM/ZREM deletion
After understanding the impact of DEL on List/Hash/Set/ZSet type data, let’s analyze it again. Will deleting a String type key cause such a problem?
ah? Didn't I mention earlier that the Redis official document describes that the time complexity of deleting a String type key is O(1)? This won't cause Redis to block, right?
Actually, this is not necessarily the case!
Think about it, what if this key occupies a very large amount of memory?
For example, if this key stores 500MB of data (obviously, it is a bigkey), then when executing DEL, the time will still become longer!
This is because it takes time for Redis to release such a large memory to the operating system, so the operation will take longer.
So, for the String type, you'd better not store too large data, otherwise there will be performance problems when deleting it.
At this point, you may be thinking: Didn’t Redis 4.0 introduce the lazy-free mechanism? If this mechanism is turned on, the operation of releasing memory will be executed in the background thread. Will it not block the main thread?
This is a very good question.
Is this really the case?
Here I will tell you the conclusion first: Even if Redis turns on lazy-free, when deleting a String type bigkey, it is still processed in the main thread instead of being executed in the background thread. . Therefore, there is still a risk of blocking Redis!
why is that?
Here is a clue. Interested students can first check the lazy-free related information to find the answer. :)
In fact, there are a lot of knowledge points about lazy-free. Due to space reasons, I plan to write a special article later. Welcome to continue to pay attention~
3) RANDOMKEY can also block Redis?
If you want to randomly view a key in Redis, you usually use the RANDOMKEY command.
This command will "randomly" extract a key from Redis.
Since it is random, the execution speed must be very fast, right?
actually not.
To explain this problem clearly, we need to combine it with the expiration strategy of Redis.
If you know something about the expiration strategy of Redis, you should know that Redis cleans up expired keys by using a combination of scheduled cleaning and lazy cleaning.
After RANDOMKEY randomly takes out a key, it will first check whether the key has expired.
If the key has expired, Redis will delete it. This process is lazy cleanup.
But the cleanup is not over yet. Redis still needs to find a "non-expired" key and return it to the client.
At this time, Redis will continue to randomly take out a key, and then determine whether it has expired, until an unexpired key is found and returned to the client.
The whole process is like this:
master randomly takes out a key and determines whether it has expired
If the key has expired , delete it, continue to randomly pick the key
, and repeat this cycle until you find a key that does not expire, return
But here is A question: If a large number of keys have expired in Redis at this time, but have not yet been cleaned up, then this cycle will last for a long time before it ends, and this time is spent on cleaning up expired keys and looking for them. The key is not expired.
导致的结果就是,RANDOMKEY 执行耗时变长,影响 Redis 性能。
以上流程,其实是在 master 上执行的。
如果在 slave 上执行 RANDOMEKY,那么问题会更严重!
为什么?
主要原因就在于,slave 自己是不会清理过期 key。
那 slave 什么时候删除过期 key 呢?
其实,当一个 key 要过期时,master 会先清理删除它,之后 master 向 slave 发送一个 DEL 命令,告知 slave 也删除这个 key,以此达到主从库的数据一致性。
还是同样的场景:Redis 中存在大量已过期,但还未被清理的 key,那在 slave 上执行 RANDOMKEY 时,就会发生以下问题:
slave 随机取出一个 key,判断是否已过期
key 已过期,但 slave 不会删除它,而是继续随机寻找不过期的 key
由于大量 key 都已过期,那 slave 就会寻找不到符合条件的 key,此时就会陷入「死循环」!
也就是说,在 slave 上执行 RANDOMKEY,有可能会造成整个 Redis 实例卡死!
是不是没想到?在 slave 上随机拿一个 key,竟然有可能造成这么严重的后果?
这其实是 Redis 的一个 Bug,这个 Bug 一直持续到 5.0 才被修复。
修复的解决方案是,在 slave 上执行 RANDOMKEY 时,会先判断整个实例所有 key 是否都设置了过期时间,如果是,为了避免长时间找不到符合条件的 key,slave 最多只会在哈希表中寻找 100 次,无论是否能找到,都会退出循环。
这个方案就是增加上了一个最大重试次数,这样一来,就避免了陷入死循环。
虽然这个方案可以避免了 slave 陷入死循环、卡死整个实例的问题,但是,在 master 上执行这个命令时,依旧有概率导致耗时变长。
所以,你在使用 RANDOMKEY 时,如果发现 Redis 发生了「抖动」,很有可能是因为这个原因导致的!
4) O(1) 复杂度的 SETBIT,竟然会导致 Redis OOM?
在使用 Redis 的 String 类型时,除了直接写入一个字符串之外,还可以把它当做 bitmap 来用。
具体来讲就是,我们可以把一个 String 类型的 key,拆分成一个个 bit 来操作,就像下面这样:
127.0.0.1:6379> SETBIT testkey 10 1 (integer) 1 127.0.0.1:6379> GETBIT testkey 10 (integer) 1
其中,操作的每一个 bit 位叫做 offset。
但是,这里有一个坑,你需要注意起来。
如果这个 key 不存在,或者 key 的内存使用很小,此时你要操作的 offset 非常大,那么 Redis 就需要分配「更大的内存空间」,这个操作耗时就会变长,影响性能。
所以,当你在使用 SETBIT 时,也一定要注意 offset 的大小,操作过大的 offset 也会引发 Redis 卡顿。
这种类型的 key,也是典型的 bigkey,除了分配内存影响性能之外,在删除它时,耗时同样也会变长。
5) 执行 MONITOR 也会导致 Redis OOM?
这个坑你肯定听说过很多次了。
当你在执行 MONITOR 命令时,Redis 会把每一条命令写到客户端的「输出缓冲区」中,然后客户端从这个缓冲区读取服务端返回的结果。
但是,如果你的 Redis QPS 很高,这将会导致这个输出缓冲区内存持续增长,占用 Redis 大量的内存资源,如果恰好你的机器的内存资源不足,那 Redis 实例就会面临被 OOM 的风险。
So, you need to use MONITOR with caution, especially when the QPS is high.
The above problem scenarios all occur when we use common commands, and they are likely to be triggered "unintentionally".
Let’s take a look at the pitfalls of Redis’s “data persistence”?
Redis data persistence is divided into two methods: RDB and AOF.
Among them, RDB is a data snapshot, and AOF will record every write command to the log file.
Problems with data persistence are mainly concentrated in these two blocks. Let’s look at them in turn.
1) The master is down and the slave data is also lost?
If your Redis is deployed in the following mode, data loss will occur:
master-slave sentinel deployment instance
master does not enable the data persistence function
The Redis process is managed by supervisor and is configured as "process crashes and automatically restarts"
If the master is down at this time, it will cause the following problems:
The master is down and the sentinel has not initiated the switch. At this time, the master process is immediately automatically pulled up by the supervisor.
But the master does not enable any data persistence, and it is an "empty" instance after startup.
At this time, in order to keep the slave consistent with the master, It will automatically "clear" all data in the instance, and the slave will also become an "empty" instance
Did you see it? In this scenario, all master/slave data is lost.
At this time, when the business application accesses Redis and finds that there is no data in the cache, it will send all requests to the back-end database. This will further trigger a "cache avalanche" and have a great impact on the business. .
So, you must avoid this situation from happening. My advice to you is:
Redis instances are automatically pulled up without using process management tools
After the master goes down, let the sentinel initiate the switch and promote the slave to the master
After the switch is completed, restart the master and let it degrade to the slave
You should avoid this problem when configuring data persistence.
2) Does AOF everysec really not block the main thread?
When Redis turns on AOF, you need to configure the AOF flushing strategy.
Based on the balance between performance and data security, you will definitely use the appendfsync everysec solution.
The working mode of this solution is that the background thread of Redis flushes the data of the AOF page cache to the disk (fsync) every 1 second.
The advantage of this solution is that the time-consuming operation of AOF disk brushing is executed in the background thread, avoiding the impact on the main thread.
But does it really not affect the main thread?
the answer is negative.
In fact, there is such a scenario: When the Redis background thread performs AOF page cache flushing (fysnc), if the disk IO load is too high at this time, the call to fsync will be blocked.
At this time, the main thread is still receiving write requests, so the main thread at this time will first determine whether the last background thread has successfully flushed the disk.
How to judge?
The background thread will record the time of disk brushing after the disk brushing is successful.
The main thread will use this time to determine how long it has been since the last disk brush. The whole process is like this:
Before the main thread writes the AOF page cache (write system call), first checks whether the background fsync has been completed?
fsync is completed, the main thread directly writes to the AOF page cache
fsync is not completed, then check how long has passed since the last fsync?
If it is within 2 seconds since the last fysnc was successful, the main thread will return directly without writing to the AOF page cache
If it is within 2 seconds If fysnc succeeds for more than 2 seconds, the main thread will force the write AOF page cache (write system call)
Due to the high disk IO load, at this time, the background thread fynsc will block , then the main thread will also block and wait when writing the AOF page cache (operating the same fd, fsync and write are mutually exclusive, one party must wait for the other to succeed before it can continue execution, otherwise it will block and wait)
Through analysis, we can find that even if the AOF flushing strategy you configure is appendfsync everysec, there is still a risk of blocking the main thread.
In fact, the key point of this problem is that the disk IO load is too high, causing fynsc to block, which in turn causes the main thread to block when writing to the AOF page cache.
So, you must ensure that the disk has sufficient IO resources to avoid this problem.
3) AOF everysec really only loses 1 second of data?
Continue to analyze the above questions.
As mentioned above, here we need to focus on step 4 above.
That is: when the main thread writes the AOF page cache, it will first determine the time when the last fsync was successful. If it is within 2 seconds from the last fysnc success, the main thread will return directly and no longer write the AOF. page cache.
This means that when the background thread performs fsync flush, the main thread will wait for up to 2 seconds and will not write to the AOF page cache.
If Redis crashes at this time, then 2 seconds of data will be lost in the AOF file, not 1 second!
We continue to analyze, why does the Redis main thread wait for 2 seconds without writing the AOF page cache?
In fact, when Redis AOF is configured as appendfsync everysec, normally, the background thread executes fsync disk flushing every 1 second. If the disk resources are sufficient, it will not be blocked.
In other words, the Redis main thread does not need to care at all whether the background thread flushes the disk successfully, as long as the AOF page cache is written mindlessly.
However, the Redis author considers that if the disk IO resources are relatively tight at this time, the background thread fsync may be blocked.
So, before the Redis author writes the AOF page cache in the main thread, he first checks the time since the last successful fsync. If it is greater than 1 second and fails, the main thread will know at this time that fsync may be blocked. .
So, the main thread will wait for 2 seconds without writing the AOF page cache. The purpose is:
Reduce the risk of the main thread blocking (if you write the AOF page cache without thinking , the main thread will block immediately)
If fsync blocks, the main thread will leave 1 second for the background thread to wait for fsync to succeed
But the price is that if a downtime occurs at this time, AOF will lose 2 seconds of data instead of 1 second.
This solution should be a further trade-off between performance and data security by the Redis author.
In any case, all you need to know here is that even if AOF is configured to flush disks every second, when the above extreme situation occurs, the data lost by AOF is actually 2 seconds.
4) OOM occurs in Redis when RDB and AOF rewrite?
Finally, let’s take a look at the problems that occur when Redis performs RDB snapshots and AOF rewrite.
When Redis does RDB snapshot and AOF rewrite, it will create a child process to persist the data in the instance to the disk.
Creating a child process will call the fork function of the operating system.
After the fork execution is completed, the parent process and the child process will share the same memory data at the same time.
But the main process at this time can still receive write requests, and incoming write requests will use Copy On Write (copy on write) method to operate memory data.
In other words, once the main process has data that needs to be modified, Redis will not directly modify the data in the existing memory. Instead, it will first copy the memory data out and then modify the data in the new memory. , this is called "copy-on-write".
Copy on write can also be understood as, whoever needs to write will copy first and then modify.
You should have discovered that if the parent process wants to modify a key, it needs to copy the original memory data to the new memory. This process involves the application for "new memory".
If your business characteristics are "more writing and less reading", and the OPS is very high, then a large amount of memory copy work will be generated during RDB and AOF rewrite.
What's the problem with this?
Because there are many write requests, this will cause the Redis parent process to apply for a lot of memory. During this period, the wider the scope of the key modification, the more new memory applications are required.
If your machine has insufficient memory resources, this will cause Redis to be at risk of OOM!
这就是你会从 DBA 同学那里听到的,要给 Redis 机器预留内存的原因。
其目的就是避免在 RDB 和 AOF rewrite 期间,防止 Redis OOM。
以上这些,就是「数据持久化」会遇到的坑,你踩到过几个?
下面我们再来看「主从复制」会存在哪些问题。
Redis 为了保证高可用,提供了主从复制的方式,这样就可以保证 Redis 有多个「副本」,当主库宕机后,我们依旧有从库可以使用。
在主从同步期间,依旧存在很多坑,我们依次来看。
1) 主从复制会丢数据吗?
首先,你需要知道,Redis 的主从复制是采用「异步」的方式进行的。
这就意味着,如果 master 突然宕机,可能存在有部分数据还未同步到 slave 的情况发生。
这会导致什么问题呢?
如果你把 Redis 当做纯缓存来使用,那对业务来说没有什么影响。
master 未同步到 slave 的数据,业务应用可以从后端数据库中重新查询到。
但是,对于把 Redis 当做数据库,或是当做分布式锁来使用的业务,有可能因为异步复制的问题,导致数据丢失 / 锁丢失。
关于 Redis 分布式锁可靠性的更多细节,这里先不展开,后面会单独写一篇文章详细剖析这个知识点。这里你只需要先知道,Redis 主从复制是有概率发生数据丢失的。
2) 同样命令查询一个 key,主从库却返回不同的结果?
不知道你是否思考过这样一个问题:如果一个 key 已过期,但这个 key 还未被 master 清理,此时在 slave 上查询这个 key,会返回什么结果呢?
slave 正常返回 key 的值
slave 返回 NULL
你认为是哪一种?可以思考一下。
答案是:不一定。
嗯?为什么会不一定?
这个问题非常有意思,请跟紧我的思路,我会带你一步步分析其中的原因。
其实,返回什么结果,这要取决于以下 3 个因素:
Redis 的版本
具体执行的命令
机器时钟
先来看 Redis 版本。
如果你使用的是 Redis 3.2 以下版本,只要这个 key 还未被 master 清理,那么,在 slave 上查询这个 key,它会永远返回 value 给你。
也就是说,即使这个 key 已过期,在 slave 上依旧可以查询到这个 key。
// Redis 2.8 版本 在 slave 上执行 127.0.0.1:6479> TTL testkey (integer) -2 // 已过期 127.0.0.1:6479> GET testkey "testval" // 还能查询到!
但如果此时在 master 上查询这个 key,发现已经过期,就会把它清理掉,然后返回 NULL。
// Redis 2.8 版本 在 master 上执行 127.0.0.1:6379> TTL testkey (integer) -2 127.0.0.1:6379> GET testkey (nil)
发现了吗?在 master 和 slave 上查询同一个 key,结果竟然不一样?
其实,slave 应该要与 master 保持一致,key 已过期,就应该给客户端返回 NULL,而不是还正常返回 key 的值。
为什么会发生这种情况?
其实这是 Redis 的一个 Bug:3.2 以下版本的 Redis,在 slave 上查询一个 key 时,并不会判断这个 key 是否已过期,而是直接无脑返回给客户端结果。
这个 Bug 在 3.2 版本进行了修复,但是,它修复得「不够彻底」。
什么叫修复得「不够彻底」?
这就要结合前面提到的,第 2 个影响因素「具体执行的命令」来解释了。
Redis 3.2 虽然修复了这个 Bug,但却遗漏了一个命令:EXISTS。
也就是说,一个 key 已过期,在 slave 直接查询它的数据,例如执行 GET/LRANGE/HGETALL/SMEMBERS/ZRANGE 这类命令时,slave 会返回 NULL。
但如果执行的是 EXISTS,slave 依旧会返回:key 还存在。
// Redis 3.2 版本 在 slave 上执行 127.0.0.1:6479> GET testkey (nil) // key 已逻辑过期 127.0.0.1:6479> EXISTS testkey (integer) 1 // 还存在!
原因在于,EXISTS 与查询数据的命令,使用的不是同一个方法。
Redis 作者只在查询数据时增加了过期时间的校验,但 EXISTS 命令依旧没有这么做。
直到 Redis 4.0.11 这个版本,Redis 才真正把这个遗漏的 Bug 完全修复。
如果你使用的是这个之上的版本,那在 slave 上执行数据查询或 EXISTS,对于已过期的 key,就都会返回「不存在」了。
这里我们先小结一下,slave 查询过期 key,经历了 3 个阶段:
3.2 以下版本,key 过期未被清理,无论哪个命令,查询 slave,均正常返回 value
3.2 - 4.0.11 版本,查询数据返回 NULL,但 EXISTS 依旧返回 true
4.0.11 以上版本,所有命令均已修复,过期 key 在 slave 上查询,均返回「不存在」
这里要特别鸣谢《Redis开发与运维》的作者,付磊。
这个问题我是在他的文章中看到的,感觉非常有趣,原来 Redis 之前还存在这样的 Bug 。随后我又查阅了相关源码,并对逻辑进行了梳理,在这里才写成文章分享给大家。
虽然已在微信中亲自答谢,但在这里再次表达对他的谢意~
最后,我们来看影响查询结果的第 3 个因素:「机器时钟」。
假设我们已规避了上面提到的版本 Bug,例如,我们使用 Redis 5.0 版本,在 slave 查询一个 key,还会和 master 结果不同吗?
答案是,还是有可能会的。
这就与 master / slave 的机器时钟有关了。
无论是 master 还是 slave,在判断一个 key 是否过期时,都是基于「本机时钟」来判断的。
如果 slave 的机器时钟比 master 走得「快」,那就会导致,即使这个 key 还未过期,但以 slave 上视角来看,这个 key 其实已经过期了,那客户端在 slave 上查询时,就会返回 NULL。
是不是很有意思?一个小小的过期 key,竟然藏匿这么多猫腻。
如果你也遇到了类似的情况,就可以通过上述步骤进行排查,确认是否踩到了这个坑。
3) 主从切换会导致缓存雪崩?
这个问题是上一个问题的延伸。
我们假设,slave 的机器时钟比 master 走得「快」,而且是「快很多」。
此时,从 slave 角度来看,Redis 中的数据存在「大量过期」。
如果此时操作「主从切换」,把 slave 提升为新的 master。
它成为 master 后,就会开始大量清理过期 key,此时就会导致以下结果:
master 大量清理过期 key,主线程发生阻塞,无法及时处理客户端请求
Redis 中数据大量过期,引发缓存雪崩
你看,当 master / slave 机器时钟严重不一致时,对业务的影响非常大!
所以,如果你是 DBA 运维,一定要保证主从库的机器时钟一致性,避免发生这些问题。
4) master / slave 大量数据不一致?
还有一种场景,会导致 master / slave 的数据存在大量不一致。
这就涉及到 Redis 的 maxmemory 配置了。
Redis 的 maxmemory 可以控制整个实例的内存使用上限,超过这个上限,并且配置了淘汰策略,那么实例就开始淘汰数据。
但这里有个问题:假设 master / slave 配置的 maxmemory 不一样,那此时就会发生数据不一致。
例如,master 配置的 maxmemory 为 5G,而 slave 的 maxmemory 为 3G,当 Redis 中的数据超过 3G 时,slave 就会「提前」开始淘汰数据,此时主从库数据发生不一致。
In addition, although the maxmemory settings of the master/slave are the same, if you want to adjust their upper limit, you must also pay special attention, otherwise the slave will also cause the data to be eliminated:
Increase the maxmemory When adjusting the slave, adjust the slave first, then adjust the master
When adjusting maxmemory to a smaller value, adjust the master first, then adjust the slave
In this way, This avoids the problem of slave exceeding maxmemory in advance.
In fact, you can think about it, what is the key to these problems?
The fundamental reason is that after slave exceeds maxmemory, the data will be eliminated "by itself".
If the slave is not allowed to eliminate data by itself, can all these problems be avoided?
That’s right.
Regarding this issue, Redis officials should also have received feedback from many users. In Redis 5.0 version, the official finally solved this problem completely!
Redis 5.0 adds a configuration item: replica-ignore-maxmemory, the default is yes.
This parameter indicates that even if the slave memory exceeds maxmemory, it will not eliminate data on its own!
In this way, the slave will always be on par with the master, and will only faithfully copy the data sent by the master, and will not engage in "little tricks" on its own.
At this point, the data of master / slave can be guaranteed to be completely consistent!
If you happen to be using version 5.0, you don’t have to worry about this problem.
5) The slave actually has a memory leak problem?
Yes, you are not mistaken.
How did this happen? Let’s take a look at it in detail.
When you are using Redis, the slave memory leak will be triggered if the following scenarios are met:
Redis is using a version below 4.0
The slave configuration item is read-only=no (writable from the library)
The key with expiration time is written to the slave
At this time, the slave will have a memory leak: The key in the slave will not be automatically cleared even if it expires.
If you do not delete it actively, these keys will remain in the slave memory and consume the slave memory.
The most troublesome thing is that you use commands to query these keys, but you still can’t find any results!
This is the slave "memory leak" problem.
This is actually a bug in Redis. This problem was only fixed in Redis 4.0.
The solution is, On a writable slave, when writing keys with expiration time, the slave will "record" these keys.
Then the slave will scan these keys regularly and clean them if the expiration time is reached.
If your business needs to temporarily store data on the slave, and these keys have expiration times set, then you should pay attention to this issue.
You need to confirm your Redis version. If it is a version below 4.0, be sure to avoid this pitfall.
In fact, the best solution is to formulate a Redis usage specification. The slave must be forced to be read-only and no writing is allowed. This can not only ensure the data consistency of the master/slave, but also avoid the slave memory. Leakage problem.
6) Why does master-slave full synchronization keep failing?
When the master-slave is fully synchronized, you may encounter the problem of synchronization failure. The specific scenario is as follows:
The slave initiates a full synchronization request to the master, and the master generates an RDB and sends it to slave, slave loads RDB.
Because the RDB data is too large, the slave loading time will also become very long.
At this point you will find that the slave has not completed loading the RDB, but the connection between the master and the slave has been disconnected, and data synchronization has failed.
After that, you will find that the slave initiates full synchronization again, and the master generates RDB and sends it to the slave.
Similarly, when slave loads RDB, master/slave synchronization fails again, and so on.
How is this going?
Actually, this is the "replication storm" problem of Redis.
What is a replication storm?
Just like what was just described: The master-slave full synchronization failed, the synchronization was restarted, and then the synchronization failed again. This goes back and forth, a vicious cycle, and continues to waste machine resources.
Why does this cause this problem?
If your Redis has the following characteristics, this problem may occur:
The instance data of the master is too large, and the slave takes too long to load the RDB.
The copy buffer (slave client-output-buffer-limit) configuration is too small
The master has a large number of write requests
When the master and slave are fully synchronizing data, the write request received by the master will first be written to the master-slave "copy buffer". The "upper limit" of this buffer is determined by the configuration.
When the slave loads the RDB too slowly, it will cause the slave to be unable to read the data in the "replication buffer" in time, which causes the replication buffer to "overflow".
In order to avoid continuous memory growth, the master will "forcibly" disconnect the slave at this time, and full synchronization will fail.
After that, the slave that failed to synchronize will "re" initiate full synchronization, and then fall into the problem described above, and then repeat in a vicious circle. This is the so-called "replication storm".
How to solve this problem? I give you the following suggestions:
Redis instance should not be too large and avoid overly large RDB
Make the copy buffer configuration as large as possible Some, leave enough time for the slave to load the RDB, and reduce the probability of full synchronization failure
If you have also stepped on this pit, you can solve it through this solution.
Okay, to summarize, in this article we mainly talk about Redis in "command usage", "data persistence", and "master-slave synchronization" There are three possible pitfalls.
How about it? Has it subverted your understanding?
The amount of information in this article is quite large. If your current thinking is a bit "messy", don't worry, I have also prepared a mind map for you to facilitate your better understanding and memory. .
I hope that when you use Redis, you can avoid these pitfalls in advance and let Redis provide better services.
Finally, I want to talk to you about my experience and thoughts about stepping into pitfalls during the development process.
In fact, when you come into contact with any new field, you will go through several stages: unfamiliarity, familiarity, stepping on pitfalls, absorbing experience, and being comfortable.
So at this stage of stepping on pitfalls, how can you avoid stepping on pitfalls? Or how to efficiently troubleshoot problems after stepping into a trap?
Here I have summarized 4 aspects, which should be able to help you:
1) Read more comments on the official document configuration file
definitely Please read more official documents and comments on configuration files. In fact, excellent software will remind you of many possible risks in documents and comments. If you read them carefully, you can avoid many basic problems in advance.
2) Don’t let go of the details of your questions and think more about why?
Always stay curious. When encountering a problem, master the ability to peel off the cocoon and gradually locate the problem, and always maintain the mentality of exploring the essence of the problem.
3) Dare to raise questions, the source code will not lie
If you think a problem is strange, it may be a bug, dare to raise questions.
Finding the truth of the problem through the source code is better than reading a hundred articles plagiarized from each other on the Internet (copying them over and over is very likely to be wrong).
4) There is no perfect software, excellent software is iterated step by step
Any excellent software is iterated step by step. During the iteration process, it is normal for bugs to exist, and we need to look at it with the right mentality.
These experiences and insights are applicable to any field of study and I hope they will be helpful to you.
For more programming-related knowledge, please visit: Programming Teaching! !
The above is the detailed content of 15 pitfalls you may encounter when using Redis, come and collect them to avoid lightning! !. For more information, please follow other related articles on the PHP Chinese website!