Home >Database >Redis >Summary of common interview questions about Redis (with answer analysis)

Summary of common interview questions about Redis (with answer analysis)

青灯夜游forward: 2021-04-08 10:29:182706browse

After interviewing 6 major manufacturers, I summarized and shared with you the common Redis interview questions that were poorly asked (with answer analysis). It has certain reference value. Friends in need can refer to it. I hope it will be helpful to everyone.

[Related recommendations: Redis video tutorial]

Caching knowledge points

What are the types of cache?

Caching is an effective means to improve hotspot data access performance in high concurrency scenarios, and is often used when developing projects.

The types of cache are divided into: Local cache, Distributed cache and Multi-level cache.

Local cache:

Local cache is cached in the memory of the process, such as our In the JVM heap, you can use LRUMap to implement it, or you can use tools like Ehcache to implement it.

Local cache is memory access, has no remote interaction overhead, and has the best performance. However, it is limited by the capacity of a single machine. Generally, the cache is small and cannot be expanded.

Distributed cache:

Distributed cache can solve this problem very well.

Distributed caches generally have good horizontal scalability and can handle scenarios with large amounts of data. The disadvantage is that remote requests are required, and the performance is not as good as local caching.

Multi-level cache:

In order to balance this situation, multi-level cache is generally used in actual business, local The cache only saves some hotspot data with the highest access frequency, and other hotspot data is placed in the distributed cache.

Among the current first-tier manufacturers, this is also the most commonly used caching solution. A single caching solution is often difficult to support many high-concurrency scenarios.

Elimination strategy

Whether it is a local cache or a distributed cache, in order to ensure higher performance, memory is used to save data. Cost and memory limitations, when the stored data exceeds the cache capacity, the cached data needs to be evicted.

General elimination strategies include FIFO to eliminate the oldest data, LRU to eliminate the least recently used data, and LFU to eliminate the least recently used data. kind of strategy.

noeviction: Returns an error when the memory limit is reached and the client attempts to execute a command that would cause more memory to be used (most write commands, but DEL and a few exceptions)
allkeys-lru: Try to reclaim least used keys (LRU) so that there is room for newly added data.
volatile-lru: Try to recycle least used keys (LRU), but only keys in the expired set, so that there is room for newly added data to be stored.
allkeys-random: Recycle random keys so that there is space for newly added data.
volatile-random: Recycle random keys so that there is room for newly added data, but only for keys in the expired set.
volatile-ttl: Recycle the keys in the expired set, and give priority to keys with shorter survival time (TTL), so that there is space for newly added data to be stored .

If no key meets the prerequisites for recycling, the strategies volatile-lru, volatile-random and volatile-ttl are similar to noeviction .

In fact, the Lru algorithm is also implemented in the familiar LinkedHashMap. The implementation is as follows:

When the capacity exceeds 100, start executing the LRU strategy: remove the least recently unused TimeoutInfoHolder objects evict.

In a real interview, you will be asked to write the LUR algorithm. Don't do the original one. It's really too much and you can't finish it. You can either answer the above one or the following one and find a data structure to implement it. It is relatively easy to download the Java version of LRU, as long as you know the principles.

##Memcache

Note that

Memcache## will be added later # Abbreviated as MC. Let’s take a look at the characteristics of MC first:

MC uses multi-threaded asynchronous IO when processing requests, which can reasonably take advantage of the multi-core CPU and has excellent performance;
MC has simple functions and uses memory to store data;
MC’s memory structure and calcification issues I won’t go into details. You can check the official website to learn more;
MC can set an expiration date for cached data, and the expired data will be cleared;
The invalidation strategy adopts delayed invalidation, which means checking whether it is invalid when the data is used again;
When the capacity is full, the data in the cache will be removed. During the removal, in addition to cleaning up the expired keys, Data will also be culled according to the LRU policy.

In addition, there are some restrictions on using MC. These restrictions are very fatal in the current Internet scenario and have become an important reason why everyone chooses Redis and MongoDB:

key cannot exceed 250 bytes;
value cannot exceed 1M bytes;
The maximum expiration time of key is 30 days;
Only supports K-V structure and does not provide persistence and master-slave synchronization functions.

Redis

Let’s briefly talk about the characteristics of Redis to facilitate comparison with MC.

Different from MC, Redis uses single-threaded mode to process requests. There are two reasons for this: one is because it uses a non-blocking asynchronous event processing mechanism; the other is that the cached data is all memory operations and the IO time will not be too long, and a single thread can avoid the cost of thread context switching.
Redis supports persistence, so Redis can not only be used as a cache, but also as a NoSQL database.
Compared with MC, Redis also has a very big advantage, that is, in addition to K-V, it also supports multiple data formats, such as list, set, sorted set, hash, etc.
Redis Provides master-slave synchronization mechanism and Cluster cluster deployment capabilities, which can provide high-availability services.

Detailed explanation of Redis

The knowledge point structure of Redis is shown in the figure below.

##Function

See

Redis Provided What are the functions of !

Let’s look at the basic types first:

String:

##String

The type is The most commonly used type in Redis, the internal implementation is stored through SDS (Simple Dynamic String). SDS is similar to ArrayList in Java, which can reduce frequent allocation of memory by pre-allocating redundant space. This is the simplest type, which is ordinary set and get, and does simple KV caching.

But in a real development environment, many people may convert many complex structures into

String

for storage and use. For example, some people like to convert objects or List is converted to JSONString for storage, then taken out and deserialized. I won’t discuss the right or wrong of doing this here, but I still hope that everyone can use the most appropriate data structure in the most appropriate scenario. The object cannot be found to be the most suitable, but the type can be selected. Well, when someone else takes over your code and sees that it is so

standard

, eh, this guy has something , and I see that you use String,Rubbish!

Okay, these are all digressions. I still hope everyone will keep it in mind. Habits become natural, and small habits make you successful. The practical application scenarios of

String

are more extensive:

Cache function: String
string is the most commonly used Data types are not just Redis, each language is the most basic type. Therefore, use Redis as a cache, cooperate with other databases as a storage layer, and use Redis Supporting high concurrency can greatly speed up the reading and writing speed of the system and reduce the pressure on the back-end database.
Counter:
Many systems will use Redis as the real-time counter of the system, which can quickly implement counting and query functions. And the final data results can be stored in a database or other storage media at a specific time for permanent storage.
Shared User Session:
The user refreshes the interface again, and may need to access the data to log in again, or access the page cache Cookie, but You can use Redis to centrally manage the user's Session. In this mode, you only need to ensure the high availability of Redis. Each time the user Session Updates and acquisitions can be completed quickly. Greatly improve efficiency.

Hash:

This is a structure similar to Map. This generally means that the structure can be The data, such as an object (provided that this object does not nest other objects) is cached in Redis, and then each time the cache is read or written, it can be operated#A certain field in ##Hash.

But this scenario is actually somewhat simple, because many objects are relatively complex now. For example, your product object may contain many attributes, including objects. I don't use it that much in my own use cases.

List:

List is an ordered list, and you can still play a lot of tricks with it.

For example, you can use

List to store some list-type data structures, such as fan lists and article comment lists.

For example, you can use the

lrange command to read elements in a certain closed interval, and you can implement paging query based on List. This is a great function, based on Redis Implement simple high-performance paging. You can do things like Weibo’s pull-down and continuous paging. With high performance, you can go page by page.

For example, you can create a simple message queue, put it in from the

List head, and get it out from the List butt.

List itself is a commonly used data structure in our development process, not to mention hot data.

Message queue: The linked list structure of Redis can easily implement blocking queues. You can use left-in and right-out commands to complete the queue design. For example: a data producer can use the Lpush command to insert data from the left, and multiple data consumers can use the BRpop command to block the data at the end of the list.
Application for article list or data paging display.
For example, when the number of users of the article list on our commonly used blog website increases, and each user has his own article list, and when there are many articles, they need to be displayed in pages. At this time, you can consider Using the
Redis list, the list is not only ordered but also supports fetching elements within the range, which can perfectly solve the paging query function. Greatly improve query efficiency.

Set:

Set is an unordered set that will automatically remove duplicates. kind.

Directly based on

Set throw in the data that needs to be deduplicated in the system, and it will be automatically deduplicated. If you need to quickly global deduplication of some data, of course you can also Deduplication is performed based on HashSet in memory of JVM, but what if one of your systems is deployed on multiple machines? Global Set deduplication must be performed based on Redis.

You can play intersection, union, and difference operations based on

Set. For example, if you intersect, we can combine the friend lists of two people to see their common friends. who is it? Right.

Anyway, there are many of these scenarios, because the comparison is fast and the operation is simple. Two queries can be done with one

Set.

Sorted Set:

Sorted set is a sorted Set, deduplicated but ok Sort, give a score when writing in, and automatically sort according to the score.

The usage scenarios of ordered sets are similar to sets, but set collections are not automatically ordered, while

Sorted set can use scores to sort members, and it is sorted during insertion. . So when you need an ordered and non-duplicate set list, you can choose the Sorted set data structure as your choice.

Ranking list: classic usage scenarios of ordered collections. For example, a video website needs to rank the videos uploaded by users. The ranking may be maintained in many aspects: according to time, according to the number of views, according to the number of likes obtained, etc.
Use
Sorted Sets to make a weighted queue. For example, the score of ordinary messages is 1, and the score of important messages is 2. Then the worker thread can choose to press the score to obtain work tasks in reverse order. Prioritize important tasks.
The Weibo hot search list has a popularity value at the back and a name in front of it

Advanced usage:

Bitmap :

Bitmap supports storing information by bits and can be used to implement

BloomFilter );

HyperLogLog:

provides an inaccurate deduplication and counting function, which is more suitable for deduplication of large-scale data. Heavy statistics, such as statistical UV;

Geospatial:

Can be used to save the geographical location and calculate the location distance or calculate the location based on the radius. Have you ever thought about using Redis to implement nearby people? Or calculate the optimal map path?

These three can actually be regarded as a kind of data structure. I don’t know how many friends still remember that I mentioned it in the Redis basics where the dream started. If you only know the five basic types, then You can only get 60 points. If you can tell advanced usage, then I think you have something.

pub/sub: The

function is a subscription publishing function that can be used as a simple message queue.

Pipeline:

Can execute a set of instructions in batches and return all results at once, which can reduce frequent request responses.

Lua:

Redis supports submitting Lua scripts to perform a series of functions.

When I was at my old e-commerce boss, I often used this thing in flash sale scenarios. It makes sense and makes use of its atomicity.

By the way, do you want to see the flash sale design? I remember that I seemed to ask this question every time I was interviewed. If you want to see it, please just

like and comment for a quick sale.

Transaction:

The last function is a transaction, but

Redis does not provide a strict transaction, Redis only guarantees serial execution of commands, and can guarantee all executions. However, when the execution of the command fails, it will not roll back, but will continue to execute.

Persistence

Redis provides two persistence methods: RDB and AOF. RDB stores data in memory. The data set is written to the disk in the form of a snapshot, and the actual operation is executed through the fork sub-process, using binary compression storage; AOF records every write or delete operation processed by Redis in the form of a text log.

RDB Save the entire Redis data in a single file, which is more suitable for disaster recovery, but the disadvantage is that if there is a downtime before the snapshot is saved, the data during this period will be Lost, and saving the snapshot may cause the service to be unavailable for a short period of time.

AOF The append mode used for writing operations to log files has a flexible synchronization strategy that supports synchronization per second, synchronization per modification, and non-synchronization. The disadvantage is that the same size of data Set, AOF is larger than RDB, and AOF is often slower than RDB in terms of operating efficiency.

For details, please go to the chapter on high availability, especially the advantages and disadvantages of the two, and how to choose.

"Beat the Interviewer" series - Redis sentry, persistence, master-slave, hand-shredded LRU

High availability

Let’s look at the high availability of Redis. Redis supports master-slave synchronization, provides Cluster cluster deployment mode, and monitors the status of the Redis master server through Sentinels. When the master fails, a new master is selected from the slave node according to a certain strategy, and other slaves are adjusted to the new master.

There are three simple master selection strategies:

Under the same circumstances, the slave replicates The more data there is, the higher the priority;
Under the same conditions, the smaller the runid, the easier it is to be selected.

In the Redis cluster, sentinel will also be deployed in multiple instances, and the sentinels use the Raft protocol to ensure their high availability.

Redis Cluster uses a sharding mechanism, which is divided into 16384 slots internally, distributed on all master nodes, and each master node is responsible for a part of the slots. During data operation, CRC16 is performed according to the key to calculate which slot it is in and which master will process it. Data redundancy is guaranteed through slave nodes.

Sentinel

The Sentinel must use three instances to ensure its robustness. The Sentinel master-slave cannot guarantee that the data will not be lost. Loss

, but the high availability of the cluster can be guaranteed. Why do we need three instances? Let's see what happens with the two sentries first.

So what’s the problem with this? If M1 is down, it is OK if S1 is not down, but what if the entire machine is down? The only sentinel left is S2, and there is no sentinel to allow failover. Although there is R1 on the other machine, the failover is not executed.

The classic sentinel cluster is like this:

Nuan Man Me, let me briefly summarize the main functions of the sentinel component:

Cluster monitoring: Responsible for monitoring whether the Redis master and slave processes are working normally.
Message notification: If a Redis instance fails, Sentinel is responsible for sending messages as alarm notifications to the administrator.
Failover: If the master node hangs, it will automatically be transferred to the slave node.
Configuration Center: If a failover occurs, notify the client of the new master address.

Master-Slave

When it comes to this, it’s just like the data persistence I mentioned earlierRDB and AOF have a close relationship.

Let me first talk about why we need to use the master-slave architecture model. As mentioned earlier, the single-machine QPS has an upper limit, and the characteristics of Redis must be supported. If you read with high concurrency, then you can read and write on one machine, who can withstand this, you are not a human being! But if you let this master machine write, synchronize the data to other slave machines, and they all use it to read, wouldn't it be much better to distribute a large number of requests, and when expanding the capacity, horizontal expansion can be easily achieved.

When you start a slave, it will send a psync command to the master. If this slave is the first Once connected to the master, it will trigger a full replication. The master will start a thread, generate RDB snapshots, and cache new write requests in memory. After the RDB file is generated, the master will store this RDBSent to the slave. The first thing the slave does after getting it is to write it to the local disk and then load it into the memory. Then the master will send all the new names cached in the memory to the slave.

After I posted it, a netizen from CSDN: Jian_Shen_Zer asked a question:

When the master-slave is synchronized, when the new slave comes in, use RDB , what about the subsequent data? How can I synchronize new data into the master to the slave? ##Same, just synchronize the log increments to the slave service

key failure mechanism

Redis

The expiration time can be set for the key. After expiration, Redis uses a combination of active and passive failure mechanisms. One is to trigger passive deletion during access like MC, and the other is regular active deletion. Periodic lazy memory elimination

Caching FAQ

Cache update methodThis is an issue that should be considered when deciding to use caching.

The cached data needs to be updated when the data source changes. The data source may be DB or a remote service. The update method can be active update. When the data source is DB, the cache can be updated directly after updating the DB. When the data source is not DB but other remote services, it may not be able to proactively sense data changes in a timely manner. In this case, you generally choose to set an expiration date for the cached data, which is the maximum tolerance time for data inconsistency. In this scenario, you can choose invalidation update. When the key does not exist or is invalid, first request the data source to obtain the latest data, then cache it again, and update the expiration date.

But there is a problem with this. If the dependent remote service encounters an exception during update, the data will be unavailable. The improved method is asynchronous update, which means that the data is not cleared when it expires, but the old data is continued to be used, and then the asynchronous thread performs the update task. This avoids the window period at the moment of failure. There is also a purely asynchronous update method that updates data in batches at regular intervals. In actual use, you can choose the update method according to the business scenario.

Data Inconsistency

The second problem is data inconsistency. It can be said that as long as you use cache, you must consider how to face this problem. . The reason for cache inconsistency is generally the failure of active update. For example, after updating the DB, the update

Redis

request times out due to network reasons; or the asynchronous update fails. The solution is that if the service is not particularly sensitive to time-consuming, you can increase retries; if the service is sensitive to time-consuming, you can handle failed updates through asynchronous compensation tasks, or short-term data inconsistencies will not affect the business. , then as long as the next update can succeed and the final consistency can be guaranteed.

Cache Penetration

. The cause of this problem may be an external malicious attack. For example, user information is cached, but the malicious attacker frequently requests the interface using a non-existent user ID, causing the query cache to miss, and then the query through the DB still misses. At this time, there will be a large number of requests penetrating the cache to access the DB.

The solution is as follows.

For non-existing users, save an empty object in the cache to mark it to prevent the same ID from accessing the DB again. However, sometimes this method does not solve the problem well and may cause a large amount of useless data to be stored in the cache.

Use BloomFilter filter. The characteristic of BloomFilter is existence detection. If it does not exist in BloomFilter, then the data must not exist; if it exists in BloomFilter, the actual data It may not exist. Very suitable for solving this kind of problems.

Cache breakdown

Cache breakdown is when a certain hotspot data fails , a large number of requests for this data will penetrate the data source.

There are following ways to solve this problem.

You can use a mutex update to ensure that there will be no concurrent requests to the DB for the same data in the same process, reducing DB pressure.

Use random backoff method, randomly sleep for a short time when it fails, query again, and perform update if it fails.

To address the problem of multiple hotspot keys failing at the same time, you can use a fixed time plus a small random number when caching to avoid a large number of hotspot keys failing at the same time.

Cache avalanche

Cache avalanche, the reason is that the cache hangs, this All requests will be sent to the DB.

Solution:

Use a fast-failure circuit breaker strategy to reduce the instantaneous pressure on the DB;

Use master-slave mode and Cluster mode is used to ensure the high availability of cache services.

In actual scenarios, these two methods will be used in combination.

Old friends all know why I didn’t introduce it in a big way Let’s just mention a few points here. My previous article was really too detailed, so I can’t help but Like , I won’t duplicate it here.

"Hitting the Interviewer" Series - Redis Basics

"Hitting the Interviewer" Series - Cache avalanche, breakdown, penetration

"Hitting the Interviewer" Series - Redis Sentinel, Persistence, Master-Slave, Hand-Teared LRU

"Hitting the Interviewer" Series-Redis Final Chapter-Winter is Coming, FPX-The New King Ascends to the Throne

Test Points and Bonus Points

Take notes!

Test site

I will ask you about cache during the interview, mainly for inspection Understanding of caching features, mastering the characteristics and usage of MC and Redis.

You need to know the usage scenarios of cache and how to use different types of cache, for example:

-Cache DB hot data to reduce DB pressure ;Cache dependent services to improve concurrency performance;

- Simple K-V caching scenarios can use MC, but special data formats such as list and set need to be cached , you can use Redis;

- If you need to cache a list of videos recently played by a user, you can use the list of Redis to save it and calculate the ranking. When ranking data, you can use the zset structure of Redis to save it.

To understand the common commands of MC and Redis, such as atomic increase and decrease, commands to operate on different data structures, etc.

Understanding the storage structure of MC and Redis in memory will be helpful for evaluating usage capacity.

Understand the data failure methods and deletion strategies of MC and Redis, such as actively triggered periodic deletion and passively triggered deferred deletion

To understand the persistence, master-slave synchronization and Cluster deployment principles of Redis, such as the implementation of RDB and AOF and difference.

You need to know the similarities, differences and solutions of cache penetration, breakdown and avalanche.

Whether you have e-commerce experience or not, I think you should know the specific implementation of flash sales and the details.

……..

Extra points

If If you want to perform better in interviews, you should also know the following bonus points.

is to introduce the use of cache based on actual application scenarios. For example, when calling the back-end service interface to obtain information, you can use local and remote multi-level cache; for dynamic ranking scenarios, you can consider using Sorted set of Redis, etc.

It is best if you have experience in distributed cache design and use, such as in what scenarios Redis was used in the project, what data structures were used, and what types of problems were solved; When using MC, adjust McSlab allocation parameters and so on according to the estimated size.

It is best to understand the problems that may arise when using cache. For example, Redis is a single-thread processing request. Time-consuming single request tasks should be avoided as much as possible to prevent mutual influence; Redis services should avoid being deployed in the same place as other CPU-intensive processes. machine; or disable Swap memory exchange to prevent Redis cache data from being swapped to the hard disk and affecting performance. Another example is the MC calcification problem mentioned earlier.

To understand the typical application scenarios of Redis, for example, use Redis to implement distributed locks; use Bitmap To implement BloomFilter, use HyperLogLog to perform UV statistics, etc.

Know the new features in Redis4.0 and 5.0, such as the persistent message queue Stream that supports multicast; customized function expansion through the Module system, etc.

……..

For more programming-related knowledge, please visit: Programming Video! !

The above is the detailed content of Summary of common interview questions about Redis (with answer analysis). For more information, please follow other related articles on the PHP Chinese website!

Statement：

This article is reproduced at:csdn.net. If there is any infringement, please contact admin@php.cn delete

Previous article：Share some interview questions about distributed caching in Redis (with answer analysis)Next article：Share some interview questions about distributed caching in Redis (with answer analysis)

See more

Summary of common interview questions about Redis (with answer analysis)

Related articles