1. Requirement background
This application scenario is a DMP cache storage requirement. DMP needs to manage a lot of third-party ID data, including various media cookies. The mapping relationship with its own cookie (hereinafter collectively referred to as superrid) also includes the population tag of superid, the population tag of mobile ID (mainly IDFA and imei), as well as some blacklist ID, IP and other data.
It is not difficult to use HDFS to store hundreds of billions of records offline, but for DMP, it needs to provide millisecond-level real-time queries. Since the cookie ID itself is unstable, the browsing behavior of many real users will lead to the generation of a large number of new cookies. Only the mapping data can be synchronized in time to hit the DMP population tag, and higher hits cannot be obtained through preheating. , which brings great challenges to cache storage.
After actual testing, for the above data, conventional storage of more than 5 billion kv records requires more than 1T of memory. If high-availability multiple copies are required, the consumption will be huge. In addition, the length of kv Inconsistency will also bring a lot of memory fragmentation, which requires a very large-scale storage solution to solve the above problems.
2. What kind of data is stored
Person tags are mainly cookie, imei, idfa and their corresponding gender, age (age group), geo (region), etc.; the mapping relationship is mainly the mapping of media cookies to superid. The following is an example of data storage:
1) PC ID:
Media number-Media cookie=>supperid
supperid => { age=>Age Segment coding, gender=>Gender coding, geo=>Geolocation coding}
2) ID on the Device side:
imei or idfa => { age=> Age range coding , gender=>Gender coding, geo=>Geolocation coding}
Obviously PC data needs to store two types of key=>value and key=>hashmap, while Device data needs to be stored one by one. Just type
key=>hashmap.
3. Data characteristics
Short key short value: superid is a 21-digit number: such as 1605242015141689522; imei It is lowercase md5: such as 2d131005dc0f37d362a5d97094103633; idfa is uppercase with "-" md5: for example: 51DFFC83-9541-4411-FA4F-356927E39D04;
The media's own cookies vary in length;
It is necessary to provide services for the entire amount of data, superid is tens of billions, media mapping is hundreds of billions, and mobile id is billions;
Billions of mapping relationships are generated every day;
Hot data can be predicted within a larger time window (there are some remaining stable cookies);
It is impossible to predict hot data from the current mapping data, many of which are newly generated cookies;
4. Existing technical challenges
1) Different lengths can easily cause memory fragmentation;
2) Due to the large number of pointers, the memory expansion rate is relatively high, generally 7 times, which is a common problem in pure memory storage;
3) Although the popularity of cookies can be predicted by their behavior, there are still many newly generated IDs every day (the percentage is sensitive and will not be disclosed for now);
4) Due to service requirements in the public network environment ( Domestic public network delay is less than 60ms) within 100ms, so in principle, the newly updated mapping and population tags on the day need to be all in memory, so as not to let the request fall into the cold data of the backend;
5) In terms of business, In principle, all data is retained for at least 35 days or even longer;
6) Memory is still relatively expensive, and storage solutions with tens of billions of keys or even hundreds of billions of keys are imperative!
5. Solution
5.1 Elimination Strategy
Every day There is a lot of new data entering the database, so it becomes particularly important to clean the data in a timely manner, which is a major cause of storage shortages. The main method is to discover and retain hot data and eliminate cold data.
The number of network users is far from reaching billions, and their IDs will continue to change over time and have a certain lifespan. So to a large extent the ids we store are actually invalid. In fact, the logic of the front-end query is advertising exposure, which is related to human behavior, so there will be a certain degree of repeatability in the access behavior of an ID in a certain time window (maybe a campaign, half a month, a few months).
Before data initialization, we first use hbase to aggregate and deduplicate the IDs of the logs, and define the TTL range, which is usually 35 days, so that IDs that have not appeared in the past 35 days can be cut off. In addition, the expiration time is set in Redis to 35 days. When accessed and hit, the key will be renewed, the expiration time will be extended, and those that do not appear within 35 days will be naturally eliminated. This can be effective for stable cookies or IDs. It has actually been proven that the life extension method is more practical for IDFA and imei, and long-term accumulation can achieve very ideal hits.
5.2 Reduce expansion
The size of the Hash table space and the number of Keys determine the conflict rate (or measured by the load factor), no matter how reasonable it is Within the range, the more keys, the larger the hash table space, and the memory consumed will naturally be large. In addition, a large number of pointers themselves are long integers, so the expansion of memory storage is considerable. Let’s first talk about how to reduce the number of keys.
Let’s first understand a storage structure. According to the following steps, key1=>value1 can be stored in redis, which is what we expect. First use the fixed-length random hash md5 (key) value as the redis key, which we call BucketId, and store key1=>value1 in the hashmap structure, so that the client can follow the above process when querying Calculate the hash and query value1.
The process change is simply described as: get(key1) -> hget(md5(key1), key1) to obtain value1.
If we allow many keys to collide in the BucketId space through pre-calculation, then it can be considered that there are multiple keys hanging under one BucketId. For example, if there are an average of 10 keys per BucketId, theoretically we will reduce the number of redis keys by more than 90%.
There are some troubles in the specific implementation, and you have to think about the capacity scale before using this method. The md5 we usually use is 32-bit hexString (hexadecimal characters), and its space is 128 bits. This magnitude is too large. What we need to store is tens of billions, which is about 33 bits, so we need a mechanism to calculate To generate a hash with the appropriate number of digits, and in order to save memory, we need to use all character types (ASCII codes between 0 and 127) to fill instead of HexString, so that the length of the Key can be shortened to half.
The following is the specific implementation
public static byte [] getBucketId(byte [] key, Integer bit) { MessageDigest mdInst = MessageDigest.getInstance("MD5"); mdInst.update(key); byte [] md = mdInst.digest(); byte [] r = new byte[(bit-1)/7 + 1];// 因为一个字节中只有7位能够表示成单字符 int a = (int) Math.pow(2, bit%7)-2; md[r.length-1] = (byte) (md[r.length-1] & a); System.arraycopy(md, 0, r, 0, r.length); for(int i=0;i<r.length if r return><p>The final size of the BucketId space is determined by the parameter bit. The optional set of space sizes is a discrete integer power of 2. Here is an explanation of why only 7 bits are available in a byte. This is because when redis stores the key, it needs to be ASCII (0~127), not a byte array. If we plan tens of billions of storage and plan to share 10 KVs per bucket, then we only need 2^30=1073741824 buckets, which is the final number of keys. </p> <p><strong><em>5.3 Reduce fragmentation</em></strong></p> <p>The main reason for fragmentation is that the memory cannot be aligned and the memory cannot be reallocated after expiration and deletion. Through the method described above, we can store population labels and mapping data in the above way. The advantage of this is that the redis keys are of equal length. In addition, we have also made relevant optimizations for the key in the hashmap, intercepting the last six digits of the cookie or deviceid as the key, which can also ensure memory alignment. In theory, there is the possibility of conflict, but the probability of the same suffix in the same bucket Extremely low (Imagine that the ID is an almost random string. The probability of 10 random IDs consisting of longer characters with the same suffix * number of bucket samples = expected value of conflict </p> <p>In addition, there is a very low but effective way to reduce fragmentation. Restart the slave, and then force failover to switch the master and slave. This is equivalent to defragmenting the memory of the master. </p> <p>Recommend Google-tcmalloc and facebook-jemalloc memory allocation, which can reduce memory fragmentation and memory consumption when the value is not large. Some people have measured that libc is more economical when the value is large. </p> <p><em><strong>6. Issues that need to be paid attention to in the md5 hash bucket method</strong></em></p> <p>1) The magnitude of kv storage must be planned in advance, floating The range is about ten to fifteen times the number of buckets. For example, if I want to store about 10 billion kv, it is best to choose 30bit~31bit as the number of buckets. In other words, there is no problem in business growth within a reasonable range (10 to 15 times growth). If the business grows by too many multiples, it will cause the hashset to grow too fast, increase the query time, and even trigger the zip-list threshold, resulting in Memory increases dramatically. </p> <p>2) Suitable for short values. If the value is too large or there are too many fields, it is not suitable, because this method must require the value to be taken out at one time. For example, the population label is a very small code, even only 3. 4 bits can be installed. 3) The typical method of exchanging time for space. Since our business scenario does not require extremely high qps, which is generally at the level of 100 million to 1 billion per day, it is also very economical to make reasonable use of the CPU rental value. </p> <p>After using information digest, the key cannot be randomly generated from Redis because the size of the key is reduced and the length is limited. If export is required, it must be exported in cold data. </p> <p>5) expire needs to be implemented by yourself. The current algorithm is very simple. Since the consumption will only increase during the write operation, it is sampled according to a certain proportion during the write operation, and HLEN hits are used to determine whether there are more than 15 entries. , the expired key will be deleted only when it exceeds, and the TTL timestamp is stored in the first 32 bits of the value. </p> <p>6) Bucket consumption statistics need to be done. Expired keys need to be cleaned regularly to ensure that redis queries will not slow down. </p> <p><em><strong>7. Test results</strong></em></p> <p>There are 10 billion records of population tags and mapping data. </p> <p>Before optimization, about 2.3T of storage space was used, and the fragmentation rate was about 2; after optimization, about 500g of storage space was used, and the average usage of each bucket was about 4 . The fragmentation rate is around 1.02. This consumes very little CPU when querying. </p> <p>It should also be mentioned that the consumption of each bucket is not actually uniform, but conforms to a polynomial distribution. </p> <p><img src="https://img.php.cn/upload/article/000/887/227/168543988688361.png" alt="How to implement Redis tens of billions of key storage solutions"></p> <p>The above formula can calculate the probability distribution of bucket consumption. This formula is just to remind everyone that bucket consumption cannot be taken for granted. It is possible that some buckets may contain hundreds of keys. But the truth is not that exaggerated. Imagine tossing a coin and there are only two possible outcomes: heads and tails. It is equivalent to having only two buckets. If you throw an infinite number of times, each time is equivalent to a Bernoulli experiment, then the two buckets will definitely be very even. When you perform a lot of generalized Bernoulli experiments and face many barrels, the probability distribution is like an invisible magic that hangs over you. The consumption distribution of buckets will tend to a stable value. Next, let’s take a look at the specific bucket consumption distribution: </p> <p>Through sampling statistics</p> <p>31bit (more than 2 billion) buckets have an average consumption of 4.18</p> <p> <img src="https://img.php.cn/upload/article/000/887/227/168543988662298.png" alt="How to implement Redis tens of billions of key storage solutions"></p> <p>10 billion saves 1.8T of memory. Original text rewritten: Not only did it save 78% of the original memory, but the bucket consumption indicator was also much lower than the expected bottom line value of 15. </p> <p>There is also a certain amount of buckets that do not appear. If there are too many, the planning will be inaccurate. In fact, the number is in line with the binomial distribution. For 2^30 buckets to store 2^32kv, the non-existent buckets are about Yes (million level, little impact): </p> <p>Math.pow((1 - 1.0 / Math.pow(2, 30)), Math.pow(2, 32)) * Math.pow( 2, 30);</p> <p>Don’t worry too much about the problem of uneven bucket consumption. As time goes by, buckets with HLEN exceeding 15 will be reduced when writing. According to the principle of polynomial distribution, when the number of experiments When the number reaches a certain level, the distribution of buckets will tend to be even (if a coin is tossed countless times, the number of heads and tails should be the same), but we have reduced bucket consumption through the expire strategy. In fact, each bucket has experienced A lot of experiments took place. </p></r.length>
The above is the detailed content of How to implement Redis tens of billions of key storage solutions. For more information, please follow other related articles on the PHP Chinese website!