Comprehensive analysis of memcached – 3. memcached deletion mechanism and development direction

The following is the third part of "Comprehensive Analysis of Memcached".

Published date: 2008/7/16
Author: Toru Maesaka
Original link: http://gihyo.jp/dev/feature/01/memcached/0003

The link to this series of articles is here:

The 1st time: http://www.phpchina.com/html/29/n-35329.html
The 2nd time: http://www.phpchina.com/html/30/n-35330.html
The 3rd time: http://www.phpchina.com/html /31/n-35331.html
The 4th time: http://www.phpchina.com/html/32/n-35332.html href="http://www.phpchina.com/html/32/n-35332.html">
The 5th time: http://www.phpchina.com/html/32/n-35333.html href="http://www.phpchina.com/html/32/n-35333.html"> href="http://www.phpchina.com/html/32/n-35332.html">

memcached effectively utilizes resources in terms of data deletion
- Data will not really disappear from memcached
- Lazy Expiration
LRU: The principle of effectively deleting data from cache
memcached The latest development direction
- About the binary protocol
- The format of the binary protocol
- The eye-catching aspects of HEADER
External engine support
- The necessity of external engine support
- The key to the success of simple API design
- Re-examine the current system
Summary

Memcached is a cache, so the data will not be permanently saved on the server. This is the prerequisite for introducing memcached into the system. This article introduces the data deletion mechanism of memcached, as well as the latest development direction of memcached - Binary Protocol and external engine support.

Memcached effectively utilizes resources in terms of data deletion

Data will not really disappear from memcached

As introduced last time, memcached will not release allocated memory. After the record times out, the client can no longer see the record (invisible, transparent), and its storage space can be reused.

Lazy Expiration

Memcached does not internally monitor whether the record has expired. Instead, it checks the timestamp of the record when getting it to check whether the record has expired. This technique is called lazy expiration. Therefore, memcached does not consume CPU time on expiration monitoring.

LRU: The principle of effectively deleting data from the cache

memcached will give priority to the space of the timed-out records, but even so, there will be insufficient space when appending new records. This A mechanism called Least Recently Used (LRU) is used to allocate space. As the name suggests, this is a mechanism for deleting "least recently used" records. Therefore, when memcached's memory space is insufficient (new space cannot be obtained from slab class), it searches from recently unused records and allocates its space to new records. From a caching practical perspective, this model is ideal.

However, in some cases, the LRU mechanism can cause trouble. LRU can be disabled through the "-M" parameter when memcached is started, as shown below:

$ memcached -M -m 1024

Copy after login

It must be noted when starting that the lowercase "-m" option is used to specify the maximum memory size. If no specific value is specified, the default value of 64MB is used.

After starting with the "-M" parameter specified, memcached will return an error when the memory is exhausted. Having said that, memcached is not a memory after all, but a cache, so it is recommended to use LRU.

The latest development direction of memcached

There are two big goals on the roadmap of memcached. One is the planning and implementation of the binary protocol, and the other is the loading function of the external engine.

About the binary protocol

The reason for using the binary protocol is that it does not require the parsing and processing of the text protocol, which improves the performance of the originally high-speed memcached and reduces the vulnerabilities of the text protocol. . It has been mostly implemented, and the code base for development already contains this feature. There is a link to the code library on the memcached download page.

http://danga.com/memcached/download.bml

The format of the binary protocol

The packet of the protocol is a 24-byte frame , followed by keys and unstructured data (Unstructured Data). The actual format is as follows (quoted from the protocol documentation):

 Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0/ HEADER / / / / / / / +---------------+---------------+---------------+---------------+ 24/ COMMAND-SPECIFIC EXTRAS (as needed) / +/ (note length in th extras length header field) / +---------------+---------------+---------------+---------------+ m/ Key (as needed) / +/ (note length in key length header field) / +---------------+---------------+---------------+---------------+ n/ Value (as needed) / +/ (note length is total body length header field, minus / +/ sum of the extras and key length body fields) / +---------------+---------------+---------------+---------------+ Total 24 bytes

Copy after login

As shown above, the packet format is very simple. It should be noted that the 16-byte header (HEADER) is divided into two types: Request Header and Response Header. The header contains information such as Magic bytes indicating the validity of the package, command type, key length, value length, etc. The format is as follows:

Request Header Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0| Magic | Opcode | Key length | +---------------+---------------+---------------+---------------+ 4| Extras length | Data type | Reserved | +---------------+---------------+---------------+---------------+ 8| Total body length | +---------------+---------------+---------------+---------------+ 12| Opaque | +---------------+---------------+---------------+---------------+ 16| CAS | | | +---------------+---------------+---------------+---------------+

Copy after login

Response Header Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0| Magic | Opcode | Key Length | +---------------+---------------+---------------+---------------+ 4| Extras length | Data type | Status | +---------------+---------------+---------------+---------------+ 8| Total body length | +---------------+---------------+---------------+---------------+ 12| Opaque | +---------------+---------------+---------------+---------------+ 16| CAS | | | +---------------+---------------+---------------+---------------+

Copy after login

If you want to know the details of each part, You can checkout the code tree of the binary protocol of memcached and refer to the protocol_binary.txt document in the docs folder.

What’s striking about HEADER

My impression after seeing the HEADER format is that the upper limit of keys is too big! In the current memcached specification, the maximum key length is 250 bytes, but the key size in the binary protocol is expressed as 2 bytes. Therefore, in theory, a maximum length of 65536 bytes (2¹⁶) can be used. Although keys larger than 250 bytes are not very common, the release of the binary protocol will allow the use of huge keys.

Binary protocol is supported starting from the next version 1.3 series.

External engine support

I experimentally transformed the memcached storage layer into pluggable last year.

http://alpha.mixi.co.jp/blog/?p=129

After MySQL’s Brian Aker saw this transformation, he sent the code Go to the memcached mailing list. The developers of memcached were also very interested and put it in the roadmap. It is now developed (specification design, implementation and testing) by me and Trond Norbye, the developer of memcached. The time difference in collaborative development with foreign countries is a big problem, but with the same vision, we can finally announce the prototype of the scalable architecture. The code base can be accessed from memcached’s download page.

The necessity of external engine support

There are many derivatives of memcached in the world. The reason is that they want to permanently save data, achieve data redundancy, etc., even if they sacrifice some performance. Before I developed memcached, I also considered reinventing memcached in mixi's R&D department.

The loading mechanism of the external engine can encapsulate memcached's network functions, event processing and other complex processing. Therefore, the current difficulties in cooperating with memcached and storage engines through forced means or redesign will disappear, and it will be easy to try various engines.

The key to success in simple API design

What we attach most importance to in this project is API design. Too many functions will cause trouble for engine developers; if they are too complex, the threshold for implementing the engine will be too high. Therefore, the initial version of the interface function only has 13. The specific content is limited to space, so it is omitted here. It only explains the operations that the engine should complete:

Engine information (version, etc.)
Engine initialization
Engine shutdown
Engine statistics
In terms of capacity, test whether a given record can be saved
Allocate memory for the item (record) structure
Release item ( Record) memory
Delete record
Save record
Recycle record
Update record timestamp
Mathematical operation processing
Data flush

Readers who are interested in detailed specifications can checkout the code of the engine project and engine.h in the reader.

Re-examine the current system

The difficulty with memcached supporting external storage is that the network and event processing related code (core server) is closely related to the memory storage code. This phenomenon is also called tightly coupled. In-memory storage code must be separated from the core server to flexibly support external engines. Therefore, based on the API we designed, memcached was restructured into the following:

Comprehensive analysis of memcached – 3. memcached deletion mechanism and development direction_PHP tutorial

After refactoring, we compared the performance with version 1.2.5, binary protocol support version, etc., and confirmed that it does not There will be a performance impact.

When considering how to support external engine loading, it is easiest to let memcached perform concurrency control. However, for the engine, concurrency control is the essence of performance, so we adopted Multi-threading support is completely left to the design of the engine.

Future improvements will make memcached more widely used.

Summary

This article introduces the timeout principle of memcached, how to delete data internally, etc. On top of this, it also introduces the latest development direction of memcached such as binary protocol and external engine support. These features will not be supported until version 1.3, so stay tuned!

This is my last article in this series. Thank you all for reading my article!

Next time, Nagano will introduce the application knowledge and application compatibility of memcached.