The following is the third part of "Comprehensive Analysis of Memcached".
Published date: 2008/7/16
Author: Toru Maesaka
Original link: http://gihyo.jp/dev/feature/01/memcached/0003
The link to this series of articles is here:
Memcached is a cache, so the data will not be permanently saved on the server. This is the prerequisite for introducing memcached into the system. This article introduces the data deletion mechanism of memcached, as well as the latest development direction of memcached - Binary Protocol and external engine support.
As introduced last time, memcached will not release allocated memory. After the record times out, the client can no longer see the record (invisible, transparent), and its storage space can be reused.
Memcached does not internally monitor whether the record has expired. Instead, it checks the timestamp of the record when getting it to check whether the record has expired. This technique is called lazy expiration. Therefore, memcached does not consume CPU time on expiration monitoring.
memcached will give priority to the space of the timed-out records, but even so, there will be insufficient space when appending new records. This A mechanism called Least Recently Used (LRU) is used to allocate space. As the name suggests, this is a mechanism for deleting "least recently used" records. Therefore, when memcached's memory space is insufficient (new space cannot be obtained from slab class), it searches from recently unused records and allocates its space to new records. From a caching practical perspective, this model is ideal.
However, in some cases, the LRU mechanism can cause trouble. LRU can be disabled through the "-M" parameter when memcached is started, as shown below:
$ memcached -M -m 1024
It must be noted when starting that the lowercase "-m" option is used to specify the maximum memory size. If no specific value is specified, the default value of 64MB is used.
After starting with the "-M" parameter specified, memcached will return an error when the memory is exhausted. Having said that, memcached is not a memory after all, but a cache, so it is recommended to use LRU.
There are two big goals on the roadmap of memcached. One is the planning and implementation of the binary protocol, and the other is the loading function of the external engine.
The reason for using the binary protocol is that it does not require the parsing and processing of the text protocol, which improves the performance of the originally high-speed memcached and reduces the vulnerabilities of the text protocol. . It has been mostly implemented, and the code base for development already contains this feature. There is a link to the code library on the memcached download page.
The packet of the protocol is a 24-byte frame , followed by keys and unstructured data (Unstructured Data). The actual format is as follows (quoted from the protocol documentation):
Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0/ HEADER / / / / / / / +---------------+---------------+---------------+---------------+ 24/ COMMAND-SPECIFIC EXTRAS (as needed) / +/ (note length in th extras length header field) / +---------------+---------------+---------------+---------------+ m/ Key (as needed) / +/ (note length in key length header field) / +---------------+---------------+---------------+---------------+ n/ Value (as needed) / +/ (note length is total body length header field, minus / +/ sum of the extras and key length body fields) / +---------------+---------------+---------------+---------------+ Total 24 bytes
As shown above, the packet format is very simple. It should be noted that the 16-byte header (HEADER) is divided into two types: Request Header and Response Header. The header contains information such as Magic bytes indicating the validity of the package, command type, key length, value length, etc. The format is as follows:
Request Header Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0| Magic | Opcode | Key length | +---------------+---------------+---------------+---------------+ 4| Extras length | Data type | Reserved | +---------------+---------------+---------------+---------------+ 8| Total body length | +---------------+---------------+---------------+---------------+ 12| Opaque | +---------------+---------------+---------------+---------------+ 16| CAS | | | +---------------+---------------+---------------+---------------+
Response Header Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0| Magic | Opcode | Key Length | +---------------+---------------+---------------+---------------+ 4| Extras length | Data type | Status | +---------------+---------------+---------------+---------------+ 8| Total body length | +---------------+---------------+---------------+---------------+ 12| Opaque | +---------------+---------------+---------------+---------------+ 16| CAS | | | +---------------+---------------+---------------+---------------+
If you want to know the details of each part, You can checkout the code tree of the binary protocol of memcached and refer to the protocol_binary.txt document in the docs folder.
My impression after seeing the HEADER format is that the upper limit of keys is too big! In the current memcached specification, the maximum key length is 250 bytes, but the key size in the binary protocol is expressed as 2 bytes. Therefore, in theory, a maximum length of 65536 bytes (216) can be used. Although keys larger than 250 bytes are not very common, the release of the binary protocol will allow the use of huge keys.
Binary protocol is supported starting from the next version 1.3 series.
I experimentally transformed the memcached storage layer into pluggable last year.
After MySQL’s Brian Aker saw this transformation, he sent the code Go to the memcached mailing list. The developers of memcached were also very interested and put it in the roadmap. It is now developed (specification design, implementation and testing) by me and Trond Norbye, the developer of memcached. The time difference in collaborative development with foreign countries is a big problem, but with the same vision, we can finally announce the prototype of the scalable architecture. The code base can be accessed from memcached’s download page.
There are many derivatives of memcached in the world. The reason is that they want to permanently save data, achieve data redundancy, etc., even if they sacrifice some performance. Before I developed memcached, I also considered reinventing memcached in mixi's R&D department.
The loading mechanism of the external engine can encapsulate memcached's network functions, event processing and other complex processing. Therefore, the current difficulties in cooperating with memcached and storage engines through forced means or redesign will disappear, and it will be easy to try various engines.
What we attach most importance to in this project is API design. Too many functions will cause trouble for engine developers; if they are too complex, the threshold for implementing the engine will be too high. Therefore, the initial version of the interface function only has 13. The specific content is limited to space, so it is omitted here. It only explains the operations that the engine should complete:
Readers who are interested in detailed specifications can checkout the code of the engine project and engine.h in the reader.
The difficulty with memcached supporting external storage is that the network and event processing related code (core server) is closely related to the memory storage code. This phenomenon is also called tightly coupled. In-memory storage code must be separated from the core server to flexibly support external engines. Therefore, based on the API we designed, memcached was restructured into the following:
After refactoring, we compared the performance with version 1.2.5, binary protocol support version, etc., and confirmed that it does not There will be a performance impact.
When considering how to support external engine loading, it is easiest to let memcached perform concurrency control. However, for the engine, concurrency control is the essence of performance, so we adopted Multi-threading support is completely left to the design of the engine.
Future improvements will make memcached more widely used.
This article introduces the timeout principle of memcached, how to delete data internally, etc. On top of this, it also introduces the latest development direction of memcached such as binary protocol and external engine support. These features will not be supported until version 1.3, so stay tuned!
This is my last article in this series. Thank you all for reading my article!
Next time, Nagano will introduce the application knowledge and application compatibility of memcached.