Introduction to Linux memory management-Linux Operation and Maintenance-php.cn

The content of this article is to introduce Linux memory management to let everyone understand the relevant knowledge of Linux memory management. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

In Linux, when you use top, vmstat, free and other commands to check the memory usage of the system or process, you often see buff/cache memeory, swap, avail Mem, etc. What do they mean? This article will talk about memory management under Linux and answer this question.

Discussing memory management under Linux is actually discussing the implementation of virtual memory under Linux. I am not a kernel expert, so this article will only introduce some conceptual things and will not go into implementation details. Some places describe may not be accurate.

In the early days, physical memory was relatively limited. People hoped that the memory space that programs could use could exceed the actual physical memory, so the concept of virtual memory appeared. However, as time goes by, the meaning of virtual memory has changed. Far exceeded the original idea.

1. Virtual memory

Virtual memory is a technology used by Linux to manage memory. It makes each application think that it has an independent and continuous available memory space (a continuous and complete address space), but in fact, it is usually mapped to multiple physical memory segments, and some are temporarily stored externally. disk storage and then loaded into memory when needed.

The size of the virtual address that each process can use is related to the number of CPU bits. On a 32-bit system, the virtual address space size is 4G. On a 64-bit system, it is 2^64=? (Can’t figure it out). The actual physical memory may be much smaller than the size of the virtual address space.

Virtual addresses are closely related to processes. The same virtual address in different processes does not necessarily point to the same physical address, so it makes no sense to talk about virtual addresses without leaving the process.

Note: Many articles on the Internet equate virtual memory with swap space. In fact, the description is not rigorous enough. Swap space is only a part of the big blueprint of virtual memory.

2. The relationship between virtual memory and physical memory

The following The table expresses the relationship between them very intuitively

  进程X                                                                      进程Y
+-------+                                                                  +-------+
| VPFN7 |--+                                                               | VPFN7 |
+-------+  |       进程X的                                 进程Y的           +-------+
| VPFN6 |  |      Page Table                              Page Table     +-| VPFN6 |
+-------+  |      +------+                                +------+       | +-------+
| VPFN5 |  +----->| .... |---+                    +-------| .... |<---+  | | VPFN5 |
+-------+         +------+   |        +------+    |       +------+    |  | +-------+
| VPFN4 |    +--->| .... |---+-+      | PFN4 |    |       | .... |    |  | | VPFN4 |
+-------+    |    +------+   | |      +------+    |       +------+    |  | +-------+
| VPFN3 |--+ |    | .... |   | | +--->| PFN3 |<---+  +----| .... |<---+--+ | VPFN3 |
+-------+  | |    +------+   | | |    +------+       |    +------+    |    +-------+
| VPFN2 |  +-+--->| .... |---+-+-+    | PFN2 |<------+    | .... |    |    | VPFN2 |
+-------+    |    +------+   | |      +------+            +------+    |    +-------+
| VPFN1 |    |               | +----->| FPN1 |                        +----| VPFN1 |
+-------+    |               |        +------+                             +-------+
| VPFN0 |----+               +------->| PFN0 |                             | VPFN0 |
+-------+                             +------+                             +-------+
 虚拟内存                               物理内存                               虚拟内存


PFN(the page frame number)： 页编号

Copy after login

When a process executes a program, it needs to first read the instructions of the process from the memory and then execute it. The virtual one is used to obtain the instructions. Address, this address is determined when the program is linked (the address range of the dynamic library will be adjusted when the kernel loads and initializes the process). In order to obtain the actual data, the CPU needs to convert the virtual address into a physical address, which is needed when the CPU converts the address. The page table of the process, and the data in the page table is maintained by the operating system.

Note: Linux kernel code uses actual physical addresses when accessing memory, so there is no conversion from virtual address to physical address, only application layer programs need it.

In order to facilitate conversion, Linux splits both virtual memory and physical memory into fixed-size pages. The general memory page size of x86 systems is 4K, and each page is assigned a unique number. This It is the page number (PFN).

As can be seen from the above figure, the mapping between virtual memory and physical memory pages is done through the page table. The virtual memory of processes X and Y are independent of each other, and the page table is also independent. Physical memory is shared between them. Processes can access their own virtual address space at will, while page tables and physical memory are maintained by the kernel. When a process needs to access memory, the CPU will translate the virtual address into a physical address based on the process's page table, and then access it.

Note: Not every page in the virtual address space is associated with a corresponding Page Table. Only after the virtual address is assigned to the process, that is, after the process calls a function similar to malloc, will the system Add a record to the Page Table for the corresponding virtual address. If the process accesses a virtual address that is not associated with the Page Table, the system will throw a SIGSEGV signal, causing the process to exit. This is why segmentfault often occurs when we access wild pointers. reason. In other words, although each process has a 4G (32-bit system) virtual address space, only those address spaces applied to the system can be used, and segmentfault errors will occur when accessing unallocated address spaces. Linux will not map virtual address 0 to anywhere, so when we access the null pointer, a segmentfault error will definitely be reported.

3. Advantages of virtual memory

● Larger address space: and it is continuous, making it easier to write and link programs More simple

● Process isolation: There is no relationship between the virtual addresses of different processes, so the operation of one process will not affect other processes

● Data protection: Each piece of virtual memory has Corresponding read and write attributes can protect the program's code segments from being modified, data blocks from being executed, etc., which increases the security of the system

　●Memory mapping: With virtual memory, files (executable files or dynamic libraries) on the disk can be directly mapped to the virtual address space. This can achieve delayed allocation of physical memory, and only when the corresponding file needs to be read When the memory is tight, it is actually loaded from the disk into the memory. When the memory is tight, this part of the memory can be cleared to improve the physical memory utilization efficiency, and all of this is transparent to the application.

● Shared memory: For example, a dynamic library only needs to store a copy in the memory, and then map it to the virtual address space of a different process, making the process feel that it has exclusive ownership of the file. Memory sharing between processes can also be achieved by mapping the same physical memory to different virtual address spaces of the process

● Physical memory management: The physical address space is all managed by the operating system, and the process cannot be directly allocated and recycled. In this way, the system can make better use of memory and balance the memory requirements between processes

●Others: With the virtual address space, functions such as swap space and COW (copy on write) can be easily implemented

4. Page table

The page table can be simply understood as a memory mapping linked list (of course the actual structure is very complex). Each page table in it Each memory mapping maps a virtual address to a specific resource (physical memory or external storage space). Each process has its own page table, which has nothing to do with the page tables of other processes.

5. Memory mapping

Each memory mapping is a description of a section of virtual memory, including the starting position and length of the virtual address, Permissions (such as whether the data in this memory can be read, written, and executed), and associated resources (such as physical memory pages, pages on swap space, file contents on disk, etc.).

When a process applies for memory, the system will return the virtual memory address, create memory mapping for the corresponding virtual memory and put it into the page table, but the system will not necessarily allocate the corresponding physical memory at this time. Generally, physical memory is allocated and associated with the corresponding memory mapping when the process actually accesses this memory. This is the so-called delayed allocation/on-demand allocation.

Each memory mapping has a tag to indicate the associated physical resource type. It is generally divided into two categories, namely anonymous and file backed. Within these two categories, there are some smaller ones. Classes, for example, there are more specific shared and copy on write types under anonymous, and there are more specific device backed types under file backed. The following is what each type represents:

file backed

This type indicates that the physical resources corresponding to the memory mapping are stored in files on the disk. The information included includes the file location, offset, rwx permissions, etc.

When the process accesses the corresponding virtual page for the first time, because the corresponding physical memory cannot be found in the memory mapping, the CPU will report a page fault interrupt, and then the operating system will handle the interrupt and save the file The contents are loaded into physical memory, and then the memory mapping is updated so that the CPU can access this virtual address next time. Data loaded into memory in this way is generally placed in the page cache. The page cache will be introduced later.

General program executable files and dynamic libraries are mapped to processes in this way. of the virtual address space.

device backed

Similar to file backed, except that the backend is mapped to the physical address of the disk. For example, when the physical memory is swapped out, it will be marked as device backed. .

anonymous

The data segment and stack space used by the program itself, as well as the shared memory allocated through mmap, cannot find the corresponding files on the disk, so This part of the memory page is called an anonymous page. The biggest difference between anonymous page and file backed is that when the memory is tight, the system will directly delete the physical memory corresponding to file backed, because it can be loaded from the disk into the memory when needed next time, but the anonymous page cannot be deleted and can only be swap out.

shared

Multiple memory mappings in the Page Table of different processes can be mapped to the same physical address, through virtual addresses (virtual addresses in different processes may not be the same) The same) can access the same content. When the memory content is modified in one process, it can be read immediately in another process. This method is generally used to achieve high-speed shared data between processes (such as mmap). When the memory mapping marked as shared is deleted and recycled, the reference count on the physical page needs to be updated so that the physical page can be recycled after the count becomes 0.

copy on write

copy on write is based on shared technology. When reading this type of memory, the system does not need to do any special operations, but when writing When using this memory, the system will generate a new memory and copy the data in the original memory to the new memory, then associate the new memory with the corresponding memory mapping, and then perform the write operation. Many functions under Linux rely on copy on write technology to improve performance, such as fork, etc.

Through the above introduction, we can simply summarize the memory usage process as follows:

1. The process sends a memory application request to the system

2. The system will check the virtual address of the process Is the space used up? If there is any left, assign a virtual address to the process

3. The system creates corresponding memory mapping (possibly multiple) for this virtual address and puts it into the page table of the process.

4. The system returns the virtual address to the process, and the process begins to access the virtual address

5. The CPU finds the corresponding memory mapping in the page table of the process based on the virtual address, but the mapping It is not associated with physical memory, so a page fault interrupt occurs

6. After the operating system receives the page fault interrupt, it allocates real physical memory and associates it with the corresponding memory mapping

7. After the interrupt processing is completed, the CPU can access the memory

Of course, page missing interrupts do not happen every time. They are only used when the system feels it is necessary to delay allocating memory, that is, many times in the above Step 3 The system will allocate real physical memory and associate it with memory mapping.

6. Other concepts

As long as the operating system realizes the mapping relationship between virtual memory and physical memory, it can work normally. But to make memory access more efficient, there are still many things to consider. Here we can look at some other concepts related to memory and their functions.

MMU (Memory Management Unit)

MMU is a module of the CPU used to convert the virtual address of the process into a physical address. Simple To put it simply, the input of this module is the page table and virtual address of the process, and the output is the physical address. The speed of converting virtual addresses into physical addresses directly affects the speed of the system, so the CPU includes this module for acceleration.

TLB (Translation Lookaside Buffer)

As introduced above, the input of the MMU is the page table, and the page table is stored in the memory. Compared with the CPU's cache, the speed of memory is very slow, so in order to further speed up the conversion speed of virtual addresses to physical addresses, Linux invented the TLB, which exists in the L1 cache of the CPU and is used to cache the found virtual addresses to physical addresses. Mapping, so check the TLB before the next conversion. If it is already in it, there is no need to call the MMU.

Allocate physical pages on demand

Since physical memory is much less than virtual memory in actual situations, the operating system must allocate physical memory very carefully to maximize memory usage. One way to save physical memory is to load only the data corresponding to the virtual page currently in use into memory. For example, in a large database program, if you only use query operations, then there is no need to load the code segments responsible for inserting, deleting, etc. into the memory. This can save a lot of physical memory. This method is called physical memory pages. On-demand allocation can also be called delayed loading.

The implementation principle is very simple, that is, when the CPU accesses a virtual memory page, if the data corresponding to the virtual memory page has not been loaded into the physical memory, the CPU will notify the operating system that a page fault has occurred. , and then the operating system is responsible for loading the data into physical memory. Since loading data into memory is time-consuming, the CPU will not wait there, but will schedule other processes. When it schedules the process next time, the data will already be in physical memory.

Linux mainly uses this method to load executable files and dynamic libraries. When the program starts to be scheduled for execution by the kernel, the kernel maps the executable files and dynamic libraries of the process to the virtual address space of the process, and only Load the small portion of data that will be used immediately into physical memory, and the other portions will only be loaded when the CPU accesses them.

Swap space

When a process needs to load data into physical memory, but the actual physical memory has been used up, the operating system Some pages in physical memory need to be reclaimed to meet the needs of the current process.

For file backed memory data, that is, the data in the physical memory comes from files on the disk, then the kernel will directly remove this part of the data from the memory to release more memory. The next time When a process needs to access this part of the data, it is loaded from the disk into memory. However, if this part of the data has been modified and has not been written to the file, then this part of the data becomes dirty data. The dirty data cannot be deleted directly and can only be moved to the swap space. (Executable files and dynamic library files will not be modified, but disk files mapped to memory through mmap private may be modified. The memory mapped in this way is quite special. Before modification, it was file backed. After modification, it was not. It becomes anonymous before writing it back to disk)

对于anonymous的内存数据，在磁盘上没有对应的文件，这部分数据不能直接被删除，而是被系统移到交换空间上去。交换空间就是磁盘上预留的一块特殊空间，被系统用来临时存放内存中不常被访问的数据，当下次有进程需要访问交换空间上的数据时，系统再将数据加载到内存中。由于交换空间在磁盘上，所以访问速度要比内存慢很多，频繁的读写交换空间会带来性能问题。

关于swap空间的详细介绍请参考Linux交换空间

共享内存

有了虚拟内存之后，进程间共享内存变得特别的方便。进程所有的内存访问都通过虚拟地址来实现，而每个进程都有自己的page tables。当两个进程共享一块物理内存时，只要将物理内存的页号映射到两个进程的page table中就可以了，这样两个进程就可以通过不同的虚拟地址来访问同一块物理内存。

从上面的那个图中可以看出，进程X和进程Y共享了物理内存页PFN3，在进程X中，PFN3被映射到了VPFN3，而在进程Y中，PFN3被映射到了VPFN1，但两个进程通过不同的虚拟地址访问到的物理内存是同一块。

访问控制

page table里面的每条虚拟内存到物理内存的映射记录（memory mapping）都包含一份控制信息，当进程要访问一块虚拟内存时，系统可以根据这份控制信息来检查当前的操作是否是合法的。

为什么需要做这个检查呢？比如有些内存里面放的是程序的可执行代码，那么就不应该去修改它；有些内存里面存放的是程序运行时用到的数据，那么这部分内存只能被读写，不应该被执行；有些内存里面存放的是内核的代码，那么在用户态就不应该去执行它；有了这些检查之后会大大增强系统的安全性。

huge pages

由于CPU的cache有限，所以TLB里面缓存的数据也有限，而采用了huge page后，由于每页的内存变大（比如由原来的4K变成了4M），虽然TLB里面的纪录数没变，但这些纪录所能覆盖的地址空间变大，相当于同样大小的TLB里面能缓存的映射范围变大，从而减少了调用MMU的次数，加快了虚拟地址到物理地址的转换速度。

Caches

为了提高系统性能，Linux使用了一些跟内存管理相关的cache，并且尽量将空闲的内存用于这些cache。这些cache都是系统全局共享的：

Buffer Cache
用来缓冲块设备上的数据，比如磁盘，当读写块设备时，系统会将相应的数据存放到这个cache中，等下次再访问时，可以直接从cache中拿数据，从而提高系统效率。它里面的数据结构是一个块设备ID和block编号到具体数据的映射，只要根据块设备ID和块的编号，就能找到相应的数据。
Page Cache
这个cache主要用来加快读写磁盘上文件的速度。它里面的数据结构是文件ID和offset到文件内容的映射，根据文件ID和offset就能找到相应的数据（这里文件ID可能是inode或者path，本人没有仔细去研究）。

从上面的定义可以看出，page cache和buffer cache有重叠的地方，不过实际情况是buffer cache只缓存page cache不缓存的那部分内容，比如磁盘上文件的元数据。所以一般情况下和page cache相比，Buffer Cache的大小基本可以忽略不计。

当然，使用cache也有一些不好的地方，比如需要时间和空间去维护cache，cache一旦出错，整个系统就挂了。

七、总结

有了上面介绍的知识，再来看看我们刚开始提出来的问题，以top命令的输出为例：

KiB Mem :   500192 total,   349264 free,    36328 used,   114600 buff/cache
KiB Swap:   524284 total,   524284 free,        0 used.   433732 avail Mem

Copy after login

KiB Mem代表物理内存，KiB Swap代表交换空间，它们的单位都是KiB。

total、used和free没什么好介绍的，就是总共多少，然后用了多少，还剩多少。

buff/cached代表了buff和cache总共用了多少，buff代表buffer cache占了多少空间，由于它主要用来缓存磁盘上文件的元数据，所以一般都比较小，跟cache比可以忽略不计；cache代表page cache和其它一些占用空间比较小且大小比较固定的cache的总和，基本上cache就约等于page cache，page cache的准确值可以通过查看/proc/meminf中的Cached得到。由于page cache是用来缓存磁盘上文件内容的，所以占有空间很大，Linux一般会尽可能多的将空闲物理内存用于page cache。

avail Mem indicates the amount of physical memory that can be used for the next allocation of the process. This size is generally a little larger than free, because in addition to free space, the system can also immediately release some space.

So how to determine if the current memory usage is abnormal? Here are the following points for reference:

● The value of Mem free is relatively small, and the value of buff/cache is also small.
The relatively small value of free does not necessarily mean there is a problem, because Linux will try its best to Memory is used for page cache, but if the value of buff/cache is also small, it means that the memory is tight and the system does not have enough memory for cache. If the current server deployment is an application that requires frequent reading and writing of disks, such as FTP server, then the impact on performance will be very large.

● The value of Swap used is relatively large.
This situation is more serious than the above. Under normal circumstances, swap should be rarely used. A relatively large used value indicates that the swap space is used more. If If you see swap in/out frequently through the vmstat command, it means that the system memory is seriously insufficient and the overall performance has been seriously affected

Recommended video tutorials: "Linux Tutorial"

The above is the entire content of this article, I hope it will be helpful to everyone's study. For more exciting content, you can pay attention to the relevant tutorial columns of the PHP Chinese website! ! !

The above is the detailed content of Introduction to Linux memory management. For more information, please follow other related articles on the PHP Chinese website!