Start with lsof and gain an in-depth understanding of the Linux virtual file system

Release: 2023-08-04 16:15:49
forward
1479 people have browsed it

Background

Sometimes it happens that the disk space is full, but when checking the disk When checking the specific file occupancy situation, it was found that the disk still has a lot of free space.
1. Execute the df command to check the disk usage and find that the disk is full.
-bash-4.2$ df -ThFilesystem Type Size Used Avail Use% Mounted on/dev/vda1 ext4 30G 30G 0 100% /devtmpfs devtmpfs 489M 0 489M 0% /devtmpfs tmpfs 497M 0 497M 0% /dev/shmtmpfs tmpfs 497M 50M 447M 11% /runtmpfs tmpfs 497M 0 497M 0% /sys/fs/cgroup
Copy after login


2. Execute the du command to check the disk usage of each directory. Add the sizes of the files in each directory and find that the disk is not occupied. , more than 10 G space is missing inexplicably.


-bash-4.2$ du -h --max-depth=1 /home16M /home/logs11G /home/serverdog11G /home
Copy after login


3.为何会出现这样的情况呢?
因为虽然文件已被删除,但是一些进程仍然打开这些文件,因此其占用的磁盘空间并没有被释放。执行 lsof 命令显示打开已删除的文件。将有问题的进程重启(或,清空),磁盘空间就会得到释放。
-bash-4.2# lsof | grep deletemysqld 2470 mysql 4u REG 253,1 0 523577 /var/tmp/ibfTeQFn (deleted)mysqld 2470 mysql 5u REG 253,1 0 523579 /var/tmp/ibaHcIdW (deleted)mysqld 2470 mysql 6u REG 253,1 0 523581 /var/tmp/ibLjiALu (deleted)mysqld 2470 mysql 7u REG 253,1 0 523585 /var/tmp/ibCFnzTB (deleted)mysqld 2470 mysql 11u REG 253,1 0 523587 /var/tmp/ibCjuqva (deleted)
Copy after login

那么,Linux 的文件系统,到底为什么这么设计呢?要了解这些,就要先弄清楚并不容易,下面将从一些基本概念入手,一步步将这些梳理清楚:
  • 什么是虚拟文件系统(VFS:virtual filesystem)?

  • 什么是通用文件模型?

    • 超级块对象(superblock object)

    • 索引节点对象(inode object)

    • 文件对象(file object)

    • 目录项对象(dentry object)

    • 文件的概念

  • 文件的表达

    • 内存表达

    • 磁盘表达

  • 目录树的构建

    • Soft link vs hard link

  • File & Disk Management

    • Index Node Status

  • File & Process Management

    • Operation:

      Open & Delete

Virtual file system

The picture below Shows the basic components responsible for file management in the Linux operating system. The upper half area is user mode, and the lower half area is kernel mode. Applications use the standard library libc to access files, and the library maps requests to system calls in order to enter kernel mode.

Start with lsof and gain an in-depth understanding of the Linux virtual file system

The entry point for all file-related operations is the virtual file system (VFS), not a specific file system (such as Ext3, ReiserFS and NFS). VFS provides an interface between system libraries and specific file systems. Therefore, VFS not only acts as an abstraction layer, but it actually provides a basic implementation of a file system that can be used and extended by different implementations. Therefore, to understand how the file system works, you must first understand VFS.

Common file model

The main idea of VFS is to introduce a common file model. The general file model consists of the following object types:

superblock object

Memory: File System Created during installation to store information about the file system
Disk: Corresponds to the file system control block (filesystem control block) stored on the disk

Index node object ( inode object)

Memory: Created when accessed, stores general information about specific files (inode structure)
Disk: corresponds to the file stored in File control block on disk
Each inode object has an inode number that uniquely identifies the file in the file system

File object (file object)

Memory: Created when opening a file, storing the interaction between the open file and the process Information about (file structure)
Open file information exists in kernel memory only while the process is accessing the file.

Directory object (dentry object)

Memory: Once the directory entry is read into memory, VFS Convert it into a directory entry object ofdentry structure
Disk: a specific file system is stored on the disk in a specific way
Stores directory entries (i.e., file names) linked to corresponding files Related information

Directory tree

In summary, the Linux root file system (system's root filessystem) is the first one that the kernel starts the mount with. File system. The kernel code image file is stored in the root file system, and the system boot program will load some basic initialization scripts and services into the memory for running after the root file system is mounted (the file system and the kernel are completely independent two parts). Other file systems are subsequently installed as sub-file systems on the directory where the file system is installed through scripts or commands, eventually forming the entire directory tree.

start_kernel   vfs_caches_init     mnt_init       init_rootfs // 注册rootfs文件系统      init_mount_tree // 挂载rootfs文件系统   …   rest_init   kernel_thread(kernel_init, NULL, CLONE_FS);
Copy after login

就单个文件系统而言,在文件系统安装时,创建超级块对象;沿树查找文件时,总是首先从初识目录的中查找匹配的目录项,以便获取相应的索引节点,然后读取索引节点的目录文件,转化为dentry对象,再检查匹配的目录项,反复执行以上过程,直至找到对应的文件的索引节点,并创建索引节点对象。

软链接 vs 硬链接

软链接是一个普通的文件,其中存放的是另外一个文件的路径名。硬链接则指向同一个索引节点,硬链接数记录在索引节点对象的 i_nlink 字段。当 i_nlink 字段为零时,说明没有硬链接指向该文件。

文件 & 进程管理

下图是一个简单示例,说明进程是怎样与文件进行交互。三个不同进程打开同一个文件,每个进程都有自己的文件对象,其中两个进程使用同一个硬链接(每个硬链接对应一个目录对象),两个目录项对象都指向同一个 索引节点对象。

Start with lsof and gain an in-depth understanding of the Linux virtual file system

索引节点的数据又由两部分组成:内存数据和磁盘数据。Linux 使用 Write back 作为索引节点的数据一致性策略。对于索引节点的数据,当文件被打开时,才会加载索引节点到内存;当不再被进程使用,则从内存踢出;如果中间有更新,则需要把数据写回磁盘。
* "in_use" - valid inode, i_count > 0, i_nlink > 0* "dirty" - as "in_use" but also dirty* "unused" - valid inode, i_count = 0
Copy after login

Whether the index node is still in use is determined by open() and close () The operation creates and destroys file objects. The file objects are provided by the index node through the iget and iput Update the i_count field of the index node to complete the usage count. The open operation increases i_count by one, and the close operation decreases i_count by one. Determine whether the index node is released during the close operation. If i_count = 0, it means that there is no longer a process reference and it will be released from memory.

File & Disk Management

The operation most closely related to file and disk management istouch andrmoperations, especially the latter is the most critical. Use strace (or dtruss) to view the actual system call of rm

# dtruss rm tmp...geteuid(0x0, 0x0, 0x0) = 0 0ioctl(0x0, 0x4004667A, 0x7FFEE06F09C4) = 0 0lstat64("tmp\0", 0x7FFEE06F0968, 0x0) = 0 0access("tmp\0", 0x2, 0x0) = 0 0unlink("tmp\0", 0x0, 0x0) = 0 0
Copy after login

可以发现 rm 实际是通过 unlink 完成的。unlink代表删除目录项,以及减少其索引节点的计数。由通用文件模型可知,父目录本身同样是一个文件,也就意味着目录项是其文件数据的一部分。删除目录项等价于从父目录的文件中删除数据,也就意味着首先要打开父目录的文件。那么,删除操作即可理解为:

  1. 删除命令(一个进程)使用 open 操作获得父目录文件对象

  2. 通过iget增加 目录文件的索引节点对象计数

  3. 读取目录文件数据

  • 将目录文件数据转化为目录项对象

  • 由于目录项包含文件的索引节点,类似的,需要通过 iget 增加文件的索引节点对象计数

  • 删除目录的目录项

  • 减少文件索引节点对象的硬链接计数i_nlink

  • 通过iput结束对文件索引节点对象的操作,使用计数 i_count 减一

    • 判断i_count是否为零,如果为零,则释放内存

    • 然后,判断i_nlink是否为零,如果为零,则释放磁盘空间

  • 通过 iput 结束对目录索引节点对象的操作。

  • Summary

    Looking back at the problems encountered, we can actually understand them from two angles:

    Index and data

    File system and files, disk management and files, process management and files, the core is the index of the file, not the data of the file. Separating data and indexes is key to understanding file systems.


    Start with lsof and gain an in-depth understanding of the Linux virtual file system

    ##Caching strategy

    Because the operating system uses Write back The strategy means that the disk can only be released if the memory is released first.

    Why lsof ?

    It can be clearly understood from the above model, because the directory is no longer indexed to the file, but the file is still indexed when opening the file, so Disk space cannot be freed immediately.
    Why can lsof find deleted and unreleased files?
    lsof, as the name suggests: list open files, the principle of this command is to find the list of open files, so you can find deleted but not released files.

The above is the detailed content of Start with lsof and gain an in-depth understanding of the Linux virtual file system. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:Linux中文社区
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!