linux io refers to a file operation; in Linux, a file is a series of binary streams, so during the exchange of information, we all perform data sending and receiving operations on these streams. These operations are referred to as I/O operations; since Linux uses a virtual memory mechanism, the kernel must be requested through system calls to complete IO operations.
#The operating environment of this tutorial: linux5.9.8 system, Dell G3 computer.
What does linux io refer to?
We all know that in the Linux world, everything is a file.
And a file is a series of binary streams, whether Socket, FIFO, pipe or terminal, for us, everything is a stream.
In the process of exchanging information, we all perform data sending and receiving operations on these streams, which are referred to as I/O operations.
To read data from the stream, the system calls Read, to write data, the system calls Write.
Usually a complete IO of the user process is divided into two stages:
Disk IO:
Network IO:
The operating system and driver run in the kernel space, and the application runs in the user space. The two cannot use pointers to transfer data because The virtual memory mechanism used by Linux must request the kernel through system calls to complete IO actions.
There are three types of IO: memory IO, network IO and disk IO. Usually the IO we talk about refers to the latter two!
Why IO model is needed
If synchronization is used to communicate, all operations will be executed sequentially in one thread, so The disadvantages are obvious:
The IO model needs to appear.Linux IO modelBefore describing the Linux IO model, let’s first understand the process of reading Linux system data:
Using the user requesting the index.html file as an example to illustrate
Basic conceptsUser space and kernel Space
The core of the operating system is the kernel, which is independent of ordinary applications and has access to protected memory space and all permissions to access underlying hardware devices.Process Switching
In order to control the execution of a process, the kernel must have the ability to suspend a process running on the CPU and resume a previously suspended process. execution of the process.This behavior is called process switching.So it can be said that any process runs with the support of the operating system kernel and is closely related to the kernel.
Blocking of the process
The executing process, due to certain expected events not happening, such as failure to request system resources, waiting for the completion of certain operations, new data If it has not arrived yet or there is no new work to do, the system will automatically execute the blocking primitive (Block) to change itself from the running state to the blocking state. It can be seen that the blocking of a process is an active behavior of the process itself, and therefore only a process in the running state (obtaining the CPU) can turn it into a blocked state.When the process enters the blocking state, it does not occupy CPU resources.
File Descriptor
File Descriptor is a term in computer science and is an abstraction used to express a reference to a file. concept. The file descriptor is a non-negative integer in form. In fact, it is an index value pointing to the record table of files opened by the process maintained by the kernel for each process.Cached IO
The default IO operation for most file systems is cached IO. The reading and writing process is as follows:Read operation: The operating system checks whether the kernel buffer has the required data. If it has been cached, it will be returned directly from the cache; otherwise, it will be returned from the disk, network card, etc. Read in, and then cached in the cache of the operating system;
Write operation: Copy data from user space to the cache of kernel space. At this time, the writing operation has been completed for the user program. As for when to write to the disk, network card, etc., it is determined by the operating system, unless the sync synchronization command is explicitly called.
Assuming that there is no required data in the kernel space cache, the user process reads data from the disk or network in two stages:
Phase One: The kernel program reads data from the disk, network card, etc. to the kernel space cache;
Phase two: The user program copies the data from the kernel space cache to user space.
Disadvantages of cached IO:
During the data transmission process, multiple data copy operations are required in the application address space and kernel space. The CPU and memory overhead caused by these data copy operations is very large.
A user space application executes a system call, which causes the application to block and do nothing until the data is ready and the data is copied from the kernel to the user process , and finally the process processes the data. During the two stages of waiting for data and processing data, the entire process is blocked and cannot process other network IO.
This is also the simplest IO model. It is no problem to use it when there are usually few FDs and the readiness is fast.
After the non-blocking system call is called, the process is not blocked, and the kernel returns to the process immediately. If the data is not ready yet, , an error will be returned.
After the process returns, it can do other things before making a system call.
Repeat the above process and make system calls in a cycle. This process is often called polling.
Polling checks the kernel data until the data is ready, then copies the data to the process for data processing.
It should be noted that during the entire process of copying data, the process is still blocked.
In this way, you can set O_NONBLOCK
to the Socket in programming.
IO multiplexing, this is the ability of the process to inform the kernel in advance, so that When the kernel finds that one or more IO conditions specified by the process are ready, it notifies the process.
Enables a process to wait on a series of events.
The current implementation methods of IO reuse mainly include Select, Poll and Epoll.
Pseudocode describes IO multiplexing:
while(status == OK) { // 不断轮询 ready_fd_list = io_wait(fd_list); //内核缓冲区是否有准备好的数据 for(fd in ready_fd_list) { data = read(fd) // 有准备好的数据读取到用户缓冲区 process(data) }}
First we allow Socket to perform signal-driven IO , and install a signal processing function, the process continues to run without blocking.
When the data is ready, the process will receive a SIGIO signal and can call the I/O operation function in the signal processing function to process the data.
The process is as follows:
Enable the socket signal driver IO function
The system calls Sigaction Execute the signal processing function (non-blocking, return immediately)
The data is ready, generate the Sigio signal, and notify the application to read the data through the signal callback
There is a big problem with this IO method: the signal queue in Linux is limited. If it exceeds this number, the data cannot be read.
The asynchronous IO process is as follows:
When the user thread calls the aio_read
system call, it can start immediately To do other things, the user thread does not block
The kernel starts the first phase of IO: preparing data. When the kernel waits until the data is ready, it will copy the data from the kernel buffer to the user buffer
The kernel will send a signal to the user thread, or call back the user thread The registered callback interface tells the user thread that the Read operation is completed
The user thread reads the data in the user buffer and completes subsequent business operations
Compared to synchronous IO, asynchronous IO is not executed sequentially.
After the user process makes the aio_read
system call, regardless of whether the kernel data is ready or not, it will be returned directly to the user process, and then the user-mode process can do other things.
When the data is ready, the kernel directly copies the data to the process, and then sends a notification from the kernel to the process.
Compared with signal-driven IO, the main difference between asynchronous IO is:
Asynchronous IO is also called event-driven IO. In Unix, a set of library functions are defined for asynchronous access to files, and a series of AIO interfaces are defined.
aio_read
or aio_write
to initiate an asynchronous IO operation, and use aio_error
to check the status of the running IO operation. The current kernel implementation of AIO in Linux is only effective for file IO. If you want to implement real AIO, you need to implement it yourself.
There are currently many open source asynchronous IO libraries, such as libevent, libev, and libuv.
Related recommendations: "Linux Video Tutorial"
The above is the detailed content of What does linux io mean?. For more information, please follow other related articles on the PHP Chinese website!