What does linux io mean?-Linux Operation and Maintenance-php.cn

#The operating environment of this tutorial: linux5.9.8 system, Dell G3 computer.

And a file is a series of binary streams, whether Socket, FIFO, pipe or terminal, for us, everything is a stream.

In the process of exchanging information, we all perform data sending and receiving operations on these streams, which are referred to as I/O operations.
To read data from the stream, the system calls Read, to write data, the system calls Write.

Usually a complete IO of the user process is divided into two stages:

Disk IO:

What does linux io mean?

Network IO:

What does linux io mean?

The operating system and driver run in the kernel space, and the application runs in the user space. The two cannot use pointers to transfer data because The virtual memory mechanism used by Linux must request the kernel through system calls to complete IO actions.

There are three types of IO: memory IO, network IO and disk IO. Usually the IO we talk about refers to the latter two!

Why IO model is needed

If synchronization is used to communicate, all operations will be executed sequentially in one thread, so The disadvantages are obvious:

Because synchronous communication operations will block any other operations on the same thread, only after this operation is completed, subsequent operations can be completed, so ## appears #Synchronous blocking multi-threading (Each Socket creates a thread corresponding to it), but the number of threads in the system is limited, and thread switching is a waste of time, which is suitable for situations where there are few Sockets.

The IO model needs to appear.

Linux IO model

Before describing the Linux IO model, let’s first understand the process of reading Linux system data:

What does linux io mean?

Using the user requesting the index.html file as an example to illustrate

What does linux io mean?

Basic concepts

User space and kernel Space

The core of the operating system is the kernel, which is independent of ordinary applications and has access to protected memory space and all permissions to access underlying hardware devices.

Process Switching

In order to control the execution of a process, the kernel must have the ability to suspend a process running on the CPU and resume a previously suspended process. execution of the process.

This behavior is called process switching.

So it can be said that any process runs with the support of the operating system kernel and is closely related to the kernel.

Blocking of the process

The executing process, due to certain expected events not happening, such as failure to request system resources, waiting for the completion of certain operations, new data If it has not arrived yet or there is no new work to do, the system will automatically execute the blocking primitive (Block) to change itself from the running state to the blocking state.

It can be seen that the blocking of a process is an active behavior of the process itself, and therefore only a process in the running state (obtaining the CPU) can turn it into a blocked state.

When the process enters the blocking state, it does not occupy CPU resources.

File Descriptor

File Descriptor is a term in computer science and is an abstraction used to express a reference to a file. concept.

The file descriptor is a non-negative integer in form. In fact, it is an index value pointing to the record table of files opened by the process maintained by the kernel for each process.

Cached IO

The default IO operation for most file systems is cached IO.

The reading and writing process is as follows:

Read operation: The operating system checks whether the kernel buffer has the required data. If it has been cached, it will be returned directly from the cache; otherwise, it will be returned from the disk, network card, etc. Read in, and then cached in the cache of the operating system;
Write operation: Copy data from user space to the cache of kernel space. At this time, the writing operation has been completed for the user program. As for when to write to the disk, network card, etc., it is determined by the operating system, unless the sync synchronization command is explicitly called.

Assuming that there is no required data in the kernel space cache, the user process reads data from the disk or network in two stages:

Phase One: The kernel program reads data from the disk, network card, etc. to the kernel space cache;
Phase two: The user program copies the data from the kernel space cache to user space.

Disadvantages of cached IO:

During the data transmission process, multiple data copy operations are required in the application address space and kernel space. The CPU and memory overhead caused by these data copy operations is very large.

Synchronous blocking

A user space application executes a system call, which causes the application to block and do nothing until the data is ready and the data is copied from the kernel to the user process , and finally the process processes the data. During the two stages of waiting for data and processing data, the entire process is blocked and cannot process other network IO.

The calling application is in a state where it is no longer consuming CPU and is simply waiting for a response, so from a processing perspective this is very efficient.

This is also the simplest IO model. It is no problem to use it when there are usually few FDs and the readiness is fast.

What does linux io mean?

Synchronous non-blocking

After the non-blocking system call is called, the process is not blocked, and the kernel returns to the process immediately. If the data is not ready yet, , an error will be returned.

After the process returns, it can do other things before making a system call.
Repeat the above process and make system calls in a cycle. This process is often called polling.
Polling checks the kernel data until the data is ready, then copies the data to the process for data processing.
It should be noted that during the entire process of copying data, the process is still blocked.
In this way, you can set O_NONBLOCK to the Socket in programming.

What does linux io mean?

IO multiplexing

IO multiplexing, this is the ability of the process to inform the kernel in advance, so that When the kernel finds that one or more IO conditions specified by the process are ready, it notifies the process.

Enables a process to wait on a series of events.

The current implementation methods of IO reuse mainly include Select, Poll and Epoll.

What does linux io mean?

Pseudocode describes IO multiplexing:

while(status == OK) { // 不断轮询 ready_fd_list = io_wait(fd_list); //内核缓冲区是否有准备好的数据 for(fd in ready_fd_list) {  data = read(fd) // 有准备好的数据读取到用户缓冲区  process(data) }}

Copy after login

Signal driver

First we allow Socket to perform signal-driven IO , and install a signal processing function, the process continues to run without blocking.

When the data is ready, the process will receive a SIGIO signal and can call the I/O operation function in the signal processing function to process the data.

The process is as follows:

Enable the socket signal driver IO function
The system calls Sigaction Execute the signal processing function (non-blocking, return immediately)
The data is ready, generate the Sigio signal, and notify the application to read the data through the signal callback

There is a big problem with this IO method: the signal queue in Linux is limited. If it exceeds this number, the data cannot be read.

What does linux io mean?

Asynchronous non-blocking

The asynchronous IO process is as follows:

When the user thread calls the aio_read system call, it can start immediately To do other things, the user thread does not block
The kernel starts the first phase of IO: preparing data. When the kernel waits until the data is ready, it will copy the data from the kernel buffer to the user buffer
The kernel will send a signal to the user thread, or call back the user thread The registered callback interface tells the user thread that the Read operation is completed
The user thread reads the data in the user buffer and completes subsequent business operations

Compared to synchronous IO, asynchronous IO is not executed sequentially.

After the user process makes the aio_read system call, regardless of whether the kernel data is ready or not, it will be returned directly to the user process, and then the user-mode process can do other things.

When the data is ready, the kernel directly copies the data to the process, and then sends a notification from the kernel to the process.

Compared with signal-driven IO, the main difference between asynchronous IO is:

The signal driver tells us when an IO operation can be started by the kernel (the data is stored in the kernel buffer), while asynchronous IO is notified by the kernel when the IO operation has completed (the data is already in user space).

Asynchronous IO is also called event-driven IO. In Unix, a set of library functions are defined for asynchronous access to files, and a series of AIO interfaces are defined.

Use aio_read or aio_write to initiate an asynchronous IO operation, and use aio_error to check the status of the running IO operation.

The current kernel implementation of AIO in Linux is only effective for file IO. If you want to implement real AIO, you need to implement it yourself.

There are currently many open source asynchronous IO libraries, such as libevent, libev, and libuv.

What does linux io mean?