Home>Article>类库下载> JAVA IO and NIO understanding

JAVA IO and NIO understanding

Michael Jordan
Michael Jordan Original
2018-09-14 09:23:24 1941browse

Thanks to Netty, I learned some knowledge about asynchronous IO. NIO in JAVA is a supplement to the original IO. This article mainly records the underlying implementation principles of IO in JAVA and introduces Zerocopy technology.

IO actually means: data is constantly moved in and out of the buffer (the buffer is used). For example, if the user program initiates a read operation, resulting in a "syscall read" system call, the data will be moved into a buffer; if the user initiates a write operation, resulting in a "syscall write" system call, the data in a buffer will be moved out ( Send to the network or write to a disk file)

The above process seems simple, but how the underlying operating system is implemented and the details of the implementation are very complicated. It is precisely because of the different implementation methods that there are implementation methods for file transfer under ordinary circumstances (let’s call it ordinary IO for the time being), and there are also implementation methods for large file transfer or batch big data transfer, such as zerocopy technology.

The flow of the entire IO process is as follows:

1) The programmer writes code to create a buffer (this buffer is a user buffer): Haha. Then call the read() method in a while loop to read the data (triggering the "syscall read" system call)

byte[] b = new byte[4096]; while((read = inputStream.read(b))>=0) { total = total + read; // other code…. }


2) When the read() method is executed, a lot of operations are actually happening at the bottom ’s:

①The kernel sends a command to the disk controller saying: I want to read the data on a certain disk block on the disk. –kernel issuing a command to the disk controller hardware to fetch the data from disk.

②Under the control of DMA, read the data on the disk into the kernel buffer. –The disk controller writes the data directly into a kernel memory buffer by DMA

③The kernel copies the data from the kernel buffer to the user buffer. –kernel copies the data from the temporary buffer in kernel space

The user buffer here should be the byte[] array of new in the code we wrote.

What can be analyzed from the above steps?

ⓐFor the operating system, the JVM is just a user process, located in the user mode space. Processes in user space cannot directly operate the underlying hardware. IO operations require operating the underlying hardware, such as disks. Therefore, IO operations must be completed with the help of the kernel (interrupt, trap), that is, there will be a switch from user mode to kernel mode.

ⓑWhen we write code new byte[] array, we usually create an array of "any size" "at will". For example, new byte[128], new byte[1024], new byte[4096]...

However, for reading disk blocks, every time you access the disk to read data, you are not reading any size of data, but: read one disk block or several disk blocks at a time (this is because the cost of accessing the disk operation is very high, and we also believe in the principle of locality) Therefore, there is a need for an "intermediate buffer" ” – i.e. kernel buffer. First read the data from the disk into the kernel buffer, and then move the data from the kernel buffer to the user buffer.

This is why we always feel that the first read operation is slow, but subsequent read operations are very fast. Because, for subsequent read operations, the data it needs to read is likely to be in the kernel buffer. At this time, you only need to copy the data in the kernel buffer to the user buffer, and the underlying data is not involved. Reading disk operations is of course fast.

The kernel tries to cache and/or prefetch data, so the data being requested by the process may already be available in kernel space. If so, the data requested by the process is copied out. If the data isn’t available, the process is suspended while the kernel goes about bringing the data into memory.


If the data is not available, the process will be suspended and need to wait for the kernel to fetch the data from the disk into the kernel buffer.

Then we might say: Why doesn’t DMA directly read the data on the disk into the user buffer? On the one hand is the kernel buffer mentioned in ⓑ as an intermediate buffer. Used to "fit" the "arbitrary size" of the user buffer and the fixed size of each disk block read. On the other hand, the user buffer is located in the user mode space, and the operation of DMA reading data involves the underlying hardware. The hardware generally cannot directly access the user mode space (probably because of the OS)

In summary, since DMA cannot directly access user space (user buffer), ordinary IO operations need to move data back and forth between the user buffer and the kernel buffer, which affects the IO speed in certain programs. Is there any corresponding solution?

That is direct memory mapped IO, which is the memory mapped file mentioned in JAVA NIO, or direct memory... In short, they express similar meanings. Kernel space buffers and user space buffers are mapped to the same physical memory area.

Its main features are as follows:

① There is no need to issue read or write system calls to operate the file—The user process sees the file data asmemory, so there is no need to issue read() or write() system calls.

②When the user process accesses the "memory mapped file" address, a page fault is automatically generated, and then the underlying OS is responsible for sending the data on the disk to the memory. Regarding page storage management, please refer to: Some understandings of memory allocation and memory management

As the user process touches the mapped memory space, page faults will be generated automatically to bring in the file data from disk. If the user modifies the mapped memory space, the affected page is automatically marked as dirty and will be subsequently flushed to disk to update the file.

这就是是JAVA NIO中提到的内存映射缓冲区(Memory-Mapped-Buffer)它类似于JAVA NIO中的直接缓冲区(Directed Buffer)。MemoryMappedBuffer可以通过java.nio.channels.FileChannel.java(通道)的 map方法创建。

使用内存映射缓冲区来操作文件,它比普通的IO操作读文件要快得多。甚至比使用文件通道(FileChannel)操作文件 还要快。因为,使用内存映射缓冲区操作文件时,没有显示的系统调用(read,write),而且OS还会自动缓存一些文件页(memory page)

zerocopy技术介绍

看完了上面的IO操作的底层实现过程,再来了解zerocopy技术就很easy了。IBM有一篇名为《Efficient data transfer through zero copy》的论文对zerocopy做了完整的介绍。感觉非常好,下面就基于这篇文来记录下自己的一些理解。

zerocopy技术的目标就是提高IO密集型JAVA应用程序的性能。在本文的前面部分介绍了:IO操作需要数据频繁地在内核缓冲区和用户缓冲区之间拷贝,而zerocopy技术可以减少这种拷贝的次数,同时也降低了上下文切换(用户态与内核态之间的切换)的次数。

比如,大多数WEB应用程序执行的一项操作就是:接受用户请求—>从本地磁盘读数据—>数据进入内核缓冲区—>用户缓冲区—>内核缓冲区—>用户缓冲区—>socket发送

数据每次在内核缓冲区与用户缓冲区之间的拷贝会消耗CPU以及内存的带宽。而zerocopy有效减少了这种拷贝次数。

Each time data traverses the user-kernel boundary, it must be copied, which consumes CPU cycles and memory bandwidth.
Fortunately, you can eliminate these copies through a technique called—appropriately enough —zero copy

那它是怎么做到的呢?

我们知道,JVM(JAVA虚拟机)为JAVA语言提供了跨平台的一致性,屏蔽了底层操作系统的具体实现细节,因此,JAVA语言也很难直接使用底层操作系统提供的一些“奇技淫巧”。

而要实现zerocopy,首先得有操作系统的支持。其次,JDK类库也要提供相应的接口支持。幸运的是,自JDK1.4以来,JDK提供了对NIO的支持,通过java.nio.channels.FileChannel类的transferTo()方法可以直接将字节传送到可写的通道中(Writable Channel),并不需要将字节送入用户程序空间(用户缓冲区)

You can use the transferTo()method to transfer bytes directly from the channel on which it is invoked to
another writable byte channel, without requiring data to flow through the application

下面就来详细分析一下经典的web服务器(比如文件服务器)干的活:从磁盘中中读文件,并把文件通过网络(socket)发送给Client。

File.read(fileDesc, buf, len);
Socket.send(socket, buf, len);
从代码上看,就是两步操作。第一步:将文件读入buf;第二步:将 buf 中的数据通过socket发送出去。但是,这两步操作需要四次上下文切换(用户态与内核态之间的切换) 和 四次拷贝操作才能完成。

①第一次上下文切换发生在 read()方法执行,表示服务器要去磁盘上读文件了,这会导致一个 sys_read()的系统调用。此时由用户态切换到内核态,完成的动作是:DMA把磁盘上的数据读入到内核缓冲区中(这也是第一次拷贝)。

②第二次上下文切换发生在read()方法的返回(这也说明read()是一个阻塞调用),表示数据已经成功从磁盘上读到内核缓冲区了。此时,由内核态返回到用户态,完成的动作是:将内核缓冲区中的数据拷贝到用户缓冲区(这是第二次拷贝)。

③第三次上下文切换发生在 send()方法执行,表示服务器准备把数据发送出去了。此时,由用户态切换到内核态,完成的动作是:将用户缓冲区中的数据拷贝到内核缓冲区(这是第三次拷贝)

④第四次上下文切换发生在 send()方法的返回【这里的send()方法可以异步返回,所谓异步返回就是:线程执行了send()之后立即从send()返回,剩下的数据拷贝及发送就交给底层操作系统实现了】。此时,由内核态返回到用户态,完成的动作是:将内核缓冲区中的数据送到 protocol engine.(这是第四次拷贝)

这里对 protocol engine不是太了解,但是从上面的示例图来看:它是NIC(NetWork Interface Card) buffer。网卡的buffer???

下面这段话,非常值得一读:这里再一次提到了为什么需要内核缓冲区。

Copy code
Use of the intermediate kernel buffer (rather than a direct transfer of the data
into the user buffer)might seem inefficient. But intermediate kernel buffers were
introduced into the process to improve performance . Using the intermediate
buffer on the read side allows the kernel buffer to act as a “readahead cache”
when the application hasn't asked for as much data as the kernel buffer holds.
This significantly improves performance when the requested data amount is less
than the kernel buffer size. The intermediate buffer on the write side allows the write to complete asynchronously.
Copy code
A core point is that the kernel buffer improves performance . Huh? Isn't it strange? Because it has been said before that it is precisely because of the introduction of the kernel buffer (intermediate buffer) that the data is copied back and forth, which reduces efficiency.

Let’s first take a look at why it says that the kernel buffer improves performance.

For read operations, the kernel buffer is equivalent to a "readahead cache". When the user program only needs to read a small part of the data at a time, the operating system first reads a large block of data from the disk to the kernel. The user program only takes away a small part of the buffer (I can just new a 128B byte array! new byte[128]). When the user program reads data next time, it can directly fetch it from the kernel buffer, and the operating system does not need to access the disk again! Because the data the user wants to read is already in the kernel buffer! This is also the reason mentioned earlier: why subsequent read operations (read() method calls) are obviously faster than the first time. From this perspective, the kernel buffer does improve the performance of read operations.

Let’s look at the write operation: it can be done “asynchronously” (write asynchronously). That is: when write(dest[]), the user program tells the operating system to write the contents of the dest[] array to the XX file, so the write method returns. The operating system silently copies the contents of the user buffer (dest[]) to the kernel buffer in the background, and then writes the data in the kernel buffer to the disk. Then, as long as the kernel buffer is not full, the user's write operation can return quickly. This should be the asynchronous disk brushing strategy.

(Actually, this is it. A tangled issue in the past was that the difference between synchronous IO, asynchronous IO, blocking IO, and non-blocking IO no longer makes much sense. These concepts are just to look at the problem. The perspectives are just different. Blocking and non-blocking are for the thread itself; synchronous and asynchronous are for the thread and the external events that affect it...) [For a more perfect and incisive explanation, please refer to this series of articles: Between Systems Communication (3) - IO Communication Model and JAVA Practice Part 1】

Since you said the kernel buffer is so powerful and perfect, why do you need zerocopy? ? ?

Unfortunately, this approach itself can become a performance bottleneck if the size of the data requested
is considerably larger than the kernel buffer size. The data gets copied multiple times among the disk, kernel buffer,
and user buffer before it is finally delivered to the application.
Zero copy improves performance by eliminating these redundant data copies.
It’s finally zerocopy’s turn to make its debut. When the data that needs to be transferred is much larger than the size of the kernel buffer, the kernel buffer becomes a bottleneck. This is why zerocopy technology is suitable for large file transfers. Why has the kernel buffer become a bottleneck? —I think a big reason is that it can no longer function as a "buffer". After all, the amount of data transmitted is too large.

Let’s take a look at how zerocopy technology handles file transfer.

When the transferTo() method is called, it switches from user mode to kernel mode. The completed action is: DMA reads data from the disk into the Read buffer (first data copy). Then, still in the kernel space, the data is copied from the Read buffer to the Socket buffer (the second data copy), and finally the data is copied from the Socket buffer to the NIC buffer (the third data copy). Then, return from kernel mode to user mode.

The entire process above only involves: three data copies and two context switches. It feels like only one data copy is saved. But the user space buffer is no longer involved here.

Among the three data copies, only one copy requires CPU intervention. (The second copy), while the previous traditional data copy requires four times and three copies require CPU intervention.

This is an improvement: we've reduced the number of context switches from four to two and reduced the number of data copies
from four to three (only one of which involves the CPU)

If zerocopy technology can only accomplish this step, then it is just so so.

We can further reduce the data duplication done by the kernel if the underlying network interface card supports
gather operations. In Linux kernels 2.4 and later, the socket buffer descriptor was modified to accommodate this requirement.
This approach not only reduces multiple context switches but also eliminates the duplicated data copies that
require CPU involvement.
That is to say, if the underlying network hardware and operating system support it, it can further reduce the number of data copies and CPU intervention times.

There are only two copies and two context switches. Moreover, these two copies are DMA copies and do not require CPU intervention (to be more rigorous, it is not completely necessary.).

The entire process is as follows:

The user program executes the transferTo() method, resulting in a system call and switching from user mode to kernel mode. The completed action is: DMA copies the data from the disk to the Read buffer

Use a descriptor to mark the address and length of the data to be transferred, and the DMA directly transfers the data from the Read buffer to the NIC buffer. The data copy process does not require CPU intervention.

Related recommendations:

Summary of non-blocking IO and event loop in Node.js_node.js

Node. Discussion on asynchronous IO performance of js_node.js

The above is the detailed content of JAVA IO and NIO understanding. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn