Node, PHP, Java and Go server I/O performance competition, who do you think will win?-PHP7-php.cn

This article first briefly introduces the basic concepts related to I/O, then horizontally compares the I/O performance of Node, PHP, Java, and Go, and gives selection suggestions. Let’s introduce it below, friends who need it can refer to it.

Node, PHP, Java and Go server I/O performance competition, who do you think will win?

Understanding the input/output (I/O) model of an application can better understand how it handles load ideally and in practice. Maybe your application is small and doesn't need to support a high load, so there's less to consider. However, as application traffic loads increase, using the wrong I/O model can have very serious consequences.

In this article, we will compare Node, Java, Go and PHP with Apache, discuss how different languages model I/O, the advantages and disadvantages of each model, and some basic performance Review. If you are more concerned about the I/O performance of your next web application, this article will help you.

I/O Basics: A Quick Review

To understand the factors related to I/O, we must first understand these concepts at the operating system level. Although it is unlikely that you will be exposed to too many concepts directly at the beginning, you will always encounter them during the operation of the application, whether directly or indirectly. Details matter.

System call

First, let’s get to know the system call. The specific description is as follows:

The application requests the operating system kernel to perform I/O for it. O operation.
A "system call" is when a program requests the kernel to perform certain operations. The implementation details vary between operating systems, but the basic concept is the same. When a "system call" is executed, some specific instructions for controlling the program will be transferred to the kernel. Generally speaking, system calls are blocking, which means that the program waits until the kernel returns the result.
The kernel performs low-level I/O operations on physical devices (disks, network cards, etc.) and replies to system calls. In the real world, the kernel may need to do a lot of things to fulfill your request, including waiting for the device to be ready, updating its internal state, etc., but as an application developer, you don't need to care about that, it's the kernel's business.

Blocking calls and non-blocking calls

I said above that system calls are generally blocking. However, some calls are "non-blocking", which means that the kernel puts the request in a queue or buffer and returns immediately without waiting for the actual I/O to occur. So, it only "blocks" for a short time, but the queue takes a certain amount of time.

To illustrate this point, here are a few examples (Linux system calls):

read() is a blocking call. We need to pass it a file handle and a buffer to save the data, and return when the data is saved to the buffer. It has the advantage of being elegant yet simple.
epoll_create(), epoll_ctl() and epoll_wait() can be used to create a group of handles to monitor, add/delete handles in this group, and block the program until there is any activity on the handle. These system calls allow you to efficiently control a large number of I/O operations using only a single thread. These features, while very useful, are quite complex to use.

It is important to understand the order of magnitude of the time difference here. If an unoptimized CPU core runs at 3GHz, it can execute 3 billion cycles per second (that's 3 cycles per nanosecond). A non-blocking system call may take more than 10 cycles, or a few nanoseconds. Blocking calls to receive information from the network may take longer, say 200 milliseconds (1/5 second).

For example, the non-blocking call took 20 nanoseconds and the blocking call took 200,000,000 nanoseconds. In this way, the process may have to wait 10 million cycles to block the call.

The kernel provides two methods: blocking I/O ("read data from the network") and non-blocking I/O ("tell me when there is new data on the network connection"), and both The length of time the mechanism blocks the calling process is completely different.

Scheduling

The third very critical thing is what happens when a lot of threads or processes start to block.

For us, there is not much difference between threads and processes. In reality, the most significant difference related to performance is that since threads share the same memory and each process has its own memory space, a single process tends to occupy more memory. However, when we talk about scheduling, we are actually talking about completing a series of things, and each thing requires a certain amount of execution time on the available CPU cores.

If you have 8 cores running 300 threads, then you have to slice the time so that each thread gets its time slice and each core runs for a short time and then switches to the next thread. This is done via a "context switch", which allows the CPU to switch from one thread/process to the next.

This kind of context switching has a certain cost, that is, it takes a certain amount of time. It may be less than 100 nanoseconds when it is fast, but if the implementation details, processor speed/architecture, CPU cache and other software and hardware are different, it is normal to take 1000 nanoseconds or longer.

The greater the number of threads (or processes), the greater the number of context switches. If there are thousands of threads, and each thread takes hundreds of nanoseconds to switch, the system will become very slow.

However, a non-blocking call essentially tells the kernel "only call me when new data or events arrive on these connections." These non-blocking calls handle large I/O loads efficiently and reduce context switches.

It is worth noting that although the examples in this article are small, database access, external caching systems (memcache and the like), and anything that requires I/O will eventually perform some type of I/O Call, this is the same principle as the example.

There are many factors that affect the choice of programming language in a project. Even if you only consider performance, there are many factors. However, if you are worried that your program is mainly limited by I/O, and performance is an important factor in determining the success or failure of the project, then the following suggestions are what you need to consider.

"Keep It Simple": PHP

Back in the 1990s, there were a lot of people wearing Converse shoes writing CGI scripts in Perl. Then, PHP came and a lot of people liked it and it made it easier to create dynamic web pages.

The model used by PHP is very simple. Although it is impossible to be exactly the same, the general PHP server principle is as follows:

The user's browser issues an HTTP request, and the request enters the Apache web server. Apache creates a separate process for each request and reuses these processes through some optimization methods to minimize the operations that need to be performed (creating processes is relatively slow).

Apache calls PHP and tells it to run a certain .php file on disk.

PHP code starts executing and blocks I/O calls. The file_get_contents() you call in PHP actually calls the read() system call and waits for the returned result.

query('SELECT id, data FROM examples ORDER BY id DESC limit 100'); ?>

Copy after login

It’s simple: one process per request. I/O calls are blocking. What about the advantages? Simple yet effective. What about the disadvantages? If there are 20,000 concurrent clients, the server will be paralyzed. This approach is difficult to scale because the tools provided by the kernel for handling large amounts of I/O (epoll, etc.) are not fully utilized. Worse, running a separate process for each request tends to take up a lot of system resources, especially memory, which is often the first to be exhausted.

*Note: At this point, Ruby's situation is very similar to PHP's.

Multi-threading: Java

So, Java appeared. And Java has multi-threading built into the language, which is great especially when it comes to creating threads.

Most Java web servers will start a new execution thread for each request, and then call the developer-written function in this thread.

Performing I/O in Java Servlet is often like this:

publicvoiddoGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { // blocking file I/O InputStream fileIs = new FileInputStream("/path/to/file"); // blocking network I/O URLConnection urlConnection = (new URL("http://example.com/example-microservice")).openConnection(); InputStream netIs = urlConnection.getInputStream(); // some more blocking network I/O out.println("..."); }

Copy after login

Since the above doGet method corresponds to a request and runs in its own thread, rather than requiring independent It runs in a separate process of memory, so we will create a separate thread. Each request gets a new thread, and various I/O operations are blocked inside that thread until the request is processed. The application will create a thread pool to minimize the cost of creating and destroying threads, but thousands of connections mean thousands of threads, which is not a good thing for the scheduler.

It is worth noting that version 1.4 of Java (re-upgraded in version 1.7) adds the ability to call non-blocking I/O. Although most applications don't use this feature, it is at least available. Some Java web servers are experimenting with this feature, but the vast majority of deployed Java applications still work according to the principles described above.

Java provides many out-of-the-box features for I/O, but if you encounter the situation of creating a large number of blocking threads to perform a large number of I/O operations, Java does not have a good solution.

Make non-blocking I/O a top priority: Node

The one that performs better in I/O and is more popular among users is Node.js. Anyone with a basic understanding of Node knows that it is "non-blocking" and handles I/O efficiently. This is true in a general sense. But the details and the way it's implemented matter.

When you need to do some operations involving I/O, you need to make a request and give a callback function. Node will call this function after processing the request.

Typical code to perform I/O operations in a request is as follows:

http.createServer(function(request, response) { fs.readFile('/path/to/file', 'utf8', function(err, data) { response.end(data); }); });

Copy after login

As shown above, there are two callback functions. The first function is called when the request starts, and the second function is called when the file data is available.

这样，Node就能更有效地处理这些回调函数的I/O。有一个更能说明问题的例子：在Node中调用数据库操作。首先，你的程序开始调用数据库操作，并给Node一个回调函数，Node会使用非阻塞调用来单独执行I/O操作，然后在请求的数据可用时调用你的回调函数。这种对I/O调用进行排队并让Node处理I/O调用然后得到一个回调的机制称为“事件循环”。这个机制非常不错。

然而，这个模型有一个问题。在底层，这个问题出现的原因跟V8 JavaScript引擎（Node使用的是Chrome的JS引擎）的实现有关，即：你写的JS代码都运行在一个线程中。请思考一下。这意味着，尽管使用高效的非阻塞技术来执行I/O，但是JS代码在单个线程操作中运行基于CPU的操作，每个代码块都会阻塞下一个代码块的运行。有一个常见的例子：在数据库记录上循环，以某种方式处理记录，然后将它们输出到客户端。下面这段代码展示了这个例子的原理：

var handler = function(request, response) { connection.query('SELECT ...', function(err, rows) {if (err) { throw err }; for (var i = 0; i < rows.length; i++) { // do processing on each row } response.end(...); // write out the results }) };

Copy after login

虽然Node处理I/O的效率很高，但是上面例子中的for循环在一个主线程中使用了CPU周期。这意味着如果你有10000个连接，那么这个循环就可能会占用整个应用程序的时间。每个请求都必须要在主线程中占用一小段时间。

这整个概念的前提是I/O操作是最慢的部分，因此，即使串行处理是不得已的，但对它们进行有效处理也是非常重要的。这在某些情况下是成立的，但并非一成不变。

另一点观点是，写一堆嵌套的回调很麻烦，有些人认为这样的代码很丑陋。在Node代码中嵌入四个、五个甚至更多层的回调并不罕见。

又到了权衡利弊的时候了。如果你的主要性能问题是I/O的话，那么这个Node模型能帮到你。但是，它的缺点在于，如果你在一个处理HTTP请求的函数中放入了CPU处理密集型代码的话，一不小心就会让每个连接都出现拥堵。

原生无阻塞：Go

在介绍Go之前，我透露一下，我是一个Go的粉丝。我已经在许多项目中使用了Go。

让我们看看它是如何处理I/O的吧。 Go语言的一个关键特性是它包含了自己的调度器。它并不会为每个执行线程对应一个操作系统线程，而是使用了“goroutines”这个概念。Go运行时会为一个goroutine分配一个操作系统线程，并控制它执行或暂停。Go HTTP服务器的每个请求都在一个单独的Goroutine中进行处理。

实际上，除了回调机制被内置到I/O调用的实现中并自动与调度器交互之外，Go运行时正在做的事情与Node不同。它也不会受到必须让所有的处理代码在同一个线程中运行的限制，Go会根据其调度程序中的逻辑自动将你的Goroutine映射到它认为合适的操作系统线程中。因此，它的代码是这样的：

func ServeHTTP(w http.ResponseWriter, r *http.Request) { // the underlying network call here is non-blocking rows, err := db.Query("SELECT ...") for _, row := range rows { // do something with the rows,// each request in its own goroutine } w.Write(...) // write the response, also non-blocking }

Copy after login

如上所示，这样的基本代码结构更为简单，而且还实现了非阻塞I/O。

在大多数情况下，这真正做到了“两全其美”。非阻塞I/O可用于所有重要的事情，但是代码却看起来像是阻塞的，因此这样往往更容易理解和维护。剩下的就是Go调度程序和OS调度程序之间的交互处理了。这并不是魔法，如果你正在建立一个大型系统，那么还是值得花时间去了解它的工作原理的。同时，“开箱即用”的特点使它能够更好地工作和扩展。

Go可能也有不少缺点，但总的来说，它处理I/O的方式并没有明显的缺点。

性能评测

对于这些不同模型的上下文切换，很难进行准确的计时。当然，我也可以说这对你并没有多大的用处。这里，我将对这些服务器环境下的HTTP服务进行基本的性能评测比较。请记住，端到端的HTTP请求/响应性能涉及到的因素有很多。

我针对每一个环境都写了一段代码来读取64k文件中的随机字节，然后对其运行N次SHA-256散列（在URL的查询字符串中指定N，例如.../test.php?n=100）并以十六进制打印结果。我之所以选择这个，是因为它可以很容易运行一些持续的I/O操作，并且可以通过受控的方式来增加CPU使用率。

在这种存在大量连接和计算的情况下，我们看到的结果更多的是与语言本身的执行有关。请注意，“脚本语言”的执行速度最慢。

Suddenly, Node's performance drops significantly as CPU-intensive operations in each request block each other. Interestingly, in this test, PHP's performance got better (relative to the others), even better than Java. (It's worth noting that in PHP, the implementation of SHA-256 is written in C, but the execution path takes more time in this loop because we do 1000 hash iterations this time).

I guess that at a higher number of connections, the application for new processes and memory in PHP Apache seems to be the main factor affecting PHP performance. Obviously, Go is the winner this time, followed by Java, Node, and finally PHP.

While there are many factors involved in overall throughput, and they vary widely from application to application, the more you understand the underlying principles and the trade-offs involved, the more you understand. The application will perform better.

Summary

To summarize, as languages evolve, so do the solutions for large applications that handle large amounts of I/O.

To be fair, both PHP and Java have available non-blocking I/O implementations for web applications. However, these implementations are not as widely used as the methods described above, and there are maintenance overheads to consider. Not to mention that the application's code must be structured in a way that is suitable for this environment.

Let’s compare several important factors that affect performance and ease of use:

Language	Threads and processes	Non-blocking I/O	Easy to use
PHP	Process	No	-
Java	Thread	Valid	Requires callback
Node.js	Thread	is	requires callback
Go	Thread(Goroutines )	Yes	No need for callback

Because threads will share the same memory space, but processes will not, threads are usually larger than The process is much more memory efficient. In the above list, looking from top to bottom, the I/O-related factors are better than the last. So, if I had to pick a winner in the comparison above, it would definitely be the Go.

That said, in practice, choosing the environment in which you build your application is closely related to your team's familiarity with the environment and the overall productivity your team can achieve. So, using Node or Go to develop web applications and services may not be the best choice for teams.

Hopefully this helps you understand more clearly what's going on under the hood and provides you with some suggestions on how to handle application scalability.

Recommended learning:php video tutorial

The above is the detailed content of Node, PHP, Java and Go server I/O performance competition, who do you think will win?. For more information, please follow other related articles on the PHP Chinese website!