This article was first published on InfoQ Chinese site. Author: Ming Ling (dragon), Fenng. Note: Friends who want to reprint please note the first author of this article!
This article is a summary made by dragon’s friend after he emailed to discuss it. Sorting in the DB or in the application is a very interesting topic. Dragon actually summarized it very well in his first email. I just added a few suggestions. Put it up now and share it with everyone. This article has also been submitted to InfoQ Chinese Station.
Q: List the reasons why performing sorting in PHP is better than sorting in MYSQL? Give some examples where sorting has to be done in MYSQL?
A: Generally speaking, execution efficiency needs to consider the load conditions of CPU, memory and hard disk. Assuming that the MYSQL server and PHP server have been configured in the most suitable way, then the scalability of the system (Scalability) and User-perceived Performance is the main goal we pursue. In actual operation, data in MYSQL is often stored in memory in HASH tables, BTREE, etc., and the operation speed is very fast; at the same time, INDEX has already performed some pre-sorting; in many applications, MYSQL sorting is the first choice. Sorting in the application layer (PHP) must also be performed in memory. Compared with MYSQL, it has the following advantages:
- 1. Considering the scalability and overall performance of the entire website, sorting in the application layer (PHP) is obvious It will reduce the load on the database, thus improving the scalability of the entire website. In fact, the cost of sorting in the database is very high, consuming memory and CPU. If there are many concurrent sortings, the DB can easily reach a bottleneck.
- 2. If there is a data middle layer between the application layer (PHP) and MYSQL, and it is properly utilized, PHP will have better benefits.
- 3. PHP’s in-memory data structure is specially designed for specific applications and is more concise and efficient than databases;
- 4. PHP does not need to consider data disaster recovery issues and can reduce this part of the operation loss;
- 5. There is no table locking problem in PHP;
- 6. Sorting in MYSQL, requesting and returning results also need to be carried out through a network connection, while in PHP you can return directly after sorting, reducing network IO.
As for the execution speed, the difference should not be huge, unless there is a problem with the application design, causing a lot of unnecessary network IO. In addition, the application layer should pay attention to PHP's Cache settings. If it is exceeded, an internal error will be reported. At this time, it is necessary to evaluate or adjust the Cache according to the application. The specific choice will depend on the specific application.
List some situations where sorting in PHP is better:
- 1. The data source is not in MYSQL, there are hard disks, memory or requests from the network, etc.;
- 2. The data is stored in MYSQL, the amount is not large, and there is no corresponding index. At this time, it is faster to take out the data and sort it with PHP;
- 3. The data source comes from multiple MYSQL servers. At this time, it is faster to retrieve the data from multiple MYSQL and then sort it in PHP;
- 4. In addition to MYSQL, there are other data sources, such as hard disk, memory or requests from the network. At this time, it is not suitable to store these data in MYSQL and then sort them;
List some data that must be sorted in MYSQL Example:
- 1. This sorted index already exists in MYSQL;
- 2. The amount of data in MYSQL is large, and the result set requires a very small subset; for example, if there are 1,000,000 rows of data, take the TOP 10;
- 3. For situations where one sorting and multiple calls are required, such as statistical aggregation, which can be provided to different services, then sorting in MYSQL is preferred. In addition, for deep data mining, the usual approach is to complete complex operations such as sorting at the application layer, and then store the results in MYSQL for easy use multiple times.
- 4. No matter where the data source comes from, when the amount of data reaches a certain scale, it is no longer suitable for sorting in PHP due to the memory/Cache occupied. At this time, the data should be copied, imported or stored in MYSQL, and optimized with INDEX , is better than PHP. However, it would be better to handle such operations in Java or even C++. [Some data similar to the aggregation or summary of large data sets, sorting on the client side outweighs the gain and loss. Of course, ideas similar to search engines can also be used to solve similar application situations. ]
From the overall consideration of the website, manpower and cost considerations must be included. If the website size and load are small, and the manpower is limited (the number of people and capabilities may be limited), sorting at the application layer (PHP) requires a lot of development and debugging work, which is time-consuming and not worth the loss; it is better to process it in DB, Simple and fast. For large-scale websites, electricity and server costs are very high. Careful planning on system architecture can save a lot of costs, which is necessary for the company's sustainable development. At this time, if the application layer (PHP) can be sorted and meet business needs , try to do it at the application layer.
The above introduces the sorting of temporary internet files in PHP and the sorting in MySQL, including the content of temporary internet files. I hope it will be helpful to friends who are interested in PHP tutorials.