Backend Development
PHP Tutorial
How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

##Interview Questions & Real Experience
Interview question: How to achieve deep paging when the amount of data is large?You may encounter the above questions during interviews or when preparing for interviews. Most of the answers are basically to divide databases and tables to build indexes. This is a very standard correct answer, but Reality is always very hard, so the interviewer will usually ask you, now that the construction period is insufficient and the personnel are insufficient, how can we achieve deep paging? At this time, students who have no practical experience are basically numb. So, please listen to me.
Painful Lessons
First of all, it must be clear: depth paging can be done, but depth is random Page jumps absolutely need to be banned. Previous picture:
Why random depth page jumps cannot be allowed
Let’s briefly talk about why random depth page jumps cannot be allowed from a technical point of view, or that Why is deep paging not recommended?MySQL
The basic principle of paging:SELECT * FROM test ORDER BY id DESC LIMIT 10000, 20;LIMIT 10000, 20 means scanning 10020 rows that meet the conditions and throwing them away Drop the first 10,000 lines and return the last 20 lines. If it is LIMIT 1000000, 100, 1000100 rows need to be scanned. In a highly concurrent application, each query needs to scan more than 100W rows. It would be strange if it does not explode.
MongoDB
The basic principle of paging:db.t_data.find().limit(5).skip(5);Similarly, as the page number increases, the items skipped by skip will also increase. becomes larger, and this operation is implemented through the iterator of the cursor. The consumption of the CPU will be very obvious. When the page number is very large and frequent, it will inevitably explode.
ElasticSearch
From a business perspective, ElasticSearch is not a typical database. It is a search engine. If the desired data is not found under the filter conditions , we will not find the data we want if we continue deep paging. To take a step back, if we use ES as a database for query, we will definitely encounter the limit of max_result_window when paging. Did you see it? Officials tell you the maximum The offset limit is ten thousand. Query process:- If you query page 501, with 10 items per page, the client sends a request to a certain node
- This node broadcasts data to each shard, and each shard queries the first 5010 pieces of data.
- The query results are returned to the node, and then the data is integrated and the first 5010 pieces of data are retrieved.
- Return to the client
Align with the product again
As the saying goes, problems that cannot be solved by technology should be solved by business! During my internship, I believed in the evil of the product, and it was necessary to implement deep paging and page jumps. Now we must correct the chaos, and the following changes must be made in the business: Add default filtering conditions as much as possible, such as : Time period, the purpose is to reduce the amount of data displayedModify the display method of page jumps, change it to scrolling display, or jump pages in a small rangeScrolling display reference picture:

##General solutionThe quick solution in a short period of time mainly includes the following points:
- Required: For sorting fields and filter conditions, the index must be set
- Core: Use known data of small range page numbers, or known data of rolling loading, to reduce the offset
- Extra: If you encounter a situation that is difficult to handle, You can also obtain excess data and intercept it to a certain extent, and the performance impact will not be significant
Original paging SQL:
# 第一页 SELECT * FROM `year_score` where `year` = 2017 ORDER BY id limit 0, 20; # 第N页 SELECT * FROM `year_score` where `year` = 2017 ORDER BY id limit (N - 1) * 20, 20;
Through context, rewritten as:
# XXXX 代表已知的数据 SELECT * FROM `year_score` where `year` = 2017 and id > XXXX ORDER BY id limit 20;
在 没内鬼,来点干货!SQL优化和诊断 一文中提到过,LIMIT会在满足条件下停止查询,因此该方案的扫描总量会急剧减少,效率提升Max!
ES
方案和MySQL相同,此时我们就可以随用所欲的使用 FROM-TO Api,而且不用考虑最大限制的问题。
MongoDB
方案基本类似,基本代码如下:

相关性能测试:

如果非要深度随机跳页
如果你没有杠过产品经理,又该怎么办呢,没关系,还有一丝丝的机会。
在 SQL优化 一文中还提到过MySQL深度分页的处理技巧,代码如下:
# 反例(耗时129.570s) select * from task_result LIMIT 20000000, 10; # 正例(耗时5.114s) SELECT a.* FROM task_result a, (select id from task_result LIMIT 20000000, 10) b where a.id = b.id; # 说明 # task_result表为生产环境的一个表,总数据量为3400万,id为主键,偏移量达到2000万
该方案的核心逻辑即基于聚簇索引,在不通过回表的情况下,快速拿到指定偏移量数据的主键ID,然后利用聚簇索引进行回表查询,此时总量仅为10条,效率很高。
因此我们在处理MySQL,ES,MongoDB时,也可以采用一样的办法:
限制获取的字段,只通过筛选条件,深度分页获取主键ID
通过主键ID定向查询需要的数据
瑕疵:当偏移量非常大时,耗时较长,如文中的 5s
推荐教程:《MySQL教程》
文章来源:https://juejin.im/post/5f0de4d06fb9a07e8a19a641
The above is the detailed content of How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?. For more information, please follow other related articles on the PHP Chinese website!
How to generate a PDF in phpAug 27, 2025 am 12:08 AMUse the FPDF library to easily generate PDFs. First install FPDF through Composer, then create a PHP file, instantiate the FPDF object, add pages, set fonts, insert text content, and finally call the Output() method to output or download the PDF file.
PHP: Efficiently merge multidimensional arrays to build structured datasetsAug 26, 2025 pm 06:03 PMThis article will dive into how multiple independent arrays (e.g., page ID, location, and priority) can be merged into a structured single array in PHP, where each element is an associative array containing related attributes. We will focus on the various implementations of using the array_map function combined with compact or array_combine, and discuss its flexibility and applicable scenarios, aiming to help developers process array data in a professional and efficient way.
Dynamically generate form fields based on user inputAug 26, 2025 pm 06:00 PMThis article describes how to use PHP to dynamically generate form fields based on user inputs in previous fields. With simple example code, it shows how to cycle through the corresponding number of text input boxes based on the number specified by the user and process the submitted data. Suitable for beginners to learn how to handle dynamic form generation in PHP.
Detailed explanation of phpMyAdmin multi-server configuration: Solve the problem of not displaying multiple database connections on the login interfaceAug 26, 2025 pm 05:57 PMThis tutorial aims to solve the problem that the phpMyAdmin login interface cannot display multiple database server options. By parsing the config.inc.php configuration file in detail, we will demonstrate how to use the officially recommended multi-host configuration method, use a loop structure to define multiple database connections, and ensure that all servers can be displayed normally and available for selection when phpMyAdmin login, thereby improving database management efficiency.
PHP Association Array Conversion: Dynamic Key Name Reconstructs Data StructureAug 26, 2025 pm 05:54 PMThis article describes how to use PHP to convert a specific associative array structure into another one that is more accessible and manipulated. The core is to recombinate the scattered data in the original array according to specific rules through loop traversal and dynamic key name assignment, and finally form a new associative array with name as keys, including ranking and amount information.
Laravel Eloquent Model Traversal: Solve Unexpected Output of Foreach LoopAug 26, 2025 pm 05:51 PMThis article aims to help developers understand the unexpected output that may occur when using foreach loops and provide solutions. Through the toArray() method, the Eloquent model can be converted into an array, thereby correctly traversing the properties and values of the model, avoiding accessing properties inside the model, and ensuring that the expected data is obtained.
Passing temporary arrays by reference in PHP: Principles, Methods and Best PracticesAug 26, 2025 pm 05:45 PMThis article explores the limitations of passing temporary arrays by reference in PHP, explains why passing temporary arrays directly causes errors, and provides a method to indirectly pass temporary array references through intermediate functions. At the same time, the applicable scenarios and potential problems of this practice are analyzed, and the principle that code readability and maintenance should be given priority in actual development.
PHP trim() function strategy for handling newlines in CSV file processingAug 26, 2025 pm 05:36 PMWhen using the PHP trim() function to process CSV files, if you find that the line end commas cannot be removed, the core reason is often the difference in line breaks between different operating systems. exploit(PHP_EOL, $csv) may fail to completely remove invisible line breaks at the end of the line, causing trim() to fail to recognize and remove the target character. The solution is to extend the character mask of trim() so that it processes commas, carriage return (\r) and line breaks (\n) at the same time to ensure thorough data cleaning.


Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 Linux new version
SublimeText3 Linux latest version

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function





