How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

Guanhui

Jul 27, 2020 pm 05:24 PM

mysql

##Interview Questions & Real Experience

Interview question: How to achieve deep paging when the amount of data is large?

You may encounter the above questions during interviews or when preparing for interviews. Most of the answers are basically to divide databases and tables to build indexes. This is a very standard correct answer, but Reality is always very hard, so the interviewer will usually ask you, now that the construction period is insufficient and the personnel are insufficient, how can we achieve deep paging?

At this time, students who have no practical experience are basically numb. So, please listen to me.

Painful Lessons

First of all, it must be clear: depth paging can be done, but depth is random Page jumps absolutely need to be banned.

Previous picture:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

Guess, if I click on page 142360, will the service explode?

Like MySQL, MongoDB database is okay. It is a professional database in itself. The processing is not good, and at most it is slow. But if it involves ES, the nature is different. We have to use SearchAfter Api to loop Obtaining data involves the issue of memory usage. If the code is not written elegantly, it may directly lead to memory overflow.

Why random depth page jumps cannot be allowed

Let’s briefly talk about why random depth page jumps cannot be allowed from a technical point of view, or that Why is deep paging not recommended?

MySQL

The basic principle of paging:

SELECT * FROM test ORDER BY id DESC LIMIT 10000, 20;

LIMIT 10000, 20 means scanning 10020 rows that meet the conditions and throwing them away Drop the first 10,000 lines and return the last 20 lines. If it is LIMIT 1000000, 100, 1000100 rows need to be scanned. In a highly concurrent application, each query needs to scan more than 100W rows. It would be strange if it does not explode.

MongoDB

The basic principle of paging:

db.t_data.find().limit(5).skip(5);

Similarly, as the page number increases, the items skipped by skip will also increase. becomes larger, and this operation is implemented through the iterator of the cursor. The consumption of the CPU will be very obvious. When the page number is very large and frequent, it will inevitably explode.

ElasticSearch

From a business perspective, ElasticSearch is not a typical database. It is a search engine. If the desired data is not found under the filter conditions , we will not find the data we want if we continue deep paging. To take a step back, if we use ES as a database for query, we will definitely encounter the limit of max_result_window when paging. Did you see it? Officials tell you the maximum The offset limit is ten thousand.

Query process:

If you query page 501, with 10 items per page, the client sends a request to a certain node
This node broadcasts data to each shard, and each shard queries the first 5010 pieces of data.
The query results are returned to the node, and then the data is integrated and the first 5010 pieces of data are retrieved.
Return to the client

From this we can see why it is necessary to limit the offset. In addition, if you use a scrolling method such as Search After API's deep page jump query also requires scrolling thousands of items each time. It may be necessary to scroll millions or tens of millions of pieces of data in total, just for the last 20 pieces of data. The efficiency can be imagined.

Align with the product again

As the saying goes, problems that cannot be solved by technology should be solved by business!

During my internship, I believed in the evil of the product, and it was necessary to implement deep paging and page jumps. Now we must correct the chaos, and the following changes must be made in the business:

Add default filtering conditions as much as possible, such as : Time period, the purpose is to reduce the amount of data displayed

Modify the display method of page jumps, change it to scrolling display, or jump pages in a small range

Scrolling display reference picture:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

Small-scale page jump reference picture:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

##General solutionThe quick solution in a short period of time mainly includes the following points:

MySQL

Original paging SQL:

# 第一页
SELECT * FROM `year_score` where `year` = 2017 ORDER BY id limit 0, 20;
# 第N页
SELECT * FROM `year_score` where `year` = 2017 ORDER BY id limit (N - 1) * 20, 20;

Through context, rewritten as:

# XXXX 代表已知的数据
SELECT * FROM `year_score` where `year` = 2017 and id > XXXX ORDER BY id limit 20;

在没内鬼，来点干货！SQL优化和诊断一文中提到过，LIMIT会在满足条件下停止查询，因此该方案的扫描总量会急剧减少，效率提升Max！

方案和MySQL相同，此时我们就可以随用所欲的使用 FROM-TO Api，而且不用考虑最大限制的问题。

MongoDB

方案基本类似，基本代码如下：

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

相关性能测试：

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

如果非要深度随机跳页

如果你没有杠过产品经理，又该怎么办呢，没关系，还有一丝丝的机会。

在 SQL优化一文中还提到过MySQL深度分页的处理技巧，代码如下：

# 反例（耗时129.570s）
select * from task_result LIMIT 20000000, 10;
# 正例（耗时5.114s）
SELECT a.* FROM task_result a, (select id from task_result LIMIT 20000000, 10) b where a.id = b.id;
# 说明
# task_result表为生产环境的一个表，总数据量为3400万，id为主键，偏移量达到2000万

该方案的核心逻辑即基于聚簇索引，在不通过回表的情况下，快速拿到指定偏移量数据的主键ID，然后利用聚簇索引进行回表查询，此时总量仅为10条，效率很高。

因此我们在处理MySQL，ES，MongoDB时，也可以采用一样的办法：

限制获取的字段，只通过筛选条件，深度分页获取主键ID
通过主键ID定向查询需要的数据

瑕疵：当偏移量非常大时，耗时较长，如文中的 5s

推荐教程：《MySQL教程》

文章来源：https://juejin.im/post/5f0de4d06fb9a07e8a19a641

The above is the detailed content of How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:juejin. If there is any infringement, please contact admin@php.cn delete

How to generate a PDF in phpAug 27, 2025 am 12:08 AM

Use the FPDF library to easily generate PDFs. First install FPDF through Composer, then create a PHP file, instantiate the FPDF object, add pages, set fonts, insert text content, and finally call the Output() method to output or download the PDF file.

PHP: Efficiently merge multidimensional arrays to build structured datasetsAug 26, 2025 pm 06:03 PM

This article will dive into how multiple independent arrays (e.g., page ID, location, and priority) can be merged into a structured single array in PHP, where each element is an associative array containing related attributes. We will focus on the various implementations of using the array_map function combined with compact or array_combine, and discuss its flexibility and applicable scenarios, aiming to help developers process array data in a professional and efficient way.

Dynamically generate form fields based on user inputAug 26, 2025 pm 06:00 PM

This article describes how to use PHP to dynamically generate form fields based on user inputs in previous fields. With simple example code, it shows how to cycle through the corresponding number of text input boxes based on the number specified by the user and process the submitted data. Suitable for beginners to learn how to handle dynamic form generation in PHP.

Detailed explanation of phpMyAdmin multi-server configuration: Solve the problem of not displaying multiple database connections on the login interfaceAug 26, 2025 pm 05:57 PM

This tutorial aims to solve the problem that the phpMyAdmin login interface cannot display multiple database server options. By parsing the config.inc.php configuration file in detail, we will demonstrate how to use the officially recommended multi-host configuration method, use a loop structure to define multiple database connections, and ensure that all servers can be displayed normally and available for selection when phpMyAdmin login, thereby improving database management efficiency.

PHP Association Array Conversion: Dynamic Key Name Reconstructs Data StructureAug 26, 2025 pm 05:54 PM

This article describes how to use PHP to convert a specific associative array structure into another one that is more accessible and manipulated. The core is to recombinate the scattered data in the original array according to specific rules through loop traversal and dynamic key name assignment, and finally form a new associative array with name as keys, including ranking and amount information.

Laravel Eloquent Model Traversal: Solve Unexpected Output of Foreach LoopAug 26, 2025 pm 05:51 PM

This article aims to help developers understand the unexpected output that may occur when using foreach loops and provide solutions. Through the toArray() method, the Eloquent model can be converted into an array, thereby correctly traversing the properties and values of the model, avoiding accessing properties inside the model, and ensuring that the expected data is obtained.

Passing temporary arrays by reference in PHP: Principles, Methods and Best PracticesAug 26, 2025 pm 05:45 PM

This article explores the limitations of passing temporary arrays by reference in PHP, explains why passing temporary arrays directly causes errors, and provides a method to indirectly pass temporary array references through intermediate functions. At the same time, the applicable scenarios and potential problems of this practice are analyzed, and the principle that code readability and maintenance should be given priority in actual development.

PHP trim() function strategy for handling newlines in CSV file processingAug 26, 2025 pm 05:36 PM

When using the PHP trim() function to process CSV files, if you find that the line end commas cannot be removed, the core reason is often the difference in line breaks between different operating systems. exploit(PHP_EOL, $csv) may fail to completely remove invisible line breaks at the end of the line, causing trim() to fail to recognize and remove the target character. The solution is to extend the character mask of trim() so that it processes commas, carriage return (\r) and line breaks (\n) at the same time to ensure thorough data cleaning.

See all articles

Hot AI Tools

Undress AI Tool

Undress images for free

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to report an impersonation account on Instagram

3 weeks agoBy下次还敢

How to Change ChatGPT Personality in Settings (Cynic, Robot, Listener, Nerd)

2 weeks agoByDDD

Best 123Movies Alternatives in 2025 (Free & Legal Streaming Options)

4 weeks agoByDDD

How to Fight Eris in Neon Abyss

2 weeks agoByJack chen

Wuchang: Fallen Feathers - Dragon Emperor Zhu Youjian Boss Fight Guide

3 weeks agoByDDD

Hot Tools

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 Linux new version

SublimeText3 Linux latest version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

Hot Topics

PHP Tutorial

1592

276