Pagination is a critical part of any database operation when dealing with large datasets. It allows you to split data into manageable chunks, making it easier to browse, process, and display. MongoDB provides two common pagination methods: offset-based and cursor-based. While both methods serve the same purpose, they differ significantly in performance and usability, especially as the dataset grows.
Let's dive into the two approaches and see why cursor-based pagination often outperforms offset-based pagination.
Offset-based pagination is straightforward. It retrieves a specific number of records starting from a given offset. For example, the first page might retrieve records 0-9, the second page retrieves records 10-19, and so on.
However, this method has a significant drawback: as you move to higher pages, the query becomes slower. This is because the database needs to skip over the records from the previous pages, which involves scanning through them.
Here’s the code for offset-based pagination:
async function offset_based_pagination(params) { const { page = 5, limit = 100 } = params; const skip = (page - 1) * limit; const results = await collection.find({}).skip(skip).limit(limit).toArray(); console.log(`Offset-based pagination (Page ${page}):`, results.length, "page", page, "skip", skip, "limit", limit); }
Cursor-based pagination, also known as keyset pagination, relies on a unique identifier (e.g., an ID or timestamp) to paginate through the records. Instead of skipping a certain number of records, it uses the last retrieved record as the reference point for fetching the next set.
This approach is more efficient because it avoids the need to scan the records before the current page. As a result, the query time remains consistent, regardless of how deep into the dataset you go.
Here's the code for cursor-based pagination:
async function cursor_based_pagination(params) { const { lastDocumentId, limit = 100 } = params; const query = lastDocumentId ? { documentId: { $gt: lastDocumentId } } : {}; const results = await collection .find(query) .sort({ documentId: 1 }) .limit(limit) .toArray(); console.log("Cursor-based pagination:", results.length); }
In this example, lastDocumentId is the ID of the last document from the previous page. When querying for the next page, the database fetches documents with an ID greater than this value, ensuring a seamless transition to the next set of records.
Let’s see how these two methods perform with a large dataset.
async function testMongoDB() { console.time("MongoDB Insert Time:"); await insertMongoDBRecords(); console.timeEnd("MongoDB Insert Time:"); // Create an index on the documentId field await collection.createIndex({ documentId: 1 }); console.log("Index created on documentId field"); console.time("Offset-based pagination Time:"); await offset_based_pagination({ page: 2, limit: 250000 }); console.timeEnd("Offset-based pagination Time:"); console.time("Cursor-based pagination Time:"); await cursor_based_pagination({ lastDocumentId: 170000, limit: 250000 }); console.timeEnd("Cursor-based pagination Time:"); await client.close(); }
In the performance test, you’ll notice that the offset-based pagination takes longer as the page number increases, whereas the cursor-based pagination remains consistent, making it the better choice for large datasets. This example also demonstrates the power of indexing as well. Try to remove index & then see the result as well!
Without an index, MongoDB would need to perform a collection scan, which means it has to look at each document in the collection to find the relevant data. This is inefficient, especially when your dataset grows. Indexes allow MongoDB to efficiently to find the documents that match your query conditions, significantly speeding up query performance.
In the context of cursor-based pagination, the index ensures that fetching the next set of documents (based on documentId) is quick and does not degrade in performance as more documents are added to the collection.
While offset-based pagination is easy to implement, it can become inefficient with large datasets due to the need to scan through records. Cursor-based pagination, on the other hand, provides a more scalable solution, keeping performance consistent regardless of the dataset size. If you are working with large collections in MongoDB, it’s worth considering cursor-based pagination for a smoother and faster experience.
const { MongoClient } = require("mongodb"); const uri = "mongodb://localhost:27017"; const client = new MongoClient(uri); client.connect(); const db = client.db("testdb"); const collection = db.collection("testCollection"); async function insertMongoDBRecords() { try { let bulkOps = []; for (let i = 0; i < 2000000; i++) { bulkOps.push({ insertOne: { documentId: i, name: `Record-${i}`, value: Math.random() * 1000, }, }); // Execute every 10000 operations and reinitialize if (bulkOps.length === 10000) { await collection.bulkWrite(bulkOps); bulkOps = []; } } if (bulkOps.length > 0) { await collection.bulkWrite(bulkOps); console.log("? Inserted records till now -> ", bulkOps.length); } console.log("MongoDB Insertion Completed"); } catch (err) { console.error("Error in inserting records", err); } } async function offset_based_pagination(params) { const { page = 5, limit = 100 } = params; const skip = (page - 1) * limit; const results = await collection.find({}).skip(skip).limit(limit).toArray(); console.log(`Offset-based pagination (Page ${page}):`, results.length, "page", page, "skip", skip, "limit", limit); } async function cursor_based_pagination(params) { const { lastDocumentId, limit = 100 } = params; const query = lastDocumentId ? { documentId: { $gt: lastDocumentId } } : {}; const results = await collection .find(query) .sort({ documentId: 1 }) .limit(limit) .toArray(); console.log("Cursor-based pagination:", results.length); } async function testMongoDB() { console.time("MongoDB Insert Time:"); await insertMongoDBRecords(); console.timeEnd("MongoDB Insert Time:"); // Create an index on the documentId field await collection.createIndex({ documentId: 1 }); console.log("Index created on documentId field"); console.time("Offset-based pagination Time:"); await offset_based_pagination({ page: 2, limit: 250000 }); console.timeEnd("Offset-based pagination Time:"); console.time("Cursor-based pagination Time:"); await cursor_based_pagination({ lastDocumentId: 170000, limit: 250000 }); console.timeEnd("Cursor-based pagination Time:"); await client.close(); } testMongoDB();
The above is the detailed content of Unleashing MongoDB: Why Cursor-Based Pagination Outperforms Offset-Based Pagination Every Time!. For more information, please follow other related articles on the PHP Chinese website!