mongodb - pymongo count is slow

Question

Thirty thousand pieces of data, each piece of data only contains a random number {"digit": random number}. Requirements: Count the most frequent numbers in the database table table {code...} It takes five or six minutes to run once. Use multi-threading to run it. 100 is not much faster, and the fan is very loud... What is the correct posture?

迷茫 · Answer

The correct posture is to use aggregation.

db.table.aggregate([
    {$group: {_id: "$digit", count: {$sum: 1}}},    // 统计每个数字出现的次数
    {$sort: {count: -1}},    // 逆序排列
    {$limit: 1}    // 取第1条记录
]);

Users of $group can refer to the documentation.
It should be noted that the possibility of such a demand appearing in reality is not high. It is estimated that this is a practice question for you. In fact, even if Aggregatoin is used, it is still necessary to traverse all the data in the entire collection to find the most frequent number. Therefore, when the total number of records in the collection is relatively large, such a full table traversal operation cannot be fast. This kind of search method is usually only available in OLAP scenarios, and OLAP usually does not have high speed requirements. Therefore, only from a theoretical discussion, the aggregation framework should be used, but the real needs still require detailed analysis.