Home > Article > Backend Development > How to use MongoDB indexes

How to use MongoDB indexes

小云云Original: 2017-12-01 11:38:252521browse

In this article, we will share with you a detailed explanation of the use of MongoDB indexes. The index is like the table of contents of a book. If you search for a certain content without the help of the table of contents, you can only search and browse the entire article, which leads to very low efficiency; if you use In the case of a directory, you can quickly locate the area where the specific content is located, and the efficiency will increase linearly.

Introduction to Index

First open the command line and enter mongo. By default, mongodb will connect to the database named test.

➜ ~ mongo

MongoDB shell version: 2.4.9
connecting to: test
> show collections
>

You can use show collections/tables to check that the database is empty.

Then execute the following code in the mongodb command line terminal

> for(var i=0;i<100000;i++) {
... db.users.insert({username:'user'+i})
... }
> show collections
system.indexes
users
>

Then check the database and find that there are more system.indexes and There are two tables for users, the former is the so-called index, and the latter is a newly created database table.
In this way, there are 100,000 pieces of data in the user table.

> db.users.find()
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e4"), "username" : "user0" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e5"), "username" : "user1" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e6"), "username" : "user2" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e7"), "username" : "user3" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e8"), "username" : "user4" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e9"), "username" : "user5" }

Now we need to find any piece of data. For example,

> db.users.find({username: 'user1234'})
{ "_id" : ObjectId("5694d5db8fad9e319c5b48b6"), "username" : "user1234" }

found that this data was successfully found, but we need to know the detailed information and need to add the explain method

   
> db.users.find({username: 'user1234'}).explain()
{
  "cursor" : "BasicCursor",
  "isMultiKey" : false,
  "n" : 1,
  "nscannedObjects" : 100000,
  "nscanned" : 100000,
  "nscannedObjectsAllPlans" : 100000,
  "nscannedAllPlans" : 100000,
  "scanAndOrder" : false,
  "indexOnly" : false,
  "nYields" : 0,
  "nChunkSkips" : 0,
  "millis" : 30,
  "indexBounds" : {
      
  },
  "server" : "root:27017"
}

There are many parameters , currently we only focus on the two items "nscanned": 100000 and "millis": 30.

nscanned indicates the total number of documents scanned by mongodb during the completion of this query. It can be found that every document in the collection is scanned and the total time is 30 milliseconds.

If there are 10 million pieces of data, it will be traversed every time the document is queried. Well, the time is also quite considerable.

For such queries, indexing is a very good solution.

> db.users.ensureIndex({"username": 1})

Then search for user1234

> db.users.ensureIndex({"username": 1})
> db.users.find({username: 'user1234'}).explain()
{
  "cursor" : "BtreeCursor username_1",
  "isMultiKey" : false,
  "n" : 1,
  "nscannedObjects" : 1,
  "nscanned" : 1,
  "nscannedObjectsAllPlans" : 1,
  "nscannedAllPlans" : 1,
  "scanAndOrder" : false,
  "indexOnly" : false,
  "nYields" : 0,
  "nChunkSkips" : 0,
  "millis" : 0,
  "indexBounds" : {
    "username" : [
      [
        "user1234",
        "user1234"
      ]
    ]
  },
  "server" : "root:27017"
}

It is indeed a bit incredible. The query is completed in an instant because only one piece of data is found through the index, not 100,000.

Of course, using indexes also comes at a cost: for each index added, each write operation (insert, update, delete) will take more time. This is because when the data changes, not only does the document need to be updated, but all indexes on the level collection are updated. Therefore, mongodb limits each collection to a maximum of 64 indexes. Generally, you should not have more than two indexes on a particular collection.

Tips

If it is a very general query, or this query causes a performance bottleneck, then it is a very good choice to create an index on a certain field (such as username). But this field should not be indexed if it is only for queries used by administrators (who don’t care about the time it takes to query).

Compound index

The values of the index are arranged in a certain order, so sorting documents using the index key is very fast.

db.users.find().sort({'age': 1, 'username': 1})

Here, sort based on age first Then sort according to username, so username does not play a big role here. In order to optimize this sorting, it may be necessary to create indexes on age and username.

db.users.ensureIndex({'age':1, 'username': 1})
This creates a composite index (an index built on multiple fields). This index is very useful if the query conditions include multiple keys.

After the composite index is established, each index entry includes an age field and a username field, and points to the storage location of the document on disk.
At this time, the age field is arranged in strict ascending order. If the ages are equal, they are arranged in ascending order by username.

Query method

Point query

Used to query a single value (although there may be multiple documents containing this value)

db .users.find({'age': 21}).sort({'username': -1})

Because we have already established a composite index, an age For a username, ascending order (i.e. number 1) is used when establishing the index. When using point query to find {age: 21}, assuming there are still 100,000 pieces of data, there may be many people with the age of 21, so more than one piece of data will be found. . Then sort({'username': -1}) will sort these data in reverse order, which is the original intention. But let's not forget 'username' when creating an index: 1 is in ascending order (from small to large). If you want to get the reverse order, just start from the last index of the data and traverse in order to get the desired result.

The sorting direction is not important, mongodb can traverse the index from any direction.
To sum up, the composite index is very efficient in the case of point query. It directly locates the age and does not need to sort the results and return the results.

Multi-value query (multi-value-query)

db.users.find({'age': {"$gte": 21, "$lte": 30}})

Find documents that match multiple values. Multi-value queries can also be understood as multiple point queries.
As above, the age to be found is between 21 and 30. monogdb will use the first key "age" in the index to get matching results, and the results are usually arranged in index order.

db.users.find({'age': {"$gte": 21, "$lte": 30}}).sort({'username': 1})

与上一个类似，这次需要对结果排序。
在没有sort时，我们查询的结果首先是根据age等于21，age等于22..这样从小到大排序，当age等于21有多个时，在进行usernameA-Z（0-9）这样排序。所以，sort({'username': 1})，要将所有结果通过名字升序排列，这次不得不先在内存中进行排序，然后返回。效率不如上一个高。

当然，在文档非常少的情况，排序也花费不了多少时间。
如果结果集很大，比如超过32MB，MongoDB会拒绝对如此多的数据进行排序工作。

还有另外一种解决方案

也可以建立另外一个索引{'username': 1, 'age': 1}, 如果先对username建立索引，当再sortusername,相当没有进行排序。但是需要在整个文档查找age等于21的帅哥美女，所以搜寻时间就长了。

但哪个效率更高呢？

如果建立多个索引，如何选择使用哪个呢？
效率高低是分情况的，如果在没有限制的情况下，不用进行排序但需要搜索整个集合时间会远超过前者。但是在返回部分数据（比如limit（1000）），新的赢家就产生了。

   
>db.users.find({'age': {"$gte": 21, "$lte": 30}}).
sort({username': 1}).
limit(1000).
hint({'age': 1, 'username': 1})
explain()['millis']
2031ms
  
>db.users.find({'age': {"$gte": 21, "$lte": 30}}).
sort({username': 1}).
limit(1000).
hint({'username': 1, 'age': 1}).
explain()['millis']
181ms

其中可以使用hint指定要使用的索引。
所以这种方式还是很有优势的。比如一般场景下，我们不会把所有的数据都取出来，只是去查询最近的，所以这种效率也会更高。

索引类型

唯一索引

可以确保集合的每个文档的指定键都有唯一值。

db.users.ensureIndex({'username': 1, unique: 
true})

比如使用mongoose框架，在定义schema时，即可指定unique: true.
如果插入2个相同都叫张三的数据，第二次插入的则会失败。_id即为唯一索引，并且不能删除。

稀疏索引

使用sparse可以创建稀疏索引

>db.users.ensureIndex({'email': 1}, {'unique': true, 'sparse': 
true})

索引管理

system.indexes集合中包含了每个索引的详细信息

db.system.indexes.find()

1.ensureIndex()创建索引

db.users.ensureIndex({'username': 
1})

后台创建索引，这样数据库再创建索引的同时，仍然能够处理读写请求，可以指定background选项。

db.test.ensureIndex({"username":1},{"background":true})

2.getIndexes()查看索引

db.collectionName.getIndexes()
db.users.getIndexes()
[
  {
    "v" : 1,
    "key" : {
      "_id" : 1
    },
    "ns" : "test.users",
    "name" : "_id_"
  },
  {
    "v" : 1,
    "key" : {
      "username" : 1
    },
    "ns" : "test.users",
    "name" : "username_1"
  }
]

其中v字段只在内部使用，用于标识索引版本。

3.dropIndex删除索引

> db.users.dropIndex("username_1")
{ "nIndexesWas" : 2, "ok" : 1 }

或

全选复制放进笔记> db.users.dropIndex({"username":1})

以上内容就是MongoDB索引的使用详解，希望对大家有帮助。

How to use MongoDB indexes

Related articles