Home >Database >Mysql Tutorial >Analyze count(*) in MySQL. Is it really faster than count(1)?
Is count(*) in MySQL really faster than count(1)? The following article will compare count(*) and count(1) in MySQL to see their performance differences. I hope it will be helpful to you!
Someone told me today that count(1)
is faster than count(*)
in MySQL. Can I tolerate this? You have to have a break with him.
Statement: The following discussion is based on the InnoDB storage engine. I will talk about MyISAM separately at the end of the article because of the special situation. [Related recommendations: mysql video tutorial]
Let me talk about the conclusion first: There is not much difference in performance between the two.
I prepared a table with 1 million pieces of data. The table structure is as follows:
CREATE TABLE `user` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `username` varchar(255) DEFAULT NULL, `address` varchar(255) DEFAULT NULL, `password` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
As you can see, there is a primary key index.
Let’s count the number of records in the table in two ways, as follows:
It can be seen that the execution efficiency of the two SQLs is actually about the same. Both are 0.14s.
Let’s look at the other two statistics:
id is the primary key, username and address are ordinary fields.
It can be seen that using id for statistics has some advantages. Brother Song, because the test data sample is relatively small, the effect is not obvious. Friends can increase the amount of test data, then the difference will be more obvious.
So what exactly is the reason for this difference? Let’s briefly analyze it next.
Let’s first use explain to look at the different execution plans of these SQLs:
You can see , the execution plans of the first three statistical methods are the same, and the last two are the same.
I will compare the different items in the explain with you here:
Through explain, we can actually roughly see that the execution efficiency of the first three statistical methods is higher (because of the use of indexes), while the statistical efficiency of the latter two is relatively low. It is said to be lower (no index is used, full table scan is required).
The above analysis alone is not enough, let’s analyze it from a principle perspective.
3.1 Primary key index and ordinary index
Before starting the principle analysis, I would like to first lead Let's take a look at the B-tree, which plays an important role in understanding the following content.
As we all know, the storage structure of indexes in InnoDB is a B-tree (as for what a B-tree is and what is the difference between it and a B-tree, this article will not discuss this. Both of them can be combined separately. article), the storage of primary key index and ordinary index is different. The following figure shows the primary key index:
As you can see, in the primary key index, the leaf nodes are stored data for each row.
In an ordinary index, the leaf node stores the primary key value. When we use an ordinary index to search for data, we first find the primary key in the leaf node, and then take the primary key to the primary key index. Searching for data is equivalent to doing two searches, which is what we usually call the table return operation.
3.2 Principle Analysis
I don’t know if my friends have noticed that when we were learning MySQL, the count function was classified as an aggregate function. That category is avg, sum, etc. The count function is grouped with these, indicating that it is also an aggregate function.
Since it is an aggregate function, it is necessary to judge the returned result set line by line. This involves a question, what is the returned result? Let’s look at it separately:
For the query select count(1) from user;
, the InnoDB engine will find a minimum index tree to traverse (not necessarily the primary key index) , but the data will not be read, but a leaf node will be read, 1 will be returned, and the results will be accumulated.
For the query select count(id) from user;
, the InnoDB engine will traverse the entire primary key index, then read the id and return it, but because the id is the primary key, it is on the leaf of the B tree on the node, so this process does not involve random IO (it does not require operations such as returning to the table to get data from the data page), and the performance is also OK.
For the query select count(username) from user;
, the InnoDB engine will traverse the entire table to perform a full table scan, read the username field of each row and return it. If username If not null is set during definition, then the number of usernames will be counted directly; if username is not set as not null when defined, then first determine whether username is empty and then count.
Finally, let’s talk about select count(*) from user;
. The special thing about this SQL is that it has been optimized by MySQL. When MySQL sees count(*)
If you want to count the total number of records, you will find a minimum index tree to traverse, and then count the number of records.
Because the leaf nodes of the primary key index (clustered index) are data, while the leaf nodes of the ordinary index are the primary key values, the index tree of the ordinary index is smaller. However, in the above case, we only have the primary key index, so the primary key index is ultimately used.
Now, if I modify the above table and add an index to the username field, then we will look at the execution plan of explain select count(*) from user;
:
As you can see, the index used at this time is the username index, which is consistent with our previous analysis results.
We can see from the above description that the first query has the highest performance, the second is second (because it needs to read the id and returns), and the third is the worst (because it requires a full table scan) ), the query performance of the fourth one is close to that of the first one.
Some friends may know that the select count(*) from user;
operation in the MyISAM engine is very fast. That is because MyISAM directly stores the number of rows in the table. Once it is in the disk, it can be read directly when needed, so it is very fast.
The reason why the MyISAM engine does this is mainly because it does not support transactions, so its statistics are actually very easy, just add a row of records.
But our commonly used InnoDB cannot do this! Why? Because InnoDB supports transactions! In order to support transactions, InnoDB introduces MVCC multi-version concurrency control, so there may be problems such as dirty reads, phantom reads, and non-repeatable reads when reading data.
For details, please refer to: https: //www.bilibili.com/video/BV14L4y1B7mB
So, InnoDB needs to take out each row of data and determine whether the row of data is visible to the current session. If it is visible, count the row of data. Otherwise, it will not be counted.
Of course, MVCC in MySQL is actually a very grand topic. Brother Song will introduce MVCC to you in detail when he is free in the future.
Okay, friends, do you understand now? If you have any questions, please leave a message for discussion.
For more programming related knowledge, please visit: Programming Video! !
The above is the detailed content of Analyze count(*) in MySQL. Is it really faster than count(1)?. For more information, please follow other related articles on the PHP Chinese website!