I’m just talking about the query, which is similar to a SQL I used recently. In a table with nearly a million data, I find the corresponding word through an article id (docId) (word is obtained based on word segmentation of the article title) , and then find all related article IDs based on these words. The initial approach was to select docId from tab1 where word in (select word from tab1 where docId=123) group by docId limit 1000;
Indexes were created for each field, and it took about 35 seconds after execution. Later, I changed to join query, which can reach about 50ms. After discussion, the problem is in in, which causes the index not to be used in the query. The optimized sql is: select docId from (select word from tab1 where docId=123) as t2 join tab1 t on t.word=t2.word where t2.word is not null GROUP BY docId limit 1000
Having said so much, I want to say that using in to query does not seem to be a good idea~~~
In addition, if the insertion into mongodb is in chronological order, you don't actually need to sort it anymore. What you find out is the order of insertion at that time. However, if mysql where in is used, the results taken out seem to be sorted in ascending order according to the in field
Also
When the limit is reduced to less than 3000, mysql only takes 0.04s and mongodb only takes 0.02s. I don’t know how to optimize it
I’m just talking about the query, which is similar to a SQL I used recently. In a table with nearly a million data, I find the corresponding word through an article id (docId) (word is obtained based on word segmentation of the article title) , and then find all related article IDs based on these words. The initial approach was to select docId from tab1 where word in (select word from tab1 where docId=123) group by docId limit 1000;
Indexes were created for each field, and it took about 35 seconds after execution. Later, I changed to join query, which can reach about 50ms. After discussion, the problem is in in, which causes the index not to be used in the query. The optimized sql is: select docId from (select word from tab1 where docId=123) as t2 join tab1 t on t.word=t2.word where t2.word is not null GROUP BY docId limit 1000
Having said so much, I want to say that using in to query does not seem to be a good idea~~~
Add a test result
Preliminary test results
select * from dynamics where uid in ({$uidStr}) order by
created_at
desc limit 10000For the above query, the dynamics table records are about 3 million, and $uidStr is 2000
mysql(5.5)
About 0.19s (in the case of where in, the created_at index does not seem to be used)
mongodb
$cursor = $dyCollection->find(array("uid" => array('$in' => $selectedUid)))->sort(array("created_at"=>-1 ))->limit(10000);
After indexing created_at, it takes about 0.09s
In addition, if the insertion into mongodb is in chronological order, you don't actually need to sort it anymore. What you find out is the order of insertion at that time. However, if mysql where in is used, the results taken out seem to be sorted in ascending order according to the in field
Also
When the limit is reduced to less than 3000, mysql only takes 0.04s and mongodb only takes 0.02s. I don’t know how to optimize it