mysql where uid in (2000个uid) 的优化或更好的存储方案

Question

有这样一个需求，要存储千万级的帖子，十万级的用户

然后

给定2000个用户ID，需要按照帖子发表时间，从这2000个用户发过的帖子中查出最新的10000个帖子

PS：极端情况

给定的2000各用户中，其中10个非常活跃，可能这最新10000个帖子只是这10个用户发表的

请教诸位大神，

这种情况该如何存储，如何查询

黄舟 · Answer

I’m just talking about the query, which is similar to a SQL I used recently. In a table with nearly a million data, I find the corresponding word through an article id (docId) (word is obtained based on word segmentation of the article title) , and then find all related article IDs based on these words. The initial approach was to select docId from tab1 where word in (select word from tab1 where docId=123) group by docId limit 1000;
Indexes were created for each field, and it took about 35 seconds after execution. Later, I changed to join query, which can reach about 50ms. After discussion, the problem is in in, which causes the index not to be used in the query. The optimized sql is: select docId from (select word from tab1 where docId=123) as t2 join tab1 t on t.word=t2.word where t2.word is not null GROUP BY docId limit 1000

Having said so much, I want to say that using in to query does not seem to be a good idea~~~

PHPz · Answer

Add a test result

Preliminary test results

select * from dynamics where uid in ({$uidStr}) order by created_at desc limit 10000

For the above query, the dynamics table records are about 3 million, and $uidStr is 2000

mysql(5.5)

About 0.19s (in the case of where in, the created_at index does not seem to be used)

mongodb

$cursor = $dyCollection->find(array("uid" => array('$in' => $selectedUid)))->sort(array("created_at"=>-1 ))->limit(10000);

After indexing created_at, it takes about 0.09s

In addition, if the insertion into mongodb is in chronological order, you don't actually need to sort it anymore. What you find out is the order of insertion at that time. However, if mysql where in is used, the results taken out seem to be sorted in ascending order according to the in field

Also

When the limit is reduced to less than 3000, mysql only takes 0.04s and mongodb only takes 0.02s. I don’t know how to optimize it