Composite full-text index in MySQL-PHP Chinese Network Q&A

I want the system to allow searching user messages by specific users. Suppose there is the following table

create table messages( user_id int, message nvarchar(500));

If I want to search all messages from user 1 that contain the word "foo", what index should I use here.

Simple, non-unique indexuser_id

It will filter only specific user messages and then fully scan for specific words.
MessageFull textIndex

This finds all messages for all users and then filters by ID, which seems inefficient when the user volume is large.
Compositeindex ofuser_idandmessage

Therefore, the full-text index tree is created individually for each user and can therefore be searched individually. During the query, the system filters the messages byIDand then performs a text search on the remaining rows in the index.

as far as I know. The last item is impossible. So I assume I should use the first option, will it perform better if there are a few thousand users?

Wouldn't a full iteration cost too many resources if there were about 100 messages each?

Maybe I could include the username in the message and use the BOOLEAN full text search mode, but I think that would be slower than using the indexuser_id.

reply all (2)

P粉4211197782023-11-08 15:18:58 2 floor

You should add a full-text index onmessageand a regular index onuser_id, and use the following query:

SELECT * FROM messages WHERE MATCH(message) AGAINST(@search_query) AND user_id = @user_id;

You are right, you cannot do option 3. But instead of trying to choose between 1 and 2, let MySQL do the work for you. MySQL will only use one of the two indexes and do a linear scan to complete the second filtering, but it will estimate the effectiveness of each index and choose the best one.

NOTE: Only do this if you can afford the overhead of two indexes (slower inserts/updates/deletions). Also, if youknow that there will only be a few messages per user, it might make sense to use a simple index and do a regex or something similar in the application layer.

Like+0

Add Reply

P粉0769873862023-11-08 12:16:14 1 floor

@Alden Quimby's answer is correct on its own, but there's more to the story, as MySQL will onlytryto choose the best index, rather than its ability to make that decision is limited because of the way the full-text index interacts with the optimizer.

What actually happened is this:

If the specified user_id exists in 0 or 1 matching rows in the table, the optimizer will be aware of this and select user_id as the index for this query. Execute quickly.

Otherwise, the optimizer will select the full-text index, filtering each row that matches the full-text index to eliminate rows that do not contain a user_id that matches the WHERE clause. Not so fast.

So this is not really the "best" path. It's more like full text, with a nice optimization that avoids doing a full text search when we know there's almost nothing of interest in the table.

The reason this occurs is that the full-text index does not provide any meaningful statistics to the optimizer. It just says "yeah, I think the query might only require me to check 1 row"... Of course, this greatly satisfies the optimizer, so the full-text index wins the bid with lowest cost, unless the index has an integer value too Relatively low or lower.

That doesn't mean I won't try this first, though.

There is another option, best suited for full text queriesIN BOOLEAN MODE, and that is to create another column that you can populate with CONCAT('user_id_',user_id) or similar, and then Declare a 2-column full-text index.

filter_string VARCHAR(48) # populated with CONCAT('user_id_',user_id); .... FULLTEXT KEY (message,filter_string)

Then specify everything in the query.

SELECT ... WHERE user_id = 500 AND MATCH (message,filter_string) AGAINST ('+kittens +puppies +user_id_500' IN BOOLEAN MODE);

Now the full text index will be responsible for matching only those rows where kittens, puppies and "user_id_500" appear in the combined full text index of both columns, but you still want to have an integer filter in there too to ensure that the final result is restricted despite the message "user_id_500" appears randomly.