Solution to generate unique database ID in distributed situation-Mysql Tutorial-php.cn

Solution to generate unique database ID in distributed situation

伊谢尔伦

Release： 2016-11-21 14:25:56

Original

1327 people have browsed it

ID, as the unique identifier of the business, is often seen in data design, for example:

•Product - product_id

•Order - order_id

•Message - message_id

These identifiers are often the primary keys of the database, and MySQL will The primary key is to create a clustered index, which directly points to the data address. Compared with the ordinary index pointing to the clustered index, it reduces one index query and is very fast. Businesses such as messages and orders generally have the need to query data in reverse chronological order. One way is to create an index on the time column, and even better is to rely on the insertion order of the ID itself. Therefore, distributed ID needs to meet two core conditions:

• Globally unique

• Time trend orderly

Some people may say, wouldn’t it be enough to just use MySQL’s auto_increment directly? In the early days of starting a business, I would also choose this solution. It is simple, efficient and fast - startups still have to iterate quickly and produce products as soon as possible, and products change frequently. The awesome architecture that takes too much time to develop may not be useful. Yes, valuable time was wasted. However, there are some problems with this solution:

• Affects parallel insertion - record B depends on the primary key of record A. You need to wait until record A is inserted successfully and get A.id before you can insert record B

• Data recovery is difficult - —After the data is accidentally deleted or lost, since there is no ID in the log, the data correlation cannot be directly determined

• Impact on database and table sharding—Since the ID is not known until it is inserted, database and table sharding cannot be performed based on the primary key of the business

Therefore, after the business is stable, you must take time to pay off early technical debt.

Common solutions

Use the auto_increment of the database to generate a unique ID

Advantages

•Simple, using existing functions, small development effort

•Fixed ID step size

Disadvantages

•Single point of writing, not high Available

• Even if multiple main libraries are expanded according to different auto_increment starting points, although the availability is improved, the strict order of IDs cannot be guaranteed

• The database needs to be accessed every time, and it is easy to reach the performance ceiling

Pulling IDs in batches, Allocate one by one

This solution also stores the ID data in the database. The ID service pulls N IDs from the database each time and updates the current maximum ID value to the original data + N. The ID service receives the ID each time When a request is generated, these N IDs are returned in sequence.

Advantages

•Batch acquisition, no need to access the database every time, low database pressure

Disadvantages

•The entire service is still a single point

•Service downtime and restart will cause ID discontinuity

•Cannot be horizontally expanded

Improvements

Add a set of backup services. If the main service fails and drifts to the backup service, you can use vip + keepalived or add a proxy.

uuid

Advantages

•Locally generated ID, no single point problem, no performance bottleneck

Disadvantages

•Cannot guarantee incremental order

•Length is too long, low performance as a primary key

Snowflake-like algorithm

Snowflake is Twitter's open source distributed ID generation algorithm. Its core idea is: a long ID, using 41 bits as the number of milliseconds, 10 bits as the machine number, and 12 bits as the sequence number within the millisecond. This algorithm can theoretically generate up to 1000*(2^12), or 400W IDs per second on a single machine, which can fully meet business needs.

Learning from snowflake’s ideas and combining the business logic and concurrency of each company, you can implement your own distributed ID generation algorithm.

Advantages

•Time is at a high level, the trend is increasing

•Simple to implement, does not rely on other services, easy to expand

Disadvantages

•There is no global clock, a single machine is absolutely in order, but from the perspective of the entire cluster, the trend is Sequential

Notes

•Since ID is often used as the identifier of the sub-database and sub-table, these IDs need to have a certain degree of randomness so that the data after the sub-database will not be uneven. The sequence number can be different at the beginning of each millisecond. Starting from 1, second is starting from any one from 0-9