Database primary key ID generation strategy-Mysql Tutorial-php.cn

Foreword:

Unique system ID is a problem we often encounter when designing a system. Here are some common ID generation strategies.

● Sequence ID

● UUID

● GUID

● COMB

● Snowflake

First Auto-increment ID In order to meet the needs of separate databases, different starting points will be used under the premise of auto-increment. However, it is extremely troublesome when database expansion is required. For example, when we first design the database of a certain system, there will be 10 tables in the database. Then we need different IDs for the content of each table. We can use different non-increasing forms, for example, The first table is 1, 11, 21, 31. . . The second table is 2, 12, 22, 32. . . The third table is 3, 13, 23, 33. . . The tenth table is 10, 20, 30. . . But the problem is, if one day I find that the 10 tables in this system are no longer enough, and I want to add another table, how should the primary keys be allocated at this time? In addition, if you want to merge data from multiple databases, but for this simple method of generating IDs, the possibility of duplication is very high, so duplication will almost certainly occur. Obviously, the scalability of the previous method will be poor.

Compared with auto-incrementing IDs, UUIDs are more convenient to generate unique primary keys (when the amount of data is very large, there is a possibility of duplication), but due to the disordered nature of UUIDs, the performance is not as good as auto-incrementing IDs and string storage. , large storage space and low query efficiency. Key: The disadvantage of using uuid is low query efficiency!

COMB compared to UUID, increases the orderliness of generated IDs, and improves the efficiency of insertion and query. This article has a simple analysis.

Sonwflake is Twitter’s primary key generation strategy, which can be seen as an improvement of COMB, using a 64-bit long integer instead of a 128-bit string. The composition of the ID: the first 0, a 41-bit time prefix, a 10-bit node identifier, a 12-bit sequence number to avoid concurrency.

Part 1: Sequence ID

The most common way is for the database to grow its own sequence or field. It is maintained by the database and is unique to the database.

Advantages:

Simple, convenient code, acceptable performance.

Number ID is naturally sorted, which is very helpful for paging or results that need to be sorted.

Disadvantages:

Different databases have different syntax and implementation, which needs to be processed during database migration or when supporting multiple database versions.

In the case of a single database or read-write separation or one master and multiple slaves, only one master database can be generated. There is a risk of a single point of failure.

It is difficult to expand when the performance cannot meet the requirements.

If you encounter multiple systems that need to be merged or data migration is involved, it will be quite painful.

There will be trouble when dividing tables and databases.

Optimization plan:

For the single point of the main library, if there are multiple Master libraries, the starting number set for each Master library is different and the step size is the same , which can be the number of Masters.

For example: Master1 generates 1, 4, 7, 10, Master2 generates 2,5,8,11, and Master3 generates 3,6,9,12. This effectively generates unique IDs in the cluster and greatly reduces the load on the ID generation database operations.

Part 2: UUID

npm Management https://www.npmjs.com/package/uuid

Common ways , 128 bits. It can be generated using a database or a program, and is generally unique globally.

UUID is a 128-bit globally unique identifier, usually represented by a 32-byte string. It can ensure the uniqueness of time and space, also called GUID, the full name is: UUID - Universally Unique IDentifier, called UUID in Python.

It ensures the uniqueness of the generated ID through MAC address, timestamp, namespace, random number, and pseudo-random number.

UUID mainly has five algorithms, that is, five methods to implement it.

(1), uuid1()

——Based on timestamp. Generated from MAC address, current timestamp, and random number. Global uniqueness can be guaranteed, but the use of MAC also brings security issues. IP can be used instead of MAC in the local area network.

(2), uuid2()

Based on the distributed computing environment DCE (this function does not exist in Python). The algorithm is the same as uuid1, except that the first 4 positions of the timestamp are replaced with POSIX UID. This method is rarely used in practice.

(3), uuid3()

Name-based MD5 hash value. It is obtained by calculating the MD5 hash value of the name and namespace, ensuring the uniqueness of different names in the same namespace and the uniqueness of different namespaces, but the same name in the same namespace generates the same uuid.

(4), uuid4()

Based on random numbers. Obtained from pseudo-random numbers, there is a certain probability of repetition, and this probability can be calculated.

(5), uuid5()

Name-based SHA-1 hash value. The algorithm is the same as uuid3, except that the Secure Hash Algorithm 1 algorithm is used.

Advantages:

Simple and convenient code.

The only one in the world, it can handle it calmly when encountering data migration, system data merger, or database changes.

Disadvantages:

There is no sorting, and the trend cannot be guaranteed to increase.

UUID is often stored using strings, and the query efficiency is relatively low.

The storage space is relatively large. If it is a massive database, you need to consider the storage amount.

The amount of transmitted data is large

Unreadable.

Optimization solution:

In order to solve the problem that the UUID is unreadable, you can use the UUID to Int64 method.

Part 3: GUID

GUID: It is Microsoft’s implementation of the UUID standard. There are various other implementations of UUID, not just GUID. The advantages and disadvantages are the same as UUID.

Part 4: COMB

COMB (combine) type is a design idea unique to the database and can be understood as an improved GUID, which combines GUID and system time to provide better performance in indexing and retrieval.

There is no COMB type in the database, it was designed by Jimmy Nilsson in his article "The Cost of GUIDs as Primary Keys". \

The basic design idea of the COMB data type is as follows: Since the UniqueIdentifier data has low index efficiency due to its irregularity, which affects the performance of the system, can we retain the uniqueness of the UniqueIdentifier through combination? The first 10 bytes are used, and the last 6 bytes are used to represent the time when the GUID was generated (DateTime). In this way, we combine the time information with the UniqueIdentifier, which increases the orderliness while retaining the uniqueness of the UniqueIdentifier, thereby improving the index. efficiency.

Advantages:

Solve the problem of UUID disorder and provide Comb algorithm (combined guid/timestamp) in its primary key generation method. Reserve 10 bytes of the GUID and use the other 6 bytes to represent the time when the GUID was generated (DateTime).

Performance is better than UUID.

Part 5: Twitter’s snowflake algorithm

Snowflake is Twitter’s open source distributed ID generation algorithm, and the result is a long ID. The core idea is: use 41 bits as the number of milliseconds, 10 bits as the machine ID (5 bits are the data center, 5 bits the machine ID), and 12 bits as the serial number within milliseconds (meaning that each node can generate 4096 IDs), and there is a sign bit at the end, which is always 0. The snowflake algorithm can be modified according to the needs of your own project. For example, estimate the number of future data centers, the number of machines in each data center, and the number of possible concurrencies in a unified millisecond to adjust the number of bits required in the algorithm.

Advantages:

Does not depend on the database, is flexible and convenient, and has better performance than the database.

ID is incremented on a single machine according to time.

Disadvantages:

is incremental on a single machine, but due to the distributed environment, the clocks on each machine cannot be completely synchronized, and maybe sometimes There may be situations where global increment is not achieved.

6. Use

This is really convenient to use:

npm install uuid --save

Copy after login

Then you can use it!

const uuidv1 = require(‘uuid/v1‘); console.log(‘随机uuid字符串‘, uuidv1());

Copy after login

In this way, we can print out the uuid string. It's different every time.

The above is the detailed content of Database primary key ID generation strategy. For more information, please follow other related articles on the PHP Chinese website!