The unique system ID is a problem we often encounter when designing a system, and we often struggle with this problem. There are many ways to generate IDs, adapting to different scenarios, needs and performance requirements. Therefore, some more complex systems will have multiple ID generation strategies. Here are some common ID generation strategies.
1. Database self-increasing sequence or field
The most common way. Using the database, the entire database is unique.
Advantages:
-
Simple, convenient code, and acceptable performance.
-
Numeric IDs are naturally sorted, which is helpful for paging or results that need to be sorted.
Disadvantages:
-
# Different database syntax and implementation are different, when database migration or when multiple database versions are supported Needs to be processed.
-
In the case of a single database or read-write separation or one master and multiple slaves, there is only one master database can be generated. There is a risk of a single point of failure.
-
It is difficult to expand when the performance cannot meet the requirements.
-
If you encounter multiple systems that need to be merged or data migration is involved, it will be quite painful.
-
There will be trouble when dividing tables and databases.
Optimization plan:
-
For the main database single point, if there are multiple Master databases, each Master The starting number set by the library is different, but the step size is the same, which can be the number of Masters. For example: Master1 generates 1, 4, 7, 10, Master2 generates 2,5,8,11, Master3 generates 3,6,9,12. This can effectively generate unique IDs in the cluster, and can also greatly reduce the load of ID generation database operations.
2. UUID common method.
It can be generated using a database or a program, and is generally unique in the world.
Advantages:
-
Simple and convenient code.
-
The ID generation performance is very good and there will be basically no performance problems.
-
The only one in the world. In the case of data migration, system data merging, or database changes, you can Take it in stride.
Disadvantages:
-
There is no sorting, and the trend cannot be guaranteed to increase.
-
UUID is often stored using strings, and the query efficiency is relatively low.
-
The storage space is relatively large. If it is a massive database, you need to consider the storage amount.
-
Transfer large amount of data
-
is not readable.
3. Redis generates ID
When the performance of using the database to generate ID is not enough, we can try to use Redis to generate ID. This mainly relies on Redis being single-threaded, so it can also be used to generate globally unique IDs. This can be achieved using Redis's atomic operations INCR and INCRBY.
You can use Redis cluster to obtain higher throughput. Suppose there are 5 Redis in a cluster. The values of each Redis can be initialized to 1, 2, 3, 4, 5 respectively, and then the step size is all 5. The IDs generated by each Redis are:
A: 1,6,11,16,21 B: 2,7,12,17,22 C: 3,8,13,18,23 D: 4, 9,14,19,24 E: 5,10,15,20,25
This can be determined by whichever machine it is loaded to. It will be difficult to modify in the future. However, 3-5 servers can basically satisfy the needs of the server, and they can all obtain different IDs. But the step size and initial value must be required in advance. Using Redis cluster can also solve the problem of single point of failure.
In addition, it is more suitable to use Redis to generate serial numbers starting from 0 every day. For example, order number = date, and the number will increase automatically on that day. You can generate a Key in Redis every day and use INCR for accumulation.
Advantages:
-
## does not depend on the database, is flexible and convenient, and has better performance than the database.
-
Numeric IDs are naturally sorted, which is helpful for paging or results that need to be sorted.
Disadvantages:
-
If there is no Redis in the system, new components need to be introduced, increasing the system complexity.
-
The workload required for coding and configuration is relatively large.
4. Twitter’s snowflake algorithm
Snowflake is Twitter’s open source distributed ID generation algorithm, and the result is a long ID. The core idea is to use 41 bits as the number of milliseconds, 10 bits as the machine ID (5 bits are the data center, 5 bits the machine ID), and 12 bits as the serial number within milliseconds (meaning that each node can generate 4096 IDs), and there is a sign bit at the end, which is always 0. The specific implementation code can be found at: https://github.com/twitter/snowflake
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | public class IdWorker {
private final long twepoch = 1420041600000L;
private final long workerIdBits = 5L;
private final long datacenterIdBits = 5L;
private final long maxWorkerId = -1L ^ (-1L << workerIdBits);
private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
private final long sequenceBits = 12L;
private final long workerIdShift = sequenceBits;
private final long datacenterIdShift = sequenceBits + workerIdBits;
private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
private final long sequenceMask = -1L ^ (-1L << sequenceBits);
private long workerId;
private long datacenterId;
private long sequence = 0L;
private long lastTimestamp = -1L;
public IdWorker(long workerId, long datacenterId) {
if (workerId > maxWorkerId || workerId < 0) {
throw new IllegalArgumentException(String.format( "worker Id can't be greater than %d or less than 0" , maxWorkerId));
}
if (datacenterId > maxDatacenterId || datacenterId < 0) {
throw new IllegalArgumentException(String.format( "datacenter Id can't be greater than %d or less than 0" , maxDatacenterId));
}
this .workerId = workerId;
this .datacenterId = datacenterId;
}
public synchronized long nextId() {
long timestamp = timeGen();
if (timestamp < lastTimestamp) {
throw new RuntimeException(
String.format( "Clock moved backwards. Refusing to generate id for %d milliseconds" , lastTimestamp - timestamp));
}
if (lastTimestamp == timestamp) {
sequence = (sequence + 1) & sequenceMask;
if (sequence == 0) {
timestamp = tilNextMillis(lastTimestamp);
}
}
else {
sequence = 0L;
}
lastTimestamp = timestamp;
return ((timestamp - twepoch) << timestampLeftShift)
| (datacenterId << datacenterIdShift)
| (workerId << workerIdShift)
| sequence;
}
protected long tilNextMillis(long lastTimestamp) {
long timestamp = timeGen();
while (timestamp <= lastTimestamp) {
timestamp = timeGen();
}
return timestamp;
}
protected long timeGen() {
return System.currentTimeMillis();
}
public static void main(String[] args) {
IdWorker idWorker = new IdWorker(0, 0);
for (int i = 0; i < 1000; i++) {
long id = idWorker.nextId();
System.out.println(Long.toBinaryString(id));
System.out.println(id);
}
}}
|
Copy after login
snowflake algorithm can be modified according to the needs of your own project. For example, estimate the number of future data centers, the number of machines in each data center, and the number of possible concurrencies in a unified millisecond to adjust the number of bits required in the algorithm.
Advantages:
-
## does not depend on the database, is flexible and convenient, and has better performance than the database.
-
ID is incremented on a single machine according to time.
Disadvantages:
-
is incremental on a single machine, but since it involves a distributed environment, each machine The clocks on the clock cannot be completely synchronized, and sometimes there may be situations where the global increment is not achieved.
5. Use zookeeper to generate unique ID
zookeeper mainly generates serial numbers through its znode data version. It can generate 32-bit and 64-bit data version numbers. Customers The client can use this version number as a unique serial number.
Zookeeper is rarely used to generate unique IDs. Mainly because it relies on zookeeper and calls the API in multiple steps. If competition is large, you need to consider using distributed locks. Therefore, the performance is not ideal in a highly concurrent distributed environment.
6. MongoDB’s ObjectId
MongoDB’s ObjectId is similar to the snowflake algorithm. It is designed to be lightweight, and different machines can easily generate it using the same globally unique method. MongoDB was designed from the beginning as a distributed database, and handling multiple nodes is a core requirement. Making it much easier to generate in a sharded environment. The format is as follows: [src/main/resources/objectId.png] Write the picture description here:
The first 4 bytes are the timestamp starting from the standard epoch, unit is seconds. The timestamp, combined with the following 5 bytes, provides second-level uniqueness. Since the timestamp comes first, this means that the ObjectIds will be sorted roughly in the order they were inserted. This is useful for things like using it as an index to improve efficiency. These 4 bytes also imply the time when the document was created. Most client libraries will expose a method to obtain this information from the ObjectId. The next 3 bytes are the unique identifier of the host. Typically a hash of the machine's hostname. This ensures that different hosts generate different ObjectIds without conflict. To ensure that the ObjectId generated by multiple concurrent processes on the same machine is unique, the next two bytes come from the process identifier (PID) that generated the ObjectId. The first 9 bytes ensure that the ObjectId generated by different processes on different machines in the same second is unique. The last 3 bytes are an automatically increasing counter to ensure that the ObjectId generated by the same process in the same second is also different. Each process is allowed to have up to 2563 (16 777 216) different ObjectIds in the same second.
Related recommendations:
php news release management system development example
PHP development simple news release system tutorial
The above is the detailed content of Summary of unique ID generation solutions for distributed systems. For more information, please follow other related articles on the PHP Chinese website!