Java Basic TutorialThe column introduces how trillions of data should be migrated.
There is a very famous line in Xingye's "Westward Journey": "There was once a sincere love I didn't cherish the relationship before me, and regretted it when I lost it. The most painful thing in the world is this. If God could give me another chance, I would say three words to any girl: I love you. If I have to add a time limit to this love, I hope it will be ten thousand years!" In the eyes of our developers, this feeling is the same as the data in our database. We wish it would last ten thousand years. Not changing, but often backfires. As the company continues to develop and the business continues to change, our requirements for data are also constantly changing. There are probably the following situations:
In actual business development, we will make different migration plans according to different situations. Next, let’s discuss how to migrate data.
Data migration is not accomplished overnight. Each data migration takes a long time, which may be a week or several months. Generally speaking, we migrate data The process is basically similar to the picture below:' First, we need to batch migrate the existing data in our database, and then we need to process the new data. We need to write this part of the data to our new storage in real time after writing the original database. Here We need to continuously verify data during the process. When we verify that the basic problems are not serious, we then perform the stream cutting operation. After the stream is completely cut, we no longer need to perform data verification and incremental data migration.
First of all, let’s talk about how to do stock data migration. After searching for stock data migration in the open source community, we found that there is no easy-to-use tool. At present, Alibaba The cloud's DTS provides existing data migration. DTS supports migration between homogeneous and heterogeneous data sources, and basically supports common databases in the industry such as Mysql, Orcale, SQL Server, etc. DTS is more suitable for the first two scenarios we mentioned before. One is the scenario of sub-database. If you are using Alibaba Cloud's DRDS, you can directly migrate the data to DRDS through DTS. The other is the scenario of heterogeneous data, whether it is Redis, ES, and DTS all support direct migration.
So how to migrate DTS stock? In fact, it is relatively simple and probably consists of the following steps:
select * from table_name where id > curId and id < curId + 10000;复制代码
3. When the id is greater than or equal to maxId, the existing data migration task ends
Of course we may not use Alibaba Cloud during the actual migration process, or in our third scenario, we need to do a lot of conversions between database fields, and DTS does not support it, then we can imitate DTS's approach is to migrate data by reading data in batches in segments. What needs to be noted here is that when we migrate data in batches, we need to control the size and frequency of segments to prevent them from affecting the normal operation of our online operations.
The migration solutions for existing data are relatively limited, but incremental data migration methods are in full bloom. Generally speaking, we have the following methods:
With so many methods, which one should we use? Personally, I recommend the method of monitoring binlog. Monitoring binlog reduces development costs. We only need to implement the consumer logic, and the data can ensure consistency. Because it is a monitored binlog, there is no need to worry about the previous double writing being different. business issues.
All the solutions mentioned above, although many of them are mature cloud services (dts) or middleware (canal), may cause some data loss. Data loss is relatively rare overall, but it is very difficult to troubleshoot. It may be that the dts or canal accidentally shook, or the data was accidentally lost when receiving data. Since we have no way to prevent our data from being lost during the migration process, we should correct it by other means.
Generally speaking, when we migrate data, there will be a step of data verification, but different teams may choose different data verification solutions:
Of course, we also need to pay attention to the following points during the actual development process:
When our data verification basically has no errors, it means that our migration program is relatively stable, then we can use our new data directly ? Of course it's not possible. If we switch it all at once, it will be great if it goes well. But if something goes wrong, it will affect all users.
So we need to perform grayscale next, which is stream cutting. The dimensions of different business flow cuts will be different. For user dimension flow cuts, we usually use the modulo method of userId to cut flows. For tenant or merchant dimension businesses, we need to take the modulo of the tenant id. way to cut the flow. For this traffic cutting, you need to make a traffic cutting plan, in what time period, how much traffic to release, and when cutting traffic, you must choose a time when the traffic is relatively small. Every time you cut traffic, you need to make detailed observations of the logs. , fix problems as soon as possible. The process of releasing traffic is a process from slow to fast. For example, at the beginning, it is continuously superimposed at 1%. Later, we directly use 10% or 20% to quickly Increase the volume. Because if there is a problem, it will often be discovered when the traffic is small. If there is no problem with the small traffic, then the volume can be increased quickly.
In the process of migrating data, special attention should be paid to the primary key ID. In the double-writing solution above, it is also mentioned that the primary key ID needs to be double-written manually. Specify to prevent ID generation sequence errors.
If we are migrating because of sub-databases and tables, we need to consider that our future primary key ID cannot be an auto-increment ID, and we need to use distributed IDs. The more recommended one here is Meituan’s open source leaf. It supports two modes. One is the snowflake algorithm trend increasing, but all ids are Long type, which is suitable for some applications that support Long as id. There is also a number segment mode, which will continue to increase from above based on a basic ID you set. And basically all use memory generation, and the performance is also very fast.
Of course, we still have a situation where we need to migrate the system. The primary key id of the previous system already exists in the new system, so our id needs to be mapped. If we already know which systems will be migrated in the future when migrating the system, we can use the reservation method. For example, the current data of system A is 100 million to 100 million, and the data of system B is also 100 million to 100 million. We Now we need to merge the two systems A and B into a new system, then we can slightly estimate some Buffer, for example, leave 100 to 150 million for system A, so that A does not need to be mapped, and system B is 150 million to 300 million. Then when we convert to the old system ID, we need to subtract 150 million. Finally, the new ID of our new system will increase from 300 million. But what if there is no planned reserved segment in the system? You can do this in the following two ways:
Finally, let’s briefly summarize this routine. It is actually four steps. One note: stock, increment, verification, and cut. Stream, finally pay attention to the id. No matter how large the amount of data is, basically there will be no big problems when migrating according to this routine. I hope this article can help you in your subsequent data migration work.
If you think this article is helpful to you, your attention and forwarding are the greatest support for me, O(∩_∩)O:
Related free learning recommendations:java basic tutorial
The above is the detailed content of How trillions of data should be migrated. For more information, please follow other related articles on the PHP Chinese website!