java - 在多线程处理表数据的时候，怎么防止数据被重复处理？

Question

悲剧啊，，，
侦听卡死了，数据积存了13天，上千万数据。
现有程序处理不过来，需要写一个多线程来处理。

如题，该如何加锁？希望各位前辈能给一点建议。

顺便说一下，需要处理的表有一个状态字段，用于标识处理与否。

伊谢尔伦 · Answer

Actually, this is not a locking issue, but a data distribution issue. Locking is to prevent dirty data from being generated under high concurrency, and you actually want the data that has been processed or has been obtained by other threads not to be processed again, right?

How to distribute data and improve cluster (or multi-thread) processing efficiency should be considered in conjunction with your data model.

For example, if the data ID being processed has a numeric identifier and you currently have 10 machines or 10 threads, then each of these 10 machines can read 1/10 of the data. This can be done by taking the remainder (%10) arrive. For example, the first machine reads the data of the ID of i%10==1, the second machine reads the data of the ID of i%10==2, and so on.

黄舟 · Answer

You can consider using a queue to try it. Scan the entire table and put the data waiting to be processed into the queue (single thread), and then consume it in multiple threads. Since dequeuing itself is atomic, repeated reads can be prevented and performance is guaranteed (especially redis). In addition, if single-threaded table scanning on the production side is not enough, multi-threaded modulo reading of data can be considered and put into the queue.

迷茫 · Answer

I don’t understand. You can’t get tens of millions of data into the cache at once!!!!

I saw again that you said that every piece of data in the table has a mark.<--Just look at this mark before modifying it.

When changing this data, you only need to ensure the operation of the relevant data bean of this data on the Java side (it may be one step or multiple steps)
Just make it atomic.<--After you change it, if other threads obtain the data, it will be what you changed.

For update operations with large amounts of data, it is faster to use jdbc batch operations.

PHP中文网 · Answer

Use queue RabbitMQ, producer-consumer model