Is there any benefit to establishing multiple database connections for SQL inserts?

Question

I'm writing a project related to massive data acquisition. Currently I'm using .NETFramework4.8 and the Mysql package to initiate connections and insert data to the database server. I will be inserting around 400,000 rows/second. I'm worried that the SQL connection may become the bottleneck of my program. I want to know if I use sql to create multi-threaded connection and use consumer queue to insert data, will it be faster and worth it (pros and cons)? In my gut it would be faster, but I'm not sure how much performance it would provide in terms of thread overhead. I'm not a SQL expert, so if anyone

P粉373596828 · Answer

Rumors, opinions, hearsay, facts, version-related benchmarks, some personal experience, etc...

Multiple threads can improve throughput, but there are limitations:

The throughput upper limit is approximately half of the theoretical limit. (your "percentage") (This is a benchmark based on the multithreading package; I forget the name; that was ten years ago.)
Multiple threads will compete with each other on mutexes and other necessary locking mechanisms.
Starting around 5.7, 64 threads is the multi-threading limit for MySQL; beyond this value, throughput will stagnate or even decrease. (Source: Many Oracle benchmarks boast that one version is significantly better than the previous one.) (Meanwhile, per-thread latency is through the roof.)
If possible, each thread should process data in batches.

Batch processing:

LOAD DATA is the fastest way to INSERT a large number of rows at once from a single thread. However, if you include the cost of writing the file to LOAD, it may make it slower than a bulk insert.
BULK INSERT follows. But it's capped at "hundreds" of rows when a certain limit or "diminishing returns" is reached.
Batch inserts are 10 times faster than inserting one row per INSERT query. Therefore, it (or LOAD DATA) is worth using for high-speed ingestion. (Source: Many different timed tests.)

Data Sources:

Some data sources must transmit only one row at a time (e.g., sensor data from a vehicle every N seconds). This requires some middle layer to batch process the data.
Discussion of collecting data: http://mysql.rjweb.org/doc.php /staging_table

What happens after loading the data? Of course, this is not a write-only table.

Normalization is useful for shrinking the disk footprint; best done in batches. SeeStandardization
PARTITIONing Rarely useful, except to eventually clear out old data. SeePartition
Huge "fact" tables are difficult to search; consider building summary data when ingesting: Summary table
You can even do the above processing and then throw away the original data. It sounds like you might be getting a terabyte of data per day.

Php8, I'm coming too

Learn website layout in 30 minutes

Shangguan Oracle Beginner to Proficient Video Tutorial

Your first line of UNI-APP code

Flutter from scratch to app launch

Brother Lian New Linux Video Tutorial

AXURE 9 Video Tutorial (Suitable for Product Manager Interactive Product Design UI)

Zero Basic Proficiency PS Video Tutorial

16 day UI video tutorial to get you started

PS Techniques and Slicing Techniques Video Tutorial

Alibaba Cloud Environment Construction and Project Launch Video Tutorial

Overview of Computer Networks - Basic Knowledge that Programmers Must Master

Essential Tutorial for Programmers - HTTP Protocol Explanation

Websocket Video Tutorial