Can PHP and MariaDB provide a more efficient solution for performing multiple inserts?
P粉423694341
P粉423694341 2024-01-16 20:14:35
0
2
505

So I'm trying to build a small database to track the high scores of the pinball machines in our league. I have a table of users that only has an AI id column and a column containing their email address. Then I have a game table which is an AI id column, and the name of the machine. Since it's a many-to-many relationship, I created a third table called "scores" that contains user_id, game_id, and Score as columns.

Edit: Include reading code:

$file = fopen('scores.txt', 'r') or die("Unable to open file.");

// Loop through the file line by line
$line_number = 1;
while (($line = fgets($file)) !== false) {

    // Reset flags
    $email_exists = 0;
    $game_exists = 0;
    if (isset($email_id)) unset($email_id);
    if (isset($game_id)) unset($game_id);
    echo ($line_number . " ");

    // Split the line into components
    $line = rtrim($line);
    $array = explode(",", $line, 3);
    $email = strtolower($array[0]);
    $game = $array[1];
    $score = $array[2];
    $stmt = $db->prepare ("SELECT email_id FROM users WHERE email_address = ?");
    $stmt->execute(array($email))
        if ($stmt->rowCount() < 1) {
        $stmt = $db->prepare("INSERT INTO users (email_address) VALUES (?)");
        $stmt->execute(array($email))
        $email_id = $db->lastInsertId();
    } else {
        $row = $stmt->fetch();
        $email_id = $row['email_id'];
        $email_exists = 1;
    }

I use similar code to check if the game is already listed in the database. I need to get the ID number of the game and the third part's email address, the third part is to see if the user already has a score for that game and if they already have a score, if the new score is higher.

    if ($email_exists == 0 || $game_exists == 0) {
        // New user or game added to DB - no prior score can exist
        $stmt = $db->prepare("INSERT INTO scores VALUES (?,?,?)");
        $stmt->execute(array($email_id,$game_id,$score));
    } else {
        $stmt = $db->prepare("SELECT * FROM scores WHERE email_id = ? AND game_id = ?");
        $stmt->execute(array($email_id,$game_id));
        if ($stmt->rowCount() == 0) {
            // No previous score for this game
            $stmt = $db->prepare("INSERT INTO scores VALUES (?,?,?)");
            $stmt->execute(array($email_id,$game_id,$score));
        } else {
            // Previous score exists
            $row = $stmt->fetch();
            if ($score > $row['score']) {
                // New score is higher
                $stmt = $db->prepare("UPDATE scores SET score = ? " .
                    . "WHERE email_id = ? AND game_id =?");
                $stmt->execute(array($score, $email_id, $game_id));
                // No action needed if new score is lower
            }
        }
    }

The code appears to be working fine, but is very slow. Also, it seems to cause the script to timeout or something happen after a few thousand records. Is there a better way to do this job?

I tried recoding it in Python but it was even slower and didn't even seem to insert the row into the database. I barely know Python, which probably doesn't help.

I was thinking of creating an array and storing the items that need to be inserted and then inserting 100 rows at a time or something like that, but I need to get the id of the score join table. I'm also considering using a UNIQUE constraint in the database and trying to figure out how to rewrite the insert code to use it to prevent duplicate email addresses or games.

P粉423694341
P粉423694341

reply all(2)
P粉442576165

There is still a lot of room for improvement here. When it comes to database speed, your main goal should generally be to reduce the number of hits to the database server.

First, you'll want to perform an email-to-ID query on each CSV row, but this is not necessary. At most, you should do it once per user and cache it. Even better, you can do it once for the entire collection, reading the entire contents into a memory array. Something like this:

$stmt = $db->prepare('SELECT email_address, email_id FROM users');
$idMap = array_column($stmt->execute(), 'email_id', 'email_address');

This will give you an array like this:

[
    'foo@bar.com' => 1,
    'baz@bar.com' => 2,
]

Execute this once at the beginning of the script and keep it in memory throughout. From this, you can instantly find the ID of a given email. This will delete 7999 clicks from the database. You are essentially trading memory for CPU and disk time. If you come across an email that is not already in the array, you can insert it and add it to the array.

Next, move the preparation outside of the loop iteration. This will delete at least 3 * 7999 clicks from the database, and possibly up to 5 * 7999 clicks.

Next, use fgetcsv() instead of explode() because it is easier and handles references correctly. and process the entire CSV before performing a single insert. It would be foolish to create such a large amount of database traffic if you are just going to throw away most of the records. So calculate the highest first and then access the database with only these:

$top = [];
$fp = fopen('scores.txt', 'r');
while ([$email, $gameId, $score] = fgetcsv($fp)) {
    if ($score > ($top[$email][$gameId] ?? 0)) {
        $top[$email][$gameId] = $score;
    }
}

Given input file:

foo@bar.com,g1,3
foo@bar.com,g1,1
foo@bar.com,g2,2
baz@bar.com,g1,4
baz@bar.com,g2,5
baz@bar.com,g2,6

This will generate an array of the highest scores for each user:

Array
(
    [foo@bar.com] => Array
        (
            [g1] => 3
            [g2] => 2
        )

    [baz@bar.com] => Array
        (
            [g1] => 4
            [g2] => 6
        )

)

You can then iterate over the array and perform inserts/updates based on only those records. This will save two queries for each redundant CSV row.

foreach ($top as $email => $scores) {
    foreach ($scores as $gameId => $score) {
        // INSERT INTO scores ($idMap[$email], $gameId, $score)
    }
}
P粉860897943

Create a stored procedure with parameters email, game_id and Score. Let the process do all the SQL work for you. Your PHP code will be reduced to a single loop that calls the procedure. The result should be faster and easier to maintain:

create procedure sp_add_email_score (
in_email varchar(320),
in_game_id int,
in_score int
)
begin

declare v_email_id int;

select email_id into v_email_id
from users 
where email_address = in_email;

if (v_email_id is null) then
  INSERT INTO users (email_address) VALUES (in_email);
  
  set v_email_id = LAST_INSERT_ID();
end if;

INSERT INTO scores (email_id, game_id, score) 
VALUES(v_email_id, in_game_id, in_score) 
ON DUPLICATE KEY UPDATE score=if(in_score>score, in_score, score);;

end

If the loop is still too slow, there may be other reasons for the slowdown.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template