How would you import 16.000.000 lines from CSV to Mysql database ?
Hey guys,
I was stuck on a point of speed. I'm dealing with big large annoying .csv files. Each lines is a row to be inserted in the database. Pretty easy isn't it ?
But when we are talking about files reaching the 700-800mb (for ~ 16.000.000 lines), how would you approach this import ?
Fortunately, it's an intern project, so I don't need to deal with a big import (users just push them files into a specific folder that is scanned in backend).
I first, checked for a tool that will chunk my file and found : https://github.com/Maatwebsite/Laravel-Excel. It was nice, but PHP gave me an error for timeout,
Alright, then I looked after Laravel Queues/Jobs, it was working, very well with the Laravel-Excel implementation. But for larges files I was talking before, PHP gave me another error for memory leak... (Not supposed to be appened, because of the chunk usage... ?)
Then, I drop my computer off the window, and developed a Java GUI for the import. Easy to make, but not so fast (but working without any error).
Then, I would like to know how would you manage this kind of situation ?
Thanks in advance, Romalops.
Yes the mysql load data would be the quickest. This works well if your fields matches your database table columns and you do not need to change values before you insert them.
If you are handy with regex you can also turn all of your csv lines into a: INSERT INTO table () VALUES ()query. I did this multiple times because this was the quickest way without any coding and I could just run those sql queries in chunks in my database IDE. This is not the fastest option but it worked pretty good in my case.
Another solution could be to just loop over the file line by line and set the time_limit to 0. If you run your script in an artisan command it is already set to 0. This will probably take a while but eventually it should end nicely if you do not build up huge array's with additional data.
set_time_limit(0); // disables the timeou
$handle = fopen("inputfile.txt", "r");
if ($handle) {
while (($columns = fgetcsv($handle, 1000, ",")) !== false) {
// do your insert here
// $columns['colum-1'], $columns['colum-2']
}
fclose($handle);
} else {
// error opening the file.
}
And another solution I just found was the one below. It reads the contents of my.csv in chunks of 1000 and it stores the current row it processing into a file. The next time you run the script it will start at the line where you left.
https://stackoverflow.com/a/19518233
$lastPosition = file_get_contents('last_position.txt');
$fh = fopen('my.csv', 'r');
fseek($fh, $lastPosition);
$maxLines = 1000;
while ($maxLines > 0 && $columns = fgetcsv($fh)) {
$maxLines--;
file_put_contents('last_position.txt', ftell($fh));
// do your insert here
// $columns['colum-1'], $columns['colum-2']
}
So yeah.. there are quite some options. If you create an import script like this I would suggest you do it in an artisan command and make use of progress bars https://laravel.com/docs/5.5/artisan#writing-output. This will give you an indication how far your script is (in percentage)
Please or to participate in this conversation.