Nov 27, 2014

Level 2

How would you go about populating a table with 2m+ rows on deployment? (Best Practices)

I am trying to find if there is any better way to populate tables that have lots of rows of data then simply importing a CSV manually.

How would you go about doing this? Note that this is only done on deployment which only happens once for each instance of the application, which happens maybe once every 6 months if not more rarely.

I don't find it problematic to manually import the data into the relevant tables. I am simply looking to see what is considered to be the best practice regarding this kind of tasks in deployment.

RemiC

11 years ago

Level 8

You can use a seeder class for this.

1 like

andriusbartulis

11 years ago

Level 2

And is there a Laravel way of seeding with a CSV or SQL file?

Or would I have to fetch the data file myself, convert it to an array of rows and then iterate over it to create the records one by one manually?

arabsight

11 years ago

Level 10

there's a package for that: http://csv.thephpleague.com/

3 likes

pmall

11 years ago

Level 56

Can't you make a dump of this huge table and import it ?

1 like

bashy

11 years ago

Level 65

I would do a command on the server to import it

mysql -u DB_USER -pDB_PASS DB_NAME < file_to_import.sql

3 likes

andriusbartulis

11 years ago

Level 2

@pmall Of course I can. I don't have a problem with importing the data into the table. I am simply asking what would be a good practice, or in other words, how would you do this and why.

Would you automate the task or is it not worth it for so many rows (by the way, at the moment I only need to import a 40k row table, not a 2m one, which I might have to import too)? If you would automate it, how? By creating a seeder class for the table and running it with db:seed command, etc etc.? Or some other way? This is the kind of questions I have.

Currently I am leaning towards doing what RemiC said: to simply make a seeder class for this table. Then I could use the CSV package by phpleague as suggested by arabsight. This would make things uniform as all data would be seeded using one command which is always nice and neat. I could use this for the 40k row table as its only a few megs of data so would only take a second or two.

However I am still undecided if this approach is also practical for bigger data files like the one with 2m rows (about 500mb). I will of course try it and see how long it takes in relation to how long it takes using sql command line directly to import the file. After all 500mb is not that much for modern hardware or software, so it might turn out to be just as fast to use a seeder class as a direct sql command.

But still, its always nice to hear what other people have to say from their experience and consider different approaches. Hence the question.

Thanks for your input! :)

1 like

pmall

11 years ago

Level 56

Yes but it may be slow to insert so many rows one by one. Better to have a dump and import it. I think you are overthiking this.

1 like

bigbite

11 years ago

Best Answer

Level 1

I wouldn't do it through PHP at all, to remove the slow down that will occur with PHP holding all the data and processing it. The best choice here would be to delegate it to actual MySQL (or whatever SQL service you use).

1 like

bashy

11 years ago

Level 65

With PHP there will be execution and upload limits, use mysql or whatever software you use to import it directly.

RemiC

11 years ago

Level 8

There will be no time limit if using PHP CLI , that's why I suggested seed.

And about performance, is it really a concern if it's a one time install ?

2 likes

bashy

11 years ago

Level 65

Good point about PHP on CLI, I would always use mysql directly for a large file personally.

1 like

andriusbartulis

11 years ago

Level 2

So basically I decided to try PHP CLI and see how fast or slow this will actually be. I simply set up a seeder class file for my table. It uses thephpleague's CSV library as suggested by arabsight to open and process the csv file. Then an SQL query is built from the csv, and executed. All rows are added in with one single query.

I tried this with 40000 rows and it only took about 3 seconds, which is no problem on deployment.

I tried it with 2.4 million rows and it took 17 minutes. Longer then I would want to wait.

Note that I was using a homestead virtual machine to do this test, and it had very little resources allocated to it, meaning that in real world scenario it would take less time.

I imported the same csv using mysql LOAD DATA in the terminal, and it only took around 1 minute.

So based on this all, in my opinion it's fine to use PHP CLI seeders to import anything up to maybe 100k - 500k rows. The advantage of this is that all the seeding happens with one command, which is always nice. However, anything over that slows the process down too much (as we saw with 2.4m rows) and it's probably more practical to just load data using mysql command line which is much faster.

Once again thanks all for your input.

1 like

bashy

11 years ago

Level 65

Would also be faster with an SSD but of course, running it all through a seeder and the extra processes involved isn't worth it.

Happy to help :)

1 like

p4bloch

11 years ago

Level 1

I would avoid using the actual production data in development in most cases. If this is your case as well, you can generate a fake database once, with data as similar as possible to the real one, and simply dump from it whenever you need to start from fresh

That's how I'd go about it :)

Please or to participate in this conversation.