imJohnBon's avatar

Seeding with real data?

I'm a little bit confused about how I should go about seeding my database with "real" data. IE data that will actually be used in a production environment. For instance, let's say whenever the site is created I also want to create 5 admin accounts.

You would think seeds are the right answer, but I've seen a lot of people say that seeds should be exclusively used for fake test data. The alternative seems to be putting "real" data into the database through migrations.

But that doesn't seem to scale well. Let's say you need to change one of the emails of the admins you are creating? You can't really go back and just change that one migration. You would risk losing data that has changed over time. Otherwise you need to create an entire migration for that simple thing, which seems crazy.

0 likes
10 replies
Michael__'s avatar

We use them for production data as well and it perfectly fine for us. Do not see any reason why you should not do that.

2 likes
Michael__'s avatar
Level 2

@imJohnBon: We use seeds for production data. For example to create the default roles and permissions. Seems like it is exactly what it is meant for. And tbh, who cares if some people say it is not for this as long as it does exactly what you need?

4 likes
jekinney's avatar

@Gewora perfect answer!

If I am transferring data from an existing db I run a script. But for new sites I to utilize migrations. But the only issue I can see and have happen once is pushing updates. Make double sure the migrations don't try to run again otherwise you may end up with double data. I run an exists check on final production migrations now.

michaeldyrynda's avatar

For a project I was working on at my last job, migrating from a custom solution to one built on Laravel, I wrote seeds for all the old data to get it into a new database structure, with new field names.

It was a task and a half (lots of weird stuff in the old system), but once the seeds were configured, it meant that I could run the migration scripts (to build the seeds from the old data) and use the two systems alongside each other and always have up to date data for the final switch flip when we moved to the new system.

You can always have two sets of seeds - one for development and one for prod - but at the end of the day, database seeding is to get the data you need in your database into your application ready to use, no matter when or where it's deployed.

4 likes
decafmainline's avatar

Came here to ask the same question as OP today.

I'm creating a MariaDB database for my web application that uses LOTS of existing data from an open dataset. I'm recording the data in TOML files and utilizing Seeders to get what I call "structural data" into the database. That is, data that won't be generated by my users through the web app. This data needs to exist prior to the site going live for it to be useful.

This has the added benefit of allowing me to migrate and rollback without fear of losing the data. I can just seed it again anytime I need to. I will verify/test the data in staging and then make php artisan db:seed part of my deployment script.

These answers helped validate that I was on the right track.

Thanks!

J5Dev's avatar

Just wanted to drop onto this one, and offer a little more to the discussion.

I have used migrations for this very purpose as well for some time, and spoke to many people who, for some reason, think its the devils work.

Firstly, it isn't, as some have said its a very valid way to get data into the db as part of a roll out, be it first install or update.

Secondly, the environment is irrelevant, I have seeders across my 3 environments, dev, staging, production. Dev seeders create the usual nonsensical dummy data, which is fine for that level. My staging/UAT environment uses a data set generated from a nightly dump of the live db (with some processing to get it into files), to ensure a better testing experience. Live seeding is explained below...

One thing I have been doing for a while is using a system similar to how the framework tracks migrations.

Any seed data that I need for a production environment is added to files, which are read by the seeders and inserted accordingly. Upon doing so, they add the file information to a table (seeded_files), including last edited dates etc.

This means that should we need to change, add to, or remove anything from our seeded data, it can still be managed by the automated install process (We have db:seed run every time just like migrations).

Put simply, when a seed runs, it collects the files it needs (hosted outside our version control to ensure data is not in there... obviously not making that rookie error), and checks the files last edited date against that stored in the seeded_files table, if it hasn't changed, it does nothing, if it has been updated, it clears out the data and adds/replaces it with the new set.

What you have is a nice way to maintain default, or internal running data, as part of your migration workflow, without compromising using seeders for their more accepted purpose.

2 likes
besrabasant's avatar

@J5Dev I was also running into the issues of seeding data in a live database. On reading your comment, your approach seems quite promising to me.

I just want to ask one question. - **How do you handle re-seeding the database if the already seeded data has some related transactional data in the database that is created by the application? **.

....Because are facing such scenario.

Please or to participate in this conversation.