kiasaty's avatar

How to transfer or convert data when doing a database migration!

assume there is a "users" table with the following columns: id, name, email, age

If we change the "age" column to "birthday" in a migration file, where should we convert the data already inserted into the table?

This is a super simple example. In some migrations, more that one table is involved and transferring data is one of the steps. but if the migration files are to only change the database schema, where should be the data transferred or converted?

I'm looking for a best practice or a standard.

0 likes
20 replies
siangboon's avatar

migration file is just for create the schema... usually inserting data is at the seeder file, you need to prepare your own scripts....

but for your "complex" case, i think do it database side may be easier...

kiasaty's avatar

@siangboon In this scenario, it is not a fresh project. the database tables are already full of actual data. the question is when in an already running project, a structure in the database changes that effects the data, where should be the data converted or transferred?

PS: Although I have seen in most projects that database seeders are used to insert actual data into the database tables, I think seeders are to as the name suggests, seed dummy data into the tables. not actual data that is used in the production. this is another discussion. I don't know if I'm right.

newbie360's avatar

create dob column, convert age data to dob column via command or php code

change your project code for use the new column

seeder is good for some default data

if you have many data, export (backup) / import (restore) via command

tisuchi's avatar

@kiasaty If I am in your situation, I would like to do two things-

  • Change the column name in the migration files.
  • Go to the database, and change the column name manually. So that I just keep my records as it is.
newbie360's avatar

i guess age column is int type ?

and new dob column is date type ?

kiasaty's avatar

@tisuchi this is just a simple example to explain the problem.

by the way, as @newbie360 mentioned the "age" column type is integer and the "birthday" column type is timestamp. data should be converted before transferring. and again, this is just a super simple example to explain the situation.

changing the database manually has 2 problems: it is a hack and it doesn't work in complex migrations.

tisuchi's avatar

@kiasaty In that case, I suggest you create a new column i.e. birthday and then create a method that serves the following purpose-

  • Get each record from the table and then calculate age in date format based on the age column.
  • Next, insert the calculated age to birthday column.

Once done, simply delete your age column from the table and update your code accordingly.

kiasaty's avatar

@newbie360 this is what I do now.

I convert and transfer data in the migration files. but the question is: is this right? is processing data in migration files right? aren't migration files just for changing the database schema?

if we can restrict migration files to only change the schema and keep processing data out of it, it prevents problems in the future.

Snapey's avatar

Sounds like you already know the answers

Migrations are to change the schema; add columns and drop columns.

Migrations do nothing for modifying the data, for that you need to write some custom script either in php or in sql.

Migration files can contain any PHP script, so in your made up example, you could include code to convert age into DOB (if that were even possible!) and then a further section to drop the original column.

Putting the transformation into your migration file means that it is only executed once and is stored in your VCS

kiasaty's avatar

@tisuchi right. but the real question is: Where to put this method? where and how should it be triggered? in migration files? or there is a better place?

tisuchi's avatar

@kiasaty Honestly it's like a one time process. Once you have done, boom.

So, I suggest you create a new get route, (maybe /migrate-data) then write your desire code inside either in route / in a method inside a dedicated class (maybe MigrateDataController).

Once done, push your code to live and run your route http://domain.com/migrate-date . Keep in mind that once done all the migration, you have to remove your code.

That's it.

kiasaty's avatar

@snapey so putting "data modifying code" in migration files is OK?

because as you mentioned, I thought Migrations are to change the schema; add columns and drop columns.

The Problem That led me to this question:

In a migration, I fetched the data, changed the schema, and then inserted the modified data into the new table schema using eloquent models.

but later, the model changed, and suddenly the migration file which is created a few weeks ago failed. This got me thinking, "is modifying data in migration files OK? or there are other ways to do that?"

Options to prevent this in the future:

  1. do not modify data in the migration files. and only change the schema in migration files.
  2. do not make migration files dependent on other classes like models to modify data, and only use query builder.

what do you guys think?

I thought maybe the whole idea of modifying data in the migration files is stupid and there is a standard way to handle these situations. but it seems there isn't.

kiasaty's avatar

@tisuchi this doesn't seem right.

  1. I do the feature, and push it to the repository. I don't publish it. and I don't know when it's gonna be merged into the master branch.

  2. creating and committing a temporary route for each data modification doesn't seem right.

tisuchi's avatar

@kiasaty it seems you are changing your conditions in your answers. In your last answer, you bring another point. :)

newbie360's avatar

well, if this is one time process

  1. terminal run command or even phpMyAdmin add a new dob column

  2. use php code or command convert the age data update to the dob column

  3. for me, i will keep the age column at least 1 week

  4. change your project code to use the new dob column ,BUT still keep maintain the old age column in your project at least 1 week

  5. if without any problems in the pass week, delete age column and split backup 'schema_only.sql' and 'data_only.sql' via command

  6. in your migration file delete the age column and add the new dob column (Be careful the column order)

  7. now your migration file should be MATCH the 'schema_only.sql'

  8. since you have backed up *.sql, you can re-run php artisan migrate:fresh and use command import the 'data_only.sql'

  9. if all fine, you can delete 'schema_only.sql', because it is actually same as the migration file

kiasaty's avatar

@tisuchi yeah. the question is about "modifying data in migration files". I wrote a simple example to give the idea. maybe I should have provided more details. sorry :)

Snapey's avatar

I dont see a problem translating table data in a migration file, but relying on the current state of other tables I can see would be a problem.

I have tackled things like this in custom artisan commands, but the problem with these is knowing when to run, and how such a command might be run within a deployment pipeline

Either is still preferable to running custom sql scripts in db tools as there is no control over these

siangboon's avatar

create the columns as needed, write a script to update the columns from a files (cvs, txt or whatever) according to the id and just execute it anywhere you like even in the web route closure also can since you only need to run one or few.... even run it with sql script in db also can...

jcandan's avatar

I feel the OP's pain. I feel the same about working from convention when possible. So, looking to the community for what we assume would have to be an already solved problem. Not much luck here either.

I found this post from the perspective of Ruby on Rails super helpful. The cons listed are very pertinent. https://www.ombulabs.com/blog/rails/data-migrations/three-useful-data-migrations-patterns-in-rails.html

Have not tried, but also found these packages:

aakarim's avatar

@jcandan I have this issue a lot and I've found a nice way to resolve this without the issues mentioned in the Rails application.

I make a separate Artisan Command that I call from within the migration:

class CreateOrganisationsDomainsTable extends Migration
{
    /**
     * Run the migrations.
     *
     * @return void
     */
    public function up()
    {
        Schema::create('organisations_domains', function (Blueprint $table) {
            $table->unsignedBigInteger('organisation_id');
            $table->unsignedBigInteger('domain_id');
            $table->timestamps();
            $table->foreign('organisation_id')->references('id')->on('organisations')
                ->onDelete('cascade');
            $table->foreign('domain_id')->references('id')->on('org_email_domains')
                ->onDelete('cascade');
        });


        // data migration
        Artisan::call('migrate:batch:email-domains-multi-org');

        Schema::table('org_email_domains', function (Blueprint $table) {
            $table->dropForeign('org_email_domains_organisation_id_foreign');
            $table->dropColumn('organisation_id');
        });
    }

    /**
     * Reverse the migrations.
     *
     * @return void
     */
    public function down()
    {
        Schema::table('org_email_domains', function (Blueprint $table) {
            $table->unsignedBigInteger('organisation_id')->nullable();
            $table->foreign('organisation_id')->references('id')->on('organisations')
                ->onDelete('cascade');
        });

        // data migration
        Artisan::call('migrate:batch:email-domains-multi-org', ['--down' => true]);

        // run the migration command to move the data back and run the previous migration
        Schema::dropIfExists('organisations_domains');
    }
}

Then the command performs all the data operations

class MigrateEmailDomainsToMultiOrg extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string
     */
    protected $signature = 'migrate:batch:email-domains-multi-org {--down=false}';

    /**
     * The console command description.
     *
     * @var string
     */
    protected $description = 'Migrate domains to a multi-org format';

    /**
     * Create a new command instance.
     *
     * @return void
     */
    public function __construct()
    {
        parent::__construct();
    }

    /**
     * Execute the console command.
     *
     * @return int
     */
    public function handle()
    {
        if (!is_null($this->option('down'))) {
            // run the migration command to move the data back
            $this->info('running down');

            DB::transaction(function () {
                OrgEmailDomain::chunk(2000, function ($domains) {
                    $batchUpdate = [];
                    $this->info('starting from ' . $domains->first()->id . ' to ' . $domains->last()->id);
                    foreach ($domains as $domain) {
                        $org = $domain->organisations()->first();
                        $batchUpdate[] = [
                            'id' => $domain->id,
                            'organisation_id' => $org->id,
                        ];
                    }
                    // update
                    DB::table('org_email_domains')->update($batchUpdate);
                });
            });
            return 0;
        }

        $this->info('running up');
        // start a transaction
        DB::transaction(function () {
            OrgEmailDomain::chunk(2000, function ($domains) {
                $this->info('starting from ' . $domains->first()->id . ' to ' . $domains->last()->id);
                $batchUpdate = [];
                foreach ($domains as $domain) {
                    $batchUpdate[] = [
                        'organisation_id' => $domain->organisation_id,
                        'domain_id' => $domain->id,
                        'created_at' => $domain->created_at,
                        'updated_at' => $domain->updated_at,
                    ];
                }
                // insert
                DB::table('organisations_domains')->insert($batchUpdate);
            });
        });
        return 0;
    }
}

This way if the class disappears, like the article mentions, you don't need to modify the migration code if the model disappears, just the command, which makes it much clearer as to what's happening and you can leave a helpful note in there that shows that no data migration is required.

It also means that you can run the command separately if you want to split up the migrations to only deal with one table at a time and you run the command between them. It's a manual step in that case, but at least it's in your repository so you can see what's happening.

Please or to participate in this conversation.