Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

philbates's avatar

In a multi-tenant application, how should I handle a failed migration that occurs when migrating all tenant databases during an automated deployment?

TL;DR – one application instance, connecting to hundreds of databases (one per tenant), I need to apply database migrations to them all once automatically during deploy, if just one fails – how do I gracefully ensure that all tenant applications will continue to work with no database/application mis-match?

Up until now we have ~20 clients. Each client has one instance of the application code, and their own MySQL database on our production server. Deployment is automated using Capistrano. The majority of the application is legacy CodeIgniter code, but we have implemented a Laravel wrapper to make our lives easier, meaning the request is handled by Laravel, and if a Laravel route isn't found, it sends the request of to the legacy/ subdirectory where it is then handled by CodeIgniter. We also use Laravel migrations to handle database changes. Each client has an identical database structure.

Now, we've just signed a contract with a new client, who want to have 200-400 instances of our application that they will franchise out. Clearly our existing set up of having one application instance per client is no longer feasible with such a large number of tenants, so I've set about making the application a single instance, multi-database application.

After a couple of days research and work, I came up with this set up: all tenant details (and database credentials) are stored in a master database. When a request comes in, we look up the tenant by the request host name, get their database credentials, and then connect to their database. This may not be the most efficient way, but it works and I'm fairly happy with it. The problem I have is to do with database migrations during deployment.

I created custom artisan migrate/migrate:rollback/db:seed commands that that query the master database, loop through all tenants, and executes the underlying command for every client database. This works well when there's no failures, but I'm having a real hard time thinking about what to do if migrating one tenant's database fails during automated deployment.

When there's a failure, there will be a limbo state – some tenant databases have been migrated, while others aren't. Clearly all databases that have been migrated up to the point of failure should be rolled backed, because the new (failed) release (requiring the database changes) was never actually made live. But how do I make that happen?

The important thing to note is that I can't rely on Laravel's built in migration:rollbackfunctionality during deployment to handling rolling back. This is because new tenants's production database will have all available migrations applied in one batch as part of a custom "add a new tenant" script. Whereas other clients will have had the same set of migrations applied in the same order, but in different batches. This means that just executing migrate:rollback on all tenant databases will results in very different results, which rules out being able to use this command to roll back all tenants in production.

I posted the same question on Reddit last week, and frustratingly every suggested solution was different, and none of them seemed perfect. The best, most practical suggestion IMO, was to diff the migrations files from the previous release to the release being processed, and store a log in the new release directory containing a list of the migrations that appeared in this new release that weren't in the previous release. This log acts as the source of truth of which migrations were processed per-release, so that I know exactly which migrations and in what order I need to roll back, in the case the deploy fails, or shortly after a successful deploy I realise I need to roll back. After creating this log, I loop through all tenants and migrate their database (I have a custom artisan command to do this) - if just one migrations fails, then rollback the migrations stored in the per-release migrations log for the clients that have (using another custom artisan command). Although I can see this working (and I can write tests to prove it), this solution just doesn't feel "nice" to me, and I'm hesitant to spend the time implementing this. I can't help but feel like there must be a better, more "correct" way that someone out there has worked out.

Other proposed solutions included temporarily running two versions of the code base, and when a migration for a tenant is successful, switch that tenant only to using the new code base that relies on the newly added migration. This seems like a sensible solution, however this involves modifying the default Capistrano behaviour, messing with Apache Virtual Hosts and symlinks etc - unless this was done "right" (and to be honest I don't know what the "right" way is to do this) I can see this solution ending up being just as hacky as the first solution I mentioned above, and as I only know some very basic Ruby (enough to get Capistrano to do the very simple tasks I need it to do), I don't feel comfortable with this suggestion at this point. However, if anyone has any ideas or tips how I could realistically accomplish this solution, that would be really helpful.

In summary, after many hours of research, I haven't yet come across a solution that would give me peace of mind during a deployment. Then I remember you guys and wondered if you guys might have some new ideas. Multi-tenancy is a common architecture, so surely some of you guys must have come across this problem before? How did you handle migrations in a multi-tenancy application?

How can I safely, automatically apply my application migrations to 100's of databases during a deployment, and gracefully handle the rare cases where a migration fails?

Thanks!

0 likes
10 replies
ohffs's avatar

Just out of curiosity - what is the reason that having an instance per tenant isn't suitable?

philbates's avatar

Two major reasons really:

  1. Any code change, no matter how small, needs to be pushed 200 times instead of once. Each of these 200 Capistrano deploys will perform a composer install, php artisan migrate etc. This is a lot of overhead. Even ignoring the overhead, I'm not sure how I'd automate this - if you have any ideas I'd love to hear them.

  2. It really complicates the process of adding a new tenant:

Single application / Multiple database:

  • Create database for tenant
  • Create database user for tenant
  • Add row to master tenant database matching tenant's host name to the database credentials in the previous two steps
  • php artisan tenant:migrate <tenant>

One application per tenant:

  • Create database for tenant
  • Create database user for tenant
  • Create virtual host (I did experiment with using mod_vhost_alias to just have a single virtual host instead of one per tenant, but it wouldn't work due to us also needing an Alias directive in the virtual host)
  • Create new directory on the server at /var/www/<tenant>
  • Add the tenant to the Capistrano repository, specifying the location of the new tenant on the server. Commit Capistrano repository and push to GitLab.
  • Before executing the first deploy, populate shared/.env on the server (the file .env.sample) is saved to the application repository that hasn't yet been cloned down by Capistrano - how do I even do this? Duplicate the file in an Ansible template?).
  • composer install --no-dev
  • php artisan migrate

If you can think of a way that I can keep the one application per tenant architecture, but still keep it easy to push updates and add new tenants, then I'd love to hear your ideas!

1 like
ohffs's avatar

I'm from a sysadmin background with a lot of automation tools (puppet and the like) so that doesn't sound too tricky to me - compared with the risks of 'one big db to rule them all' anyway ;-) Have you looked into things like puppet, ansible etc? @fideloper has a series of videos that sound like they might be worth an investigate for your set-up (and I think the company he works for went down the install+db per customer route for their SaaS if I'm not getting things mixed up) :

https://deploy.serversforhackers.com/

https://serversforhackers.com/video/deploying-with-fabric

1 like
philbates's avatar

I've done a tiny amount with Ansible any it seems pretty straight forward to use, but it could take me a while to work out how to get it to do everything I need. I'm currently half way through S.F.H. Deploy! series, but so far there's no mention of handling multi-tenant.

So given your sysadmin background, do you think it's preferable to have 200 separate instances of the application, assuming I could use Ansible to automate the set up of each client? And when it came to deploying updates - would it more correct to modify Capistrano/Fabric etc to loop through the tenants and do many small deploys in each tenant directory on the server in a single actual deploy, or would another tool be more suitable (Ansible? Something else?) to loop through all tenants on the server and trigger each individual Capistrano/Fabric deploy for each client. What if there were 500 tenants? 1000? Would your opinion change?

If you were to implement a multi-tenant solution for a Laravel application right now from scratch, what tools would you use to handle deployment? Any info at all would be incredibly useful :)

Hopefully that makes sense, I really appreciate your input.

kfirba's avatar

@philbates35 hey.

I actually have a solution for you but it's different from any approach you've mentioned here.

Why not have 1 database only for all of your tenants? As I understand it, your tenants have the same DB structure and the only thing that varies is the data stored.

To solve that, each DB record should have a tenant_id column to specify which tenant this piece of data is related to. Make is a foreign key - that will also index that column.

Ofcourse you will want to set a cluster of databases. Go with a single master database and few slaves for the beginning. Make those DBs sit on different machines connected with a private subnet.

Also, please check Percona Server which is a drop-in replacement for MySQL and has much much better performance and stability. There are some amazing toolkits that Percona have.

The only overhead now is to always query only the right sub-data. This can actually be easily done. Add a global scope to add the where clause for the current tenant id. When you save data, make a trait that will listen to the saving event and there add the tenant_id just before it's inserted to the DB.

By doing this you just don't have to worry about anything.

When performance degrades and the servers are bloated just upgrade the current master and slaves and maybe add another slave.

Please note, you should have a fairly high RAM available on your slave servers and increase the buffer pool. On your master server you don't need much RAM at all.

1 like
ohffs's avatar

It's hard to give a 100% answer as it'll depend on exactly your set-up, customers etc. My preference is for making changes as small and isolated as you can so doing (via automation) 500 small individual updates seems preferable to one big change that might mess up all 500 at once :-) It gives you a bit of flexibility too - you could choose to update all the tenants in batches (by name, location etc) and see how it went. Maybe some tenants don't want to move to an updated version so they can stick with the current db/code etc while the others get the new stuff.

Once you learn a little automation (whether puppet, ansible, whatever) you'll never go back. I'm mostly a puppet user so I'm used to barely thinking about setting up vhosts etc any more. Have a look at the vhost modules for instance and how much work/thought all disappears from your life (I'm sure there are equivalents for ansible etc). I don't think I'd mind if there were a lot of tenants - assuming we're not talking ludicrous numbers or a very long deploy time - once it's automated it's not 'on your time' any more - it just happens while you go get a coffee and do something more productive (like swan about on here, ahem....) The automatic deploy will probably be faster and more reliable than I'd be anyway.

As for what I'd use from scratch - it'd depend on how much I was charging them ;-) I'd be pretty tempted with the price of VPS's these days just to do a whole VM per tenant and save a lot of heartache and worry - which in turn can be part of the automation with fabric, puppet/ansible/whatever.

philbates's avatar

Thanks again for your response @ohffs

Let's say I go with one application per tenant - one thing that's still unclear is the link between Ansible (the only configuration manage tool that I could feasibly use) and the deployment tool (like I say, we're using Capistrano now).

Lets say my Ansible repository contains a list of all 200+ tenant hosts and the details of the server they reside on. When I need to add a new tenant, I add the relevant details to Ansible and run the playbook to initialise the client. I'm fairly happy with this: creating a new database, MySQL user, /var/www/<tenant> etc.

However, when it comes to using a deployment tool, and rolling out deployments across all 200+ applications, how do I make the deployment tool aware of which tenants we have, and which servers they reside on?

Should there be an Ansible deployment task that loops through the tenants config file (server IP, the directory on the server to deploy to etc) and calls a local Capistrano deploy task with the provided config? Capistrano then handles deployment using the config Ansible gave it.

Or, instead should there be a common place where tenant config is stored (lets say a simple MySQL database) accessible by both Ansible and Capistrano? In this approach, Ansible queries the database to get all tenants, and then provisions the server for all tenants in the results of the query (meaning the process of adding a new client would now involve adding a row to the database instead of updating the Ansible code repository). Similarly, Capistrano would query the same database before a deploy and one at a time deploy individually to all tenants found using the config returned from the query. The benefit I see to this approach is that I can query for subsets of tenants in my Capistrano deployment task to roll out a deploy in stages, like you mentioned above.

PS - I'm aware that Ansible is considered a configuration management tool, not a deployment tool, which is why I'm trying to keep the distinction. Given that something like Capistrano handles zero-downtime deploys out of the box and is specifically designed for the job, I'm happy to keep using Ansible alongside a deployment tool, unless you think this isn't a good idea - I'd love to hear your thoughts on this.

philbates's avatar

@kfirba I'm afraid that solution isn't really feasible - the majority of the application is very much legacy CodeIgniter, with thousands of hard-coded queries and hard to maintain spaghetti with no tests. It's a case of "it works - don't touch it" until the time comes to re-write. Having to re-scope every query would just be too big a task, and we just don't have the time. Also more generally, there are definitely plus points for having each tenant have their own database.

ohffs's avatar

I'm not very familiar with Capistrano outside of a vague 'it's that ruby thing for deploying stuff' I'm afraid - so I'm not sure what it can/can't do. But I think you're on the right path with using a shared db of tenants and letting ansible do it's thing then capistrano to deal with the deployment/refreshes.

I'm already anticipating reading your blogpost of how it went! :-)

philbates's avatar

@ohffs That's good to hear, I'll start working on a way to keep all tenant server configuration in a master database, and querying it in both Ansible and the deployment tool.

Out of curiosity, what deployment tool would you use for this scenario? Would you still use the shared db of tenant server config, or would you approach it completely differently? Let's say there's 200 application instances, one for each tenant, all on the same server to keep things simple. Given you don't use Capistrano and you seem to know what you're talking about, I'm wondering if I'm missing a trick, and if you would recommend a alternative deployment process entirely. :)

Please or to participate in this conversation.