In a multi-tenant application, how should I handle a failed migration that occurs when migrating all tenant databases during an automated deployment?

Level 50

Just out of curiosity - what is the reason that having an instance per tenant isn't suitable?

Level 1

Two major reasons really:

Any code change, no matter how small, needs to be pushed 200 times instead of once. Each of these 200 Capistrano deploys will perform a composer install, php artisan migrate etc. This is a lot of overhead. Even ignoring the overhead, I'm not sure how I'd automate this - if you have any ideas I'd love to hear them.
It really complicates the process of adding a new tenant:

Single application / Multiple database:

Create database for tenant
Create database user for tenant
Add row to master tenant database matching tenant's host name to the database credentials in the previous two steps
php artisan tenant:migrate <tenant>

One application per tenant:

Create database for tenant
Create database user for tenant
Create virtual host (I did experiment with using mod_vhost_alias to just have a single virtual host instead of one per tenant, but it wouldn't work due to us also needing an Alias directive in the virtual host)
Create new directory on the server at /var/www/<tenant>
Add the tenant to the Capistrano repository, specifying the location of the new tenant on the server. Commit Capistrano repository and push to GitLab.
Before executing the first deploy, populate shared/.env on the server (the file .env.sample) is saved to the application repository that hasn't yet been cloned down by Capistrano - how do I even do this? Duplicate the file in an Ansible template?).
composer install --no-dev
php artisan migrate

If you can think of a way that I can keep the one application per tenant architecture, but still keep it easy to push updates and add new tenants, then I'd love to hear your ideas!

1 like

https://deploy.serversforhackers.com/

Level 50

I'm from a sysadmin background with a lot of automation tools (puppet and the like) so that doesn't sound too tricky to me - compared with the risks of 'one big db to rule them all' anyway ;-) Have you looked into things like puppet, ansible etc? @fideloper has a series of videos that sound like they might be worth an investigate for your set-up (and I think the company he works for went down the install+db per customer route for their SaaS if I'm not getting things mixed up) :

https://serversforhackers.com/video/deploying-with-fabric

1 like

Level 1

I've done a tiny amount with Ansible any it seems pretty straight forward to use, but it could take me a while to work out how to get it to do everything I need. I'm currently half way through S.F.H. Deploy! series, but so far there's no mention of handling multi-tenant.

So given your sysadmin background, do you think it's preferable to have 200 separate instances of the application, assuming I could use Ansible to automate the set up of each client? And when it came to deploying updates - would it more correct to modify Capistrano/Fabric etc to loop through the tenants and do many small deploys in each tenant directory on the server in a single actual deploy, or would another tool be more suitable (Ansible? Something else?) to loop through all tenants on the server and trigger each individual Capistrano/Fabric deploy for each client. What if there were 500 tenants? 1000? Would your opinion change?

If you were to implement a multi-tenant solution for a Laravel application right now from scratch, what tools would you use to handle deployment? Any info at all would be incredibly useful :)

Hopefully that makes sense, I really appreciate your input.

kfirba

Level 50

@philbates35 hey.

I actually have a solution for you but it's different from any approach you've mentioned here.

Why not have 1 database only for all of your tenants? As I understand it, your tenants have the same DB structure and the only thing that varies is the data stored.

To solve that, each DB record should have a tenant_id column to specify which tenant this piece of data is related to. Make is a foreign key - that will also index that column.

Ofcourse you will want to set a cluster of databases. Go with a single master database and few slaves for the beginning. Make those DBs sit on different machines connected with a private subnet.

Also, please check Percona Server which is a drop-in replacement for MySQL and has much much better performance and stability. There are some amazing toolkits that Percona have.

The only overhead now is to always query only the right sub-data. This can actually be easily done. Add a global scope to add the where clause for the current tenant id. When you save data, make a trait that will listen to the saving event and there add the tenant_id just before it's inserted to the DB.

By doing this you just don't have to worry about anything.

When performance degrades and the servers are bloated just upgrade the current master and slaves and maybe add another slave.

Please note, you should have a fairly high RAM available on your slave servers and increase the buffer pool. On your master server you don't need much RAM at all.

1 like

Level 50

It's hard to give a 100% answer as it'll depend on exactly your set-up, customers etc. My preference is for making changes as small and isolated as you can so doing (via automation) 500 small individual updates seems preferable to one big change that might mess up all 500 at once :-) It gives you a bit of flexibility too - you could choose to update all the tenants in batches (by name, location etc) and see how it went. Maybe some tenants don't want to move to an updated version so they can stick with the current db/code etc while the others get the new stuff.

Once you learn a little automation (whether puppet, ansible, whatever) you'll never go back. I'm mostly a puppet user so I'm used to barely thinking about setting up vhosts etc any more. Have a look at the vhost modules for instance and how much work/thought all disappears from your life (I'm sure there are equivalents for ansible etc). I don't think I'd mind if there were a lot of tenants - assuming we're not talking ludicrous numbers or a very long deploy time - once it's automated it's not 'on your time' any more - it just happens while you go get a coffee and do something more productive (like swan about on here, ahem....) The automatic deploy will probably be faster and more reliable than I'd be anyway.

As for what I'd use from scratch - it'd depend on how much I was charging them ;-) I'd be pretty tempted with the price of VPS's these days just to do a whole VM per tenant and save a lot of heartache and worry - which in turn can be part of the automation with fabric, puppet/ansible/whatever.

Level 1

Thanks again for your response @ohffs

Let's say I go with one application per tenant - one thing that's still unclear is the link between Ansible (the only configuration manage tool that I could feasibly use) and the deployment tool (like I say, we're using Capistrano now).

Lets say my Ansible repository contains a list of all 200+ tenant hosts and the details of the server they reside on. When I need to add a new tenant, I add the relevant details to Ansible and run the playbook to initialise the client. I'm fairly happy with this: creating a new database, MySQL user, /var/www/<tenant> etc.

However, when it comes to using a deployment tool, and rolling out deployments across all 200+ applications, how do I make the deployment tool aware of which tenants we have, and which servers they reside on?

Should there be an Ansible deployment task that loops through the tenants config file (server IP, the directory on the server to deploy to etc) and calls a local Capistrano deploy task with the provided config? Capistrano then handles deployment using the config Ansible gave it.

Or, instead should there be a common place where tenant config is stored (lets say a simple MySQL database) accessible by both Ansible and Capistrano? In this approach, Ansible queries the database to get all tenants, and then provisions the server for all tenants in the results of the query (meaning the process of adding a new client would now involve adding a row to the database instead of updating the Ansible code repository). Similarly, Capistrano would query the same database before a deploy and one at a time deploy individually to all tenants found using the config returned from the query. The benefit I see to this approach is that I can query for subsets of tenants in my Capistrano deployment task to roll out a deploy in stages, like you mentioned above.

PS - I'm aware that Ansible is considered a configuration management tool, not a deployment tool, which is why I'm trying to keep the distinction. Given that something like Capistrano handles zero-downtime deploys out of the box and is specifically designed for the job, I'm happy to keep using Ansible alongside a deployment tool, unless you think this isn't a good idea - I'd love to hear your thoughts on this.

Level 1

@kfirba I'm afraid that solution isn't really feasible - the majority of the application is very much legacy CodeIgniter, with thousands of hard-coded queries and hard to maintain spaghetti with no tests. It's a case of "it works - don't touch it" until the time comes to re-write. Having to re-scope every query would just be too big a task, and we just don't have the time. Also more generally, there are definitely plus points for having each tenant have their own database.

Level 50

I'm not very familiar with Capistrano outside of a vague 'it's that ruby thing for deploying stuff' I'm afraid - so I'm not sure what it can/can't do. But I think you're on the right path with using a shared db of tenants and letting ansible do it's thing then capistrano to deal with the deployment/refreshes.

I'm already anticipating reading your blogpost of how it went! :-)