Just out of curiosity - what is the reason that having an instance per tenant isn't suitable?
In a multi-tenant application, how should I handle a failed migration that occurs when migrating all tenant databases during an automated deployment?
TL;DR – one application instance, connecting to hundreds of databases (one per tenant), I need to apply database migrations to them all once automatically during deploy, if just one fails – how do I gracefully ensure that all tenant applications will continue to work with no database/application mis-match?
Up until now we have ~20 clients. Each client has one instance of the application code, and their own MySQL database on our production server. Deployment is automated using Capistrano. The majority of the application is legacy CodeIgniter code, but we have implemented a Laravel wrapper to make our lives easier, meaning the request is handled by Laravel, and if a Laravel route isn't found, it sends the request of to the legacy/ subdirectory where it is then handled by CodeIgniter. We also use Laravel migrations to handle database changes. Each client has an identical database structure.
Now, we've just signed a contract with a new client, who want to have 200-400 instances of our application that they will franchise out. Clearly our existing set up of having one application instance per client is no longer feasible with such a large number of tenants, so I've set about making the application a single instance, multi-database application.
After a couple of days research and work, I came up with this set up: all tenant details (and database credentials) are stored in a master database. When a request comes in, we look up the tenant by the request host name, get their database credentials, and then connect to their database. This may not be the most efficient way, but it works and I'm fairly happy with it. The problem I have is to do with database migrations during deployment.
I created custom artisan migrate/migrate:rollback/db:seed commands that that query the master database, loop through all tenants, and executes the underlying command for every client database. This works well when there's no failures, but I'm having a real hard time thinking about what to do if migrating one tenant's database fails during automated deployment.
When there's a failure, there will be a limbo state – some tenant databases have been migrated, while others aren't. Clearly all databases that have been migrated up to the point of failure should be rolled backed, because the new (failed) release (requiring the database changes) was never actually made live. But how do I make that happen?
The important thing to note is that I can't rely on Laravel's built in migration:rollbackfunctionality during deployment to handling rolling back. This is because new tenants's production database will have all available migrations applied in one batch as part of a custom "add a new tenant" script. Whereas other clients will have had the same set of migrations applied in the same order, but in different batches. This means that just executing migrate:rollback on all tenant databases will results in very different results, which rules out being able to use this command to roll back all tenants in production.
I posted the same question on Reddit last week, and frustratingly every suggested solution was different, and none of them seemed perfect. The best, most practical suggestion IMO, was to diff the migrations files from the previous release to the release being processed, and store a log in the new release directory containing a list of the migrations that appeared in this new release that weren't in the previous release. This log acts as the source of truth of which migrations were processed per-release, so that I know exactly which migrations and in what order I need to roll back, in the case the deploy fails, or shortly after a successful deploy I realise I need to roll back. After creating this log, I loop through all tenants and migrate their database (I have a custom artisan command to do this) - if just one migrations fails, then rollback the migrations stored in the per-release migrations log for the clients that have (using another custom artisan command). Although I can see this working (and I can write tests to prove it), this solution just doesn't feel "nice" to me, and I'm hesitant to spend the time implementing this. I can't help but feel like there must be a better, more "correct" way that someone out there has worked out.
Other proposed solutions included temporarily running two versions of the code base, and when a migration for a tenant is successful, switch that tenant only to using the new code base that relies on the newly added migration. This seems like a sensible solution, however this involves modifying the default Capistrano behaviour, messing with Apache Virtual Hosts and symlinks etc - unless this was done "right" (and to be honest I don't know what the "right" way is to do this) I can see this solution ending up being just as hacky as the first solution I mentioned above, and as I only know some very basic Ruby (enough to get Capistrano to do the very simple tasks I need it to do), I don't feel comfortable with this suggestion at this point. However, if anyone has any ideas or tips how I could realistically accomplish this solution, that would be really helpful.
In summary, after many hours of research, I haven't yet come across a solution that would give me peace of mind during a deployment. Then I remember you guys and wondered if you guys might have some new ideas. Multi-tenancy is a common architecture, so surely some of you guys must have come across this problem before? How did you handle migrations in a multi-tenancy application?
How can I safely, automatically apply my application migrations to 100's of databases during a deployment, and gracefully handle the rare cases where a migration fails?
Thanks!
Please or to participate in this conversation.