peffraimidis's avatar

Database Replication

Hello everyone,

My team and I are working on a web application that handles a massive volume of data (many tables, many records etc). Without getting into very specific details, we need fresh data both for debugging and for developing features.

The problem is that the app's database is getting too massive to just export and import data to align the production database with the localhost database. Not only it is time-consuming, but also it is impractical (This is what we are doing today).

We were thinking if we could somehow create a database replication (development database) and keep them in sync with the one that is in production. The goal is to automatically insert the same record to the development database whenever the production database executes a transaction (not on Business Layer by code, but on Database Layer). If the copying transaction (from production db to development db) could be done asynchronously without blocking the production database would be even better.

Do you have any suggestions? How would you approach such a problem? Thank you.

0 likes
6 replies
cwhite's avatar

What's the OOM (order of magnitude) for the database size?

peffraimidis's avatar

@cwhite At this point we are talking approximately for 1 gb of data but their growth is exponential.

cwhite's avatar

@pefremidis, just for some insight, we are currently doing something similar to sync our Production and Staging databases once a week with a cron. Our Production DBs are 20+ GB and it takes 1-2 hours to dump, rsync across servers (the compressed dump is ~1-2GB), and restore.

If there's a better way to do this I'm all ears, but this has worked for us so far.

1 like
psrz's avatar

Similar to @cwhite's case. The cron runs once a week, on Saturdays.

Though we synchronize two environments, staging and development. The biggest database is around 12GB currently, but it's not just the DB. Our projects use a lot files that are not stored in the database so we need to rsync that too.

It's a pain, but I we don't need this to be "mirrored" on real time so this is good enough.

1 like
robj's avatar

I dont' know details but I saw a setup on MySQL a few years ago that had replication setup with a MySQL proxy.

It had one primary and 3 secondary (replicated from primary server) - The proxy would send inserts/updates to the primary server and selects to one of the secondary servers.

The replication used transaction logs and was fast - it's in my options list if every I do a heavy read applicaiton which might need horizontal DB scaling.

Anyway, back to your point, is it possible in some use csses. I would seek out advice of a good DBA with experience in your DB layer.

1 like
peffraimidis's avatar

Thank you all for your answers!

The original idea was to dump and sync our DBs on Weekends too, but we were wondering if there was a better approach. The issue is still under discussion by our team, I will keep you posted if we find some other alternative.

Please or to participate in this conversation.