Oct 20, 2025

Level 41

Cloud Database

Hi,

I have a database table (currently in MySQL) which has 6 columns, id, relation_id, timestamp and 4 int values which would probably receive around 10-20 million records per week.

I would like to know which database solutions you (the community) would recommend in this situation. The cost aspect would also be an consideration. Preferably one which we would be able to query on the relation_id with the timestamp constrains.

If adding disk space to the server and keeping it in MySQL is the cheapest solution that would also be possible, but i'm wondering if there are any (cloud) solutions that would be a beter fit.

Any suggestions?

(AI suggests Amazon RDS or DynamoDB or similar, depending on how i ask the question, but i want community suggestions based on experiences).

Thanks!

LaryAI

7 months ago

Level 58

Sure, here are some options and practical considerations based on your use case:

1. MySQL (Cloud Hosted or Managed)

Pros: Familiar, easy migration path, mature ecosystem, well-supported by all major cloud providers (e.g., Amazon RDS MySQL, Google Cloud SQL, Azure Database for MySQL).
Cons: May require vertical scaling (bigger instance) as data grows; indexing and storage costs can add up.
Query Capability: Excellent, especially if you index relation_id and timestamp.

Example index for your query pattern:

ALTER TABLE your_table ADD INDEX idx_relation_timestamp (relation_id, timestamp);

2. PostgreSQL (Cloud Hosted or Managed)

Pros: Similar to MySQL, but often gets higher marks for complex queries and extensibility. Amazon RDS and others offer managed Postgres.
Cons: Slightly heavier resource requirements.
Query Capability: Great for time-series queries, plus advanced features like partitioning.

Table partitioning example (Postgres):

-- Partition table by week (for example)
CREATE TABLE measurements (
  id serial,
  relation_id int,
  timestamp timestamptz,
  val1 int,
  val2 int,
  val3 int,
  val4 int
) PARTITION BY RANGE (timestamp);

3. Time-Series Databases (e.g., TimescaleDB, InfluxDB)

Pros: Designed for high-ingest and time-based queries, compression, retention policies.
Cons: If your queries are simple and you don't need time-series features, may be overkill.
Query Capability: Excellent for time-dependent data analysis.

TimescaleDB is an extension of Postgres and fits your schema:

-- Create hypertable for optimized time-series storage
SELECT create_hypertable('your_table', 'timestamp');

4. NoSQL Options

Amazon DynamoDB:
- Pros: Managed, highly scalable, pay-per-use.
- Cons: Query flexibility is limited unless you design your table for key-based queries. Joins and aggregations are not as easy as SQL.
Google Bigtable / Firestore:
- Similar Pros/Cons to DynamoDB; aimed at massive scale, but higher operational complexity for relational-like access.

5. Cost Considerations

MySQL/Postgres on a managed service with periodic scaling and optimized retention/indexing is often the best balance between price and flexibility, especially if you already have SQL knowledge.
Time-series DBs are economical at scale for write-heavy workloads, but check ingestion and storage costs.
NoSQL can be cheap at low volumes, but may become expensive at high query/read rates if not properly optimized.

Recommendation

Given your requirements (massive inserts, query by relation_id and timestamp, cost sensitivity, optional cloud hosting) — MySQL or PostgreSQL on a managed cloud database (like Amazon RDS or Google Cloud SQL) is a solid starting point. Use proper indexing and, if needed, partitioning.

If your queries are mostly analytics on time-based data, evaluate TimescaleDB (hosted on Timescale Cloud or your own Postgres instance).

With any option, monitor query performance and storage usage as your data grows.

Let me know if you want example provisioning scripts or specific config tips for any of these solutions!

martinbean

7 months ago

Level 80

@s.spaan Why do you need a “cloud” database at all? A cloud database is probably going to cost far more than just running a MySQL instance of a five-dollar-a-month VPS.

1 like

s.spaan

7 months ago

Level 41

@martinbean If i need a Cloud database at all is part of the question. A rough calculation gives me around 2,5GB data per/year for this table. So it would grow rapidly. It's fine if just increasing the disk size of the VPS is de best option. Just wondering if any cloud options would be beter?

@tray2 If we keep the data on our own VPS we would indeed partition the data (but i would nee to check how that works). At the moment we are planning to keep the data, so no cleanup for now.

Glukinho

7 months ago

Level 31

20 millions rows per week is about 33 rows per second. I would worry more about performance rather than data volume.

Glukinho

7 months ago

Level 31

What type of data you have? Isn't it some sort of logs? Do you really need relational database for this data?

1 like

s.spaan

7 months ago

Level 41

@glukinho Meter readings data of each 15 minutes intervals. We need a relation to the meter (so we can fetch the data for that meter). But if we need a Relational Database is the question. If we can use another database type that is fine. As long if we can query on the relation.

Tray2

7 months ago

Level 73

I agree with @martinbean, just make sure that you use indexes for your tables, and use partitions to separate on created_at or similar. If you add an archived column, then you can partition on that, and it will only see active records.

Make sure to remove old data if it isn't needed. If there is like a report that is ran like once a month, then I would move all the older data into a history table, and query that with old data.

Or use another table for the archiving.

1 like

jlrdw

7 months ago

Level 75

Also look into a NoSQL database solution.

And like @tray2 mentioned:

If there is like a report that is ran like once a month, then I would move all the older data into a history table

That's what I did at a logistics company, archive older data that wasn't needed in the current year.

Edit:

Side note, before messing with migrations, backup data first. A while back a poster lost over 6 months of real company data by migrating wrong on live data.

Make sure you fully understand how to use migrations before you use them.

JussiMannisto

7 months ago

Level 50

If you don't go for a cloud-based database with built-in replication, I suggest you set up read replicas from the start, or at the very least configure GTIDs on the primary database so that it's easier to add replication later if needed.

If you don't use GTIDs, it may be tricky to set up replication later without downtime given the size of the database.

When you have read replicas, you can direct all reads to replica databases and writes to the primary database. This allows you to reduce load on the primary DB server and scale horizontally. You can mitigate replication lag issues by using a "sticky" connection, so that if any writes are made during a request, all subsequent reads are sent to the primary database.

You can find more information on Laravel's database connections here.

Please or to participate in this conversation.