brentxscholl's avatar

Performance Anxiety: 1 big table or 2 smaller tables?

I keep on encountering this performance anxiety when building out the database structure of my projects. I'm hoping to get an outside opinion on the matter.

Which approach would you say is better when building a large scale project?

Option A. 1 large table with many columns, or

Option B. Break data up into 2 small tables with few columns, but linked with ids.

Here is an example:

Say you are building a project that showcases things to do in an area. We will call these Activities. You have an index page that list all the Activities as cards/thumbnails. Simple data that is just the Activity Title and the Activity Image.

When you view the single Activity page, there are ~20 more columns worth of data that you want to show about the activity. ie. Description, how physically demanding it is, it's location, is it indoor or outdoor, is it family friendly. etc.

Lets also say there are many places throughout the site where you would see the simple cards for the activity (title and image only). but only one time you would need all the details (when viewing the single activity page).

Which is better?:

Option A (single table)

Activity Table
-----------------------------------------------------------------------
id
title
image_id
status
description
physical_level
environment
family_friendly
location_id
start_date
end_date
start_time

Option B (multiple tables)

Activity Table
-----------------------------------------------------------------------
id
title
image_id
status

Activity Details Table
-----------------------------------------------------------------------
id
activity_id
description
physical_level
environment
family_friendly
location_id
start_date
end_date
start_time
end_time

Does it take more cpu to join tables or more to select columns in tables?

What is better for searching, if say we were wanting to search for activities based on environment field?

What is better to query with? Option A: Activity::select(['id', 'title', 'image_id'])->all() for querying activities on index page and Activity::find($id) for querying activity for single/show page.

Option B: Activity::all() for querying activities on index page and Activity::with(['activityDetails']) for querying activity for single/show page.

I appreciate any advice

0 likes
7 replies
Phread's avatar

Your answer may depend on the number of rows for each table, how static the table(s) are, how many Users (100 or 1M), indexes - defined/used/needed, database type, how many times this information is required, caching (and type), the information required for the form, etc.

As an example for only one area: Caching alleviates the database hits, but this assumes that the data is static for a time period that makes sense such as: reference data, period data (previous period(s)), and other similar information. Note: caching can be used for "stuff" other than data.....

Imho, unfortunately there are way too many variables for us to give you a definitive answer. There are tools which can be used to perform load & performance testing, which have a good probability of giving you a better direction/opportunity/answer to get an answer to your question.

Also, I do not have a list of performance/load testing tools, yet.

1 like
brentxscholl's avatar

Thanks for your input! I really appreciate it.

If anyone else has some input on the matter I would appreciate it as well!

sr57's avatar

Option B is better than A in absolute but, with good index and cache in service you'll see no difference till all your data will be in memory, real advantage will arrive when table A that should be the smallest size will still be in memory and not table B.

Don't forget that there good tools to optimize sql in mariadb or postgresl.

newbie360's avatar

Option B: relation one-to-one , only a manager has able access the 2nd table, eg, id_card_number, bank_account_number

diogoko's avatar

I wouldn't split Activities in two tables in this case. From your description, conceptually an Activity is a single entity that may have a large number of properties, but all of them are pertinent to each Activity.

Regarding searching performance, simple indexes will be enough for most situations. If you need free text searching, you can use Laravel Scout; it will sync your data to search providers like Algolia or ElasticSearch, which are quite optimized for complex full text queries.

Finally, watch out for premature optimization. It's always useful to consider performance implications of data structures and algorithms, but unless you're working on a new feature in Twitter's core, your largest cost is developer time. Splitting Activities in two tables will increase the complexity of your code, so your team will spend more time coding, testing and debugging. I would use the simplest solution that works and review it in the future if profiling proves that that solution is not acceptable anymore.

Snapey's avatar

Simply; If the tables end up being 1:1 related then you may as well have had one table and just select the columns you need for each use.

It makes sense to split them when only some of table 1 need attributes stored in table 2. You avoid having a lot of empty cells because a lot of the rows in 1 do not carry the extra attributes.

Tray2's avatar

As a rule of thumb, Never pull more out of the database than you need.

So if you have one table containing lots of columns then you shouldn't just do

Model::where('field', $value)->get();

You should tell it which columns to fetch for you

Model::where('field', $value)->get(['field1', 'field3', 'field6');

However you also need to consider that if one of the fields are a clob or a blob then you probably benefit by moving it to another table.

I like to think of groups when it comes to how I divide my tables.

Let's say that you have a users table that is used for authentiction and stores the users profile. then it might be a good idea to create a profiles table to store the profile specific information.

It all depends like mentioned before, how static is the table, what kind of data is stored, how many records the database contains and how many users.

In your case I probably stick with one table for the activity and then check for repetition of data in the table.

If you notice something like this it's probably a good idea to extract the activity

title   | location_id
Horseback riding | 1
Horseback riding | 2

And make it

activity_id   | location_id
1 | 1
1 | 2
1 like

Please or to participate in this conversation.