V1nk0's avatar
Level 2

Caching complete lists or just the id-references

I'm currently trying to introduce cache into a project. To make the example simple I will use an example of a blog having categories and posts. A category can have multiple posts assigned, and a post can be assigned to multiple categories.

I'm doubting between the following two strategies:

Strategy A:

  1. Caching the single posts
  2. Caching the complete list of posts for specific lists (e.g. posts by category, top-10, etc. etc.) (to keep the example simple let's ignore pagination here)

Strategy B:

  1. Caching the single posts (exactly like with strategy A)
  2. For lists of posts (e.g. by category, top-10, etc. etc.) I ONLY cache the ids of the posts. Retrieving a list would work as follows: Create an empty array for the posts. Loop the post-ids for the list and call the find-method to retrieve a single post (this function would return the cached post when available of course). Then add the returned post to the array of posts and finally return the array of posts.

My feeling tends to go towards strategy B because of the following reasons:

There are no duplicates in the cache. A post will only exist once in the cache (as a single item). A post (except its non-changing id) will not exist in the lists. This would save storage space in the cache, which sounds welcoming when using in-memory cache drivers like Redis (RAM = expensive)

When a post changes I only have to invalidate and update the single cached post. All lists will still remain the same as they only contain the ids.

It would make pagination completely independent of the database (also saving the need for paginated queries) because I can slice the list of ids myself.

A downside could be that the initial request needs more database-queries (as all posts are queried independently because of the list of ids and the loop to create the output for a list) But this would only be the case for the first load because after that the (most) content will come from the cache.

Maybe there's a name for this method of caching, but I was unable to find anything about this online, making me doubt if this would a good approach at all.

I'm very curious about your opinions or possible alternatives that maybe are (even) more efficient.

0 likes
3 replies
Mithrandir's avatar

I think you have your considerations pretty well thought out already, so I will perhaps just sum up what you already have:

Strategy A: Higher cache costs, more workload when things change, fewer requests to fetch content

Strategy B: Less cache usage, less workload when things change, but (many) more requests to cache instead of database when fetching content

I think you should do some testing of both strategies with many items to determine if Strategy B - the optimal data integrity strategy - is faster than fetching from the database without using the cache - and whether Strategy A has (enough) performance benefits from Strategy B.

V1nk0's avatar
Level 2

Thanks for your feedback. Yes, your summary sums up the advantages and trade-offs how I see them pretty well.

Of course, with strategy B, I could (re)populate the cache for a single post directly after an update or create action in the backend, so a cached item of a post will pretty much always already exists. That way, looping the ids of a list will normally directly return a cached version of a post and not hit the database at all.

This whole thing started with me because I was trying to find a way to achieve caching but without filling my expensive RAM-space (like redis) more than necessary. This is when I first thought of the fact that caching a whole Eloquent model would be really more than necessary and to use simple arrays contain only the things I need (I've made another discussion on the specific topic) Then I thought ok, when I always just blindly cache what comes back from a certain model-method (or repository method) there's gonna be duplicate content (e.g. a post being stored under multiple cache-keys) That's when I came up with the whole idea to just cache the id for a post in lists and parallel to that keep a cache for all single posts.

The thing that made me (or still makes me) doubt if this strategy is good, is that I'm not able to find anything about such a cache-stragegy.

dkroft's avatar

Caching a list of IDs (array $ids), and using Model::find($ids) seems quite efficient

Please or to participate in this conversation.