Caching strategy: cache complete lists of results or cache single objects and result lists with only ids of objects
To make it easier to understand I will translate my problem into this easier example:
I have a website with stories and categories. Categories contain stories and stories can be assigned to one or more categories. I'm introducing caching. I have thought of two strategies:
Strategy 1: Cache the stories for each category. I cache the result for every category. This will generate cache with keys "cat:1", "cat:2", "cat:3" and so on, where the number corresponds with the category id. The cache items contains a list of FULL stories (so all the story content) for that specific category. Advantage in my opinion: getting all stories requires only one cache-lookup. Disadvantage in my opinion: a story can be present in the cache multiple times with identical content.
Strategy 2: Cache single objects and lists of id's In this case I will create a single cache-item for a single story. This means I will have cache with keys "story:1", "story:2", "story:3" and so on where the number corresponds with id of the the story. For the categories I will also have cache items with keys like this: "cat:1", "cat:2", "cat:3" and so on, where the number corresponds with the id of the category. The cache items of categories contain a list of ONLY THE IDS of stories within that category. Then, when a category is requested, I will get the list of story id's from the cache, loop that list, and for every id look up that story in the cache.
Disadvantages in my opinion: more cache hits, as every single story in the list must be looked-up individually in the cache. Advantage in my opinion: A story is not stored multiple times in the cache, which saves space in the cache, especially useful for the more expensive cache mechanisms like REDIS. Also it's very easy to update a story in the cache, I only have to update the one cache item. If the categories of the stories don't change, there's no need to change the cache of the category lists. I can also introduce pagination without having to cache paginated results, as I can build my own pagination around the list of id's (I have the total number of results in the list as I can count the id's that are in the category cache)
As you might have sensed already, my personal favourite at the time being is strategy 2. I'm just wondering what the effect is, when all stories are stored separately in the cache. I would guess, that with SSD disks (in case of file-cache) or the super-fast REDIS cache, the advantages outweigh the disadvantages, but I'm curious to other opinions. Also, you maybe recommend a different strategy at all?
Thanks in advance for any contributions!
Please or to participate in this conversation.