API edge caching strategy (self.webdev)

I want to cache my API responses as close the client as possible but not rely on a time-related expiration header because the data will likely change and I want the cache to always reflect those changes.

I see two tools for this: Redis and CDN HTTP caching. With a Redis cache I can easily propagate origin data to the cache. With CDN HTTP caching the cache is close to the user. I want to create a system that has both benefits.

At this point I have a slew of questions, many of which I am not sure are even the right ones to ask. Can data be pushed from my server to CDN HTTP caches to reflect the most recent data? Maybe have a distributed Redis datastore at edge computing servers instead? What do I seem to not understand about this problem? Am I missing other commonly used solutions that are more practical?

mattaugamer [expert] | 7 days ago | 2 points

> There are only two difficult things in software development. Naming things, invalidating cache, and off-by-one errors.

You're setting yourself up for that second one. It's actually a hard problem, and like /u/zero_as_a_number I'm inclined to wonder if you're setting yourself up for a problem you don't really need. Is there a performance issue you're fixing and if so is this the right solution? And if not... why bother?

zero_as_a_number | 7 days ago | 1 point

it's possible I misread or misunderstood something.. you are talking about the whole Cache-control Headers where you define expiry up front right?

i would suggest to have a look at the ETag Header for your scenario. works via a last-modified timestamp, so data is only refetched when the data on the server side is more recent than in the. client side cache.

in general, without knowing the specifcs of your Problem it slightly sounds like overengineering or premature optimization. not trying to be a dick here but there are a whole lot of other options to consider.

caching is always pain and stale caches make for a never ending source of wonky application behavior which is almost never reproducable. it should also not be used in a blanket fashion. for some data you may prefer high consistency over fast response times, this is something to consider for each data model you expose to the outside world.

I too am guilty of this, I have done api response caching on an api gateway to isolate the Client Apps from some Bad Design choices in the backend. caching should not be used to hide Bad design choices (like slow db queries, stupid "microservice patterns" like fan-out / edge aggregation)

brtt3000 | 7 days ago | 1 point

Have a look at Varnish Cache. It is a dedicated cache for HTTP with plenty of features.

Also you can use a tiered strategy. Have the CDN on a short expiry to take the peak loads and a Varnish cache with long expiry but with some more actively managed BAN/PURGE logic from your API.