Half-Measure (of software): Proactive caching

There's no need to introduce caching.
There are different flavors and of course many vendors and implementations.

However I have rarely seen a caching solution having a more proactive strategy to refreshing its data.

Caching Pattern

The classical caching pattern is as follows:

Determine whether data is held in cache
If not, get the actual data (either by loading it from data store or creating it on the fly or some hybrid way)
Store copy of data into cache
Return data

Usually, a cache (or the item put into cache) has some expiration policy. When expiration happens, the data is often evicted (or left expired). The logic then kicks in again with subsequent requests: load data from store then put copy into cache. And so on.

There are two problems with this approach:

Concurrency: Under load, I could be having multiple thread refreshing the data that has expired.
Latency: when refreshing the data from the cache, my system will most likely incur a drop in performance.

Preemptive Caching

Preemptive caching provides a solution to the 2 problems mentioned in a traditional caching strategy.
The idea is to cache the data before it's even requested.
The problem with that approach is three folds:

We need to know ahead of time the data to be cached. One could fine tune this strategy over time by monitoring usage and deciding whether new data need to be cached and evict old data
Even this pre-cached data need to expire
We may not have enough memory to cache all data (e.g. UGC sites or sites having massing amount of data with little to no predictability of end-user behavior)

Proactive caching

Proactive caching could be considered as a special expiration policy. The idea is to have the cache refresh the data whenever it's about to expire based on a policy. Such policy should specify the maximum amount of time the data should remain in cache (e.g. how many times it should be refreshed), when/whether the data should be refreshed (e.g. if items requested more than once per second the last 5 minutes), how often to check expiration, when to trigger refresh before expiration, etc.

Implementation

The simplest approach is to bundle the data to be cached with logic to re-load it.
With any of the existing client/server caching solutions, that would mean changing the implementation, adding a new feature.
Another approach (taken here) is to limit this feature to the client-side of it. The client will then be responsible for keeping the logic to retrieve data in memory and using them before data expires on the remote server. If a client fails to refresh the data, another client can pick up and be then in charge to refresh it from that point on.

Architecture

The Command Pattern is the best solution to hold the logic to retrieve data when needed. We can keep it in memory, execute it, re-execute it and even decorate it with other features (see Command Patterns).
Then, we abstract a Cache and its behavior in order to be able to implement a proactive solution and delegate the actual caching to a concrete implementation.

Fig. 1. Architecture

The ProActiveCachingTask is called by a Timer and process cache entries depending on their expiration state. If the entry is about to expire and if it has a command, it will use the command to refresh the data.
This can be done synchronously or asynchronously. While being refreshed, the entry should be indicated as such so that ProActiveCachingTask won't be using that same command again next time around.
Once the command is done getting the data, the entry can be reset and ready to have its command executed again next time expiration is about to happen.

Considerations

The policy of the proactive cache is really what will make this pattern useful or utterly useless.
Even in cases where it might seem counter-intuitive.
One special case is when one wants to improve on an existing feed provided by a third party, which is out of our control. If such a feed proves to be unstable (response times ranging from 1s to 20s, with 80% uptime), one could use this pattern (as a broker or a proxy) to refresh the data from the feed in the background.
In this special case, we actually want to have the data cached all the time but refreshed every once in a while so the cache holds the latest data.

References

[1] Cache-aside Pattern, MSDN
[2] Command Patterns, Luc Pezet, 2014
[3] TTL, Wikipedi
[4] memcached, memcached.org
[5] Pining, Expiration & Eviction, Ehcache .org
[6] Source code, Luc Pezet, GitHub.com

Half-Measure (of software)

Saturday, April 5, 2014

Proactive caching

Caching Pattern

Preemptive Caching

Proactive caching

Implementation

Architecture

Considerations

References

No comments:

Post a Comment