Cache Eviction Policies: LRU, LFU, FIFO Explained
Caches have finite storage. When a cache becomes full and new data needs to be added, one or more existing items must be removed. This process is called cache eviction, and the rules governing which items to remove are known as eviction policies. Choosing an appropriate eviction policy is crucial for cache effectiveness and overall application performance.
Common Cache Eviction Policies
Several eviction policies exist, each with its own trade-offs in terms of performance, overhead, and complexity. Here are some of the most common ones:
LRU (Least Recently Used)
This policy discards the least recently accessed items first. The assumption is that data accessed recently is likely to be accessed again soon, while data not accessed for a while is less likely to be needed.
- How it works: Keeps track of when each item was last accessed. When eviction is needed, the item with the oldest access timestamp is removed.
- Pros: Generally good performance for many workloads due to temporal locality. Relatively straightforward to understand.
- Cons: Can perform poorly if older items are frequently accessed after a period of inactivity. Requires updating metadata on every cache hit, which can add overhead.
- Use Cases: General-purpose caching, database query caching, web page caching. Its implementation often involves specific data structures explained (Python) like a doubly linked list and a hash map.
LFU (Least Frequently Used)
LFU evicts items that have been accessed the fewest times. The rationale is that items frequently used are more valuable and should remain in the cache, regardless of how recently they were accessed.
- How it works: Maintains an access count for each item. When eviction is necessary, the item with the lowest access count is removed.
- Pros: Can be more effective than LRU when access patterns mean some old items remain very popular.
- Cons: More complex to implement than LRU. Items that were popular in the past but are no longer needed might stay in the cache too long (cache pollution). A new item might be evicted quickly if it hasn't had a chance to build up its access count, even if it's about to become popular.
- Use Cases: Caching data with stable, long-term popularity; scenarios where access frequency is a better predictor of future use than recency.
FIFO (First-In, First-Out)
This is the simplest eviction policy. It evicts items in the order they were added to the cache, without regard to how often or how recently they were accessed.
- How it works: The cache behaves like a queue. The oldest item (the one at the head of the queue) is removed when space is needed.
- Pros: Very simple to implement, low overhead.
- Cons: Often performs poorly because it can evict frequently accessed items that were simply added early. It doesn't consider item popularity or recency.
- Use Cases: Simple caches where overhead is a major concern and access patterns are not well-defined or where items have a very short, predictable lifespan.
Other Eviction Policies
While LRU, LFU, and FIFO are foundational, other policies exist, such as:
- MRU (Most Recently Used): Evicts the most recently accessed items. This might seem counterintuitive but can be useful in specific scenarios, like when iterating over a dataset larger than the cache, where older items are more likely to be re-accessed.
- RR (Random Replacement): Randomly selects an item for eviction. Simple and low overhead, but performance is unpredictable.
- SLRU (Segmented LRU): An improvement over LRU that divides the cache into segments (e.g., probationary and protected) to better handle items with varying access patterns and resist cache pollution.
Choosing the Right Policy
The best eviction policy depends on the specific access patterns of your data and application requirements. Consider:
- Access Patterns: Does your data exhibit strong temporal locality (LRU)? Is frequency more important (LFU)?
- Overhead: How much computational overhead can you afford for managing the cache policy? FIFO is cheap, LFU can be expensive.
- Complexity: Simpler policies are easier to implement and debug.
Often, monitoring cache hit/miss rates with different policies in a staging environment can help determine the most effective strategy. Effective observability in modern systems is key to making these informed decisions.
Once you've decided how to manage space within your cache, the next critical step is ensuring the data remains current. Learn more about this in The Art of Cache Invalidation.