Malleable Caches

Caches have become an ubiquitous feature of general-purpose microprocessors, however, their organization is not always suitable for modern applications. In particular, the locality of reference assumption and the LRU replacement strategy of traditional caches are wrong for the stream-based and real-time components of many applications. Moreover, future microprocessors are likely to have even larger caches which will consume a large fraction of chip area and power. For many application domains, however, the marginal utility of larger on-chip caches is decreasing.

Interesting Points

Caching has been around for a long time and has been well studied. However, computers and their use evolve, but caches just get larger. Caches now occupy nearly 95 percent of the transistors of a modern microprocessor. Despite that fact, the bulk of effort is spent improving the processor core (the remaining 5 percent). We believe the time has come to re-examine and improve long standing technology.

We believe that there is a big payoff with cache mechanisms that adapt to the existing workload rather than being purely static ones. After examining various uses of the valuable caching resource, we expect to then examine architecures for adaptive and malleable processors.

Approach

We propose a malleable cache architecture with novel ways of controlling the cache to dramatically improve its utility and power consumption. Small code sequences of stream-processing and real-time applications can achieve several orders of magnitude improvement with malleable caches. Applications with low information entropy can make use of the cache resources for data compaction or memoized functions to dramatically improve execution time. In addition, malleable caches permit many architectural optimizations, such as adaptive prefetching.

A malleable cache can be dynamically partitioned so that competiting data accesses do not cause thrashing. The traditional LRU replacement strategy only looks at the near term past behavior, and so in numerous situations, the wrong elements are replaced. Cache partitioning can isolate elements so that they do not interfer. This idea is not new. Caches have been partitioned for data and instructions for a long time, and recently they have been partitioned for temporal and spacial data. Page coloring is a technique that indirectly partitions a cache, especially one that is direct mapped. This work proposes a mechanism to dynamically partition caches, that has low overhead both in terms of implementation and in the cost of repartitioning. This mechanism, refered to as "column caching," is explained in the following section.