Hardware Caching Notes
Methods of caching in hardware
Basics
Caches are configured in lines/blocks - contiguous memory
Size of cache tag and data depends on its configuration
eg. [Block size] = Exploiting spatial/temporal locality? Miss rate? Latency?
Spatial = similar addresses, temporal = same block (time)
Structure of cache depends on the type of cache, block size etc.
Direct Mapped
For a 2N byte cache and 32 bit memory byte address and 2M byte block size:
(32 - N) bits = cache tag (MSB)
M bits = byte select
(N - M) bits = index select
Memory Address:
32 (32-N) (N-M) M
+------------------------+--------------------+--------+
| Tag | Index | Offset |
+------------------------+--------------------+--------+
Cache Structure:
Block size
+---+------------------------+---------+---------------+
Index 0 : | V | Tag | Index | Byte 2^M | .. |
+---+------------------------+---------+---------------+
Index 1 : | V | Tag | Index | Byte 2^M | .. |
+---+------------------------+---------+---------------+
...
Tag = compare high bits with content of the cache
Index = look at index of cache and compare tag with cache tag
Offset = if block size 1 byte (...), index into cache data section
Pros + Cons of direct mapped
+Only need to compared one index
+Simple check and eviction
-Possibility of high collisions if block size and cache size aren't optimal
For example, if a program repeatedly accesses addresses > index, then the cache tag will be constantly different - cache thrashing
Example: 1 KB direct mapped cache with 32 byte blocks/lines
N = 10
M = 5
tag size = 22 bits
index size = 5 bits
offset size = 5 bits (size 32 bytes = 2^5)
Memory Address:
32 10 9 5 4 0
+------------------------+--------------------+--------+
| Tag | Index | Offset |
+------------------------+--------------------+--------+
Fully Associative
All cache locations can hold any data from any memory address
+Removes conflicts
-Need to compare all cache entries for match
High comparator cost
-High cost of replacement policies to implement
Memory Address:
32 M
+---------------------------------------------+--------+
+----| Tag | Offset |
| +---------------------------------------------+--------+
|
|
| Block size
| +----------------------------------+---+---------------+
XOR --| Tag | V | Byte 2^M | .. |
| +----------------------------------+---+---------------+
XOR --| Tag | V | Byte 2^M | .. |
| +---+------------------------------+---+---------------+
...
N-way Set Associative Cache
Compared with direct mapped:
N
comparators vs1
Extra MUX delay for the data
Data comes after hit/miss decision and set selection
In direct mapped, data is available before hit/miss selection
Fewer collisions since there are more lines available if there is a collision
Complex comparison hardware - but at least it is more linear than number of cache lines
Last updated