分类:
2010-04-07 23:39:43
A CPU cache is a used by the of a to reduce the average time to access . The cache is a smaller, faster memory which stores copies of the data from the most frequently used locations. As long as most memory accesses are cached memory locations, the average of memory accesses will be closer to the cache latency than to the latency of main memory.
When the processor needs to read from or write to a location in main memory, it first checks whether a copy of that data is in the cache. If so, the processor immediately reads from or writes to the cache, which is much faster than reading from or writing to main memory.
Most modern desktop and server CPUs have at least three independent caches: an instruction cache to speed up executable instruction fetch, a data cache to speed up data fetch and store, and a used to speed up virtual-to-physical address translation for both executable instructions and data.
The diagram on the right shows two memories. Each location in each memory has a datum (a cache line), which in different designs ranges in size from 8 to 512 .[] The size of the cache line is usually larger than the size of the usual access requested by a CPU instruction, which ranges from 1 to 16 bytes (the largest addresses and data handled by current 32 bits and 64 bits architectures being 128 bits long, i.e. 16 bytes).[] Each location in each memory also has an index, which is a unique number used to refer to that location. The index for a location in main memory is called an . Each location in the cache has a tag that contains the index of the datum in main memory that has been cached. In a CPU's data cache these entries are called cache lines or cache blocks.
When the processor needs to read or write a location in main memory, it first checks whether that memory location is in the cache. This is accomplished by comparing the address of the memory location to all tags in the cache that might contain that address. If the processor finds that the memory location is in the cache, we say that a cache hit has occurred; otherwise, we speak of a cache miss. In the case of a cache hit, the processor immediately reads or writes the data in the cache line. The proportion of accesses that result in a cache hit is known as the hit rate, and is a measure of the effectiveness of the cache.
In the case of a cache miss, most caches allocate a new entry, which comprises the tag just missed and a copy of the data from memory. The reference can then be applied to the new entry just as in the case of a hit. Misses are comparatively slow because they require the data to be transferred from main memory. This transfer incurs a delay since main memory is much slower than cache memory, and also incurs the overhead for recording the new data in the cache before it is delivered to the processor.[]
In order to make room for the new entry on a cache miss, the cache generally has to evict one of the existing entries. The that it uses to choose the entry to evict is called the . The fundamental problem with any replacement policy is that it must predict which existing cache entry is least likely to be used in the future. Predicting the future is difficult, especially for hardware caches that use simple rules amenable to implementation in circuitry, so there are a variety of replacement policies to choose from and no perfect way to decide among them. One popular replacement policy, , replaces the least recently used entry.
When data is written to the cache, it must at some point be written to main memory as well. The timing of this write is controlled by what is known as the write policy. In a write-through cache, every write to the cache causes a write to main memory. Alternatively, in a write-back or copy-back cache, writes are not immediately mirrored to memory. Instead, the cache tracks which locations have been written over (these locations are marked dirty). The data in these locations are written back to main memory when that data is evicted from the cache. For this reason, a miss in a write-back cache may sometimes require two memory accesses to service: one to first write the dirty location to memory and then another to read the new location from memory.
There are intermediate policies as well. The cache may be write-through, but the writes may be held in a store data queue temporarily, usually so that multiple stores can be processed together (which can reduce turnarounds and so improve bus utilization).
The data in main memory being cached may be changed by other entities (e.g. peripherals using or ), in which case the copy in the cache may become out-of-date or stale. Alternatively, when the CPU in a multi-core processor updates the data in the cache, copies of data in caches associated with other cores will become stale. Communication protocols between the cache managers which keep the data consistent are known as protocols.
The time taken to fetch a datum from memory (read latency) matters because a CPU will often run out of things to do while waiting for the datum. When a CPU reaches this state, it is called a stall. As CPUs become faster, stalls due to cache misses displace more potential computation; modern CPUs can execute hundreds of instructions in the time taken to fetch a single datum from memory. Various techniques have been employed to keep the CPU busy during this time. Out-of-order CPUs ( and later designs, for example) attempt to execute independent instructions after the instruction that is waiting for the cache miss data. Another technology, used by many processors, is (SMT), or in Intel's terminology (HT), which allows an alternate to use the CPU core while a first thread waits for data to come from main memory.