Since the original message mentioned PCIe-related CBo events, it seems reasonable to assume that the counters are referring to PCIe transactions with the "no snoop required" bit set. Secondary benefits include reduction in snooping traffic on the processor caches, reducing coherence traffic on the chip-to-chip links in multi-chip systems, and reducing overall power consumption. The PCIe read/write bandwidth improvement is typically the primary reason to implement the "no snoop required" functionality. The improvement in bandwidth due to the elimination of snooping can improve graphics frame rates. Only the GPU will be accessing that memory, so it does not need to look in the processor caches to see if any of them has modified copies of the cache lines. These "no snoop required" transactions are typically "safe" for accesses to address ranges for which processor cacheing is prohibited.Īn example use case is a GPU that needs to "borrow" extra memory from the processor(s) for "spill" and "restore" traffic. This reduces the amount of time that the buffer handling the store is occupied, so that a fixed number of buffers can deliver higher throughput. The processor caches do not need to be snooped to invalidate any copies of that cache line. This can reduce the latency for obtaining the data, which can increase the sustained read bandwidth in the common case that the hardware supports a limited number of concurrent read transactions.įor a PCIe non-snooped store, the request can go directly to the DRAM controller to store the data. The processor caches do not need to be snooped, and the PCIe device does not need to wait for a snoop response before using the data. The PCI express protocol includes a "no snoop required" attribute in the transaction descriptor.įor a PCIe non-snooped read, the request can go directly to the DRAM controller to obtain the data. If this "directory" indicates that a particular cache line has not been read by another chip, then local accesses to the line don't need to initiate a global snoop. Intel processors supporting 4 (or more) sockets have a "directory" that keeps track of cache lines that might be cached in another chip. In section 2.4 ("Home Agent" events), Table 2-44 describes a performance counter event related to directory lookups and remote snoops. This applies to most memory-mapped IO regions, so it is not surprising to see this in the context of PCIe events in the CBo. Since the processors are not allowed to cache these addresses, they are not required to snoop accesses to those addresses. These are probably associated with accesses to addresses that are mapped by an MTRR (or by the default memory type) as uncacheable. In section 2.3 (CBo events), Table 2-13 lists a filter for PCIe non-snoop read and non-snoop write operations. I am assuming that this is in the context of a Xeon E5-2600 series processor, where the uncore performance monitors are described in Intel document 327043 ("Intel Xeon Processor E5-2600 Product Family Uncore Performance Monitoring Guide").
0 Comments
Leave a Reply. |