Allocating non-cacheable memory for DMA

First of all, thank you for your detailed and very helpful response.

Your response deals in Intel architecture. What about AMD architecture, where there is no snooping but a coherency protocol that runs above Hyper-Transport?
PCIe “no-snoop” is translated into an HT “coherent” bit, which presumable generated HT transactions to make sure that the data is not cached in one of the CPUs.

Additionally, we thought of another reason why asking for non-cacheable buffers can increase performance: if a certain region in memory is not accessed too frequently, keeping it out of the cache makes more room for other, more frequently accessed stuff. This latter issue is not related directly to I/O and DMA.

Thanks