Thanks Jeffrey - I can speculate, but you can help us all with a definitive answer!
Sent from Surface Pro
From: Jeffrey Tippet
Sent: Tuesday, April 14, 2015 10:06 PM
To: Windows System Software Devs Interest List
Any clue why was this feature not supported Win 8 on wards?
I removed NetDMA in Windows 8. Hopefully, I have slightly more than a clue 
The purpose of NetDMA was to reduce CPU usage by offloading memcpy to a generic offload engine.
But in networking, we tend to handle fairly small buffers. A typical network buffer tends to not be larger than 1500 bytes. (Yes we can do LSOs of many kilobytes, but NetDMA was limited to only 2 pages of memory per transaction, so at most NetDMA should be compared to an 8kb buffer.)
Which uses less CPU:
- Setting up a DMA offload to the hardware & continuing when the hardware interrupts its completion
- memcpy 1500 bytes on the CPU
With newer CPUs, the answer tends to be #2.
Since the whole purpose of NetDMA was to reduce CPU usage, and it wasn’t even providing a clear CPU reduction, that makes NetDMA a dubious benefit. Add to that it had low adoption (not many vendors implemented a NetDMA provider), and the value of keeping the feature wasn’t there. Its competitor, memcpy, is simpler, better-supported, easier to debug, and is sometimes even faster.
Is there any other alternative provided for this NetDMA?
memcpy.
Suppose I need this support in my solution, what other feature has replaced this?
You don’t need this support. Once common misconception is that NetDMA is “how network adapters should do DMA”. The feature has a misleading name. NetDMA is actually “how a generic memcpy offload engine can make its services available to the OS”. If you need to do DMA to talk to your hardware, we have APIs for that. More than enough APIs, actually. NDIS, WDF, and WDM all have APIs for DMA. These APIs are fully supported and are not related to NetDMA. See for example https://msdn.microsoft.com/en-us/library/windows/hardware/ff543260(v=vs.85).aspx
If you need to expose your NIC’s memcpy offload engine to the OS… are you sure you NEED this? It will be very difficult to beat the main CPU. Also, building a bunch of hardware to speed up memcpy isn’t really the best use of your time/silicon. The OS isn’t stupid; it’s not sitting there memcpying buffers all the time.
If you need to consume memcpy offload from your own hardware… you don’t need to route things through NetDMA APIs. (In fact, I suggest you avoid it even where the OS supports it.) Just talk directly to your hardware; you have all the access to it.
If you need to consume memcpy offload from somebody else’s hardware… sorry, the OS can’t help you. That was never possible on any OS release; the NetDMA consumer API was private and only used by TCPIP. You can contact the vendor directly, and see if they’re willing to give you a direct API to their hardware. There is no generic hardware-agnostic API to this. NetDMA might have been that hardware-agnostic API in theory, but in practice very few vendors implemented it, so it wasn’t very general.
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer