Our simple (application-specific) throughput test of a bulk USB driver to a connected device shows a throughput loss of ca. 20% (2MByte/s) when using UMDF instead of a ‘plain’ WDM driver.
The application block size for write used is ca. 15k, the USB connection and device are hi-speed (USB block size 512 bytes), the same USB device on the same hub was used (on a different port).
A limitation to ca. 10MB/s comes very likely from the connected device, and it’s more the size of the delta I’m worried about.
My UMDF bulk USB driver is derived from the OSR USB Fx2 UMDF sample (WDK RTM), the WDM bulk driver was derived from bulkusb.sys (DDK2000).
UMDF vs. WDM performance tests were done on XP. On another PC with Vista (x64) the UMDF driver showed almost identical throughput. The WDM driver is not (yet) ported/tested.
Now I am a bit surprised. As far as I know UMDF should not have such an adverse effect on performance [1]. So of course I assume I’m doing something wrong or overlooked something.
Does anybody have more information or any hints about this?
Thanks!
-H
[1] “UMDF can easily saturate a USB 2.0 bus’s maximum speed of 480 megabits/sec” and “No adverse performance impact in existing UMDF drivers” (see DEV095_WH06.ppt, slide 14).
Hagen Patzke wrote:
Our simple (application-specific) throughput test of a
bulk USB driver to a connected device shows a throughput
loss of ca. 20% (2MByte/s) when using UMDF instead of
a ‘plain’ WDM driver.
What’s your CPU usage with the WDM and UMDF drivers? Is there a big difference?
Are you doing reading or writing? Are you pending multiple URBs to the device?
xxxxx@gmail.com schrieb:
What’s your CPU usage with the WDM and UMDF drivers? Is there a big difference?
Thanks for the good idea of measuring the CPU usage difference! That
should at least give me a hint where the time difference comes from.
Are you doing reading or writing?
Both. Upload and download rates differ by a fixed percentage, and this
rate did not change, so I did not bother to mention it.
Are you pending multiple URBs to the device?
Overlapped I/O would be possible and the proper calls for it are in on
the host side, but in the test code we also wait for the device’s ACK
for every block before sending the next one - so in fact it is not used.
With a modified version of RwBulk.exe for our device that I’m currently
writing, I want to check out e.g. overlapped vs. direct I/O and see if
there’s any significant difference.
BTW, I noticed that unplugging a USB hub device that was plugged in on
the same root hub actually decreased throughput. As there was no
device attached to that hub, I would have expected a very small
throughput increase (less polling necessary), therefore IMHO there
must be a timing problem. So I’ll try to get a look on the bus to see
what happens.
Tomorrow or Friday I’ll try to check out your questions/suggestions (CPU
usage, overlapped I/O used in the driver), and post the update here.
Thanks - more ideas welcome! 
-H